Skip to content

Synthetic data is going to eat you(r job)! with Doug Guion - Webinar recap

Yabble January 15, 2024

In an era where the boundaries between reality and digital innovation are increasingly blurred, the realm of data collection and utilization is undergoing a transformative shift.

Below is a partial transcript of our recent webinar 'Synthetic Data is going to eat you(r job)!', presented by Doug Guion, an industry expert with over 25 years of experience in the data collection and market research space, who delves into this fascinating transition. The focus is on how artificial intelligence, specifically generative AI and synthetic data, is revolutionizing the way we think about and interact with data.

Prepare to explore how these technological advancements are reshaping the market research industry, the implications for strategic planning, and the urgency of embracing these changes to stay ahead in an increasingly AI-driven world.


AI-Academy-Blog-Banner_Synthetic-data-is-going-to-eat-your-job


I want you to imagine that when it comes to data collection or the way that you're typically thinking about creating data, that the line between reality and simulation has already started to blur significantly. That thinking about data in terms of facts and figures that have been curated through one method or another that you then use and set aside – that is changing into a partnership where it's the data itself you're partnering with. You learn from the data of course, but the data also quite literally learns from you. And the more you interact with that data, the more knowledgeable it gets and the better it is at answering your questions.

So this is also something that is here today. The viewing of AI, generative AI, or synthetic data creation as just another tool, like an add-on to a dashboard or a platform, doesn't give it the right context because that says there's something that I'm doing today that I can now do a little more efficiently or a little faster. What AI is giving us is an entirely new way of doing something where in many cases you'll be able to forego the way that you're doing it today.

And most importantly, it's here. It's here right now. The train that is AI is barreling at all of us at 1,000 miles an hour.

Even if from a strategic planning perspective, you're on the right track, you're talking about it at the boardroom, you're talking about it in your planning sessions – if you're not taking action right now, if you stand there too long, even if it's on the right track, you're going to get hit by the train.

So today is about a brief education on what's happening, what synthetic data is, and most importantly, the need to get started, at least from an exploration perspective, because if you wait too long, it's probably going to prove to be too late.

What do we know about AI and synthetic data collection right now?

So along those lines, what do we know about AI and synthetic data collection right now from a market perspective, from a trajectory, not a lot, right?

There are people who are serious people, who are taken seriously, who are doing exploration of synthetic data and saying that they're finding interesting results, and then there are people who are sizing the market in the hundreds of millions and the billions and the multi-billions.

We don't know exactly where it's gonna go, and anyone who says that they do are lying to you, or trying to get your attention, but what I can say with some degree of confidence is that no one's saying it isn't going to be a market.

It's a matter of how large and how quickly and how big. To further help substantiate that, another measure that's often looked at when you, from an adoption perspective or a trajectory/momentum perspective is how quickly any individual software platform gets to 100 million users.

And as you can see, ChatGPT got there in two months. I think the mean across this set is about 40-ish months, so quite a bit faster and over four times more quickly than its nearest adjacent competitor in this regard, TikTok.

So, while the success of ChatGPT isn't necessarily a bell weather for what's going to happen with AI at large, or with synthetic data specifically – the point is it grabbed attention at critical mass more quickly than anything that I've ever seen before.

Look at people like John Wren from Omnicom who's saying in 15 years it took us to transform digital marketing, AI is going to do that in 3 to 5.

I think that's way too conservative. I think in three years, everything that we do today will be upside down.

I think in the next 12 to 18 months, a vast majority of the way in which we, in relation to collecting data currently, will be completely replaced or a sizable 60, 80 percent of that share will have been converted to AI methods.

A new way to think about data creation...do you really need a project?

An example that I use when I'm talking to prospects is imagine that you live in New York and you have a customer in Los Angeles and you have to see them in person every week for whatever reason – that's a necessary component of the relationship.

There are a number of ways you can get from New York to LA. You can walk, you can drive, you can take a train – none of which are the fastest.

So typically people will fly there. That's the fastest current method of getting to California from New York. But what if there was a teleportation option?

Suddenly, instead of having to travel to the airport, book your ticket, go through security, fly there, land, take an Uber, and get to the appointment, you could go to a station and immediately be at your destination in a moment.

Have you lost any value that you're going to give to your customers by getting there more quickly? And my proposition is that you haven't.

And analogous to data collection aside from quality measures, which are important no matter what you're doing ie. it needs to be safe to drive, fly or teleport somewhere – if you don't program the survey, if you don't go to a panel, if you don't have your programming potentially messed up, if you don't have to worry about fraud and hit the considerations in PII, if you don't have to worry about all of that, if you don't have to write a tab spec. And you can start with a thought and end with an artifact of data that you can trust, that's verifiable and substantive. Have you lost any value or have you just gotten to your endpoint more quickly?

And my proposition is that you haven't, and that is what is happening with synthetic data. It is allowing people who use a very traditional method of collecting data that we all do, because currently, it's the best way to do it, to try and experiment where you start with a thought, you move at the speed of thought, you end up with a result and you can then have a conversation with that result, and that can be recursive.

So what I'm trying to tell people is you can get rid of the project construct – you don't have to work just within a project. You can just have a set of ideas or hypotheses and then test them actively using synthetic data.

But, what is it? What is synthetic data?

But what is it, right? Synthetic data is often thought of as something imaginary. Like there's a computer like Hal from the Kubrick movie (A Space Odyssey) that is making things up, right? It is just going into chat GPT or a large language model, sometimes it can tell you things that aren't true.

So is that what synthetic data is? If it was, I wouldn't be on this webinar today.

Synthetic data is generated using artificial intelligence or artificial intelligence techniques, similar infrastructure in nature to the way that we do in the social sciences.

When we interview people in the way we structure our instruments. So there is a rigor that goes into how you collect good synthetic data in the same way that there is called people source data.

And it's already being used today. This isn't an abstract fantasy that may come at some point in the future – for software development, all of the autonomous vehicle work that's been done (Tesla and all the other companies that are doing driverless cars) is using synthetic data now to not only test real-world scenarios, but also edge cases that are hard to see expressed when you're in a sample population. So synthetic data has actually been present for a number of years, a vast number of years, but in terms of the advent in market research to be able to answer consumer questions using synthetic data, that's very new and that's come with the power of generative AI.

What makes good synthetic data in market research?

Obviously, there are some things that are part and parcel.

It's faster. It's definitely cheaper. You don't have HIPAA. You don't have CCPA or GDPR. You don't have PII. You don't have any of that because you don't need it.

You're not having to worry about the vagaries that come if you want to do an adult beverage survey in the US, every state has a different age limit.

You can forego all of that and just ask the question directly. The scalability is infinite. It's as much knowledge as available.

And the accuracy is high as long as it's done well. That's something that is important to remember that there's going to be a lot of synthetic data offerings coming out from companies that you trust or from companies that are simply going to start natively in that space and like all things, some of them will do it well with rigor and some of them won't. It is quite possible to take a standalone data set, a large language model, a deep learning database, a customer database that you have, whatever it is, deploy generative AI on top of it, and say, here's a synthetic offering, ask questions.

And it's not always easy to tell the difference between a good answer and a bad answer. So like with all technologies, it's important to understand how is that data being sourced? How is it structured? What are the rigors that go into it? And if synthetic data requires people or days to be able to get back to you, it's not real.

That means there are people behind the curtains who are pulling switches and just claiming it to be AI. True synthetic data is a lot more than just going in and asking a large language model to act as a consumer that you've profiled.

There's a great deal more that goes into making it good. So this is the cautionary tale of: get excited about it because it's real and it's here, but enter it with the right degree of skepticism and the right degree of probative questions to make sure as you're evaluating who's the right synthetic offering to go with. You have baseline questions that you can compare across all of them. I often talk about synthetic data as a new animal that's been released into the wild and what that typically does is make people afraid.

 

When there's a new animal, a new species that you've never seen before, you don't understand if it's dangerous. You don't know how to interact with it. What that does with generative AI is it stops people in their tracks. I'm excited about this but I'm not sure if it's going to take my job, is it going to destroy my house, is it going to eat me and my children – I don't know how to trust this and what that inaction then does typically is make people late adopters.

Fine. You're a late adopter, there's the cross in the chasm, there's the early adopters. I think most of that model is gone because of the speed with which AI moves.

If you wait too long, then the expectations that you have in terms of future business prospects or recurring revenue can change so quickly. The sands can shift so fast that you won't have time to recover.

So, here are some examples where there were long-standing, long-trusted, you know, widely distributed technologies that were replaced nearly instantly, but with the advent of a new technology, none of them go at the pace of AI.

Synthetic Data is going to eat you(r job) - Slides

You can't wait to start finding out - you need to explore synthetic data offerings now

This is a call to action. It's fine to have a bit of fear, it's fine to have a bit of reservation and want to understand, but you can't wait to start finding out.

You need to start doing that exploration now. Be curious, right? If you're a researcher, you're actually a curious person. You like to ask questions. So do that. Take a look at the way that your projects are structured.

 

Take a look at the way that people are buying from you and the types of things that you're selling to people and ask yourself whether it's possible to do them with AI, with generative AI.

And then go out and test it. Imagine a customer that is currently buying from you sitting in a meeting with their VP who has 'a really good idea' that's actually a terrible, terrible idea.

And it takes a week or two to prove that because you have to go out and do research. Imagine if the woman who's getting on the train could in the space of the meeting, ask a topic, ask the questions, source the audiences, and then in 20 minutes start having a conversation during the meeting with the VP who came up with the idea and leave the meeting with an empirical substantiated understanding of what you want to do. Green light, red light.

That's where we're going, so test it. And if you're not sure whether it works what I tell people is, to ask a question you already know the answer to. The confirmatory curve is very low with AI because you can take data that you already have and you understand and ask it questions. Or you can create data that you've done recently and run a test and find the degree of similarity or the degree of veracity that you're looking for.

Getting started with generative AI & synthetic data for market research

If you're interested in getting started, if you want to try something out, get in touch with Yabble to try Virtual Audiences today.

Take a topic of your choice. It's very inexpensive. It's a quick way to find out whether you have the right understanding and depth of knowledge with synthetic data and how it performs against something that you have a question about.


To watch the full webinar and see the Q&A portion of the session, visit the Yabble AI Academy and register to view our past webinars on-demand.