Brainstorm AI London 2024: The Future Of Synthetic Media

Fortune

Victor Riparbelli, Co-founder and CEO, Synthesia Mati Staniszewski, Co-founder and CEO, ElevenLabs Moderator: Jeremy Kahn, FORTUNE

Transcript

00:00 - Hi, everybody. Welcome back. I hope you had a good break. I'm very excited to be joined

00:04 with some of the CEOs of two of the hottest sort of startups in the generative AI space

00:10 at the moment. And as you heard from Ellie, both working in synthetic media, which is

00:16 maybe using text to generate something, but maybe also using existing video, still images,

00:22 or voice to generate media and content. To show you how this works, first, we've got

00:27 two very exciting demos. First, we're going to go to Victor, who's going to show you a

00:30 little bit about how Synthesia's product works. Victor, go ahead.

00:34 - Thank you so much. I think we should get a visual in just a second here, but maybe

00:38 before we jump into that. So at Synthesia, we're on a mission to make video easy for

00:42 everyone. That's something a lot of people have attempted before us, but we take kind

00:46 of a new approach to this, right? We don't think of this as building smaller, more affordable

00:50 cameras. We don't think of this as like slightly better editing apps that run on your phone.

00:54 We're actually building technology that eventually is going to be able to replace the entire

00:58 physical production process of using studios, cameras, actors, microphones, and all those

01:02 stuff. We want to take that entire workflow and make it into something you can do entirely

01:06 from behind your desk. Today, we're a SaaS platform, and we help predominantly enterprise

01:12 create more videos to communicate better with their stakeholders, which could be employees,

01:16 could be customers. And I think we'll get a visual up here now of how it actually works

01:21 and how easy it actually is to do. So this is the platform, and in some ways, it's sort

01:26 of modeled a little bit on PowerPoint. That's how easy we want it to be able to do. What

01:31 we're showcasing here is our AI video assistant, which uses LLMs to essentially help you get

01:35 to a draft of your video in just a few moments. So what you're seeing here is someone putting

01:42 in the URL to an article, of course, a Fortune article, because we're here today, giving

01:47 the system a little bit more context of what it is that we're trying to do with this video.

01:52 And in just a second, the system will then go in, it will pass the content, and it will

01:57 give us a draft, an editable copy that we can then kind of tune to what exactly we want

02:03 it to be like. Now, in the context of an enterprise, this may not be an article like this. This

02:09 would probably be a knowledge-based article. It could be a case study that you want to

02:13 share with your customers or anything else really you want to turn into a video. So as

02:16 you can see here now, we begin to actually write the script. And it is, of course, using

02:22 all the data that it's kind of taking from this URL. The idea here is not that this is

02:27 a final video that you can just publish, but it is to take you 70, 80% of the way. Instead

02:32 of having what we internally call the blank screen of death, we give you something to

02:37 actually begin kind of working from. And we don't just write the script. We actually also

02:40 give you the visuals. So as you'll see here, there's some text on the screen. There'll

02:44 be some bullet points. There'll be a bunch of other things that you can sort of edit

02:48 yourself. Of course, everything can be edited. We have these avatars, which is the sort of

02:52 core of our product. And I think after this video is done, we'll of course see what the

02:56 video looks like. But it's just to give you a quick visual of how easy it actually is

03:00 to make videos with these systems. And it's one of the things that excites us the most

03:04 about this new wave of generative tools. It's awesome if we can make it easier for Hollywood

03:10 people and video production professionals to make video. But I think what's much, much

03:13 more interesting is taking everyone in the world, including everyone here, and making

03:17 you into video creators. In some ways, it sounds a bit science fiction. In other ways,

03:23 I think it's just natural evolution. If you go back 40, 50 years in time, it wasn't part

03:28 of most people's job to write text, right? You had secretaries. You had people in companies

03:33 who would write things on typewriters and send them to the right people. Then we all

03:36 got computers and keyboards. And now I'm pretty sure everyone in this room, you write as part

03:41 of your daily tasks, right? Then PowerPoint came along. We all became designers without

03:46 really knowing. Probably all of you here know how to operate a PowerPoint. And obviously,

03:51 we think that the next iteration of this is going to be video, which is just a much, much

03:55 better way of compressing information than text, at least for most people out there.

04:01 I think with that, let's see the demo of what the final video looks like. This uses some

04:04 of our latest technology, our latest model called Express One, which essentially teaches

04:09 the avatars how they should perform and behave. But let's see the video.

04:13 Imagine stepping into a room where the air crackles with the energy of innovation, a

04:17 place where the future of AI and its impact on our world is not just discussed, but shaped.

04:24 Welcome to Fortune Brainstorm AI in London, a gathering that promises to be the nexus

04:28 of AI's brightest minds from leading technology companies.

04:34 At Fortune Brainstorm AI in London, expect to dive deep into conversations that matter.

04:40 Hear from Google DeepMind's Vice President and Faculty AI CEO as they unveil the future

04:46 of AI and its transformative potential on society. Microsoft's Chief Scientist alongside

04:52 Accenture's Chief AI Officer will explore the profound changes generative AI is set to

04:57 bring to the workplace. This event is not just about listening, but engaging with roundtable

05:03 sessions and ample opportunities for networking. I hope you enjoy the event.

05:09 Great. It's very impressive given how quickly it all generates. Yeah, absolutely.

05:16 Thank you. And now we'll hear from Madi about 11 Labs,

05:20 which is, you may know about voice cloning. Let's see how this works.

05:23 Thank you, Jeremy. I will attempt doing it live and maybe that's a very quick introduction.

05:29 11 Labs is an audio AI research and deployment company with the goal of making content universally

05:35 accessible across voices and languages. And what I'll show you now is three demos, quick

05:40 demos of how some of those technologies come to life. You should get a visual any second

05:45 now to the screen. And the first of those building blocks is one that most of you might

05:51 be familiar with, with foundational text to speech. And that's one of the breakthroughs

05:55 that we've figured out of how you can take existing text and by understanding the nuance

06:01 of the context, turn it into emotions and into the right intonation with a specific

06:06 voice. So queuing up, I will use 11 Labs to help me with the introduction to all of you

06:12 and we'll see how that comes across.

06:14 Ladies and gentlemen, welcome to the Fortune AI Brainstorm Summit. We are thrilled to have

06:21 you join us from around the globe for this dynamic gathering of minds and ideas. Today,

06:29 we stand on the brink of new discoveries and innovations in the field of artificial intelligence

06:35 that promise to redefine what's possible in business, society and beyond.

06:44 You could hear the pauses, you could hear the intonation, you could hear the excitement

06:47 and also a little bit of ponder with the hmm as the AI was speaking. And this is one of

06:52 the voices from one of voice actors we work with. And actually there's plenty of those

06:58 voice actors as part of our platform. We now have over a thousand such voices, which created

07:03 clones of their voices and then shared it while every time it's being generated, they

07:07 earn compensation and return as well. And whether you want this voice, a voice with

07:11 an Australian accent, with a different style or gender, this is all possible within the

07:16 platform. But this is not the end. And one of the things that's super exciting about

07:20 the technology, that it not only allows you to create the content in the language of the

07:24 speaker, but actually take the content and turn it into other languages while preserving

07:29 the same voice and the same characteristics. I'll swap over to a demo that some of you

07:36 might be familiar. It's a John F. Kennedy speech from moon landing. And we'll see how

07:42 initially he speaks in English. And then I will flip it to Spanish and to Hindi live.

07:48 And you'll see how some of those characteristics come across while we play the demo.

07:53 >> But why some say the moon? Why choose this as our goal? And they may well ask why climb

08:04 the mountain? Why do we choose to go to the moon? Why do we choose to go to the moon?

08:15 >> Three, two, one, zero. Take off.

08:21 >> We chose to go to the moon this decade to do other things, not because they are easy,

08:27 but because they are difficult. Because the goal is to provide our energy and skill and

08:34 to measure it. Because it is a challenge that we are ready to accept. One that we don't

08:41 want to avoid. And we intend to win.

08:46 >> That's one small step for man, one giant leap for mankind.

08:53 >> You can see the emotions come across while the moon and the liftoff happened. And that's

08:58 something we're excited about. How we can take that content and enable those global

09:01 audiences to watch it while still enjoying that original experience. But of course in

09:06 the audio world, there's so much more than speech and the voices. And as we think about

09:10 the future, about our work, is how we can enable some of that work across some of the

09:15 tangential domains. The sound effects, the music. How we can bring it all together with

09:19 the video and truly make it an immersive experience. And to close it off, I'll show you

09:25 something that you might be familiar from text to video work with Sora, from OpenAI,

09:29 with Synthesia as well, where we'll take some of the stitches from the videos and we'll

09:34 supplement it with AI-generated sound effects. What we're trying to do is hit that unmute

09:39 button with those videos that you've probably seen online out there.

09:42 [VIDEO PLAYBACK]

10:06 - In a place beyond imagination, where the horizon kisses the heavens, one man dares

10:12 to journey where few have ventured. Armed with nothing but his wit and an unyielding

10:17 spirit, he seeks the answers to mysteries that lie beyond the stars.

10:22 [END PLAYBACK]

10:25 And that's the goal. To make all the content out there accessible across voices,

10:30 across languages, across sounds. And thank you.

10:34 [APPLAUSE]

10:40 - Well, I think you all agree those are pretty impressive demos. And amazing

10:45 technology, also a little bit scary. And I think looking at this, a lot of people

10:49 immediately think about some of the negative use cases around deepfakes, around

10:54 fraud. I know there's already been some concern around 11 Labs, whether people have

10:59 used voice clones to perpetuate frauds. How are you guys trying to prevent that

11:05 from happening and to make sure these technologies are not used for harmful

11:10 uses? Victor, maybe you go first.

11:13 - Yeah, sure. I think as with most new technologies, we immediately go to all

11:17 things and go wrong, right? And I think in this case, that's definitely right.

11:20 These technologies will be used by bad actors to do bad things, for sure. I think

11:24 we should not be staring about that at all. And so for us, the sort of safeguarding

11:28 the technology has always been part of the company from day one. So we found the

11:32 company in what we call our ethical framework, which is the three Cs, consent,

11:36 control, and collaboration. It's a long topic, but I think the kind of

11:40 fundamental keystone for us is around consent. So never, ever recreate anyone's

11:46 voice or video avatar without the explicit consent.

11:49 - And how do you guarantee consent?

11:51 - So we have a KYC style check when you go through, right? You submit your avatar

11:54 footage, it's reviewed by a human being. You have to say out some specific

11:58 sentences to make sure you can make your clone. So essentially, it is impossible

12:01 today to go in and take some YouTube videos or something and make a clone of

12:04 someone. That is just not possible. And we intend to keep it that way.

12:08 Control is about content moderation, so we employ pretty heavy content moderation

12:12 to quite strong stance on what you're allowed to create, what you're not

12:14 allowed to create. I think for most of us, the sort of what we call the red

12:18 content is easy to agree on, hate speech, violence, swearing, things like that.

12:22 But what gets harder is, well, someone's making a video about cryptocurrencies,

12:25 for example, right? Are you talking about what a great technological invention that

12:28 is, or you're trying to lead me into some fraudulent scheme that promises I'm going

12:32 to get rich in 10 days, right? And we do our best to catch that stuff.

12:36 It's a huge effort. It's, of course, a lot of computers and AI, but it's also a lot

12:40 of humans sort of figuring out who are the bad faith actors on the platform.

12:45 So that's a big, that's essentially an internal product for us. That said,

12:49 it is an incredibly small amount of users we catch trying to do this,

12:52 but we do think it is our responsibility to take some stance on that.

12:55 And the last one is...

12:56 - Okay, good.

12:57 - No, that's fine.

12:58 - No, Matti, I want to get you in there. How are you at 11 Labs are you trying to

13:00 prevent this sort of misuse?

13:01 - Yeah, I think, first of all, Jeremy, you're right. The deepfakes is a serious

13:04 concern as we think about this year, AI-generated content across audio,

13:07 images, videos coming across. And it's something that's a concern because

13:12 open source, other commercial models are becoming more active, and there will be

13:16 bad actors that will use and abuse them to the nefarious cases. At 11 Labs,

13:21 it's education, making everyone aware of the technology. Second is traceability.

13:27 All the content that's generated by 11 Labs can be traced back to a specific user,

13:30 specific account, and as they use the technology, they will need to verify

13:34 themselves through different steps depending on the technology they use.

13:37 And the last one is traceability, but also how you can embed the traceability

13:42 as part of detection. So effectively, all the content out there should be known

13:47 as AI-generated, and there should be tools that allow you to quickly get that

13:51 information as a viewer. And we released our tool publicly that allows everybody

13:55 to use it and find whether it was 11 Labs or not and report it back to us.

13:59 - So right now that that exists, if I run across some content, an audio clip on the

14:04 internet on X or on LinkedIn or whatever, I can check and see if it was 11 Labs

14:09 generated?

14:10 - Yes, you can. You can directly go on our website, upload that clip,

14:14 and get probability of how likely it is. And now we are working with some partners

14:19 to extend it across other technologies. So some of the open-source model,

14:22 other commercial models, that's part one. And then part two, because as you said,

14:26 you can run into that in social media, but you might not be aware that it's AI-generated

14:30 in the first place. So how we can collaborate with social media,

14:33 with telecommunication companies to check it on the fly while that content is being

14:37 shared.

14:38 - I wanted to ask a question about sort of the ethics of some of this.

14:41 There's an interesting use case on the JFK speech. There have already been some

14:45 cases of politicians in various countries wanting to appeal to a certain audience

14:49 in a language that they don't speak. And they have translated their speeches in

14:53 real time, I think maybe even using some of this technology, to appear to speak

14:57 those languages. And they've said, "Oh, that's legitimate. I'm just trying to

15:00 reach a different audience with my content, with my political content."

15:03 But other people have said, "No, that's a complete red flag.

15:06 You don't actually speak that language. You're presenting yourself as something

15:08 you're not." Do you have a policy around this? And also at Synthesia,

15:12 do you have a policy around the creation of political content?

15:15 - Yeah. So we don't allow political content today. I think that will change

15:18 over time. But we take generally a quite permissive stance on what you can use

15:23 these technologies for. We're also mainly serving the enterprise.

15:27 I do think it's a very interesting philosophical question. I think in five

15:29 years' time, something like that, no one will be talking about this anymore,

15:32 and everyone will be doing it. But I think as with any new technology,

15:35 you will have these sort of years, right, where people are trying to figure out

15:38 what's right, what's wrong. You could say, "Is it okay for a politician to have a

15:42 ghostwriter to write a piece in Fortune?" Right? They didn't actually write it

15:45 themselves. Is that worse? Is it the same? Is it okay? So I think, you know,

15:49 all new technologies come with these sort of interesting questions and dilemmas.

15:52 And some of them will ring out to be very important and very true.

15:55 And I think some of them we'll care less about in the future.

15:59 - And Mati, do you have a policy on that?

16:01 - I would certainly echo some of the victory pieces as well, where it's going

16:06 to happen at one stage, and now as a society, we need to come together and

16:09 figure out what's the right way of approaching that. Currently, we don't

16:14 allow people around to impersonate any of that work or pretend that they are part

16:20 of the candidates and do the political speech or political content.

16:25 - Great. I want to take one question from the audience because I think I only have

16:28 time for one. But if there's a question, please raise your hand, and I'll try to

16:31 come to you. Right here, I see there's a gentleman here at this table. If you

16:35 raise your hand again. Oh, that's great. If you could stand up and identify

16:38 yourself.

16:39 - Hi. It's Mark Salmon from the Times. When Sadiq Khan had a deep fake of him

16:44 made, the Met investigated, and they concluded that they couldn't take any

16:48 action. Do you think it's possible to write a law in this country to combat

16:56 against that kind of deep fake, and do you think that's desirable?

17:00 - Should there be laws against deep fakes?

17:03 - I'm obviously not a lawmaker, so I can't give you a definitive answer to that.

17:07 I can give you my personal opinion. I think, ultimately, deep fakes is an

17:11 advanced version of impersonation, and impersonation is, in most cases, illegal

17:16 if you do it for deceptive reasons like fraud, for example. I definitely think we

17:19 should include deep fakes of that, maybe even make the punishment harder if you're

17:23 using a deep fake for malicious impersonation. But where to draw the lines

17:27 legally, I think, is very difficult to say. That's not my question.

17:31 - And to add one quick note to that, and as you rightly asked, this content is out

17:36 there. There's a plethora of tools that can generate that content, and what's the

17:39 true solution? There's definitely the legislature that the countries can pass,

17:43 but then also on the technical level, what we can do, and beyond traceability and

17:46 being able to detect it, as you think a few years out, how this could work as part

17:50 of the society. And one thing we are advocating for is beyond watermarking

17:54 what's AI and approved AI, maybe there's a version of watermarking what's real

17:58 content and for Sadek Han or any other candidate to be able to transmit a

18:03 message, and that message is decoded, "This is really Sadek Han that's speaking."

18:07 So we don't only detect what's fake, but actually verify what's true.

18:12 - Excellent. We're out of time. Thank you very much for the demos, and thank you

18:15 very much for the conversation. Put your hands together, please, for Maddie and

18:18 Victor.

18:20 ♪ [music] ♪

18:23 [BLANK_AUDIO]

Category

Transcript

Recommended