Meta’s Chief AI Scientist Yann LeCun discusses why he supports open source large learning models and why models need to live in the world to achieve autonomy.
Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1
Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:
https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript
Stay Connected
Forbes newsletters: https://newsletters.editorial.forbes.com
Forbes on Facebook: http://fb.com/forbes
Forbes Video on Twitter: http://www.twitter.com/forbes
Forbes Video on Instagram: http://instagram.com/forbes
More From Forbes: http://forbes.com
Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.
Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1
Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:
https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript
Stay Connected
Forbes newsletters: https://newsletters.editorial.forbes.com
Forbes on Facebook: http://fb.com/forbes
Forbes Video on Twitter: http://www.twitter.com/forbes
Forbes Video on Instagram: http://instagram.com/forbes
More From Forbes: http://forbes.com
Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.
Category
🤖
TechTranscript
00:00 Thank you, Jan, and welcome.
00:08 And my God, thank you for this.
00:09 It's the highlight of my year, the opportunity to talk to you.
00:12 I don't know what you can see right now, but there are 2,000 of the smartest people on
00:15 the planet watching you from Cambridge.
00:18 And boy, what an opportunity to pick your brain.
00:21 He's in stereo.
00:22 Look at that one.
00:23 Well, I can see them from the back.
00:25 Okay.
00:26 Yeah, actually, if you want, Jan, to see your faces behind you too.
00:29 So Jan, what an amazing coincidence.
00:33 Lama 3 dropped just while we were meeting today.
00:36 What are the odds?
00:37 That's unbelievable.
00:38 Absolutely staggering.
00:42 So what came out today was 8B, Lama 3, 8 billion and 70B.
00:47 So far, what we're hearing in the rumor bill is that the 8B performs as well as the old
00:51 Lama 2, 70B did.
00:53 So we're looking at an order of magnitude change.
00:55 Does it sound about right to you?
00:56 Also, I noticed it was trained on 15 trillion parameters.
00:59 Where did you come up with 15 trillion tokens?
01:01 Where did you come up with 15 trillion tokens?
01:03 Okay.
01:04 So the first thing I have to say is that I deserve no credit whatsoever for Lama 3.
01:13 Maybe a little bit of credit for making sure our models are open source, but the technical
01:21 contributions are from a very large collection of people and have a very, very small part
01:28 of it.
01:29 15 trillion tokens.
01:30 Yeah.
01:31 I mean, you need to get all the data you can get, all the high quality public data, and
01:35 then fine tune and license data and everything.
01:41 So that's how you get to 15 trillion.
01:43 But that's kind of a bit, it's kind of saturating.
01:46 There is only so much text you can get and that's about it.
01:49 Well, I got to say, I owe a big fraction of my life journey to you.
01:54 You didn't know it, but when you were doing optical character recognition way back in
01:58 the day, I was reading your CNN papers.
02:01 He invented convolutional neural nets, which really made those things work.
02:05 That became my very first dollar of revenue I ever made in a startup, was doing neural
02:08 networks based on your work.
02:10 Changed the course of my life.
02:11 Now you're doing it again, especially for you young folks in the front here.
02:16 By being the champion of open source, I think you're fundamentally giving them an opportunity
02:20 to build companies that otherwise wouldn't be able to be built.
02:23 So first of all, a huge debt of gratitude for you for championing that.
02:31 So the next thing that happens could be one of those events we look back on in history
02:36 and say that was a turning point for humanity.
02:39 The 750B monster neural net will come out soon, will also be open source I assume?
02:45 405B, from what I gather.
02:49 About 400 million.
02:50 About 400 million?
02:51 A billion.
02:52 A billion, yeah.
02:53 Dense, not sparse, which is interesting.
02:58 So yeah, it's still training, despite all the computers we have our hands on.
03:05 It still takes a lot of time, takes a lot of time to fine tune.
03:09 But it's going to come out.
03:10 A bunch of those variations of those models are going to come out over the next few months.
03:15 Yeah, I was going to ask that question next.
03:17 So they didn't come out concurrently, which is interesting, which means it must still
03:21 be in the training process.
03:22 It's such a massive endeavor.
03:24 And I saw in the news that Facebook had bought another 500,000 NVIDIA chips, bringing the
03:30 total to about a million, by my math, unless you got a discount.
03:33 You might have gotten a volume discount.
03:35 But that's $30 billion worth of chips, which would make the training of this model bigger
03:41 than the Apollo moon mission in terms of research and development.
03:46 Am I getting that about right?
03:48 It's staggering, isn't it?
03:49 Yeah, I mean, a lot of, you know, not just training, but also deployment is limited by
03:56 computational abilities.
03:59 I think one of the issues that we're facing, of course, is the supply of GPUs.
04:05 That's one of them, and the cost of them at the moment.
04:09 But another one is actually scaling up the learning algorithm so that they can be parallelized
04:13 on lots and lots of GPUs.
04:16 And progress on this has been kind of slow, like in the community.
04:22 So I think we're kind of waiting for breakthroughs there.
04:25 But we're also waiting for other breakthroughs that are, you know, in terms of architectures,
04:29 like new principles, new, like brand new blueprints with which to build AI systems that would
04:36 enable them to do things they can't do today.
04:38 And so since you brought it up, the philosophy of taking an investment that size and then
04:44 open sourcing it, there's no historical precedent for this.
04:47 And the equivalent would be as, you know, if you built a gigafactory that builds Teslas
04:54 and somehow you gave it to society.
04:56 But the thing is, once you open source it, it can be infinitely copied.
04:59 So it's not even a good analogy to talk about a gigafactory being open sourced.
05:03 So there's no precedent for this in business history.
05:05 What's the logic behind making it open source?
05:07 What do you want to see happen from this?
05:10 Well, so what's happened, I mean, certainly the whole idea of open sourcing infrastructure
05:15 software is very prevalent today.
05:19 And it's been in the DNA of Meta, you know, Facebook before that, since the beginning.
05:24 There's a lot of open source packages that are basically infrastructure software that
05:30 Meta has been open sourcing over the years, including in AI, right?
05:34 So everybody is using PyTorch.
05:36 Well, everybody except a few people at Google, but pretty much everybody is using PyTorch.
05:42 And that's open source.
05:45 It was built originally at Meta.
05:47 Meta actually transferred the ownership of PyTorch to the Linux Foundation.
05:52 So it could be much more of a kind of community effort.
05:56 So that's really in the DNA of the company.
05:58 And the reason is, you know, infrastructure is better, becomes better, faster when it's
06:03 open source, when more people contribute to it, when there is sort of more eyeballs looking
06:08 at it, it's more secure as well.
06:10 So what is true for, you know, internet infrastructure software is also true for AI.
06:17 And then there is the additional thing for AI, which is that financial models are so
06:22 expensive to train.
06:25 It would be a complete waste of resources to, you know, have 50 different entities training
06:30 their own financial model.
06:31 I mean, it's much better if there is only a few, but they make them open.
06:35 And that basically creates the substrate for a whole ecosystem to take off.
06:42 And it's very much the same thing that happened to the internet in the 90s.
06:45 If you remember, in the mid 90s, when the internet started to get popular, the software
06:52 infrastructure was dominated by proprietary platforms from either Microsoft or some micro
06:59 systems.
07:00 And they both lost.
07:01 They kind of disappeared from that market.
07:03 Now it's all Linux, Apache, you know, MySQL, PHP, whatever, you know, all the open source
07:10 stuff, even the core of web browsers is open source.
07:15 Even the software stack of cell phones, cell phone towers is open source nowadays.
07:21 So infrastructure needs to be open source.
07:23 It just makes it progress faster, be more secure and everything.
07:26 Well, I'm so glad to hear you say that because there are definitely diverging philosophies
07:30 on that if you think about where open AI is going and where you're going.
07:34 But the version of the world that you're describing is one where all of these startups and all
07:39 of these teams can thrive and be competitive and create and innovate.
07:44 And the alternate version is the one where strong AI is invented in a box and is controlled
07:48 by a very small group of people and all the benefit, you know, confers to a very small
07:52 group.
07:53 So I don't have skin in the game on this, but I certainly love your version of the future
07:59 a lot more than alternate versions.
08:01 So very, very glad to hear you say it.
08:04 So I want to spend a lot of our time or limited time that we have talking about the implications
08:09 of this and where you see it going.
08:10 I also want to ask you about VJEPA.
08:12 So you've been very clear in saying that LLMs will take us down a path, incredible things
08:17 we can build, but it's not going to get you to a truly intelligent system.
08:22 You need experience in the world.
08:25 And VJEPA, I think, is your solution to that.
08:27 Is that going to carry us to that goal?
08:30 Tell us about VJEPA, first of all.
08:31 Okay.
08:32 Well, first of all, I have to tell you where I believe AI research is going.
08:38 And I wrote a fairly long kind of vision paper about this about two years ago that I put
08:42 online that you can look for.
08:45 It's on OpenReview.
08:46 It's called A Path Towards Autonomous Machine Intelligence.
08:48 I replace the autonomous by advanced now because people are scared by the word autonomous.
08:54 So we have this thing autonomous or advanced machine intelligence that's spelled AMI.
08:59 And in French, you pronounce it ami.
09:02 That means French, in French, which I think is a good analogy.
09:07 Anyway, current LLMs are very limited in their abilities.
09:13 And Stephen Wolfram just before actually pointed to that limitations as well.
09:20 One of them is they don't understand the world.
09:23 They don't understand the physical world.
09:25 The second one is they don't have persistent memory.
09:28 The third one is they can't really reason in the sense that we usually understand reasoning.
09:33 They can regurgitate previous reasoning that they've been trained on and adapt that into
09:41 the situation, but really not reason in the sense that we understand it for humans and
09:46 many animals.
09:47 And the last thing, which is also important, they can't really plan either.
09:51 They can, again, regurgitate plans that they've been trained on, but really plan in new situations
09:55 they can't.
09:56 And there is a lot of studies by various people that show the limitations of LLMs for planning
10:02 reasoning and understanding the world, et cetera.
10:06 So we need to basically design new architectures, which would be very different from the ones
10:11 we currently have that will make AI systems understand the world, have persistent memory,
10:16 and plan, and also be controllable in a way that you can give them objectives.
10:22 And the only thing they can do is fulfill those objectives and not do anything else,
10:27 subject to some guardrails.
10:29 So that's why we make them safe and controllable as well.
10:33 So the missing part is how do we get AI systems to understand the world by watching it a little
10:39 bit like baby animals and humans?
10:41 It takes a very long time for baby humans to really understand how the world works.
10:45 The whole idea of the fact that an object that is not supported falls because of gravity,
10:51 it takes nine months for human babies to learn this.
10:55 It's not something you're born with.
10:56 It's something you have to observe the world and understand the dynamics of it.
11:01 How do we reproduce this ability with machines?
11:05 So for almost 10 years now, my colleagues and I have been trying to train a system to
11:12 basically do video prediction, with the idea that if you get a system to predict what's
11:16 going to happen in a video, it's got to develop some understanding of the nature of the physical
11:21 world.
11:22 And it's been basically a complete failure.
11:24 And we tried many, many things for many years.
11:28 But then a few years ago, what we realized is that the architectures that we can use
11:33 to train deep learning systems to learn representations of images are not generative.
11:41 They are not things for which you take an image, you corrupt it, and then you train
11:47 a system to reconstruct the uncorrupted image, which is the way we train LLMs.
11:55 That's how we train LLMs, where we take a piece of text, we remove some of the words
11:58 and train some gigantic neural net to predict the words that are missing.
12:02 If you do this with images or video, it doesn't work.
12:04 I mean, it kind of works, but you get representations of images and videos that are not very good.
12:11 And the reason is, it's very difficult to actually reconstruct all the details of an
12:14 image or a video that is hidden from you.
12:18 And so what we figured out a few years ago is that the way to approach that problem is
12:22 through what we call a joint embedding architecture or a joint embedding predictive architecture,
12:27 which is what JEPA means.
12:29 It's an acronym.
12:31 And the idea of joint embedding architecture goes back to the early '90s.
12:35 Some people I worked on, we used to call them Siamese nets.
12:38 And the idea is basically, if you have, let's say, a piece of video and you mask some parts
12:46 of it, let's say the second half of the video, and then you train a big neural net to try
12:51 to predict what's going to happen next to the video, that would be a generative model.
12:56 Instead of that, we run both videos through encoders, and then we train a predictor in
13:02 the representation space to predict the representation of the video, not all the pixels of the video.
13:08 And you train the whole thing simultaneously.
13:11 We didn't know how to do this four years ago, and we kind of figured out a number of ways
13:14 to do this.
13:15 We now have half a dozen algorithms for this.
13:17 So VJEPA is a particular instance of this kind of thing.
13:22 And the results are very promising.
13:25 I think ultimately we're going to be able to build or train systems that basically have
13:30 mental world models, have some notion of intuitive physics, have some possibility of predicting
13:35 what's going to happen in the world as a result of taking an action, for example.
13:41 And if you have a model of the world of this type, then you can do planning.
13:45 You can plan a sequence of actions to arrive at a particular objective.
13:49 That's really what intelligence is about.
13:51 That's what we can do.
13:52 So I think that's a really critical question, actually.
13:56 When you use diffusion algorithms to create pictures, they'll make six fingers or four
14:00 fingers all the time.
14:01 They never make five fingers.
14:02 But these LLMs have a shocking amount of common sense, but they also are missing a shocking
14:08 amount of common sense.
14:09 Once you roll in the JEPA data, the VJEPA data, you give it a lot more of an opportunity
14:14 to think much more like we do, because all the real world experiences of moving around
14:17 and feeling things are folded into the training data.
14:21 So do you think the result of that will then be one massive foundation model?
14:26 Or are we still going to use the mixture of experts approach and glue them together in
14:31 kind of synthetic ways?
14:33 I think ultimately it's probably going to be one big model.
14:37 Of course it'd be modular in the sense that there's going to be multiple modules that
14:42 interact but are not necessarily completely connected with each other.
14:46 There's a big debate now in AI whether if you want a multi-modal system that deals with
14:52 text as well as images and video, should you do early fusion?
14:55 So should you basically tokenize images or videos and then turn them into kind of little
15:00 vectors that you concatenate with the text tokens?
15:04 Or should you do late fusion, which means run your images or video through some sort
15:10 of encoder that is more or less specialized for it and then have some merging at the top?
15:15 I'm more in favor of the second approach, but a lot of the current approaches actually
15:20 are more early fusion because it's easier, it's simpler.
15:24 I'm going to do the dangerous thing of asking you to predict the future, but if you can't
15:29 then nobody can, so it has to be you.
15:32 So once you roll in the VGEPA data and you train these massive models, and suppose you
15:37 go up another 10x, buy another $30 billion or so of chips.
15:43 The combination of the VGEPA data plus this massive scale, will that be enough to then
15:48 solve fundamental problems like physics problems and biological experimentation problems?
15:53 Or are we still missing something in the pathway that needs to be thought of and added after
15:58 that?
15:59 Well, it's clear that we're missing a number of things.
16:03 The problem is that we don't exactly know what.
16:05 And we can see the first obstacle, really, but where is that going afterward is not clear.
16:14 But the hope is that we're going to get systems that can have some level of common sense.
16:18 You know, at first they're not going to be as smart as a top mathematician or physicist,
16:22 but they're going to be as smart as your cat.
16:24 That would be a pretty good advance already if we had systems that were, you know, as
16:30 could understand the world like cats.
16:32 If you had systems that could be trained very easily in 10 minutes, like any 10-year-old
16:38 to clear out the dinner table and fill out the dishwasher, we would have domestic robots.
16:42 If we had systems that could learn to drive a car in 20 hours of practice, like any 17-year-old,
16:47 that would be a big advantage.
16:48 Hey, Yann, just I'm interrupting you for a sec.
16:52 It's going to take a while.
16:53 So, you know, we spoke at the time party at Davos on this subject, and we enjoyed having
17:01 you at Imagination Action in the Dome.
17:03 This is the second of three of our events.
17:05 I don't know if you realize this, but if you speak at all three, the next one's June 6th,
17:09 you get a Chia Pet.
17:10 This is a foot of a Chia Pet.
17:11 So and I think a Chia Pet would go great there.
17:14 Do you enjoy speaking under the dome, not the MIT dome, but the MIT event in Davos?
17:20 Yeah, that was fun.
17:22 Yeah.
17:23 All right.
17:24 Can I lock you in for next year?
17:25 There was a spectrum of people from the sort of techno-positive optimist, and I was not
17:33 like at the end of that spectrum.
17:36 And on the other side, Doomers, we think.
17:39 Oh, there's the Doomers.
17:40 It's Davos.
17:41 Yeah.
17:42 All right.
17:43 Well, we have someone from OpenAI, and given that you work at Meta, you may not want to
17:46 be seen in the same Zoom.
17:48 So ladies and gentlemen, Yann LeCoultre.
17:51 Thank you, Yann.
17:52 Thank you.
17:53 Well done.
17:54 Well done.
17:54 Thank you.
17:55 Thank you.
17:55 Thank you.
17:56 Thank you.
17:56 Thank you.
17:57 Thank you.
17:57 Thank you.
18:02 Thank you.
18:07 Thank you.
18:12 [BLANK_AUDIO]