• last year
Meta’s Chief AI Scientist Yann LeCun discusses why he supports open source large learning models and why models need to live in the world to achieve autonomy.

Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1

Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:

https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript

Stay Connected
Forbes newsletters: https://newsletters.editorial.forbes.com
Forbes on Facebook: http://fb.com/forbes
Forbes Video on Twitter: http://www.twitter.com/forbes
Forbes Video on Instagram: http://instagram.com/forbes
More From Forbes: http://forbes.com

Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.

Category

🤖
Tech
Transcript
00:00 Thank you, Jan, and welcome.
00:08 And my God, thank you for this.
00:09 It's the highlight of my year, the opportunity to talk to you.
00:12 I don't know what you can see right now, but there are 2,000 of the smartest people on
00:15 the planet watching you from Cambridge.
00:18 And boy, what an opportunity to pick your brain.
00:21 He's in stereo.
00:22 Look at that one.
00:23 Well, I can see them from the back.
00:25 Okay.
00:26 Yeah, actually, if you want, Jan, to see your faces behind you too.
00:29 So Jan, what an amazing coincidence.
00:33 Lama 3 dropped just while we were meeting today.
00:36 What are the odds?
00:37 That's unbelievable.
00:38 Absolutely staggering.
00:42 So what came out today was 8B, Lama 3, 8 billion and 70B.
00:47 So far, what we're hearing in the rumor bill is that the 8B performs as well as the old
00:51 Lama 2, 70B did.
00:53 So we're looking at an order of magnitude change.
00:55 Does it sound about right to you?
00:56 Also, I noticed it was trained on 15 trillion parameters.
00:59 Where did you come up with 15 trillion tokens?
01:01 Where did you come up with 15 trillion tokens?
01:03 Okay.
01:04 So the first thing I have to say is that I deserve no credit whatsoever for Lama 3.
01:13 Maybe a little bit of credit for making sure our models are open source, but the technical
01:21 contributions are from a very large collection of people and have a very, very small part
01:28 of it.
01:29 15 trillion tokens.
01:30 Yeah.
01:31 I mean, you need to get all the data you can get, all the high quality public data, and
01:35 then fine tune and license data and everything.
01:41 So that's how you get to 15 trillion.
01:43 But that's kind of a bit, it's kind of saturating.
01:46 There is only so much text you can get and that's about it.
01:49 Well, I got to say, I owe a big fraction of my life journey to you.
01:54 You didn't know it, but when you were doing optical character recognition way back in
01:58 the day, I was reading your CNN papers.
02:01 He invented convolutional neural nets, which really made those things work.
02:05 That became my very first dollar of revenue I ever made in a startup, was doing neural
02:08 networks based on your work.
02:10 Changed the course of my life.
02:11 Now you're doing it again, especially for you young folks in the front here.
02:16 By being the champion of open source, I think you're fundamentally giving them an opportunity
02:20 to build companies that otherwise wouldn't be able to be built.
02:23 So first of all, a huge debt of gratitude for you for championing that.
02:31 So the next thing that happens could be one of those events we look back on in history
02:36 and say that was a turning point for humanity.
02:39 The 750B monster neural net will come out soon, will also be open source I assume?
02:45 405B, from what I gather.
02:49 About 400 million.
02:50 About 400 million?
02:51 A billion.
02:52 A billion, yeah.
02:53 Dense, not sparse, which is interesting.
02:58 So yeah, it's still training, despite all the computers we have our hands on.
03:05 It still takes a lot of time, takes a lot of time to fine tune.
03:09 But it's going to come out.
03:10 A bunch of those variations of those models are going to come out over the next few months.
03:15 Yeah, I was going to ask that question next.
03:17 So they didn't come out concurrently, which is interesting, which means it must still
03:21 be in the training process.
03:22 It's such a massive endeavor.
03:24 And I saw in the news that Facebook had bought another 500,000 NVIDIA chips, bringing the
03:30 total to about a million, by my math, unless you got a discount.
03:33 You might have gotten a volume discount.
03:35 But that's $30 billion worth of chips, which would make the training of this model bigger
03:41 than the Apollo moon mission in terms of research and development.
03:46 Am I getting that about right?
03:48 It's staggering, isn't it?
03:49 Yeah, I mean, a lot of, you know, not just training, but also deployment is limited by
03:56 computational abilities.
03:59 I think one of the issues that we're facing, of course, is the supply of GPUs.
04:05 That's one of them, and the cost of them at the moment.
04:09 But another one is actually scaling up the learning algorithm so that they can be parallelized
04:13 on lots and lots of GPUs.
04:16 And progress on this has been kind of slow, like in the community.
04:22 So I think we're kind of waiting for breakthroughs there.
04:25 But we're also waiting for other breakthroughs that are, you know, in terms of architectures,
04:29 like new principles, new, like brand new blueprints with which to build AI systems that would
04:36 enable them to do things they can't do today.
04:38 And so since you brought it up, the philosophy of taking an investment that size and then
04:44 open sourcing it, there's no historical precedent for this.
04:47 And the equivalent would be as, you know, if you built a gigafactory that builds Teslas
04:54 and somehow you gave it to society.
04:56 But the thing is, once you open source it, it can be infinitely copied.
04:59 So it's not even a good analogy to talk about a gigafactory being open sourced.
05:03 So there's no precedent for this in business history.
05:05 What's the logic behind making it open source?
05:07 What do you want to see happen from this?
05:10 Well, so what's happened, I mean, certainly the whole idea of open sourcing infrastructure
05:15 software is very prevalent today.
05:19 And it's been in the DNA of Meta, you know, Facebook before that, since the beginning.
05:24 There's a lot of open source packages that are basically infrastructure software that
05:30 Meta has been open sourcing over the years, including in AI, right?
05:34 So everybody is using PyTorch.
05:36 Well, everybody except a few people at Google, but pretty much everybody is using PyTorch.
05:42 And that's open source.
05:45 It was built originally at Meta.
05:47 Meta actually transferred the ownership of PyTorch to the Linux Foundation.
05:52 So it could be much more of a kind of community effort.
05:56 So that's really in the DNA of the company.
05:58 And the reason is, you know, infrastructure is better, becomes better, faster when it's
06:03 open source, when more people contribute to it, when there is sort of more eyeballs looking
06:08 at it, it's more secure as well.
06:10 So what is true for, you know, internet infrastructure software is also true for AI.
06:17 And then there is the additional thing for AI, which is that financial models are so
06:22 expensive to train.
06:25 It would be a complete waste of resources to, you know, have 50 different entities training
06:30 their own financial model.
06:31 I mean, it's much better if there is only a few, but they make them open.
06:35 And that basically creates the substrate for a whole ecosystem to take off.
06:42 And it's very much the same thing that happened to the internet in the 90s.
06:45 If you remember, in the mid 90s, when the internet started to get popular, the software
06:52 infrastructure was dominated by proprietary platforms from either Microsoft or some micro
06:59 systems.
07:00 And they both lost.
07:01 They kind of disappeared from that market.
07:03 Now it's all Linux, Apache, you know, MySQL, PHP, whatever, you know, all the open source
07:10 stuff, even the core of web browsers is open source.
07:15 Even the software stack of cell phones, cell phone towers is open source nowadays.
07:21 So infrastructure needs to be open source.
07:23 It just makes it progress faster, be more secure and everything.
07:26 Well, I'm so glad to hear you say that because there are definitely diverging philosophies
07:30 on that if you think about where open AI is going and where you're going.
07:34 But the version of the world that you're describing is one where all of these startups and all
07:39 of these teams can thrive and be competitive and create and innovate.
07:44 And the alternate version is the one where strong AI is invented in a box and is controlled
07:48 by a very small group of people and all the benefit, you know, confers to a very small
07:52 group.
07:53 So I don't have skin in the game on this, but I certainly love your version of the future
07:59 a lot more than alternate versions.
08:01 So very, very glad to hear you say it.
08:04 So I want to spend a lot of our time or limited time that we have talking about the implications
08:09 of this and where you see it going.
08:10 I also want to ask you about VJEPA.
08:12 So you've been very clear in saying that LLMs will take us down a path, incredible things
08:17 we can build, but it's not going to get you to a truly intelligent system.
08:22 You need experience in the world.
08:25 And VJEPA, I think, is your solution to that.
08:27 Is that going to carry us to that goal?
08:30 Tell us about VJEPA, first of all.
08:31 Okay.
08:32 Well, first of all, I have to tell you where I believe AI research is going.
08:38 And I wrote a fairly long kind of vision paper about this about two years ago that I put
08:42 online that you can look for.
08:45 It's on OpenReview.
08:46 It's called A Path Towards Autonomous Machine Intelligence.
08:48 I replace the autonomous by advanced now because people are scared by the word autonomous.
08:54 So we have this thing autonomous or advanced machine intelligence that's spelled AMI.
08:59 And in French, you pronounce it ami.
09:02 That means French, in French, which I think is a good analogy.
09:07 Anyway, current LLMs are very limited in their abilities.
09:13 And Stephen Wolfram just before actually pointed to that limitations as well.
09:20 One of them is they don't understand the world.
09:23 They don't understand the physical world.
09:25 The second one is they don't have persistent memory.
09:28 The third one is they can't really reason in the sense that we usually understand reasoning.
09:33 They can regurgitate previous reasoning that they've been trained on and adapt that into
09:41 the situation, but really not reason in the sense that we understand it for humans and
09:46 many animals.
09:47 And the last thing, which is also important, they can't really plan either.
09:51 They can, again, regurgitate plans that they've been trained on, but really plan in new situations
09:55 they can't.
09:56 And there is a lot of studies by various people that show the limitations of LLMs for planning
10:02 reasoning and understanding the world, et cetera.
10:06 So we need to basically design new architectures, which would be very different from the ones
10:11 we currently have that will make AI systems understand the world, have persistent memory,
10:16 and plan, and also be controllable in a way that you can give them objectives.
10:22 And the only thing they can do is fulfill those objectives and not do anything else,
10:27 subject to some guardrails.
10:29 So that's why we make them safe and controllable as well.
10:33 So the missing part is how do we get AI systems to understand the world by watching it a little
10:39 bit like baby animals and humans?
10:41 It takes a very long time for baby humans to really understand how the world works.
10:45 The whole idea of the fact that an object that is not supported falls because of gravity,
10:51 it takes nine months for human babies to learn this.
10:55 It's not something you're born with.
10:56 It's something you have to observe the world and understand the dynamics of it.
11:01 How do we reproduce this ability with machines?
11:05 So for almost 10 years now, my colleagues and I have been trying to train a system to
11:12 basically do video prediction, with the idea that if you get a system to predict what's
11:16 going to happen in a video, it's got to develop some understanding of the nature of the physical
11:21 world.
11:22 And it's been basically a complete failure.
11:24 And we tried many, many things for many years.
11:28 But then a few years ago, what we realized is that the architectures that we can use
11:33 to train deep learning systems to learn representations of images are not generative.
11:41 They are not things for which you take an image, you corrupt it, and then you train
11:47 a system to reconstruct the uncorrupted image, which is the way we train LLMs.
11:55 That's how we train LLMs, where we take a piece of text, we remove some of the words
11:58 and train some gigantic neural net to predict the words that are missing.
12:02 If you do this with images or video, it doesn't work.
12:04 I mean, it kind of works, but you get representations of images and videos that are not very good.
12:11 And the reason is, it's very difficult to actually reconstruct all the details of an
12:14 image or a video that is hidden from you.
12:18 And so what we figured out a few years ago is that the way to approach that problem is
12:22 through what we call a joint embedding architecture or a joint embedding predictive architecture,
12:27 which is what JEPA means.
12:29 It's an acronym.
12:31 And the idea of joint embedding architecture goes back to the early '90s.
12:35 Some people I worked on, we used to call them Siamese nets.
12:38 And the idea is basically, if you have, let's say, a piece of video and you mask some parts
12:46 of it, let's say the second half of the video, and then you train a big neural net to try
12:51 to predict what's going to happen next to the video, that would be a generative model.
12:56 Instead of that, we run both videos through encoders, and then we train a predictor in
13:02 the representation space to predict the representation of the video, not all the pixels of the video.
13:08 And you train the whole thing simultaneously.
13:11 We didn't know how to do this four years ago, and we kind of figured out a number of ways
13:14 to do this.
13:15 We now have half a dozen algorithms for this.
13:17 So VJEPA is a particular instance of this kind of thing.
13:22 And the results are very promising.
13:25 I think ultimately we're going to be able to build or train systems that basically have
13:30 mental world models, have some notion of intuitive physics, have some possibility of predicting
13:35 what's going to happen in the world as a result of taking an action, for example.
13:41 And if you have a model of the world of this type, then you can do planning.
13:45 You can plan a sequence of actions to arrive at a particular objective.
13:49 That's really what intelligence is about.
13:51 That's what we can do.
13:52 So I think that's a really critical question, actually.
13:56 When you use diffusion algorithms to create pictures, they'll make six fingers or four
14:00 fingers all the time.
14:01 They never make five fingers.
14:02 But these LLMs have a shocking amount of common sense, but they also are missing a shocking
14:08 amount of common sense.
14:09 Once you roll in the JEPA data, the VJEPA data, you give it a lot more of an opportunity
14:14 to think much more like we do, because all the real world experiences of moving around
14:17 and feeling things are folded into the training data.
14:21 So do you think the result of that will then be one massive foundation model?
14:26 Or are we still going to use the mixture of experts approach and glue them together in
14:31 kind of synthetic ways?
14:33 I think ultimately it's probably going to be one big model.
14:37 Of course it'd be modular in the sense that there's going to be multiple modules that
14:42 interact but are not necessarily completely connected with each other.
14:46 There's a big debate now in AI whether if you want a multi-modal system that deals with
14:52 text as well as images and video, should you do early fusion?
14:55 So should you basically tokenize images or videos and then turn them into kind of little
15:00 vectors that you concatenate with the text tokens?
15:04 Or should you do late fusion, which means run your images or video through some sort
15:10 of encoder that is more or less specialized for it and then have some merging at the top?
15:15 I'm more in favor of the second approach, but a lot of the current approaches actually
15:20 are more early fusion because it's easier, it's simpler.
15:24 I'm going to do the dangerous thing of asking you to predict the future, but if you can't
15:29 then nobody can, so it has to be you.
15:32 So once you roll in the VGEPA data and you train these massive models, and suppose you
15:37 go up another 10x, buy another $30 billion or so of chips.
15:43 The combination of the VGEPA data plus this massive scale, will that be enough to then
15:48 solve fundamental problems like physics problems and biological experimentation problems?
15:53 Or are we still missing something in the pathway that needs to be thought of and added after
15:58 that?
15:59 Well, it's clear that we're missing a number of things.
16:03 The problem is that we don't exactly know what.
16:05 And we can see the first obstacle, really, but where is that going afterward is not clear.
16:14 But the hope is that we're going to get systems that can have some level of common sense.
16:18 You know, at first they're not going to be as smart as a top mathematician or physicist,
16:22 but they're going to be as smart as your cat.
16:24 That would be a pretty good advance already if we had systems that were, you know, as
16:30 could understand the world like cats.
16:32 If you had systems that could be trained very easily in 10 minutes, like any 10-year-old
16:38 to clear out the dinner table and fill out the dishwasher, we would have domestic robots.
16:42 If we had systems that could learn to drive a car in 20 hours of practice, like any 17-year-old,
16:47 that would be a big advantage.
16:48 Hey, Yann, just I'm interrupting you for a sec.
16:52 It's going to take a while.
16:53 So, you know, we spoke at the time party at Davos on this subject, and we enjoyed having
17:01 you at Imagination Action in the Dome.
17:03 This is the second of three of our events.
17:05 I don't know if you realize this, but if you speak at all three, the next one's June 6th,
17:09 you get a Chia Pet.
17:10 This is a foot of a Chia Pet.
17:11 So and I think a Chia Pet would go great there.
17:14 Do you enjoy speaking under the dome, not the MIT dome, but the MIT event in Davos?
17:20 Yeah, that was fun.
17:22 Yeah.
17:23 All right.
17:24 Can I lock you in for next year?
17:25 There was a spectrum of people from the sort of techno-positive optimist, and I was not
17:33 like at the end of that spectrum.
17:36 And on the other side, Doomers, we think.
17:39 Oh, there's the Doomers.
17:40 It's Davos.
17:41 Yeah.
17:42 All right.
17:43 Well, we have someone from OpenAI, and given that you work at Meta, you may not want to
17:46 be seen in the same Zoom.
17:48 So ladies and gentlemen, Yann LeCoultre.
17:51 Thank you, Yann.
17:52 Thank you.
17:53 Well done.
17:54 Well done.
17:54 Thank you.
17:55 Thank you.
17:55 Thank you.
17:56 Thank you.
17:56 Thank you.
17:57 Thank you.
17:57 Thank you.
18:02 Thank you.
18:07 Thank you.
18:12 [BLANK_AUDIO]

Recommended