In many ways, 2023 was the year that people began to understand what AI really is—and what it can do. Here are three of the biggest AI innovations from the past year.
Category
🗞
NewsTranscript
00:00 My name is Billy Pirago, I'm a tech correspondent at Time Magazine and I've spent much of this year
00:04 reporting on artificial intelligence. In a lot of ways 2023 was the year that people began to
00:11 understand what AI really is, but there were plenty of innovations as well. Here's three to keep an eye on.
00:16 The first is multimodality. That's the ability of an AI system to work with lots of different
00:29 types of data, not just text but also images, video, audio and more. 2023 was the first year
00:35 that the public really gained access to powerful multimodal AI models like OpenAI's GPT-4 which
00:41 allowed users to upload images as well as text. GPT-4 could see the contents of images which opened
00:47 up all kinds of possibilities. You could ask it what to make for dinner based on a photograph of
00:51 what was inside your fridge or you could ask it how to fix your bike based on a photograph of a
00:55 broken part. Google DeepMind's latest model Gemini is also able to work with images as well as text.
01:01 In its launch video, after being shown an image of pink and blue yarn and asked what it could be used
01:06 to create, Gemini generated an image of a pink and blue octopus plushie. The real innovation behind
01:12 multimodality is that instead of just being trained on text, the new generation of models are trained
01:17 on video, images and audio. The belief inside many top AI companies is that this extra training data
01:24 will help these models become more capable and more powerful. It's a step on the path,
01:29 many AI scientists hope, towards so-called artificial general intelligence, the kind of
01:34 system that can act in the world, make new scientific discoveries and perform economically
01:39 valuable labour. The second big thing to watch in AI innovation from 2023 is constitutional AI. One
01:47 of the biggest unanswered questions in AI is how to align it to human values. If AI becomes smarter
01:53 and more powerful than humans, it could cause untold harm to our species, some even say total
01:58 extinction, unless somehow it's constrained by a set of rules that puts human survival and human
02:05 flourishing at its centre. Constitutional AI, first described by researchers at Anthropic in December
02:11 last year, harnesses the fact that AI systems are now basically capable enough to understand
02:15 natural language. The idea is quite simple. First, you write a constitution that lays out the values
02:22 you'd like your AI to follow. Then, you train the AI to score its own responses based on how aligned
02:28 they are to the constitution, and then incentivise the model to output only the responses that score
02:34 more highly. If you run that cycle enough times, you're left with an AI that has been reinforced
02:41 to behave in the way that you want it to, and to not behave in the way that you don't want it to.
02:46 There are some problems with constitutional AI. It requires trusting that the AI is interpreting
02:52 your constitution correctly, for example, but it's a promising addition to a field where new
02:56 alignment strategies are few and far between. Of course, constitutional AI doesn't solve the
03:01 problem of whose values AI should be aligned to. Today, it's a small number of Silicon Valley
03:06 executives who are writing those rules. But by making the act of setting rules for an AI so
03:11 explicit, constitutional AI could open the door to a future where the public gets more of a say
03:15 in how AI is governed. The third big thing to watch this year is text-to-video. One of the
03:23 noticeable outcomes of billions of dollars pouring to AI this year has been the rapid rise of text-to-video
03:28 tools. Last year, even text-to-image tools had barely emerged from their infancy, but now there
03:33 are several companies offering the ability to turn normal sentences into moving images with
03:38 increasingly fine-grained levels of accuracy. One of those companies is Runway, a Brooklyn-based AI
03:43 video startup that wants to make filmmaking accessible to anybody. And another is Pika AI,
03:48 which isn't pitched at professional filmmakers but at the general user. Tools like Pika and Runway
03:53 could transform the user-generated content experience as early as 2024, but text-to-video
03:59 is quite computationally expensive still, so don't be surprised if tools start charging for access.
04:04 [Music]