How does AI Art work? Text to image conversion explained

last year

Generative AI models are in full swing right now. They're almost universally accessible and incredibly easy to use. They can generate images, videos and even music. But how do they work? How do AI models learn to generate images? Are they blurring the line between what's real and what isn't? Rukmini Ravishankar explores.

Transcript

00:00 To see is to believe.

00:02 How many times have we heard that?

00:04 We're always looking for visual proof that something exists.

00:07 And once we find that proof, we jump ahead and believe it, no matter what it is.

00:11 So I have a question for you.

00:13 Which one of these images do you think is real?

00:16 This image of a Zomato delivery person dancing in the rain?

00:19 Or this image of Barack Obama in a hat?

00:24 Whichever one you guessed, you'd be wrong.

00:29 Neither one of these is real.

00:31 Both of them are generated by artificial intelligence.

00:34 Which begs the question, can we believe everything we see?

00:38 Whether we like it or not, artificial intelligence has become a part of our daily lives.

00:52 Every time we use a Snapchat filter to add cat ears to our face,

00:55 every time we use Google Maps to navigate to our destination,

00:58 and every time we speak to Customer Care because of a defective product,

01:01 we're interacting with artificial intelligence.

01:03 What's relatively new is generative AI.

01:06 Most of us experienced generative AI for the first time when we tried out ChatGPT.

01:11 ChatGPT and other AI chatbots can simulate human-like dialogue.

01:15 From preparing legal briefs to composing college essays,

01:18 these can generate any text you want them to.

01:21 And it's not just text.

01:22 There are generative AI models that can create images, video, and even music.

01:26 These are being tweaked and enhanced as we speak.

01:29 The appeal of generative AI is in the simplicity of the interface.

01:32 Almost anyone can use it.

01:34 I've never been very great at art.

01:37 I don't think I have the ability to create something visually appealing

01:40 with just a canvas, some material, and my hands.

01:43 But I'm going to give myself 10 minutes to draw something.

01:46 And that something is written inside this piece of paper.

01:52 [Drawing]

01:54 I'm very quickly going to look up images of the Eiffel Tower on my phone.

02:04 I know what it looks like, but...

02:06 I'd like some reference.

02:09 [Drawing]

02:12 [Music]

02:40 We've reached a point in the evolution of generative AI

02:43 where my abysmal drawing skills will no longer affect my ability to express myself visually.

02:47 And this is how it works.

02:49 All I need to do is type a prompt here.

02:53 A prompt is the text you give the AI model,

02:56 describing the image you want it to generate.

02:58 Wow.

03:06 [Drawing]

03:08 Here it is.

03:14 An image that is better than what I've drawn,

03:16 and in much, much lesser time too.

03:19 The more detailed the prompt is, the better.

03:22 I can also try and change the aspect ratio

03:25 and the type of content.

03:32 Here's a photorealistic image of the prompt that was given to me.

03:36 I can also change the lighting.

03:41 So, the possibilities are endless.

03:44 In fact, there's a whole field named prompt engineering,

03:48 in which people can train and learn how to describe an idea

03:51 in such a way that the AI model can understand it

03:54 and generate an acceptable image.

03:56 So, how is the AI model doing this?

03:58 Am I just that good at describing an image?

04:00 You just saw that the more detail my prompts got,

04:03 the better the image turned out.

04:05 [Mumbling]

04:13 [Mumbling]

04:23 [Mumbling]

04:25 But that's not all.

04:32 The AI model actually learned to produce these images.

04:35 How did it learn?

04:36 Well, how does a human learn an art form?

04:39 We first consume several examples of the art form,

04:42 we observe the details of it, and then we practice it.

04:45 When we practice it, someone, usually a teacher,

04:47 points out what is wrong,

04:49 and we learn to tweak the way we do it.

04:52 [Music]

04:54 This is almost exactly the way an AI model learns to produce art.

05:06 It first consumes several thousands of examples of art,

05:09 usually fed into it by the human who is building the AI model.

05:12 These examples are part of what is called a data set.

05:15 AI that is being trained to generate images

05:18 learns from a data set of several millions of images.

05:21 An AI engineer shows the model these images.

05:24 The model learns the patterns, the details, the colors, the shapes, everything.

05:28 And it learns how to associate a particular visual with a particular text.

05:32 Once the AI model has learned from the data set,

05:34 it uses an algorithm to make decisions

05:36 and produce new art based on what it's learned.

05:39 To draw an analogy, an artist would use paints as the raw material

05:44 and would manipulate that paint with paint brushes, which is the tool.

05:48 Here, for me, there are two broad materials that I work with.

05:54 One is data and then algorithms, which are tools to manipulate that data.

05:59 So, no, it's not just about entering a prompt, generating an image,

06:03 and then calling it art.

06:04 Harshit is an MIT-trained machine learning expert and AI artist.

06:08 He's among the human machine learning experts

06:10 who create data sets and train AI models.

06:13 I might have used Firefly as a way to demonstrate what these models are capable of,

06:17 but there are people out there who use these models every day as part of their workflow.

06:21 I've been into filmmaking for almost three years now,

06:24 and I started as an editor.

06:26 And then from there, I got into direction.

06:28 And currently, I'm working at Zomato as a film director.

06:31 I spoke to Ranje Jha, a filmmaker.

06:33 He uses AI programs in his work at Zomato.

06:36 First, like, I usually, what we do is we have a script,

06:39 like our team of writers, they write, of course, like brilliant scripts,

06:42 and that's what I get.

06:44 Then I take that script and I feed it to Chahat GPD to start with.

06:47 Now, the best part is that you can tell Chahat GPD

06:50 to behave the way you want it to behave.

06:52 Like, I can tell Chahat GPD,

06:54 "Write the screenplay like Christopher Nolan."

06:56 I, by default, am getting the styled-up screenplay,

07:00 and I have to put zero effort in it.

07:01 Now, moving to the next part, storyboarding, right?

07:04 We might not have the budget to get a storyboard artist in the first place, right?

07:09 So, when these kind of scenarios are there,

07:11 like, I would use Midjourney for usually creating a storyboard.

07:15 Like, we just have to feed the prompt, you know,

07:18 based on what we're looking at,

07:20 and we are getting visuals right from there.

07:23 We just put all of them in a deck, and then we have a storyboard,

07:26 or, you know, in general, a mood reference ready for us.

07:29 So, that's another way in which I incorporate AI in my workflow.

07:34 Apart from creatives who use AI models to help with their workflow,

07:38 there are also people who work with open-source AI models

07:41 to make their vision a reality.

07:42 It was raining a lot in Bombay.

07:44 I just saw, in my society, there were a lot of kids,

07:47 they were dancing or something like that,

07:49 and I ordered lunch,

07:51 and one of the Zomato guy came wearing that raincoat,

07:55 and I saw that, and I just combined,

07:57 and it clicked me that time, immediately,

08:00 that these people are, like, you know, delivering food and all of that,

08:03 and the kids are dancing, like, literally, down.

08:07 So, it's a very high contrast.

08:09 Meet Saurabh Dhabai, a creative from Mumbai,

08:11 who came up with a campaign idea.

08:13 Coincidentally, also for Zomato, executed it on Midjourney,

08:16 a generative AI model, and then watched it go viral.

08:20 So, I tried making it, generating it, using Midjourney,

08:24 but since Midjourney cannot come up with the logos

08:27 and doesn't understand brands that well,

08:30 so it was not generating how I was expecting.

08:33 So, yeah, then I tried to, like, you know,

08:36 simple it down, I just wrote, like,

08:39 a guy in Mumbai streets, dancing, wearing a red raincoat,

08:45 then it generated something like that,

08:47 and then I did the branding, I would say, in Photoshop.

08:51 I would say this whole process, thinking about the idea,

08:54 and posting it, live, took me, max to max, four and a half hours.

08:59 And then there is the third, most important category of AI practitioners,

09:02 those who actually train the AI model to do what it does.

09:05 I'm Harshit, I work as an artist,

09:08 I make art primarily with emerging technologies,

09:11 like machine learning, augmented reality, virtual reality,

09:15 new forms of fabrication.

09:17 All of us spend pretty much 24/7 with technology these days,

09:22 we can't imagine sort of living without it,

09:24 so we have to sort of move from thinking of it as something

09:28 that gets our tasks done, to forming this more poetic partnership with it.

09:33 Harshit works with AI to create abstract works of art

09:36 that clearly don't exist, but whose inherent intent is to make the consumer think.

09:41 But given how accessible these tools are,

09:43 anyone can also use AI models to create photorealistic images

09:47 of people, objects, places and scenes.

09:50 You saw how Saurabh took a campaign idea

09:52 and created something that is so realistic, yet completely fabricated.

09:56 The humans in those images do not exist.

09:58 So how do you decide what's true and what isn't?

10:01 Can you believe everything you see?

10:03 Till now, how we consume visuals is to believe first,

10:07 and then only if we have enough reasons to doubt it,

10:11 that we go on to verify something, right?

10:14 We kind of believe what we see, there's no two ways about it.

10:17 Now, especially all the digital content that we see,

10:21 I think there has to be a big mental shift of

10:24 trying to form ourselves somehow to doubt first

10:30 and then believe only if there are enough reasons to believe it.

10:34 And I think that kind of transformation is not trivial at all.

10:38 So I think that is something that kind of fundamental shift

10:42 is something that we have to start designing for in systems already.

10:47 Through the course of doing the research for this video

10:49 and speaking to all of these people,

10:51 we have learnt that regardless of how potent the AI is,

10:54 the onus is on users, human beings,

10:56 to make sure that they use it in ways that do not harm the interests of our own kind.

11:00 This video showed you a glimpse of what generative AI is

11:03 and what it is capable of creating.

11:05 But these images, can we really call them art?

11:08 Will AI change our understanding of the concept of art?

11:11 Will it make creative jobs irrelevant?

11:13 Those are the questions I'll be answering in the next video.

11:16 If you have questions or suggestions, let me know in the comments below.

11:19 Subscribe to Deccan Herald's social media channels

11:21 to make sure you don't miss the next video.

11:24 [Music plays]

11:29 [MUSIC]

How does AI Art work? Text to image conversion explained

Category

Transcript

Recommended