Analysis: Nvidia Document Leak Sheds Light on AI Training Ethics

  • 2 months ago
A recent internal leak from Nvidia shows the company scraped videos from YouTube to train its generative AI models. TaiwanPlus finds out about the leaked documents, the implications of AI training methods on everyday internet users and the implications for content producers.
Transcript
00:00What do the leaked NVIDIA documents tell us?
00:03So the NVIDIA documents that we've been able to see
00:06indicate that NVIDIA is actively seeking whatever video data they can find.
00:12It looks like they're predominantly going after YouTube
00:15and they're using collated data sets that have links to YouTube videos
00:20and actively using whatever programs they can find
00:23to both download the videos and also mask the fact that it's coming from,
00:27in this case, NVIDIA.
00:28They're hiding what's called their internet protocol address
00:30or their IP address to actively do this.
00:33It raises interesting questions because it says that they're in this
00:36interesting gray zone about what they can and cannot download.
00:40And it raises also interesting questions about the sheer energy consumption
00:43as well as the respect for whatever data rights of the video producers.
00:47And why is that worrying for the everyday users of the internet?
00:51Because if you remember about a decade, a little bit more ago,
00:55there was this idea of Web 2.0.
00:57It was this idea that folks could not just consume content
01:01and browse content on the internet, but they could also be producers,
01:04whether it's with blogs and then later with audio podcasts
01:07and then ultimately with video.
01:08And so a lot of people have put content out there.
01:11Some of that content they put out as part of an ad-generating model
01:15with different companies, including Google and others,
01:18with the expectations that the content producers would be compensated
01:21for their content as it was watched.
01:23And now by doing this mass download, what happens is the AI is being trained
01:27and maybe there is one counted download towards the content producer,
01:31but then the AI is going to use it multiple times in the future
01:34with no source of revenue or at least attribution back
01:37to the original producers of that video content.
01:40What is being done to ensure that companies like OpenAI,
01:43founder of ChatGPT, train their model ethically?
01:46So one of the things we need to do right now is think about
01:49how can we give voice to content producers, content creators?
01:53And it might very well be that instead of actually expecting companies
01:56to do the right thing, we actually need to galvanize content producers
01:59to find some way to actually express their perceived contractual rights
02:04as content producers.

Recommended