Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

Samantha Cole writing for 404 Media (Apple News)

Slack messages from inside a channel the company set up for the project show employees using an open-source YouTube video downloader called yt-dlp, combined with virtual machines that refresh IP addresses to avoid being blocked by YouTube. According to the messages, they were attempting to download full-length videos from a variety of sources including Netflix, but were focused on YouTube videos. Emails viewed by 404 Media show project managers discussing using 20 to 30 virtual machines in Amazon Web Services to download 80 years-worth of videos per day.

Wonder how YouTube feels about Nvidia just using what’s available on the “open web” 🙄.

It’ll be interesting to see how Google responds to this, since, you know, Google is doing the exact same thing with people’s content on the internet to train their AI models.