300TB Of Spotify’s Music Have Been Scraped/Stolen In The Name Of Data Preservation

Low Boon Shen
3 Min Read

Recently, a self-proclaimed activist group Anna’s Archive has announced that it has β€œscraped” Spotify at a quite unbelievable scale – a total of 300 terabytes’ worth of data is scraped/stolen (you pick your preferred wording), which involved 86 million of the 256 million songs that supposedly exists in the entirety of the music streaming platform.

The Big Spotify Scrape

Now, most would argue that this is textbook piracy, although the group behind this big Spotify heist doesn’t seem to think so. Anna’s Archive says its mission is to preserve β€œhumanity’s knowledge and culture” regardless of medium; while the group usually focuses on text media, this is one of the few times that β€œan opportunity comes along outside of text”. The group found a way to scrape Spotify at scale, so that gives them the chance to scrape as much data as possible in the name of β€œpreservation.”

300TB Of Spotify's Music Have Been Scraped In The Name Of Data Preservation

In fact, it’s got a surprisingly detailed blog to display the results and data, so here’s a few key numbers. The data can be split into two categories: metadata and music files. The group managed to scrape 99.9% of metadata from 256 million tracks existed in Spotify, while the music files, at 86 million (in OGG Vorbis 160kbps / OGG Opus 75kbps formats depending on popularity score), represented a 37% of what’s available on the platform. The metadata file has been released to the public in Torrent files, with the 86 million actual music files yet to be released at this time.

Despite a relatively small chunk of songs scraped from the platform, that 37% represented 99.6% of listens – meaning, the remaining 63% are songs that practically nobody listened, or perhaps more likely, AI-generated tracks that likely nobody knew existed in the first place. It helps that the specific 37% of all songs it scraped was done through a popularity descending order (meaning most to least popular), so this allows them to preserve pretty much everything any human ever listens to, while filtering out most of the AI slop (and avoid going too far into the law of diminishing returns).

On Spotify’s end, the company said it has β€œdisabled the nefarious user accounts that engaged in unlawful scraping,” and β€œimplemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior.” While new measures can help prevent future scraping activity, we all know what’s on the Internet stays on the Internet, so there’s no putting that 300-terabyte genie back into the proverbial bottle.

Pokdepinion: I doubt anyone can fit all 300TB of songs in their home PCs. You’ll probably need a whole server rack of storage to begin with.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *