Recently, a self-proclaimed activist group Annaβs Archive has announced that it has βscrapedβ Spotify at a quite unbelievable scale β a total of 300 terabytesβ worth of data is scraped/stolen (you pick your preferred wording), which involved 86 million of the 256 million songs that supposedly exists in the entirety of the music streaming platform.
The Big Spotify Scrape
Now, most would argue that this is textbook piracy, although the group behind this big Spotify heist doesnβt seem to think so. Annaβs Archive says its mission is to preserve βhumanityβs knowledge and cultureβ regardless of medium; while the group usually focuses on text media, this is one of the few times that βan opportunity comes along outside of textβ. The group found a way to scrape Spotify at scale, so that gives them the chance to scrape as much data as possible in the name of βpreservation.β

In fact, itβs got a surprisingly detailed blog to display the results and data, so hereβs a few key numbers. The data can be split into two categories: metadata and music files. The group managed to scrape 99.9% of metadata from 256 million tracks existed in Spotify, while the music files, at 86 million (in OGG Vorbis 160kbps / OGG Opus 75kbps formats depending on popularity score), represented a 37% of whatβs available on the platform. The metadata file has been released to the public in Torrent files, with the 86 million actual music files yet to be released at this time.
Despite a relatively small chunk of songs scraped from the platform, that 37% represented 99.6% of listens β meaning, the remaining 63% are songs that practically nobody listened, or perhaps more likely, AI-generated tracks that likely nobody knew existed in the first place. It helps that the specific 37% of all songs it scraped was done through a popularity descending order (meaning most to least popular), so this allows them to preserve pretty much everything any human ever listens to, while filtering out most of the AI slop (and avoid going too far into the law of diminishing returns).
On Spotifyβs end, the company said it has βdisabled the nefarious user accounts that engaged in unlawful scraping,β and βimplemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior.β While new measures can help prevent future scraping activity, we all know whatβs on the Internet stays on the Internet, so thereβs no putting that 300-terabyte genie back into the proverbial bottle.
Pokdepinion: I doubt anyone can fit all 300TB of songs in their home PCs. Youβll probably need a whole server rack of storage to begin with.
