We know that AI models like ChatGPT relies on a boatload of data for training, which has already sparked a long debate over what constitutes as fair use. However, as AI increasingly dominates the internet, there comes to a point where AI-generated content becomes part of the citation and training data. As The Guardian finds out, it is already happening on OpenAIβs latest large language model.
ChatGPT Sourcing Grokipedia, AI Trains AI

Letβs laid out whatβs involved here. The Guardian reported that ChatGPTβs GPT-5.2 has been sourcing some of its information from Grokipedia, and if that name sounds familiar, itβs Grokβs version of Wikipedia competitor that is entirely AI-generated. (Grok originates from xAI, which is owned by Elon Musk, who also owns X.) The report indicates that information sourced from Grokipedia include uncommon topics like Iranian politics, and it has to be noted that editors cannot directly edit what Grokipedia shows β the only way is to give it prompts on how content is to be edited.
Now, irrespective of what Grokipediaβs political bias may be, this kind of AI-feeds-AI situation is, on paper, potentially disastrous for todayβs AI models. Thereβs two aspects of this β model collapse and LLM grooming. The first one is a fairly well-known theory that, when AI trains itself with enough hallucinated or nonsensical AI-generated data over time, output quality will degrade and information becomes less reliable. Essentially, itβs like snake eating its tail.
The second part, LLM grooming, is the weaponization of such principle. Threat actors can output large amounts of nonsensical data, or even disinformation, where AI models will eventually scrape them and made it into the training dataset; over time, this will shift the modelβs behavior more towards what the threat actors intended, such as treating the disinformation as legitimate and presents it to the user as real information. (Thereβs also the βNightshadeβ technique that research teams presented back in 2023 which follows similar logic.)
This certainly wonβt help companies of these AI models deal with the constant skepticism from some parts of the public. It should be stressed that despite improvements, AI models fundamentally cannot eliminate every possible factor in causing hallucinations, so it is best to double-check whatever it outputs to verify the information.
Pokdepinion: Perhaps AI companies eventually have to decide if introducing AI-generated dataset is worth the risk of model collapse.
