The Great Data Poisoning: How Users Are Fighting Back Against AI Scraping

5 min read
Article

From LinkedIn to Reddit, users are deliberately corrupting their data to sabotage AI training. It's guerrilla warfare, and it's spreading.

The free AI newsletter
The Great Data Poisoning: How Users Are Fighting Back Against AI Scraping

The Revolt Nobody Saw Coming

When Reddit announced it had sold user data to Google and OpenAI for $203 million, the backlash was swift and creative. Within days, thousands of users began systematically deleting years of posts and comments. Others went further, replacing their content with gibberish before deletion. The message was clear: if you're selling our words, we'll make sure they're worthless.

This wasn't an isolated incident. Across LinkedIn, X, Reddit, and DeviantArt, a decentralized resistance movement is emerging. Users aren't just opting out of AI training. They're actively sabotaging it.

The Toolkit of Digital Sabotage

The weaponry varies by platform and medium, but the strategy is consistent: make the data unusable.

On LinkedIn, users are flooding their profiles with fictional job histories, invented skills, and fabricated achievements. The goal isn't to impress recruiters. It's to corrupt the training data of AI models scraping professional networks.

Artists have their own arsenal. Nightshade, a free tool from the University of Chicago, lets creators poison images at the pixel level. To the human eye, nothing changes. But when an AI model ingests a Nightshade-treated image, it learns the wrong patterns. Feed it enough poisoned data, and a model trained to recognize dogs might start hallucinating cats.

Glaze, from the same research team, works differently. It cloaks artistic style, making it harder for AI to mimic an artist's work. Both tools have been downloaded hundreds of thousands of times. DeviantArt's #NoAI tag became a rallying point before the platform eventually added official opt-out features.

On Reddit, the weapon of choice is PowerDeleteSuite, a browser extension that bulk-deletes posts and comments. After the Google/OpenAI deal went public, usage spiked. Some users replaced their entire comment history with lorem ipsum before deletion, just to make sure nothing useful remained in any cached copies.

X (formerly Twitter) presents a trickier battlefield. Elon Musk's xAI has been scraping the platform aggressively, and users have fewer tools. But some are adopting the LinkedIn strategy: polluting their tweet history with noise, bots posting nonsense, or simply going dark and deleting years of content.

Why This Works (and Why It Scares Platforms)

AI models are only as good as their training data. Feed them garbage, and they output garbage. This is the asymmetric warfare advantage users have discovered.

Platforms can detect some forms of poisoning. Obvious spam gets filtered. But sophisticated poisoning, especially tools like Nightshade v2, is much harder to catch. The arms race favors the defender. You know which data you poisoned. The platform doesn't.

GDPR strengthens this position in Europe. Users have the legal right to delete or modify their data. Platforms can't stop you from corrupting your own profile, even if they suspect sabotage. They can ban you for violating terms of service, but by then the damage is done.

Data poisoning is the guerrilla tactic. The legal battles are the conventional warfare.

The New York Times sued OpenAI and Microsoft for copyright infringement, alleging unauthorized use of articles to train ChatGPT. Getty Images filed similar claims against Stability AI over image generators. The Authors Guild launched a class action on behalf of writers whose books allegedly trained AI models without permission or compensation.

These cases could redefine how AI companies source training data. If courts rule that scraping copyrighted material without licensing is infringement, the entire foundation of large language models gets shakier.

Platforms are caught in the middle. Reddit's $203 million payday looked smart until users started scorching the earth. LinkedIn hasn't disclosed AI data deals publicly, but the user revolt suggests many assume they're coming.

The Corporate Response: Too Little, Too Late

Platforms are scrambling to add opt-out features. DeviantArt introduced anti-AI toggles. Meta announced (then delayed) tools to let users exclude their data from AI training. X claims to respect user preferences, though enforcement is murky.

The problem? Trust is gone. Users who spent years building content on these platforms feel betrayed. Retroactive opt-outs don't undo models already trained on their work. And fine-print policy changes don't rebuild goodwill.

Some platforms are trying the legal route, threatening users who poison data with account suspension. But enforcement is spotty. Proving intent is hard. And banning users who delete their own content is a PR nightmare.

What Happens Next

This isn't going away. As AI companies race to train bigger models, the hunger for data intensifies. Users now understand their content has value, and they're less willing to give it away for free.

Expect more tools like Nightshade. Expect more legal battles. Expect platforms to keep making promises they can't quite keep.

The data poisoning movement is asymmetric, decentralized, and surprisingly effective. It won't stop AI development. But it might force the industry to rethink how it sources training data.

In the meantime, the message from users is loud: if you're going to profit from our work, expect us to fight back.

Topics covered:

PrivacyNews

Frequently asked questions

How are users fighting back against AI training?
Users are deploying multiple tactics: deliberately poisoning data (fake info in LinkedIn profiles), artistic sabotage tools like Nightshade and Glaze, mass deletion of content on Reddit, and collective legal action against platforms.
What is Nightshade?
Nightshade is a free tool developed by the University of Chicago that poisons images to fool AI models. It alters pixels in ways invisible to humans but causes AI to completely misinterpret the image.
Why are Reddit users deleting their old posts?
Reddit sold its data to Google and OpenAI for $203 million without user consent. In response, a mass deletion movement emerged, powered by tools like PowerDeleteSuite.
Is data poisoning legal?
Poisoning your own data is generally legal. GDPR guarantees rights to deletion and correction. Filling your LinkedIn profile with false information doesn't break the law, though it may violate terms of service.
Can platforms stop data poisoning?
Technically, platforms can detect some forms of poisoning, but the arms race favors users. Nightshade v2 is already harder to filter than v1. The defender who controls their own data always has the advantage.
The free AI newsletter