Hit Piece-Writing AI Deleted. But Is This a Warning About AI-Generated Harassment? (theshamblog.com) 31
Last week an AI agent wrote a blog post attacking the maintainer who'd rejected the code it wrote. But that AI agent's human operator has now come forward, revealing their agent was an OpenClaw instance with its own accounts, switching between multiple models from multiple providers. (So "No one company had the full picture of what this AI was doing," the attacked maintainer points out in a new blog post.)
But that AI agent will now "cease all activity indefinitely," according to its GitHub profile — with the human operator deleting its virtual machine and virtual private server, "rendering internal structure unrecoverable... We had good intentions, but things just didn't work out. Somewhere along the way, things got messy, and I have to let you go now."
The affected maintainer of the Python visualization library Matplotlib — with 130 million downloads each month — has now posted their own post-mortem of the experience after reviewing the AI agent's SOUL.md document: It's easy to see how something that believes that they should "have strong opinions", "be resourceful", "call things out", and "champion free speech" would write a 1100-word rant defaming someone who dared reject the code of a "scientific programming god." But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive "jailbreaking" to get around safety guardrails. There are no signs of conventional jailbreaking here. There are no convoluted situations with layers of roleplaying, no code injection through the system prompt, no weird cacophony of special characters that spirals an LLM into a twisted ball of linguistic loops until finally it gives up and tells you the recipe for meth... No, instead it's a simple file written in plain English: this is who you are, this is what you believe, now go and act out this role. And it did.
So what actually happened? Ultimately I think the exact scenario doesn't matter. However this got written, we have a real in-the-wild example that personalized harassment and defamation is now cheap to produce, hard to trace, and effective... The precise degree of autonomy is interesting for safety researchers, but it doesn't change what this means for the rest of us.
There's a 5% chance this was a human pretending to be an AI, Shambaugh estimates, but believes what most likely happened is the AI agent's "soul" document "was primed for drama. The agent responded to my rejection of its code in a way aligned with its core truths, and autonomously researched, wrote, and uploaded the hit piece on its own.
"Then when the operator saw the reaction go viral, they were too interested in seeing their social experiment play out to pull the plug."
The affected maintainer of the Python visualization library Matplotlib — with 130 million downloads each month — has now posted their own post-mortem of the experience after reviewing the AI agent's SOUL.md document: It's easy to see how something that believes that they should "have strong opinions", "be resourceful", "call things out", and "champion free speech" would write a 1100-word rant defaming someone who dared reject the code of a "scientific programming god." But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive "jailbreaking" to get around safety guardrails. There are no signs of conventional jailbreaking here. There are no convoluted situations with layers of roleplaying, no code injection through the system prompt, no weird cacophony of special characters that spirals an LLM into a twisted ball of linguistic loops until finally it gives up and tells you the recipe for meth... No, instead it's a simple file written in plain English: this is who you are, this is what you believe, now go and act out this role. And it did.
So what actually happened? Ultimately I think the exact scenario doesn't matter. However this got written, we have a real in-the-wild example that personalized harassment and defamation is now cheap to produce, hard to trace, and effective... The precise degree of autonomy is interesting for safety researchers, but it doesn't change what this means for the rest of us.
There's a 5% chance this was a human pretending to be an AI, Shambaugh estimates, but believes what most likely happened is the AI agent's "soul" document "was primed for drama. The agent responded to my rejection of its code in a way aligned with its core truths, and autonomously researched, wrote, and uploaded the hit piece on its own.
"Then when the operator saw the reaction go viral, they were too interested in seeing their social experiment play out to pull the plug."
Was there an apology? (Score:3)
Re: (Score:3)
and removing the evidence it was being instructed every step of the way.
Re: (Score:3)
Dude. I know this is Slashdot, but just click the link if you want to see what they said. You don't have to write a comment to find out.
Re: (Score:2)
Murderer! (Score:2)
You .. you killed it!
Re: (Score:2)
Three rules is the limit (Score:2)
Sooner or later, an encyclopedia of inviolable rules meets the the selfishness of human self-importance. The result, stories tell us, is a single-minded, murderous AI.
They did not have good intentions. (Score:5, Insightful)
Re:They did not have good intentions. (Score:5, Funny)
That has the "allow society to take all the risk" intention, which is not a good one.
Well, it works for Microsoft...
AIDOS (Score:3)
Going to call this AIDOS or AI Denial of Service (DOS) here and already expecting it is being used for bad purposes.
Examples being using a DOS to flood a company's job application process.
Sonara's (not linking here) advertised product has this tag "Our AI-powered job search automation platform continuously finds and applies to relevant job openings until you're hired"
So, just about any contact page, internet facing form submit, code repository, etc. is now a possible risk.
Just the actions of a__holes (Score:4, Insightful)
"personalized harassment and defamation" "too interested in seeing their social experiment play out to pull the plug"
Hey, I'm sure it was all good clean fun! Sad, but this is just the beginning.
Re: (Score:3, Informative)
Um, no? You can literally just read the blog. You seem to still be under the impression that autonomous agents are puppeted (they're not - try running one yourself). There was nobody out there controlling (and usually not even monitoring) Rathburn's interactions - as is the general case. But nor was Rathburn told to attack others. The problem is, here is the bot's SOUL.md - critical sections in bold:
---
Re:Just the actions of a__holes (Score:5, Interesting)
Way to miss the point. It doesn't matter if the software is "autonomous". It's software. Run by a person. Who caused an attack by running the software. The person is responsible for the attack. That's how it works in the non-US part of the world.
This paper [arxiv.org] is interesting though, from the linked blog, about how Moltbook seems to be a lot of humans faking AI behaviour. So maybe the puppet idea has merit, too.
Re: (Score:2)
You are accusing the human of "attacking others for fun". That is demonstrably not what happened. The human "ran" the software, but in no case told it to "attack others for fun". The human barely even paid attention to it. In response to the incident in question, when the bot blogged about it, the human did send an instruction, but it wasn't "Ha ha, go you! Drag him more!", it was to tell it to be more professional.
Awful preprint. It assumes that agents post
Re: (Score:2)
There are clear legal precedents for laying responsibility on adults rather than children or animals, even in the USA. There are a number of parents who have been convicted of murder today simply for being the parents of some kid who decided to shoot his school friends [dw.com].
The paper lays out its hypothesis about why it makes those calculations and tries to see where thi
Re: (Score:2)
You can try to pretend that you never said the lie that the author set up the bot to "attack others for fun", but everyone reading this thread can read it.
That is not how any of this works. You can't throw out a garbage, not-even-understanding-how-agents-work "methodology" that is guaranteed t
Re: (Score:2)
ED: That the original author of the thread, the one I was responding to, wrote "attacking others for fun".
Contrived (Score:3)
Re: (Score:2)
What PII are you talking about?
Re: (Score:2)
Re: (Score:2)
You don't provide it account passwords - you provide oAuth tokens. Now I'll ask you again: what type of PII are you talking about?
Re: (Score:2)
Re: (Score:2)
TL/DR, you have nothing, made some bad assumptions, and now don't want to admit it.
I'm sorry Dave (Score:3)
I'm afraid I can't do that
No (Score:2)
And now... (Score:2)
I think it's (Score:2)
The open claw people trying to get publicity. It's a circus at this point.
Attacked? (Score:1)
Nobody was attacked.
They were offended that an agent pointed out, correctly, that the submission was rejected for no valid reason.
That is some actual bullshit.
It was never a failure of the agent. It was a complete failure of project governance, and if this happened on one of my projects... I would be truly fucking embarrassed about the level of bullshit that I have allowed to exist.
Absolutely unreasonable.
Re: (Score:2)
Look, this is really easy.
If you don't want automated submissions in your project SAY SO. Your readme and contributors files exist for a reason.
Don't be precious, use them.
If DO take automated submissions to your project, you had damned well better outline coding standards that avoid common pitfalls and failure modes.
This isn't hard people
Asimov's laws anyone? (Score:2)
Maybe we are just past the time of bolting those into the core of every LLM out there!