

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' (arstechnica.com) 22
A new study [PDF] reveals AI-generated code frequently references non-existent third-party libraries, creating opportunities for supply-chain attacks. Researchers analyzed 576,000 code samples from 16 popular large language models and found 19.7% of package dependencies -- 440,445 in total -- were "hallucinated."
These non-existent dependencies exacerbate dependency confusion attacks, where malicious packages with identical names to legitimate ones can infiltrate software. Open source models hallucinated at nearly 22%, compared to 5% for commercial models. "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users," said lead researcher Joseph Spracklen. Alarmingly, 43% of hallucinations repeated across multiple queries, making them predictable targets.
These non-existent dependencies exacerbate dependency confusion attacks, where malicious packages with identical names to legitimate ones can infiltrate software. Open source models hallucinated at nearly 22%, compared to 5% for commercial models. "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users," said lead researcher Joseph Spracklen. Alarmingly, 43% of hallucinations repeated across multiple queries, making them predictable targets.
You are not hallucinating (Score:4, Informative)
You are not hallucinating - this story is a dupe [slashdot.org]
Re: (Score:2)
Hahaha, just my first thought as well.
Re: You are not hallucinating (Score:2)
Re: You are not hallucinating (Score:2)
Who tests the code ? (Score:2)
As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ] I would be wary of using AI to generate test cases -- if it hallucinates then what are you testing ?
Another question: who writes the end user documentation ?
I am assumi
Re: Who tests the code ? (Score:1)
Re: (Score:2)
Re: (Score:2)
As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?
From what I've heard talking to people, one of the most common uses of AI is to generate the tests.
Re: (Score:2)
As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ?
From what I've heard talking to people, one of the most common uses of AI is to generate the tests.
That's because in many companies the primary purpose of tests is so that you can tell auditors and hence customers that your code has x% test coverage. With AI you can hit that checkbox of 100% test coverage with AI tests that are meaningless, but allow you to get the auditor seal of approval that you have good test coverage.
Re: (Score:2)
As important as generating code is testing it to ensure that it does what it is supposed to do. Who/what writes these tests ? This has got to be by people who really understand the problem that the code is supposed to address. So given a set of inputs, what are the expected outputs ? [ I do understand that this is a simplistic description. ]
I used to test software for a living, alpha stuff right out of the daily builds.
It was my experience that it took out-of-the-box thinking to come up with real-world tests that accurately reflected both how the software was intended to be used by its developer and ways that someone could misuse it that were actually plausible. I was working on communications protocols because apparently the company lawyers were afraid of BSD licensed code so they wouldn't let the project take existing software. I leveraged
Referencing non-existent 3rd party libraries? (Score:2)
So, you're saying that AI code is even shittier than first year programmers straight out of the Code Boot Camp? Because I have yet to see one of those that doesn't at least make sure a library exists before referencing it. Even the really, really bad ones.
I know, I know. AI is gonna take all programming jobs any day now. And sadly, it'll probably happen because management would rather have shit code than pay and benefits for employees. The contractor cleanup gigs a few years later will be nice paying, I'm s
Pfft. Hallucinations. I swear by librwnj (Score:1)
And you can too! [slashdot.org]
"Package hallucinations" (Score:1)
That's what SHE said.
Fedora installed HP bloatware last week (Score:2)
What is stopping the AI from testing dependencies? (Score:2)
If you create a malicious package and advertise it enough, doesn't this happen without AI?
Re: (Score:2)
Re: (Score:2)
Why don't they hallucinate grammar or vocabulary?
ChatGPT says: "LLMs are much better at plausible surface-level generation than verifiable grounded reference, especially in niche domains like package names or APIs."
AI seems to hallucinate a lot. (Score:2)
hallucinations? (Score:2)
Given the speed of publishing in academia... (Score:1)
...this study is already obsolete. No, I am not kidding. AI code generatiion makes huge improvementns in just 2-3 months, whereas the development, peer review and publication of academic papers takes 6-12 months.
In short, the models they test in the paper are basically ancient history.