


Curl Warns GitHub About 'Malicious Unicode' Security Issue (daniel.haxx.se) 18
A Curl contributor replaced an ASCII letter with a Unicode alternative in a pull request, writes Curl lead developer/founder Daniel Stenberg. And not a single human reviewer on the team (or any of their CI jobs) noticed.
The change "looked identical to the ASCII version, so it was not possible to visually spot this..." The impact of changing one or more letters in a URL can of course be devastating depending on conditions... [W]e have implemented checks to help us poor humans spot things like this. To detect malicious Unicode. We have added a CI job that scans all files and validates every UTF-8 sequence in the git repository.
In the curl git repository most files and most content are plain old ASCII so we can "easily" whitelist a small set of UTF-8 sequences and some specific files, the rest of the files are simply not allowed to use UTF-8 at all as they will then fail the CI job and turn up red. In order to drive this change home, we went through all the test files in the curl repository and made sure that all the UTF-8 occurrences were instead replaced by other kind of escape sequences and similar. Some of them were also used more or less by mistake and could easily be replaced by their ASCII counterparts.
The next time someone tries this stunt on us it could be someone with less good intentions, but now ideally our CI will tell us... We want and strive to be proactive and tighten everything before malicious people exploit some weakness somewhere but security remains this never-ending race where we can only do the best we can and while the other side is working in silence and might at some future point attack us in new creative ways we had not anticipated. That future unknown attack is a tricky thing.
In the original blog post Stenberg complained he got "barely no responses" from GitHub (joking "perhaps they are all just too busy implementing the next AI feature we don't want.") But hours later he posted an update.
"GitHub has told me they have raised this as a security issue internally and they are working on a fix."
The change "looked identical to the ASCII version, so it was not possible to visually spot this..." The impact of changing one or more letters in a URL can of course be devastating depending on conditions... [W]e have implemented checks to help us poor humans spot things like this. To detect malicious Unicode. We have added a CI job that scans all files and validates every UTF-8 sequence in the git repository.
In the curl git repository most files and most content are plain old ASCII so we can "easily" whitelist a small set of UTF-8 sequences and some specific files, the rest of the files are simply not allowed to use UTF-8 at all as they will then fail the CI job and turn up red. In order to drive this change home, we went through all the test files in the curl repository and made sure that all the UTF-8 occurrences were instead replaced by other kind of escape sequences and similar. Some of them were also used more or less by mistake and could easily be replaced by their ASCII counterparts.
The next time someone tries this stunt on us it could be someone with less good intentions, but now ideally our CI will tell us... We want and strive to be proactive and tighten everything before malicious people exploit some weakness somewhere but security remains this never-ending race where we can only do the best we can and while the other side is working in silence and might at some future point attack us in new creative ways we had not anticipated. That future unknown attack is a tricky thing.
In the original blog post Stenberg complained he got "barely no responses" from GitHub (joking "perhaps they are all just too busy implementing the next AI feature we don't want.") But hours later he posted an update.
"GitHub has told me they have raised this as a security issue internally and they are working on a fix."
Re: (Score:2)
Re: (Score:2)
And I don't miss the stupid emojis either.
:-(
Re: (Score:2)
pretty much every major open source package is compromised at this point. The fact that once every few years these infiltration attempts are barely caught (xz, now this) just goes to show how many get through.
Or it goes to show that almost nobody is actively trying, and attempts happen only every few years. The absence of detecting attacks is not inherently a defect in detection; it can also be a lack of attacks.
More highlighting (Score:2)
I imagine Github will tout a CoPilot solution (Score:2)
However, looking for these sort of shenanigans seems like something that could've (and maybe should've) been at least semi-automated a couple decades ago - search for characters outside the typical ASCII range and flag those parts for human review.
7 Bit ASCII (Score:1)
You need that 8th character Re:7 Bit ASCII (Score:1)
EBCIDIC is the One True Standard [xkcd.com].
Unicode is a bug (Score:1)
Vertical double quotes.
Closing double quotes. Opening double quotes.
Homoglyphs.
Arbitrary number of bytes per glyph.
If it ain't ascii it isn't worth expressing in bytes.
Re: (Score:2)
Unicode is fucking ridiculous and so are standards bodies who seem to be entirely composed of zero experts and just industry insiders. Javascript is even worse and the web as a whole is getting progressively worse.
Re: (Score:2)
If it ain't ascii it isn't worth expressing in bytes.
If you exclusively speak American then you can say everything is US ASCII ... but for many who, reasonably, want to express themselves in their own language they will want other characters. But the "everything" is not entirely true even for Americans, eg 1/100 of a dollar is a cent which is U+00A2 - which slashdot will not display correctly.
Spoofing attacks are old (Score:2)
Package Managers (Score:2)
Many traditional distros still ship unusably old versions of some packages - due to some network dependency they literally don't work anymore.
Some are buggy with upstream fixes (e.g. nvme tool) and just don't work. "Wait a year and we'll ship a version that works".
This pushes people to use upstream packages which often times come with update scripts that run as root.
These would be an ideal place for a malicious "contributor" to put in an update URL he controls.
It would be better for the distros to remove t
AI would have caught this (Score:2)
That's indeed one of the use-cases than an AI can catch easier than a human.
Patch (Simplified as I couldn't copy&paste from the screenshot):
--- test1.txt 2025-05-17 20:56:18.097357631 +0200
+++ test2.txt 2025-05-17 20:56:33.357317426 +0200
@@ -1 +1 @@
-Find the file at https://githubusercontent.com/... [githubusercontent.com]
+Find the file at https:/// [https]ithubusercontent.com/mozilla-firefox/file.json
Instruction: "Describe the changes done in this patch"
Input: (the patch)
AI:
In this patch, the following changes were made:
1. **Re
Re: (Score:2)
Also note that the LLM did get the actual code point (first question) and the script (second question) wrong. To the AI's defense: It was only a small 12B model.