MD5 To Be Considered Harmful Someday 401
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
Re:damn (Score:4, Insightful)
-nB
Let's face it (Score:3, Insightful)
Re:Exploit? (Score:4, Insightful)
Correct me if I'm wrong, but... (Score:5, Insightful)
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.
Re:Exploit? (Score:3, Insightful)
I think the point was this.
file1: xxxxxxxxxx
file2: xxxxxyyyyy
%file1 = %file2
now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.
you get:
file3: xxxxxyyxxx
which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.
Or I still could be wrong.
Re:damn (Score:1, Insightful)
______ with be harmful/obsolete in the future.. (Score:3, Insightful)
For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...
Re:Inherent to any hashing mechanism (Score:3, Insightful)
Re:damn (Score:3, Insightful)
Re:Correct me if I'm wrong, but... (Score:5, Insightful)
Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.
The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.
Lea
Re:The "Detailed Summary" (Score:3, Insightful)
Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?
Dirt (Score:3, Insightful)
The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system goes up and it would not likely be so easy to break, unless the whole hashing process is somehow cracked wide open (which would be very bad for a number of reasons).
What we'll likely see is that with the greater increases in CPU speeds available to end users, the hash breaking systems can be tested easier until a final reverse algorhythm is available for md5. There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.
Re:damn (Score:3, Insightful)
Any hash that maps a large (infinitely large, in most cases) space onto a finite ones will always have collisions, it's just a question of how easy they are to find. If you don't want to have have collisions, you have to send the whole file.
Re:The "Detailed Summary" (Score:3, Insightful)
The solution is to not use MD5.
Not always... (Score:3, Insightful)
It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.
Re:In english (Score:3, Insightful)
Re:You are missing the point. (Score:5, Insightful)
Your statement is ironic in the extreme. The big risk here is NOT P2P apps. Here's the real risk.
Using one of these collision generators, I can create two x.509 certificate requests which have the same MD5 hash. One request says, "I am John Smith, kshdfkhs8i76y238888888" and the other request says, "I am Microsoft Corp., oiushir87dsfhgkjshdfg"
Now, I get Verisign to issue me a certificate for the first request. Since the hash is the same, I can rewrite the certificate to say that I am Microsoft Corp, and nobody will ever be able to tell the difference. Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.
Re:Correct me if I'm wrong, but... (Score:3, Insightful)
Re:damn (Score:5, Insightful)
Another condition is obviously that the message should not be reconstructable from the hash.
Re:damn (Score:2, Insightful)
If you are concerned that the result-space of a hash algorithm is going to lead to collisions (this is not an ordinary concern because the algorithms claim to have dealt with this for us already) then using two very different hash functions in concert will definitely expand the range of possible results and reduce the probability of collisions by Asize * Bsize.
If however you are concerned about bad guys faking things up, then there is a slightly different problem...
A == MD5
B == SHA
Hashing to both A and B yields a huge range of results.
However, if A known or suspected to be broken, then you're down to the security provided by B. A is out of the picture.
If A can be tossed then, then you're totally reliant on B for a safe hash. If that's true, then you didn't need A at all and you'd better be confident that B is gonna do what you need it to do, cuz A don't dance.
Note to experts: Please do not grade harshly.
Re:The "Detailed Summary" (Score:3, Insightful)
What do you think are the most viable alternatives? It seems to me that SHA-1 would suffer similar vulnerabilities. Does SHA-1 suffer from the appendable cascade issue?
Do you think there is any way to avoid this kind of problem with hashes? I'm not really aware of any alternate techniques that wouldn't suffer from this same kind of attack eventually. Sure, you could develop related algorithms that increase the hash size, but then it looks like it'd just be an arms race between hashers and colliders.
~D
The biggest problem is still human falibility (Score:3, Insightful)
THAT is the biggest problem with MD5 for most users.
Re:damn (Score:2, Insightful)
If the orginal data is: "abcdefg", I can insert some malicious bits 'M', along with some padding bits 'X', and ensure that the result of A would be the same: A("abcdefg") == A("abcdMXfg").
Assuming that I found a similar attack on B, I could find padding bits 'Y' such that B("abcdefg") == B("abcdMYfg").
Of course to fool both checksums, X must equal Y. Finding a value that keeps both checksums the same might turn out to be a non-trivial problem.
PGP is easily broken by a brute force attack given enough time, it's just that the amount of time isn't feasible at the moment.
Re:The "Detailed Summary" (Score:3, Insightful)
Cryptographic stacking potentially harmful (Score:4, Insightful)
Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.
What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.
Another case of (false sense of) security through obscurity?
--
Try Nuggets [mynuggets.net], our SMS search engine which uses question answering technology; now available across the UK.