MD5 To Be Considered Harmful Someday 401
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
damn (Score:2, Interesting)
Re:damn (Score:4, Insightful)
-nB
Very true (Score:2)
However, there is no real need to use weak algorithms (unless you're in the Government and need to use the FIPS standard). Unless you're doing something time-critical, just use SHA-1 and Whirlpool.
Re:Very true (Score:3, Informative)
The ISO approval also carries some weight in industry, although after some rather disasterous specifications (such as ISO 9000), they have lost some of their image. However, there are plenty of organizat
Almost forgot (Score:4, Interesting)
Where time isn't critical (eg: creating and validating checksums for files), I'd say use both. The overhead isn't great, and you'd get much more security.
Where time is critical AND you don't have to be concerned with computers not under your control, use Whirlpool. Rijndael is fast, SHA-1 is slow. Whirlpool also offers a longer hash string than SHA-1.
In any other situation, use SHA-1. Whirlpool might turn out to be the greatest algorithm out there, but that doesn't help if you're trying to talk to a remote computer that doesn't support it.
Not always... (Score:3, Insightful)
It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.
Re:damn (Score:3, Informative)
I have to disagree with you here.
If I have algo A and algo B:
I hash with algo A and get a value which I store.
I hash with algo B and get a value which I store.
While the security does not add up to A^B it does ammount to > A+B, which is still better than A or B only. (I really wish I had my Crypto reference books handy)
Other posters mentioned it was more work and equiv to one secure algo, both those statements are true; as I pointed out this was a
Re:damn (Score:2)
For algorithms that are provably unrelated, it should approach A*B.
However, if they're closely tied to each other (not sure, but probably MD5 and MD4?), it could be as little as A+B.
Re:damn (Score:2)
nah (Score:2)
Re:damn (Score:2)
Re:damn (Score:5, Insightful)
Another condition is obviously that the message should not be reconstructable from the hash.
Re:damn (Score:3, Interesting)
Depends on the hash function, if the input length is equal or smaller then output length reconstructing the message might be possible, ie:
$ echo "hello" | md5sum
b1946ac92492d2347c6235b4d2611184 -
Should be easily reversible with a dictonary attack. However in the more common case a hash maps a large input domain, to a much smaller output domain, so yep, hashes are not reversible unless input is somehow smaller then the hash itself.
Re:damn (Score:3, Insightful)
Re:damn (Score:2, Informative)
Thats the nature of the game we play.
The only hashing system without collisions is sending the original file itself.
Re:damn (Score:5, Interesting)
Yes, but a good hash makes it *extremely* difficult to find them. MD5 is looking pretty mediocre right now.
Re:damn (Score:2, Informative)
That's not hashing:
Producing hash values for accessing data or for security. A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.
Re:damn (Score:3, Insightful)
Any hash that maps a large (infinitely large, in most cases) space onto a finite ones will always have collisions, it's just a question of how easy they are to find. If you don't want to have have collisions, you have to send the whole file.
Shadow Password Suite (Score:2)
Two files with the same md5 hash? (Score:5, Funny)
Re:Two files with the same md5 hash? (Score:5, Informative)
Re:Two files with the same md5 hash? (Score:3, Funny)
drice@pinky:/tmp$ echo "Hello World" > file2
drice@pinky:/tmp$ md5sum file1
e59ff97941044f85df5297e1c302d260 file1
drice@pinky:/tmp$ md5sum file2
e59ff97941044f85df5297e1c302d260 file2
Cheap, I know...
MP5 harmful? No way! (Score:5, Funny)
Re:MP5 harmful? No way! (Score:2)
Exploit? (Score:5, Interesting)
Re:Exploit? (Score:4, Insightful)
clarification: (Score:2)
Re:Exploit? (Score:2, Informative)
As the length of the file is sent in addition to the MD5, in the vast majority of cases it's going to be impossible to find a file which gives you the same length and MD5. I guess as the size of media files increases this gets more and more likely, but if it ever starts affecting more than 0.0000000001% of files you can just increase the length of the hash.
Re:Exploit? (Score:3, Informative)
Re: Exploit? (Score:4, Informative)
This would worry me, except that BT uses SHA1, not MD5, so this is irrelevant. MD5 has seemed suspect for years, & Bram's the sort to pay attention to that sort of thing.
I checked; Edonkey is based on MD4. Gnutella variants might use MD5.
Re:Exploit? (Score:2)
Of course, I could be completely misunderstanding the information at hand.
Re:Exploit? (Score:3, Insightful)
I think the point was this.
file1: xxxxxxxxxx
file2: xxxxxyyyyy
%file1 = %file2
now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.
you get:
file3: xxxxxyyxxx
which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.
Or I still could be wrong.
Re:Exploit? (Score:4, Informative)
file1: xxxxccccccc....
file2: yyyyccccccc....
%file1 = %file2
Which is the example given in the article.
However, Wang said she could get to a collision from any intermediate hash code within the hour (according to the article). That would mean:
file1: ccccxxxxccccccccc......
file2: ccccyyyyccccccccc......
%file1 = %file2
Where xxxx and yyyy are (pre?)calculated and cccc.... is the payload.
If _I_ am not mistaken.
In english (Score:4, Funny)
Re:In english (Score:5, Informative)
Short version: A common technology for verifying that a file you've downloaded is legitimate and untampered-with, known as MD5, isn't as secure as people thought.
Slightly longer version: MD5 is a way of generating a checksum -- a single, comparable value -- from a file. Ideally it is supposed to give you different numbers for different files, so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see whether the file you've downloaded has been modified, potentially maliciously.
The research shows that it is possible for someone to construct a drop-in replacement for the file you thought you had that generates the same MD5 checksum as the original, so anyone attempting to validate the file this way would think they had the real thing. If it turns out that you can construct a damaging replacement for a common file -- perhaps an installer for a popular application like Firefox or OpenOffice that's usually downloaded from a public server -- then this could open a loophole for viruses, worms, etc. that would slip through the security net often used by cautious people when downloading such programs.
Re:In english (Score:3, Insightful)
Re:In english (Score:2)
Re:Only 340282366920938463463374607431768211456 ? (Score:2)
Well, if they had found a way to compress any file to something as small as a MD5 hash uniquely, there would be no need for broadband. A 100 mb files has a whole lot more possibilities than a short string, no maths required. Collisions are obvious. SHA1 has even more possibilities, but it's still not enough.
So, why is this news? I guess someone wanted massive traffic on his website to get more people to see ads.
What does this mean (Score:2)
Good analysis (Score:5, Funny)
I am glad somebody does.
Re:Good analysis (Score:2)
Fire burns. Ice remains frozen. Fire and Ice have the same MD5 hash.
Happy?
(From page 3 of the paper)
Re:Good analysis (Score:2)
Re:Good analysis (Score:4, Funny)
Homer: Say it in English, Doc.
Hibbert: You're going to need open heart surgery.
Homer: Spare me your medical mumbo jumbo.
Hibbert: We're going to cut you open and tinker with your ticker.
Homer: Could you dumb it down a shade?
http://www.tvtome.com/tvtome/servlet/GuidePageSer
Let's face it (Score:3, Insightful)
Re:Let's face it (Score:3, Interesting)
True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claim
Cryptographic stacking potentially harmful (Score:4, Insightful)
Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.
What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.
Another case of (false sense of) security through obscurity?
--
Try Nuggets [mynuggets.net], our SMS search engine which uses question answering technology; now available across the UK.
md5 vs sha1 vs ? (Score:5, Informative)
here is a very good link about the algo...
This is almost appropriate... (Score:4, Funny)
And your double-clicking icon puts your window in the trash,
And your data is corrupted 'cause the index doesn't hash,
Then your situation's hopeless and your system's gonna crash!
Correct me if I'm wrong, but... (Score:5, Insightful)
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.
Re:Correct me if I'm wrong, but... (Score:2)
It's not much of a threat to signature schemes, but there are enough places where MD5 is used as a one-way encryption of an access token that this attack poses a potential security hazard.
Re:Correct me if I'm wrong, but... (Score:3, Insightful)
Re:Correct me if I'm wrong, but... (Score:3, Informative)
Re:Correct me if I'm wrong, but... (Score:4, Informative)
Re:Correct me if I'm wrong, but... (Score:3, Informative)
Re:Correct me if I'm wrong, but... (Score:2)
Re:Correct me if I'm wrong, but... (Score:3, Interesting)
See the long list of cryptographic hash functions [wikipedia.org] on Wikipedia. (Bottom of page.)
If you used three or four different hash functions, to produce a long combined message digest (concatenated from the several message digests), it would seem to be much more difficult to produce files that collide. An attack resulting in a collision on one algorithm, is not likely to work for another algorithm. Finding two files that simultaneously fool two or more algorithms seems a much m
Re:Correct me if I'm wrong, but... (Score:5, Insightful)
Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.
The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.
Lea
Dirt (Score:3, Insightful)
The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system g
Re:Correct me if I'm wrong, but... (Score:2)
True, it's impossible to have a hash without collisions. However it should be difficult (in a crypto computational complexity sense) to find such collisions. Apparently MD5 has a vulnerability where it is possible to compute in reasonable time sets of bits that can b
Re:Correct me if I'm wrong, but... (Score:3, Interesting)
If I had decided to use MD5 for the hash in the digital signature, my scheme is now vulnerable
You're wrong. (Score:5, Informative)
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Suppose you're storing passwords as encrypted hashes, so that intercepting the hashes doesn't tell you what the password is. But if you can generate a password to match that MD5...
You know that GPG keys are identified and signed by their MD5 hashes? Suppose that I can generate a GPG key that would be identified as yours, and distributed it.
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
There's a coin-flipping protocol that goes as follows. Suppose that Alice and Bob want to flip a coin (over the Internet), but they don't trust each other.
Now, suppose that Alice generated multiple files in step 1. When Bob makes his guess, she tries to pick a file that will make her win. If she generated only two files, completely randomly, this would let Alice win 75% of the time.
These are just the first ideas I thought of. If I were looking for other problems, I'd think about undeniable signatures [rsasecurity.com], keysigning (which as GPG and X.509 SSL are heavily based on) and other specialized signature systems. In particular, I expect that the first type of crack could cause issues with SSH keys, both user keys (used for authentication) and host keys (to prevent man-in-the-middle attacks).
Digital signatures are used for much more than just testing for file tampering.
Re:You're wrong. (Score:3, Informative)
They're not. The old PGP 2.6 keys are, but GnuPG generates OpenPGP keys that use SHA-1.
GnuPG will use an already generated PGP 2.6 key, but will not make more of them.
Re:Correct me if I'm wrong, but... (Score:2)
Not just MD5 (Score:3, Interesting)
SHA-256 is a good replacement. SHA-1 should be fine but if you are going through the trouble of an upgrade, why not make it sufficiently future proof?
Re:Not just MD5 (Score:2)
Whirlpool (another 256-bit hash) is another excellent hash, but again isn't considered adequately tested to be considered "safe".
Of course, you could opt for an unconventional hash. There are some algorithms based on cellular automata. Although some have known weaknesses, it would seem that this wo
Re:Not just MD5 (Score:3, Informative)
Still... I would switch out MD5 if you have a target that's worth pretty much anything at all. After a break like this, I'd expect MD5 to become basically useless pretty fast. Of course, I don't work in hash collisions, I work mostly in protocols...
Lea
If I Had A Million Terabytes... (Score:5, Funny)
Two files with the same MD5 hash at once. Aaw yeah.
Re:If I Had A Million Terabytes... (Score:2)
The "Detailed Summary" (Score:5, Informative)
I've been doing some analysis on MD5 collision announced by Wang et al. Short version: Yes, Virginia, there is no such thing as a safe hash collision -- at least in a function that's specified to be cryptographically secure. The full details may be acquired at the following link:
http://www.doxpara.com/md5_someday.pdf
A tool, Stripwire, has been assembled to demonstrate some of the attacks described in the paper. It may be acquired at the following address:
http://www.doxpara.com/stripwire-1.1.tar.gz
Incidentally, the expectations management is by no means accidental -- the paper's titled "MD5 To Be Considered Harmful Someday" for a reason. Some people have said there's no applied implications to Joux and Wang's research. They're wrong; arbitrary payloads can be successfully integrated into a hash collision. But the attacks are not wildly practical, and in most cases exposure remains thankfully limited, for now. But the risks are real enough that responsible engineers should take note: This is not merely an academic threat, systems designed with MD5 now need to take far more care than they would if they were employing an unbroken hashing algorithm, and the problems are only going to get worse.
Some highlights from the paper:
* The attack itself is pretty limited -- essentially, we can create "doppelganger" blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum.
* MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. This leads to...
* Attacks are possible using only the proof of concept test vectors released by Wang -- the actual attack is not necessary.
* Stripwire emits two binary packages. They both contain an arbitrary payload, but the payload is encrypted with AES. Only one of the packages ("Fire") is decryptable and thus dangerous; the other ("Ice") shields its data behind AES. Both files share the same MD5 hash.
* Digital Signature systems are vulnerable, as they almost always sign a hashed representation of data rather than the data itself.
* This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.
* HMAC's invulnerability has been slightly overstated. It's definitely possible, given the key, to create two datasets with the same HMAC. Attacker possession of the key violates MAC presumptions, so the impact of this is particularly questionable.
* Very interesting possibilities open up once the full attack is made available -- among other things, we can create self-decrypting executables (fire.exe and ice.exe) that exhibit differential behavior based on their internal colliding payloads. They'll still have the same MD5 hash.
* Several doppelgangers may (relatively quickly, as per Joux) be computed within a single multicollision-friendly block. As such, the particular selection of doppelganger sets within a file can itself be made to represent data. It's relatively straightforward to embed a 128 bit signature inside an arbitrary file, in such a way that no matter the value of the signature, a constant MD5 hash is maintained. This is curiously steganographic.
* Many popular P2P networks (and innumerable distributed content databases) use MD5 hashes as both a reliable search handle and a mechanism to ensure file integrity. This makes them blind to any sign
Re:The "Detailed Summary" (Score:3, Insightful)
Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?
Re:The "Detailed Summary" (Score:3, Insightful)
The solution is to not use MD5.
Re:The "Detailed Summary" (Score:3, Insightful)
Re:The "Detailed Summary" (Score:3, Insightful)
What do you think are the most viable alternatives? It seems to me that SHA-1 would suffer similar vulnerabilities. Does SHA-1 suffer from the appendable cascade issue?
Do you think there is any way to avoid this kind of problem with hashes? I'm not really aware of any alternate techniques that wouldn't suffer from this same kind of attack eventually. Sure, you could develop related algorithms that increase the hash size, but the
Re:The "Detailed Summary" (Score:3, Informative)
SHA-1 has a much stronger design. It's starting to show cracks, though, so I don't recommend anything. Something based on AES will come, though -- maybe AES-OMAC, maybe Whirlpool. At the core of almost every hashing algorithm is just a block cipher anyway...
--Dan
Cash Money? (Score:2)
--grendel drago
Re:Cash Money? (Score:3, Informative)
XiaoyunWang, Dengguo Feng, Xuejia Lai, and Hongbo Yu
"Collisions for hash functions md4, md5,haval-128 and ripemd"
______ with be harmful/obsolete in the future.. (Score:3, Insightful)
For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...
Solution: Use more than one hash algorithm (Score:3, Informative)
Using MD5 with SHA1, or even the older MD2 or MD4 will reduce the probability of creating a compatable binary with the same checksum to virtually zero.
If only one checksum is required then just XOR the resulting checksums from each algorithm.
birthday attack (Score:2)
(heh, Wang)
Switching to SHA1 (Score:2)
md5_hex to sha1_hex
Truncated SHA-1 Hash Also Secure (Score:2)
Re:Switching to SHA1 (Score:2)
Better safe than sorry though I guess... SHA-1 is just fine for me.
It actually easy to see this (Score:2, Interesting)
sort 15,000 jpg image files to remove duplicates.
Since the names varied, and it is not uncommon
to have many images with the same length, it
seemed like a good idea to use md5 hashes.
So I coded it up in python to do the md5 hash,
and then stuffed the file name into a table
keyed by the md5 hash. Big suprise when multiple
different files landed in the same hash.
Some property of jpgs probably reduces the
randomness of the files and increases the
probability of hash collis
hash, equals (Score:2)
Thing search(Thing thingToFind) {
thingList = hashSearch(hash(thingToFind));
for each thing in thingList
if (thing == myThing)
return thing;
return null;
}
s/myThing/thingToFind/ (Score:2)
MD5 To Be Considered Harmful Someday (Score:2)
Maybe start consider adopting magnet links (Score:2)
I personally think any clients that don't support magnet links should really start consider adopting those -- DC++, most Gnutella clients, and Shareaza already do.
A magnet link can look like something like this:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO 5C
what about stuff that's harmful NOW? (Score:2)
Yeah, yeah, Obligatory Microsoft Slam. But I don't see any great stampede to isolate and neutralize the threat that this Typhoid Mary of OS's presents to communication stability. Why are we preparing for a threat that barely exists, when we refuse to deal with the present, much greater threat?
Is a two-pass just as vulnerable? (Score:3, Interesting)
Is doing a 2nd pass helpful?
In other words, if
ABCfireDEF
and
ABCiceeDEF
hash the same, does
ABCfireDEFABCfireDEF
necessarily, or even frequently, hash to
ABCiceeABCiceeDEF?
Even if it were to provide protection, in practical terms,
1) other hashing althorithms are likely faster than "MD5 times two"
2) there may be some - hopefully a lot fewer - places in a long file where where
ABCfireDEFABCfireDEF
can be replaced with
ABCiceeDEFABCmeltDEF
I'm beginning to like the "pick two algorithms and call me in the morning" approach.
Dijkstra Considered Dead (Score:2)
The biggest problem is still human falibility (Score:3, Insightful)
THAT is the biggest problem with MD5 for most users.
MD6 (Score:3, Funny)
Re:Already Happening (Score:4, Informative)
I'll give you $50 if you can back that claim up. I want to see two video files. They must start out the same, but have a difference about half-way through. And they have to have the same md5 sum. Just post where I can download the two files, and your paypal address.
The way I see it, you've got a 1/2^64 chance of being right. So I'm risking $50/184467440737095516, which isn't a whole lot.
Re:Inherent to any hashing mechanism (Score:3, Insightful)
You missed the point. (Score:3)
Re:Time is of the essence (Score:2)
Re:MD5 is obviously less secure (Score:3, Informative)
Sure, there are an infinite number of possible passwords, but they're all impossibly huge. I
Re:You are missing the point. (Score:5, Insightful)
Your statement is ironic in the extreme. The big risk here is NOT P2P apps. Here's the real risk.
Using one of these collision generators, I can create two x.509 certificate requests which have the same MD5 hash. One request says, "I am John Smith, kshdfkhs8i76y238888888" and the other request says, "I am Microsoft Corp., oiushir87dsfhgkjshdfg"
Now, I get Verisign to issue me a certificate for the first request. Since the hash is the same, I can rewrite the certificate to say that I am Microsoft Corp, and nobody will ever be able to tell the difference. Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.
Re:You are missing the point. (Score:3, Funny)
You can just ask Verisign for a certificate in the name of Microsoft, and they'll give you one. Much simpler.
It's happened in the past. [microsoft.com]
Re:I have a novel solution (Score:3, Informative)
So you are safe downloading linux for now via bittorrent. Besides, the chances of MD5 collisions happening from sheer luck/unluck are very slim. (after all, we've been using it for ages with no reports)
The most dangerous factor to continued use of MD5 are malicious individuals.