Forgot your password?
typodupeerror
Encryption Security

MD5 To Be Considered Harmful Someday 401

Posted by michael
from the house-built-on-sand dept.
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
This discussion has been archived. No new comments can be posted.

MD5 To Be Considered Harmful Someday

Comments Filter:
  • Re:damn (Score:4, Insightful)

    by networkBoy (774728) on Tuesday December 07, 2004 @04:11PM (#11023668) Homepage Journal
    Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially higher.
    -nB
  • Let's face it (Score:3, Insightful)

    by mordors9 (665662) on Tuesday December 07, 2004 @04:15PM (#11023724)
    Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".
  • Re:Exploit? (Score:4, Insightful)

    by Ayaress (662020) on Tuesday December 07, 2004 @04:18PM (#11023775) Journal
    It doesn't have to be harmful to break a ptp system. There's a pretty common exploit on Kazaa where people have a file just containing random junk that registers as a match to a popular file. If you download taht file, and get any portion of it from the fake, the file is corrupted and useless. Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.
  • by Sheetrock (152993) on Tuesday December 07, 2004 @04:19PM (#11023799) Homepage Journal
    If I'm translating this properly, a malicious person can do two things with this knowledge:

    He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

    Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

    Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.

  • Re:Exploit? (Score:3, Insightful)

    by Pxtl (151020) on Tuesday December 07, 2004 @04:23PM (#11023866) Homepage
    Wait - no - I'm completely fscking wrong.

    I think the point was this.

    file1: xxxxxxxxxx
    file2: xxxxxyyyyy

    %file1 = %file2

    now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.

    you get:
    file3: xxxxxyyxxx
    which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.

    Or I still could be wrong.
  • Re:damn (Score:1, Insightful)

    by The Milkman (214539) on Tuesday December 07, 2004 @04:24PM (#11023888)
    Which is the same as just one secure algorithm.
  • Really no big surprise, all currently implemented algorithms for security and digital signatures will be rendered useless or easily hackable in the future. The fact that a compromise seems to be soon is the part thats interesting.. however the parent fails to address how serious it is.

    For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...

  • by FearUncertaintyDoubt (578295) on Tuesday December 07, 2004 @04:37PM (#11024055)
    Before you flame, change 256^4 / 256^1000 to 256^1000 / 256^4.
  • Re:damn (Score:3, Insightful)

    by TCM (130219) on Tuesday December 07, 2004 @04:40PM (#11024116)
    A P4 at 2.6GHz does >300MB/s md5 according to openssl speed md5. As noted, you probably wait for the data to be fetched from disk rather than the checksum to be computed.
  • by chialea (8009) <chialea.gmail@com> on Tuesday December 07, 2004 @04:41PM (#11024117) Homepage
    When you're dealing with cryptography, it should be very, very, very hard to find collisions. If you find enough of them, you can proabably find something bad with the same hash value. For example, if you sign a digital document that says you're going to pay me $1 for my pencil, and I find a suitable hash collision, I could make it look like you signed a promise to pay me $3,000 for some used tissue. I wouldn't rule out that someone could find a harmful collision for a program distributed online, and substitute a trojan. If the prize gives enough reward, people will throw a lot of computational power at it, and will likely hit pay dirt.

    Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.

    The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.

    Lea
  • by BranMan (29917) on Tuesday December 07, 2004 @04:42PM (#11024138)
    It seems to me that at least the malicious nature of this vulnerability can be limited by using the size of the file as an additional check besides the MD5 hash. In other words, if you know how large the file is supposed to be in bytes, then there is less likelihood that something malicious can be passed off as the original - even if it has the same MD5.
    Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?
  • Dirt (Score:3, Insightful)

    by mfh (56) on Tuesday December 07, 2004 @04:45PM (#11024175) Homepage Journal
    I've always called this dirt. Some folks call it padding, but it's essentially the same. However up until now, no one has known how to do it with md5 and come up with the same sum. This is nothing more than a hack to me.

    The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system goes up and it would not likely be so easy to break, unless the whole hashing process is somehow cracked wide open (which would be very bad for a number of reasons).

    What we'll likely see is that with the greater increases in CPU speeds available to end users, the hash breaking systems can be tested easier until a final reverse algorhythm is available for md5. There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.
  • Re:damn (Score:3, Insightful)

    by Hanji (626246) on Tuesday December 07, 2004 @04:46PM (#11024190)
    That was his point.
    Any hash that maps a large (infinitely large, in most cases) space onto a finite ones will always have collisions, it's just a question of how easy they are to find. If you don't want to have have collisions, you have to send the whole file.
  • by Effugas (2378) * on Tuesday December 07, 2004 @04:49PM (#11024230) Homepage
    File size remains constant between Fire and Ice. Good thinking, though.

    The solution is to not use MD5.
  • Not always... (Score:3, Insightful)

    by Goonie (8651) <robert@merkel.benambra@org> on Tuesday December 07, 2004 @04:50PM (#11024240) Homepage
    How do you know that the algorithms aren't going to be weak for the same text pairs?

    It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.

  • Re:In english (Score:3, Insightful)

    by MindStalker (22827) <mindstalker@gmDALIail.com minus painter> on Tuesday December 07, 2004 @04:50PM (#11024249) Journal
    Important part is you have to construct both files. So if you were the distributor of said common file and people trusted you, you could produce two versions of said file, with same hash. But producing a identical hash to a file you can't change has currently not been successful.
  • by pclminion (145572) on Tuesday December 07, 2004 @04:59PM (#11024406)
    Jesus, I hope I'm wrong... I would have to acctualy buy MS crap.

    Your statement is ironic in the extreme. The big risk here is NOT P2P apps. Here's the real risk.

    Using one of these collision generators, I can create two x.509 certificate requests which have the same MD5 hash. One request says, "I am John Smith, kshdfkhs8i76y238888888" and the other request says, "I am Microsoft Corp., oiushir87dsfhgkjshdfg"

    Now, I get Verisign to issue me a certificate for the first request. Since the hash is the same, I can rewrite the certificate to say that I am Microsoft Corp, and nobody will ever be able to tell the difference. Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.

  • by Facekhan (445017) on Tuesday December 07, 2004 @05:06PM (#11024508)
    A lot of routing protocols use MD5 as the password hash for authentication. If someone found a collision for some of the important internet router BGP peers then large chunks of the internet could get routed the wrong way or just small parts like the subnet your bank controls could get routed to a some black hat's basement in Romania where he could setup a mock intranet and webserver to steal customer info through a spoofing attack.
  • Re:damn (Score:5, Insightful)

    by canavan (14778) on Tuesday December 07, 2004 @05:08PM (#11024548)
    You're using a different definition of a secure hash than everybody else. It's rather obvious that for files larger than the length of the hash (128 bit for md5), there must be quite a lot sharing the same hash, for a given file length about 2^(filelength in bits - hashlength in bits). However for a hash to be considered secure, it's only required that finding two files with the same hash must be as hard as trying (in md5's case 2^127 different files), but in md5's case you can compute those collisions much cheaper under certain circumstances.

    Another condition is obviously that the message should not be reconstructable from the hash.
  • Re:damn (Score:2, Insightful)

    by jyoull (512280) <jim @ m e dia.mit.edu> on Tuesday December 07, 2004 @05:33PM (#11024935)
    Intuitutively, yes, and i'm just pulling this out of the air here cuz i'm supposed to be working on something else, but solve a slightly different problem.

    If you are concerned that the result-space of a hash algorithm is going to lead to collisions (this is not an ordinary concern because the algorithms claim to have dealt with this for us already) then using two very different hash functions in concert will definitely expand the range of possible results and reduce the probability of collisions by Asize * Bsize.

    If however you are concerned about bad guys faking things up, then there is a slightly different problem...

    A == MD5
    B == SHA

    Hashing to both A and B yields a huge range of results.
    However, if A known or suspected to be broken, then you're down to the security provided by B. A is out of the picture.

    If A can be tossed then, then you're totally reliant on B for a safe hash. If that's true, then you didn't need A at all and you'd better be confident that B is gonna do what you need it to do, cuz A don't dance.

    Note to experts: Please do not grade harshly.
  • by Dracolytch (714699) on Tuesday December 07, 2004 @05:33PM (#11024937) Homepage
    I know a fair amount about this stuff, but obviously not as much as you...

    What do you think are the most viable alternatives? It seems to me that SHA-1 would suffer similar vulnerabilities. Does SHA-1 suffer from the appendable cascade issue?

    Do you think there is any way to avoid this kind of problem with hashes? I'm not really aware of any alternate techniques that wouldn't suffer from this same kind of attack eventually. Sure, you could develop related algorithms that increase the hash size, but then it looks like it'd just be an arms race between hashers and colliders.

    ~D
  • by syousef (465911) on Tuesday December 07, 2004 @05:40PM (#11025057) Journal
    Be honest. How many of you have checked the MD5 sums on a file with a TRUSTED source, as opposed to from the same source you got the file? How many of you do this regularly?

    THAT is the biggest problem with MD5 for most users.
  • Re:damn (Score:2, Insightful)

    by goatpunch (668594) on Tuesday December 07, 2004 @11:15PM (#11028687)
    However, if A known or suspected to be broken, then you're down to the security provided by B. A is out of the picture.

    If A can be tossed then, then you're totally reliant on B for a safe hash. If that's true, then you didn't need A at all and you'd better be confident that B is gonna do what you need it to do, cuz A don't dance.

    I'm not so sure, the description of the MD5 attack didn't say it was completely compromised, only that it was possible to find padding bits that could allow malicious content to be inserted without changing the checksum of the whole.

    If the orginal data is: "abcdefg", I can insert some malicious bits 'M', along with some padding bits 'X', and ensure that the result of A would be the same: A("abcdefg") == A("abcdMXfg").

    Assuming that I found a similar attack on B, I could find padding bits 'Y' such that B("abcdefg") == B("abcdMYfg").

    Of course to fool both checksums, X must equal Y. Finding a value that keeps both checksums the same might turn out to be a non-trivial problem.

    PGP is easily broken by a brute force attack given enough time, it's just that the amount of time isn't feasible at the moment.

  • by BranMan (29917) on Tuesday December 07, 2004 @11:16PM (#11028690)
    The file size does indeed remain constant, but the attacker is constrained as to what he can substitute out of the original file while maintaining both the file size and the MD5 hash. Without the file size remaining constant too, the attacker can theoretically make the original file into a trojan. With the file size remaining constant, most likely the only thing an attacker can do is break the file - he doesn't have enough leeway to put something malicious in there. I think.
  • Let's say you break a file into blocks, encrypt those blocks with Rijndael or Serpent using a chaining method that authenticates the prior block, digitally sign the result using (seperately generated) RSA, DSA and ECC signatures in turn, and generate SHA-1 and Whirlpool checksums of both the encrypted and unencrypted file. True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claimed to be.

    Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.

    What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.

    Another case of (false sense of) security through obscurity?

    --
    Try Nuggets [mynuggets.net], our SMS search engine which uses question answering technology; now available across the UK.

Genius is ten percent inspiration and fifty percent capital gains.

Working...