Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Encryption Security

MD5 To Be Considered Harmful Someday 401

Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
This discussion has been archived. No new comments can be posted.

MD5 To Be Considered Harmful Someday

Comments Filter:
  • damn (Score:2, Interesting)

    I guess I know what I'll be coding now - SHA-1 just got a lot more important in my priority list... :(
    • Re:damn (Score:4, Insightful)

      by networkBoy ( 774728 ) on Tuesday December 07, 2004 @04:11PM (#11023668) Journal
      Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially higher.
      -nB
      • by jd ( 1658 )
        The only exception being where the two algorithms have a coincidentally overlapping weakness. At which point, you're no better off.


        However, there is no real need to use weak algorithms (unless you're in the Government and need to use the FIPS standard). Unless you're doing something time-critical, just use SHA-1 and Whirlpool.

      • Not always... (Score:3, Insightful)

        by Goonie ( 8651 )
        How do you know that the algorithms aren't going to be weak for the same text pairs?

        It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.

    • Re:damn (Score:2, Informative)

      There will ALWAYS be collisions with any kind of hashing algorythm.

      Thats the nature of the game we play.

      The only hashing system without collisions is sending the original file itself.
      • Re:damn (Score:5, Interesting)

        by WolfWithoutAClause ( 162946 ) on Tuesday December 07, 2004 @04:22PM (#11023849) Homepage
        There will ALWAYS be collisions with any kind of hashing algorythm.

        Yes, but a good hash makes it *extremely* difficult to find them. MD5 is looking pretty mediocre right now.

      • Re:damn (Score:2, Informative)

        by networkBoy ( 774728 )
        "The only hashing system without collisions is sending the original file itself."

        That's not hashing:
        Producing hash values for accessing data or for security. A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.
        • Re:damn (Score:3, Insightful)

          by Hanji ( 626246 )
          That was his point.
          Any hash that maps a large (infinitely large, in most cases) space onto a finite ones will always have collisions, it's just a question of how easy they are to find. If you don't want to have have collisions, you have to send the whole file.
    • If you can modify the shadow password suite to support SHA-1 and/or Whirlpool, that would be wonderful. The greatest potential use for the attack is the discovery of aliases for passwords and other login tokens, as then you don't care if what you produce is nonsense. It just has to hash to the same key.
  • by Anonymous Coward on Tuesday December 07, 2004 @04:09PM (#11023635)
    I can only hope I live that long.
  • by October_30th ( 531777 ) on Tuesday December 07, 2004 @04:09PM (#11023644) Homepage Journal
    Aha! So it was MD5 and not MP5 [scottsdalegunclub.com]...
  • Exploit? (Score:5, Interesting)

    by Limburgher ( 523006 ) on Tuesday December 07, 2004 @04:11PM (#11023669) Homepage Journal
    So does this mean that it's possible to find a useful MD5-equivalent file for any file? Just because someone alters a file does not mean they have done anything destructive. Would one be able to take a binary, make a change of some sort, and then run a tool to determine the block of data to add to the binary to both allow the change to take effect and cancel out the MD5 change? How complex would it be to construct this tool?
    • Re:Exploit? (Score:4, Insightful)

      by Ayaress ( 662020 ) on Tuesday December 07, 2004 @04:18PM (#11023775) Journal
      It doesn't have to be harmful to break a ptp system. There's a pretty common exploit on Kazaa where people have a file just containing random junk that registers as a match to a popular file. If you download taht file, and get any portion of it from the fake, the file is corrupted and useless. Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.
      • I say this under the assumtion that "someday" somebody will figure out a way to take a good file and make an MD5 equivalent file to poison a torrent with. The way I read the blurb, it sounds like both files were specifically crafted to create a collision.
      • Re:Exploit? (Score:2, Informative)

        As the length of the file is sent in addition to the MD5, in the vast majority of cases it's going to be impossible to find a file which gives you the same length and MD5. I guess as the size of media files increases this gets more and more likely, but if it ever starts affecting more than 0.0000000001% of files you can just increase the length of the hash.

        • Re:Exploit? (Score:3, Informative)

          by wdr1 ( 31310 ) *
          I thought the same at first, but they have a pretty clear example showing that's not the case. From here [tcs.hut.fi]:

          file1.dat:

          00000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
          00000010 2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
          00000020 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
          00000030 08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
          00000040 96 0b 1d d1 dc 41 7b 9c e4 d8 97 f4 5a 65 55 d5
          00000050 35 73 9a c7 f0 eb fd 0c 30 29 f1 66 d1 09 b1 8f
          00000060 75 27 7f 79 30 d5 5c eb 22 e8 ad b

      • Re: Exploit? (Score:4, Informative)

        by Omniscient Ferret ( 4208 ) on Tuesday December 07, 2004 @08:40PM (#11027281)
        Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.

        This would worry me, except that BT uses SHA1, not MD5, so this is irrelevant. MD5 has seemed suspect for years, & Bram's the sort to pay attention to that sort of thing.

        I checked; Edonkey is based on MD4. Gnutella variants might use MD5.
    • The blurb is unclear, but it sounds like you have to have created the original file with the original intent that a portion would be "swapped". So, a content creator could create a legitimate piece of content and get that reviewed, and then create a malicious piece of content with the same md5. Not quite as nasty as hacking an existing md5'd file, but still nasty.

      Of course, I could be completely misunderstanding the information at hand.
      • Re:Exploit? (Score:3, Insightful)

        by Pxtl ( 151020 )
        Wait - no - I'm completely fscking wrong.

        I think the point was this.

        file1: xxxxxxxxxx
        file2: xxxxxyyyyy

        %file1 = %file2

        now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.

        you get:
        file3: xxxxxyyxxx
        which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.

        Or I still could be wrong.
        • Re:Exploit? (Score:4, Informative)

          by owlstead ( 636356 ) on Tuesday December 07, 2004 @05:00PM (#11024420)
          Yep, you are still wrong. I've tried some code and the (precalculated) different blocks MUST be at the start of the code. So it's more like

          file1: xxxxccccccc....
          file2: yyyyccccccc....

          %file1 = %file2

          Which is the example given in the article.

          However, Wang said she could get to a collision from any intermediate hash code within the hour (according to the article). That would mean:

          file1: ccccxxxxccccccccc......
          file2: ccccyyyyccccccccc......

          %file1 = %file2

          Where xxxx and yyyy are (pre?)calculated and cccc.... is the payload.

          If _I_ am not mistaken.
  • In english (Score:4, Funny)

    by ValuJet ( 587148 ) on Tuesday December 07, 2004 @04:14PM (#11023712)
    Is there a translator from ultra-nerd to english?
    • Re:In english (Score:5, Informative)

      by Anonymous Brave Guy ( 457657 ) on Tuesday December 07, 2004 @04:32PM (#11023990)

      Short version: A common technology for verifying that a file you've downloaded is legitimate and untampered-with, known as MD5, isn't as secure as people thought.

      Slightly longer version: MD5 is a way of generating a checksum -- a single, comparable value -- from a file. Ideally it is supposed to give you different numbers for different files, so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see whether the file you've downloaded has been modified, potentially maliciously.

      The research shows that it is possible for someone to construct a drop-in replacement for the file you thought you had that generates the same MD5 checksum as the original, so anyone attempting to validate the file this way would think they had the real thing. If it turns out that you can construct a damaging replacement for a common file -- perhaps an installer for a popular application like Firefox or OpenOffice that's usually downloaded from a public server -- then this could open a loophole for viruses, worms, etc. that would slip through the security net often used by cautious people when downloading such programs.

      • Re:In english (Score:3, Insightful)

        by MindStalker ( 22827 )
        Important part is you have to construct both files. So if you were the distributor of said common file and people trusted you, you could produce two versions of said file, with same hash. But producing a identical hash to a file you can't change has currently not been successful.
  • Does this mean that I can take a given binary file, like kde.rpm, and change it so that its' contents are different, and harmful, yet the MD5 hash is the same? This has never been done before? Sounds like an interesting 4th-year engineering project to me. I'd think it'd be really easy to mess up a file and keep the same MD5 hash (like screwing up an avi file shared ona p2p network). But to actually change the file so that it behaves a certain way, wow.
  • by overbyj ( 696078 ) on Tuesday December 07, 2004 @04:15PM (#11023722)
    By examining the MD5 hash using a sophisticated Fourier schema followed by deconvolution with a bit binary-inequal collision analysis, it is quite obvious I have no freaking clue what this stuff is about.

    I am glad somebody does.

  • Let's face it (Score:3, Insightful)

    by mordors9 ( 665662 ) on Tuesday December 07, 2004 @04:15PM (#11023724)
    Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".
    • Re:Let's face it (Score:3, Interesting)

      by jd ( 1658 )
      Let's say you break a file into blocks, encrypt those blocks with Rijndael or Serpent using a chaining method that authenticates the prior block, digitally sign the result using (seperately generated) RSA, DSA and ECC signatures in turn, and generate SHA-1 and Whirlpool checksums of both the encrypted and unencrypted file.

      True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claim

      • by j.leidner ( 642936 ) <leidner@NoSpaM.acm.org> on Wednesday December 08, 2004 @02:45AM (#11029878) Homepage Journal
        Let's say you break a file into blocks, encrypt those blocks with Rijndael or Serpent using a chaining method that authenticates the prior block, digitally sign the result using (seperately generated) RSA, DSA and ECC signatures in turn, and generate SHA-1 and Whirlpool checksums of both the encrypted and unencrypted file. True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claimed to be.

        Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.

        What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.

        Another case of (false sense of) security through obscurity?

        --
        Try Nuggets [mynuggets.net], our SMS search engine which uses question answering technology; now available across the UK.

  • md5 vs sha1 vs ? (Score:5, Informative)

    by alexandre ( 53 ) on Tuesday December 07, 2004 @04:17PM (#11023763) Journal
    http://en.wikipedia.org/wiki/Md5

    here is a very good link about the algo... :)
  • by freeze128 ( 544774 ) on Tuesday December 07, 2004 @04:18PM (#11023773)
    If your cursor finds a menu item followed by a dash,
    And your double-clicking icon puts your window in the trash,
    And your data is corrupted 'cause the index doesn't hash,
    Then your situation's hopeless and your system's gonna crash!
  • by Sheetrock ( 152993 ) on Tuesday December 07, 2004 @04:19PM (#11023799) Homepage Journal
    If I'm translating this properly, a malicious person can do two things with this knowledge:

    He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

    Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

    Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.

    • Many passwords are MD5 hashes. You may be able to use this to attack password files, by finding some string that hashes to the same password entry, in a reasonable length of time.


      It's not much of a threat to signature schemes, but there are enough places where MD5 is used as a one-way encryption of an access token that this attack poses a potential security hazard.

      • A lot of routing protocols use MD5 as the password hash for authentication. If someone found a collision for some of the important internet router BGP peers then large chunks of the internet could get routed the wrong way or just small parts like the subnet your bank controls could get routed to a some black hat's basement in Romania where he could setup a mock intranet and webserver to steal customer info through a spoofing attack.
        • Very true, or it could be used to poison the BGP routing tables. The problem with such an attack is that, once one router believes in the poisoned routes, the poison will spread very rapidly.
      • by kiltedtaco ( 213773 ) on Tuesday December 07, 2004 @05:07PM (#11024530) Homepage
        The Wang et al attack does not apply to passwords. Their attack applied to situations where the md5 input plaintext was known. Collisions are nowhere near common enough when using less than 16 character inputs to md5 to provide a feasible means of cracking passwords. Nobody has ever found a collision with under 128 bits of input, and the attacks in the article take considerably more than that.
      • Not really... the article is talking about relatively huge files which when you swap out blocks in order to result in the same MD5. Passwords on the other hand are much shorter than the MD5 in the first place and no two strings shorter than the MD5 length (128 bits) will give you the same MD5. The MD5 space is many orders of magnitude larger than the typical password space - even a 16 character password that uses mixed case, digits and punctuation (basically all of base64), that contains about 96 bits of
    • Couldn't a (newer) P2P app just use two different algorithms to do the checksum? Assuming both use very different schemas, even if one is compromised (like MD5), the other would not be? Not sure how computationally expensive the checksums are, but if this kind of sabotage becomes commonplace, it would make for an easy fix.
      • I was thinking the same thing.

        See the long list of cryptographic hash functions [wikipedia.org] on Wikipedia. (Bottom of page.)

        If you used three or four different hash functions, to produce a long combined message digest (concatenated from the several message digests), it would seem to be much more difficult to produce files that collide. An attack resulting in a collision on one algorithm, is not likely to work for another algorithm. Finding two files that simultaneously fool two or more algorithms seems a much m
    • by chialea ( 8009 ) <chialea@gmail. c o m> on Tuesday December 07, 2004 @04:41PM (#11024117) Homepage
      When you're dealing with cryptography, it should be very, very, very hard to find collisions. If you find enough of them, you can proabably find something bad with the same hash value. For example, if you sign a digital document that says you're going to pay me $1 for my pencil, and I find a suitable hash collision, I could make it look like you signed a promise to pay me $3,000 for some used tissue. I wouldn't rule out that someone could find a harmful collision for a program distributed online, and substitute a trojan. If the prize gives enough reward, people will throw a lot of computational power at it, and will likely hit pay dirt.

      Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.

      The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.

      Lea
    • Dirt (Score:3, Insightful)

      by mfh ( 56 )
      I've always called this dirt. Some folks call it padding, but it's essentially the same. However up until now, no one has known how to do it with md5 and come up with the same sum. This is nothing more than a hack to me.

      The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system g
    • any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints").

      True, it's impossible to have a hash without collisions. However it should be difficult (in a crypto computational complexity sense) to find such collisions. Apparently MD5 has a vulnerability where it is possible to compute in reasonable time sets of bits that can b
    • Let's say I have a system that downloads files over the internet. To prevent man-in-the-middle attacks, I digitally sign the files. This prevents me from having to vet all of the code that deals with local files for buffer overflows. I implement the digital signatures by simply encrypting a hash of the file with an RSA private key when I create the file, and decrypting and verifying the hash on the receiving side.

      If I had decided to use MD5 for the hash in the digital signature, my scheme is now vulnerable
    • You're wrong. (Score:5, Informative)

      by Piquan ( 49943 ) on Tuesday December 07, 2004 @05:04PM (#11024481)

      He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

      Suppose you're storing passwords as encrypted hashes, so that intercepting the hashes doesn't tell you what the password is. But if you can generate a password to match that MD5...

      You know that GPG keys are identified and signed by their MD5 hashes? Suppose that I can generate a GPG key that would be identified as yours, and distributed it.

      Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

      There's a coin-flipping protocol that goes as follows. Suppose that Alice and Bob want to flip a coin (over the Internet), but they don't trust each other.

      1. Alice generates a file with random data.
      2. Alice sends Bob the MD5 hash of the file.
      3. Bob picks a bit in the file, and whether he thinks it's a 0 or a 1.
      4. Bob wins if and only if he picked right.
      5. To verify, Alice sends Bob the file she generated at the beginning.

      Now, suppose that Alice generated multiple files in step 1. When Bob makes his guess, she tries to pick a file that will make her win. If she generated only two files, completely randomly, this would let Alice win 75% of the time.

      These are just the first ideas I thought of. If I were looking for other problems, I'd think about undeniable signatures [rsasecurity.com], keysigning (which as GPG and X.509 SSL are heavily based on) and other specialized signature systems. In particular, I expect that the first type of crack could cause issues with SSH keys, both user keys (used for authentication) and host keys (to prevent man-in-the-middle attacks).

      Digital signatures are used for much more than just testing for file tampering.

      • Re:You're wrong. (Score:3, Informative)

        by Gemini ( 32631 )
        You know that GPG keys are identified and signed by their MD5 hashes?

        They're not. The old PGP 2.6 keys are, but GnuPG generates OpenPGP keys that use SHA-1.

        GnuPG will use an already generated PGP 2.6 key, but will not make more of them.
  • Not just MD5 (Score:3, Interesting)

    by PureFiction ( 10256 ) on Tuesday December 07, 2004 @04:20PM (#11023806)
    Other weaknesses were reported in various other secure digests, including MD4, RIPEMD, HAVAL-128, SHA-0.

    SHA-256 is a good replacement. SHA-1 should be fine but if you are going through the trouble of an upgrade, why not make it sufficiently future proof?
    • Problem with SHA-256 (and SHA-512) is that they're not considered sufficiently verified. The modification used to add the extra bits may or may not be sound. It's probably OK, but there's no assurance of that.

      Whirlpool (another 256-bit hash) is another excellent hash, but again isn't considered adequately tested to be considered "safe".

      Of course, you could opt for an unconventional hash. There are some algorithms based on cellular automata. Although some have known weaknesses, it would seem that this wo

    • Re:Not just MD5 (Score:3, Informative)

      by chialea ( 8009 )
      SHA-1 has an attack that's somewhat troubling. I'd look to next year's crypto and eurocrypt conferences to see starts shaking out as the new standard.

      Still... I would switch out MD5 if you have a target that's worth pretty much anything at all. After a break like this, I'd expect MD5 to become basically useless pretty fast. Of course, I don't work in hash collisions, I work mostly in protocols...

      Lea
  • by Tackhead ( 54550 ) on Tuesday December 07, 2004 @04:21PM (#11023826)
    If I had a million terabytes of storage, y'know what I'd do?

    Two files with the same MD5 hash at once. Aaw yeah.

  • by Effugas ( 2378 ) * on Tuesday December 07, 2004 @04:22PM (#11023859) Homepage
    [This is the author]

    I've been doing some analysis on MD5 collision announced by Wang et al. Short version: Yes, Virginia, there is no such thing as a safe hash collision -- at least in a function that's specified to be cryptographically secure. The full details may be acquired at the following link:

    http://www.doxpara.com/md5_someday.pdf

    A tool, Stripwire, has been assembled to demonstrate some of the attacks described in the paper. It may be acquired at the following address:

    http://www.doxpara.com/stripwire-1.1.tar.gz

    Incidentally, the expectations management is by no means accidental -- the paper's titled "MD5 To Be Considered Harmful Someday" for a reason. Some people have said there's no applied implications to Joux and Wang's research. They're wrong; arbitrary payloads can be successfully integrated into a hash collision. But the attacks are not wildly practical, and in most cases exposure remains thankfully limited, for now. But the risks are real enough that responsible engineers should take note: This is not merely an academic threat, systems designed with MD5 now need to take far more care than they would if they were employing an unbroken hashing algorithm, and the problems are only going to get worse.

    Some highlights from the paper:

    * The attack itself is pretty limited -- essentially, we can create "doppelganger" blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum.

    * MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. This leads to...

    * Attacks are possible using only the proof of concept test vectors released by Wang -- the actual attack is not necessary.

    * Stripwire emits two binary packages. They both contain an arbitrary payload, but the payload is encrypted with AES. Only one of the packages ("Fire") is decryptable and thus dangerous; the other ("Ice") shields its data behind AES. Both files share the same MD5 hash.

    * Digital Signature systems are vulnerable, as they almost always sign a hashed representation of data rather than the data itself.

    * This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.

    * HMAC's invulnerability has been slightly overstated. It's definitely possible, given the key, to create two datasets with the same HMAC. Attacker possession of the key violates MAC presumptions, so the impact of this is particularly questionable.

    * Very interesting possibilities open up once the full attack is made available -- among other things, we can create self-decrypting executables (fire.exe and ice.exe) that exhibit differential behavior based on their internal colliding payloads. They'll still have the same MD5 hash.

    * Several doppelgangers may (relatively quickly, as per Joux) be computed within a single multicollision-friendly block. As such, the particular selection of doppelganger sets within a file can itself be made to represent data. It's relatively straightforward to embed a 128 bit signature inside an arbitrary file, in such a way that no matter the value of the signature, a constant MD5 hash is maintained. This is curiously steganographic.

    * Many popular P2P networks (and innumerable distributed content databases) use MD5 hashes as both a reliable search handle and a mechanism to ensure file integrity. This makes them blind to any sign
    • It seems to me that at least the malicious nature of this vulnerability can be limited by using the size of the file as an additional check besides the MD5 hash. In other words, if you know how large the file is supposed to be in bytes, then there is less likelihood that something malicious can be passed off as the original - even if it has the same MD5.
      Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?
      • File size remains constant between Fire and Ice. Good thinking, though.

        The solution is to not use MD5.
        • The file size does indeed remain constant, but the attacker is constrained as to what he can substitute out of the original file while maintaining both the file size and the MD5 hash. Without the file size remaining constant too, the attacker can theoretically make the original file into a trojan. With the file size remaining constant, most likely the only thing an attacker can do is break the file - he doesn't have enough leeway to put something malicious in there. I think.
    • I know a fair amount about this stuff, but obviously not as much as you...

      What do you think are the most viable alternatives? It seems to me that SHA-1 would suffer similar vulnerabilities. Does SHA-1 suffer from the appendable cascade issue?

      Do you think there is any way to avoid this kind of problem with hashes? I'm not really aware of any alternate techniques that wouldn't suffer from this same kind of attack eventually. Sure, you could develop related algorithms that increase the hash size, but the
      • by Effugas ( 2378 ) *
        SHA-1 also uses a Merkle-Damgard construction, so yes, if we find a SHA-1 collision, it'll be appendable.

        SHA-1 has a much stronger design. It's starting to show cracks, though, so I don't recommend anything. Something based on AES will come, though -- maybe AES-OMAC, maybe Whirlpool. At the core of almost every hashing algorithm is just a block cipher anyway...

        --Dan
  • Was there a cash money prize for someone who could produce a pair of strings with the same MD5 sum? I remember reading or possibly hearing that No One had Ever discovered a hash collision with MD5. Is this the first ever hash collision found?

    --grendel drago
  • by l4m3z0r ( 799504 ) <kevin AT uberstyle DOT net> on Tuesday December 07, 2004 @04:27PM (#11023908)
    Really no big surprise, all currently implemented algorithms for security and digital signatures will be rendered useless or easily hackable in the future. The fact that a compromise seems to be soon is the part thats interesting.. however the parent fails to address how serious it is.

    For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...

  • by NZheretic ( 23872 ) on Tuesday December 07, 2004 @04:27PM (#11023910) Homepage Journal
    This kind of attack can be mitigated into non-existance by just using two dissimilar hash algorithms.

    Using MD5 with SHA1, or even the older MD2 or MD4 will reduce the probability of creating a compatable binary with the same checksum to virtually zero.

    If only one checksum is required then just XOR the resulting checksums from each algorithm.

  • Can someone explain to me how this is something more complicated than the birthday paradox?

    (heh, Wang)
  • I've made the switch to SHA1 for all my website work where I'm storing passwords and session IDs and the like. It's just a simple change from 32 to 40 characters, and a search/replace for:
    md5_hex to sha1_hex
    • It's also safe to truncate 40 byte SHA-1 down to 32 byte if need be.
    • I was thinking about this and I wonder how insecure MD5 actually is when you are feeding it pseudo-random data (to make a session ID) or to store a user's password. I don't believe those kinds of things are vulnerable to any known form of attack other than brute force.

      Better safe than sorry though I guess... SHA-1 is just fine for me.
  • by Anonymous Coward
    I was a bit suprised recently when I tried to
    sort 15,000 jpg image files to remove duplicates.
    Since the names varied, and it is not uncommon
    to have many images with the same length, it
    seemed like a good idea to use md5 hashes.
    So I coded it up in python to do the md5 hash,
    and then stuffed the file name into a table
    keyed by the md5 hash. Big suprise when multiple
    different files landed in the same hash.
    Some property of jpgs probably reduces the
    randomness of the files and increases the
    probability of hash collis
  • Testing the equality of hash codes is not a replacement for testing equality. It should be used as an optimization to limit the set of items for which equals need be applied. e.g.

    Thing search(Thing thingToFind) {
    thingList = hashSearch(hash(thingToFind));
    for each thing in thingList
    if (thing == myThing) // IMPORTANT!
    return thing;
    return null;
    }
  • Think of the children! We cannot allow some MD5 to put our children in the harm's way! Initiate operation AntiMD5, deploy the bombers.

  • Magnet links [sourceforge.net] is an open standard (specification draft [sourceforge.net]) and an alternative to primarly eDonkey2000, eMule, and Overnet hashes that are based on a precursor to the MD5 algorithm, and I doubt very safe anymore?

    I personally think any clients that don't support magnet links should really start consider adopting those -- DC++, most Gnutella clients, and Shareaza already do.

    A magnet link can look like something like this:

    magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO 5C

    ... see "sha1" above -- the stan

  • Like, say, Windows?

    Yeah, yeah, Obligatory Microsoft Slam. But I don't see any great stampede to isolate and neutralize the threat that this Typhoid Mary of OS's presents to communication stability. Why are we preparing for a threat that barely exists, when we refuse to deal with the present, much greater threat?
  • by davidwr ( 791652 ) on Tuesday December 07, 2004 @04:48PM (#11024217) Homepage Journal
    I haven't had time to think this out, so I'm throwing it out for you guys:

    Is doing a 2nd pass helpful?

    In other words, if
    ABCfireDEF
    and
    ABCiceeDEF
    hash the same, does
    ABCfireDEFABCfireDEF
    necessarily, or even frequently, hash to
    ABCiceeABCiceeDEF?

    Even if it were to provide protection, in practical terms,
    1) other hashing althorithms are likely faster than "MD5 times two"
    2) there may be some - hopefully a lot fewer - places in a long file where where
    ABCfireDEFABCfireDEF
    can be replaced with
    ABCiceeDEFABCmeltDEF

    I'm beginning to like the "pick two algorithms and call me in the morning" approach.
  • really. and it's about time this stupid meme died. it's all too easy to toss it about for dramatic effect when attention whoring.
  • by syousef ( 465911 ) on Tuesday December 07, 2004 @05:40PM (#11025057) Journal
    Be honest. How many of you have checked the MD5 sums on a file with a TRUSTED source, as opposed to from the same source you got the file? How many of you do this regularly?

    THAT is the biggest problem with MD5 for most users.
  • MD6 (Score:3, Funny)

    by Danathar ( 267989 ) on Tuesday December 07, 2004 @09:45PM (#11027851) Journal
    That's OK..just invent MD6!

This is clearly another case of too many mad scientists, and not enough hunchbacks.

Working...