Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Encryption Security AMD Software

RC4 Code Achieves 319 MB/s On AMD64 Opteron 177

Marc Bevand writes "This recent paper is about optimizing RC4 for AMD64 processors. A working implementation is provided. Its encryption/decryption throughput reaches 319 MB/s on a single AMD Opteron x44 processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas." An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
This discussion has been archived. No new comments can be posted.

RC4 Code Achieves 319 MB/s On AMD64 Opteron

Comments Filter:
  • by datajack ( 17285 ) on Tuesday November 02, 2004 @06:27AM (#10698584)
    I was initially disappointed with the performance of my Athlon64. CPU intensive 64bit code often seemed much slower than it's (heavily optimised) 32bit counterpart.

    Every now & then I come across some code optimised for 64bit processors, and it just flies - as more & more stuff gets the treatment, it will be like upgradingin for free :)
    • by Savage-Rabbit ( 308260 ) on Tuesday November 02, 2004 @06:50AM (#10698658)
      it will be like upgradingin for free :)

      Just don't get too excited. One of my coworkers made this same discovery a while back. Now he runs around the office wearing an "I love Opteron" T-Shirt and starts shouting"Intel is history - Power PC is dead!" everytime somebody mentions the words Opteron or AMD in a sentence. Worst of all he attacks anybody who disagrees and tries to bite them. We tried to knock him out with a dart gun after he savaged a visiting IBM sales rep but even heavy duty veterinary tranqulisers don't seem to have any effect.

      :-D
  • until (Score:4, Insightful)

    by iamnotacrook ( 816556 ) on Tuesday November 02, 2004 @06:27AM (#10698586)
    amd decides to provide a compiler for its chip, optimization will always be behind intel (who do. for linux also).
    • Re:until (Score:4, Insightful)

      by isometrick ( 817436 ) on Tuesday November 02, 2004 @06:38AM (#10698620)
      I agree, to an extent. It's been said [coyotegulch.com] that Intel's compiler can outdo GCC in some performance benchmarks.

      GCC is no slouch though, and obviously Intel is performing some tricks that could also be implemented by GCC.

      I think it'd be a great move for AMD to work WITH GNU to optimize 64-bit AMD code from GCC.

      Seems like Intel is more prone to keeping secrets when it comes to processors. Maybe this is (yet another) way for AMD to give them a run for their money.
    • ICC really sucks for compatibility ....

      For example, my code performs 5 times faster when compiled with gcc than when compiled with ICC ...

      Ok, maybe I'm a special case (I use computed GOTO). But you can't compile the kernel either :)
      • Actually, you can compile the kernel with ICC, but it requires patching and a program between make and ICC. I read how to do it a magazine somewhere, don't have it on hand right now.
    • Re:until (Score:3, Interesting)

      by ceeam ( 39911 )
      Dare I say that the fact that Intel produces a kick-ass compiler (for certain tasks anyway) has nothing to do with the fact that the same company produces CPUs. BTW - currently (AFAIK) the compiler is developed in Russia whereas chips design is done at "traditional" sites.
      PS: Oh, of course, Intel compiler won't ever support 3dnow, but that's the issue with sponsorship. I mean - AMD don't have to design the compiler themselves. They will be equally ok with sponsoring someone who knows how to do that.
  • Somewhat OT, but... (Score:5, Informative)

    by bhtooefr ( 649901 ) <bhtooefr@bhtoo[ ].org ['efr' in gap]> on Tuesday November 02, 2004 @06:34AM (#10698605) Homepage Journal
    If all a machine is doing is encrypting, A64s and Opterons are a bit overkill. The VIA C3 C5P has an encryption engine that makes top-of-the-line processors look sad. I couldn't find results for RC4, but is a page from a review of the EPIA MII-12000 which shows AES results. First graph is EPIAs in software, second is a few Intel and AMD CPUs (software), and the MII-12000 in software (which gets creamed by the AXP 2500+ and the P4@2.4) and hardware (which totally obliterates everything). [mini-itx.com]
    • by MrNemesis ( 587188 ) on Tuesday November 02, 2004 @07:01AM (#10698695) Homepage Journal
      AFAIK, the VIA's *only* do AES, as they're designed to make good VPN endpoints. This is cos some hefty AES subroutines are built into the hardware (with software drivers doing the rest).

      So whilst this is all very handy, if you want encryption other than AES (which, if there were ever any significant flaws found in AES' maths, is a certainty) you'd want to dump those VIA boards and get yourself either a dedicated encryption device like an Encipher box (like an expensive version of the VIA) or just a beast of a machine to do encryption entirely in software (like an Opteron).

      I personally shunt everything through DSA stunnels, so a VIA isn't much use to me.
      • by mczak ( 575986 ) on Tuesday November 02, 2004 @07:28AM (#10698787)
        AFAIK, the VIA's *only* do AES, as they're designed to make good VPN endpoints. This is cos some hefty AES subroutines are built into the hardware (with software drivers doing the rest).
        True. VIA padlock (as they call it) can currently only do AES in hardware (and it can also generate true random numbers). The next VIA chip called C7 (C5J Esther) however should be able to also do SHA-1, SHA-256 and parts of RSA in hardware (I think it should be available first half of 2005). That's of course still a limited set of encryption algorithms, but it's certainly an improvement.
        • Wow, I didn't know that. Nice to see that VIA is taking the encryption market seriously, especially in the Linux arena (IIRC they opened the specs to their encryption engine, right? There's definitely support for it available in the kernel via a patch to the Crypto API). As you say, it's not full blown RSA, DSA and MD5 in hardware, but it's a start.

          Now if only they'd be as nice with the damned CLE266 graphics drivers...
        • I should say that from in investment standpoint, having 2-3 different algorithms in hardware would mitigate the obselecence issue with discovered weaknesses.

          Excuse my ignorance here, but are these chips on an expansion card or can you find motherboards with them?

          • You can get those via cpus as "normal" cpus, (via C3), they fit into socket 370 boards. Standalone via cpus are not very popular, but the VIA mini-itx boards which have them soldered directly to the board surely are (via calls them "Eden" cpus, but it's just the same cpu in a different package). Not all VIA C3 (or Eden) cpus have padlock, only C5P "Nehemiah" have - via did not change the "public" name for newer cpu cores, and it's possible you can still get older ones. Good for small, quiet, cheap home-grow
      • if there were ever any significant flaws found in AES' maths

        I would just like to object to hearing this all the time. Sure, it's POSSIBLE that AES will be found vunerable, but quite unlikely. The government agency that selected and approved of AES are the same ones who approved of DES, oh so many years ago. I think that alone means it deserves the benefit of the doubt.

        Of course, it's still POSSIBLE, but hearing the same questions about it repeated so often, gives the wrong impression.

        • by swillden ( 191260 ) *

          The government agency that selected and approved of AES are the same ones who approved of DES, oh so many years ago.

          And the same ones who were apparently surprised when flaws were found in SHA-1, which they also selected and approved. And the same ones who developed the Law Enforcement Access Field (LEAF) for Clipper, which was quickly broken by Matt Blaze.

          Thirty years ago when the NSA fixed IBM's Lucifer, which became DES, the NSA clearly had a huge amount of cryptologic knowledge that the public res

          • the NSA clearly had a huge amount of cryptologic knowledge that the public research community did not

            Yet you think the NSA forgot much of that?

            DES has never been broken, therefore they know enough to thwart even the most advanced researchers today.

            But you're convinced, this time around, they don't know enough to do that again?

            SHA-1 and LEAF are completely different subject, really. If you want to talk about Clipper, talk about Skipjack, which hasn't been found vulnerable yet.

            it's entirely feasible fo

            • Yet you think the NSA forgot much of that?

              Nope. I think the public cryptologists caught up (or close to it).

              DES has never been broken, therefore they know enough to thwart even the most advanced researchers today.

              And what about tomorrow?

              Even if you can break all but the last round, it's still every bit as secure. Even if you can break all but one round, does not mean it's possible to extend the same or similar method to break that last round. Skipjack is again a good example, is it has just enou

      • I personally shunt everything through DSA stunnels

        You encrypt your data with the Digital Signature Algorithm? Good trick, that. Gotta be horribly slow, though.

        Actually, you don't do this. You use DSA to validate DH public keys, use DH to establish a shared secret and use something like RC4 or some block cipher to actually do the bulk encryption. Or maybe you use RSA instead of DSA/DH, or maybe even El Gamal, but you definitely don't use DSA for bulk encryption.

        It's actually quite likely that you

        • Hehe, well done for pointing out my rather flaky knowledge of crypto and TLA's. I stand corrected, and thanks for explaining how it actually works!.

          What I should have said was; everything gets thrown through SSH tunnels and I'd love to see an acceleration of whatever it is that SSH uses, as well as acceleration for creating those huge RSA/DSA keys we use all the time, which are slow to generate even on a dual Athlon 2000. And maybe better use of those RNG's that some of the VIA and AMD chipsets use.

          I have
          • everything gets thrown through SSH tunnels and I'd love to see an acceleration of whatever it is that SSH uses, as well as acceleration for creating those huge RSA/DSA keys we use all the time

            Well, get an Opteron, install the tuned RC4 implementation and configure stunnel to prefer RC4 and you'll have no problems with throughput. The tuned RC4 won't speed up session startup because that's all public-key stuff. Large integer math libs could really benefit from tuning on 64-bit registers, though.

            As far

  • by cheezemonkhai ( 638797 ) on Tuesday November 02, 2004 @06:35AM (#10698608) Homepage
    Don't get me wrong it's good that code is optimised, but I think that RC4 would fly faster on an IA64 than an opteron if specifically optimised to take advantage of the CPU's features.

    RC4 isn't really that relavent in real life as wep is crap & also easily done in hardware anyway.

    The 64 bit advantage will suffer thesame fate as the 32bit advantage did for the 486, pentium & especially the Pentium Pro.

    486 = 32bits, faster but people still bought 386's due to cost.

    Pentium = 32bits, sometimes faster but again costs meant 486's stayed popular.

    Pentium Pro = 32bit, 16 bit instrucations stalled it. WHen running pure 32bit code ran like the dogs, when running 16bit code (win 98) ran like a dog.

    Problem is that your generally better off saving your cash, buying a cheap CPU (32bit in this case) and waiting for the 2nd/3rd Generation CPU. By that time prices will more reasonable and you will see the full advantages as programs will use the extra bits properly.

    I mean come on MS still hasn't released a final AMD64 version of Winblows yet.

    • I think that RC4 would fly faster on an IA64 than an opteron

      So this code should run directly on an Pentium IV with EM64T. Anybody tried it, yet? How about trying it with the Intel C compiler. Most benchmarks use the Intel compiler, even on AMD CPUs because its so much better than GCC.

      I don't buy the argument that its the extra registers, because there have been over 56 registers available for register renaming since the early-mid 90's.

      • I don't buy the argument that its the extra registers, because there have been over 56 registers available for register renaming since the early-mid 90's.

        I'm no expert, however, from what I understand from the bit if reading I've done and the bit of assembler I've done, it isn't the number of registers on the chip, it is the number of registers available to the user of the chip.

        For example, on the classic 32 bit X86, there are only four general purpose registers - EAX, EBX, ECX and EDX. If you want to

        • Regural x86 has 8 GP-register, AMD64 has 16.
          • Regural x86 has 8 GP-register, AMD64 has 16.

            Not if you want to actually use the stack pointer and your stack-frame base pointer; you have 4 GP regs (EAX .. EDX), two kinda- specific-purpose regs (ESI, EDI), one crippled-kinda-general-purpose-pointer reg (EBP) and one specific-purpose register (ESP).

            AND, if you want to do multiplications and divisions (the worst offenders, IMO), then two of the GP registers are already spoken for (EAX, EDX).

            So actually, the grandparent poster was right.

            -gus
            • Don't forget that many if not most x86 instructions require that source and/or destination registers be fixed, so your data is always in AX or always put in DX, et cetera, and mul and div aren't the only cases of this. Plus, you MUST use the source and destination registers for copies... So arguably, none of the registers are truly general purpose :P
        • If there is not enough user-visible registers, memory operands are used instead. However, these probably fits in the L1 cache anyway, and L1 cache is very fast (IIRC it is just one clock cycle on some CPUs). Also, the CPU can see that a later load depends on a preceding store, and the load can get the result from the preceding store directly. IIRC current processors already do this. So more user-visible general-purpose registers just reduces the number of load operations for the CPU to process, and sinc
    • by joib ( 70841 ) on Tuesday November 02, 2004 @06:47AM (#10698647)

      486 = 32bits, faster but people still bought 386's due to cost.


      The 386 was also a 32-bit processor...
    • The Althon 64 starts at $141 on NewEgg

      See for yourself [newegg.com]

      And BTW, the 386 was 32 bit.

    • by DigitumDei ( 578031 ) on Tuesday November 02, 2004 @07:40AM (#10698826) Homepage Journal

      I just bought a new PC, and when compaired to all the available options, the the AMD64 option (I got an AMD64 2800+) was best. Slightly more expensive than the equivalent XP, cheaper than the p4. And they run so cool, its the first PC I've had in years where I don't have to worry about the temperature. When I bought an XP 2600+ last year, I spent almost half the chips price again on cooling.

      Just because I'm running a 32bit win XP on it doesn't make it a bad purchase.

      Also, I'm one of those people who bought a 386 instead of a 486 (then later a 486 instead of a pentium 1) because of the price difference. The price difference nowadays is nowhere near comparable to what it was then.

    • by Bert64 ( 520050 ) <.moc.eeznerif.todhsals. .ta. .treb.> on Tuesday November 02, 2004 @07:53AM (#10698874) Homepage
      Actually, the majority of SSL websites are using RC4..
      If you use Mozilla and Apache, you can use 256-bit AES encryption for SSL (try loading up paypal with a mozilla based browser) but if either the server or client is microsoft-based your stuck with the much weaker 128bit RC4...
      MS - always behind the curve, no 256bit encryption, no 64bit os
    • The 64 bit advantage will suffer thesame fate as the 32bit advantage did for the 486, pentium & especially the Pentium Pro.

      What fate would that be? [intel.com]In 1985 I could see buying into a 286 simply because there was really no support for 32bit protected mode let alone expanded memory. Hell extended memory was barely supported. Even in 1990 I could see buying into a 286 if it would save you money. Dos 4.0 was a bug ridden piece of filth and there still was not alot of support for 32bit protected mode.
      • Motorola on the other hand designed their 68000 was designed to be a 32bit chip from the get go, which I believe was first introduced in 1979 or so, at least according to my data book titled "break away from the past". Makes you wonder why anyone thought it was a good idea to use the 8086 for the PC.

        I've spoken to some of the people that made this decision for various companies (e.g. Raytheon). The general consensus was that the difference between the 68K and the x86 was "night and day", but that the Intel

    • I don't quite get your point.

      64 bit systems aren't exactly new, they've been around for ages and the apps have been there as well. The fairly cheap (at the time) DEC alpha series & (way cheaper) assorted clones popularized them further.

      I currently run a 64 bit AMD cpu and my system and all my applications are 64 bit. It's quite easy to run a 64 bit system if you want/need one. You can even tweak your system so it runs 32 bit apps in case you have some old stuff lying around.

      Or you can go get a ready
    • The Pentium Pro ran 16 bit code slowly; 32 bit code ran quite well. However at the time Windows still had a lot of 16 bit code, and so did most major apps. The Pentium Pro did not run faster than the much cheaper Pentium processors that were also available at the time.

      The Athlon 64 architecture currently runs many or most 32 bit applications faster than comparable Intel processors, and is competitively priced. The ability to run 64 bit code is more like a bonus. This seems more comparable to the Pentiu
      • The Athlon 64 architecture currently runs many or most 32 bit applications faster than comparable Intel processors, and is competitively priced. The ability to run 64 bit code is more like a bonus. This seems more comparable to the Pentium II, which was an extremely successful CPU architecture.

        AMD was smart (as in business-smart) by providing a very easy upgrade path from 32 to 64 bit CPUs. I now own an Opteron CPU, and it is a very sweet chip that runs regular old 32-bit WinXP very nicely (as well as m
    • "Don't get me wrong it's good that code is optimised, but I think that RC4 would fly faster on an IA64 than an opteron if specifically optimised to take advantage of the CPU's features."

      Opterons are much cheaper then IA-64, and they run 32-bit x86 stuff at full speed. They make porting application easy because, it's still x86. So whether or not the Itanium is faster/better, is moot. They are way expensive and way nitche.

      "RC4 isn't really that relavent in real life as wep is crap & also easily don
    • I don't know jack about RC4 but unless it uses floating point it's going to be slower on itanic than on opteron. If you can perform RC4 efficiently using floating point mathematics then itanic will probably whip opteron.
  • by Vo0k ( 760020 ) on Tuesday November 02, 2004 @06:45AM (#10698639) Journal
    ...to allow DRM encryption of movies to become standard :)
  • PowerPC G5 (Score:5, Interesting)

    by TiMac ( 621390 ) on Tuesday November 02, 2004 @06:46AM (#10698643)
    Who wants to optimize RC4 for the PowerPC G5 chip (64-bit implementation) and do a bake-off? Hand-coding PPC assembly doesn't sound as fun as this PHP I'm working on at the moment, so someone else will have to tackle that!
    • Re:PowerPC G5 (Score:2, Insightful)

      by fizze ( 610734 )
      I dont know why everyone jumps off the horse as soon as they hear the magic word "assembly".
      Seriously.
      If you want to get 110% out of your hardware, you have to put effort in, to get effort out. Makes sense, doesnt it ?

      Im not saying people who dont like ASM are sissies, not at all. But Im saying that assembly has its right, just as so many other programing languages.
      • Re:PowerPC G5 (Score:3, Insightful)

        by TiMac ( 621390 )
        Indeed.

        But when other projects beckon that don't require assembler work, I'm not about to jump on one that does for "fun" either ;)

  • once intel really puts some muscle behing the 64 bit desktop i'm sure we'll start to see loads of new apps compiled for the platform... aside from your os and rare app... games could have a lot to benefit from the extra performance and amd's line has been very well received (and is currently embarassing intel)... it's still nice to know you can do something super fast with your 64's
  • chip names (Score:4, Funny)

    by Pompatus ( 642396 ) on Tuesday November 02, 2004 @06:52AM (#10698667) Journal
    I'm holding out on the 64 bit systems until amd starts naming the chips commodore.
  • well... (Score:4, Insightful)

    by mx.2000 ( 788662 ) <mx.2000@nOsPAM.gmail.com> on Tuesday November 02, 2004 @06:57AM (#10698685)
    "I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas."

    well, maybe in some areas.
    Since this is a cipher, it obviously helps a lot when you can work on 64-Bit chunks of data instead of 32-Bit.

    The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.

    But other than that, I don't see much room for huge speedups.
    • See my earlier post [slashdot.org] as to why.

      • But, since the output is stored in one of the source operands, couldn't you do something like:
        MUL eax, ebx
        MUL eax, ecx
        MUL eax, edx
        leaving the solution stored in eax?
        • AFAIK MUL takes two 32-bit values and produces a 64-bit value, which is stored in edx/eax. Therefore you have to do the edx multiplication first. Other than that, this sequence should work.
        • Heh, you've hit the edge of my assembler knowledge, and I didn't think the example through that well..

          However, the point I was trying to show was that on a processor with additional GP registers, you would be able to add to your example

          mul eax, eex

          If such an "eex" register existed, instead of

          mov <mem location>, ebx
          mul eax, ebx

          In other words, the additional GP registers allow both the number of "mov" instructions, and the delays they cause, to be reduced,.

          • This is pushing my knowledge now, but depending on how many cycles it takes, out-of-order execution could be used to load the new values from cache without a significant performance hit.

            Of course, I understand what you are trying to say.
    • The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.

      Depends on what you're doing. An add yes, instead of ab + cd you'd have a+c,b+d (plus some overflow flags). ab * cd? a*c + a*d + b*c + b*d (with appropriate magnitudes, of course).

      Still, cryptograhpy is still ideal for going 64-bit. Most other apps won't be significant, it is the added GP registers (which have nothing to do with 64 b
    • 64-bit operations (addition and multiplication) are much faster on a 64-bit CPU such as Athlon 64, for on a 32-bit CPU they have to be emulated in software using multiple instructions, which is slower than the "hardware-accelerated" way in 64-bit CPUs.
      • If you write using generic code without need for carefully crafted data types then the compiler should compile it as you described. Unfortunately I see code all the time that assumes 32 bit ints and they are a real bugger to port to 64 bit.

        It really take knowledge and skill to write portable code that makes few assumptions about hardware. Porting for OpenOffice.org 64 bit has been worked on for about 18 months. Hopefully 128bit will not be as hard. See the code for dates that is not Y2K compliant writt
    • Sure, most of the apps we use today might not get a HUGE performance increase from 64-bit x86. However.

      Can you imagine a 16-bit version of Office 2003? Or a media player? Or any of the other pretty heavy apps you run now a days?

      A 64-bit platform opens new doors for doing things that would require a much faster IA-32 chip to perform. Since we're not going to be seeing the huge Ghz increases in clock speed for awhile, it's a decent thing to focus on.
  • that's good (Score:2, Interesting)

    by b100dian ( 771163 )
    That's good because is yet another pace in the direction when all information (http, smtp etc.) will travel encrypted (since today only some pages are served this way, because of the processor loads)
    and because everytime we hear good about AMD we're happy:P

    Everybody'll get TLS'ed
  • by Space_Soldier ( 628825 ) <not4_u@hotmail.com> on Tuesday November 02, 2004 @07:19AM (#10698752)
    I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.
    • I would tend to disagree with that. While one should weigh the performance against an overabundance of features, overzealous optimization can also result in problems. Remember that Knuth said "Premature optimization is the root of all evil."

      If someone took your idea to the extreme, you might get something like this:
      "What does it do?"
      "Nothing, but look how *fast* it does it?"

      I think the best solution is moderation in both ends.
    • Look at my super-fast text editor! Yes, it doesn't have many features; indeed what it does is to show you the old version, and then lets you type in the new one, so it also misses the features found usually in text editors like only changing part of the text without touching the rest at all (which would save you from typing the whole file again if you just want to change one character), but it can be proven that this is not strictly necessary, but you can do all the same with the interface "show old version
    • Buy a games console then =]
    • I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.

      Want to explain your logic? It seems to me it'd be a once-off win (as everyone switches to focus on optimization), and then business as usual. Think it through:

      1. You have a computer capable of running the software you have at a level of performance you're happy with.
      2. All you software gets magically optimized, obviously y
    • It really is a free market. If people refuse to pay for features because performance is poor then you will see a change. Currently it is easier to buy a new machine every 2-3 years and people expect that redundancy, I even advise it.

      I think with the lack of upgrades to Windows you are starting to see this effect happening. People are simply sticking to what they have. Microsoft (as an example only) will have to consider performance gains on existing hardware as a marketable thing soon.
  • by cgenman ( 325138 )
    Wow, if RC4 is this much faster, just wait until they get to their Gold Master!

  • AMD really need to look at creating a multi-OS optimised compiler. Or activly support the GNU / gcc so that anyone can compile binaries that are compiled specifically for the AMD-64/Athlon whatever.

    Then all the coders need to do is write the code that can be optimised best. The Intel C compiler does magic on intel processors in linux etc the performance difference is clear.

  • ...crucial code, and assembly language monkeys are still worth having around =) .

    I don't see the big deal here. I'd like to see what this algorithm would do if fully-optimized on the other processors out there, including the 64-bit G5. Maybe even better, use an algorithm that would have more practical value (wasn't RC4 cracked a while back already?) Try cracking MD5 or SHA-1 or something...
  • model name : AMD Opteron(tm) Processor 248
    stepping : 10
    cpu MHz : 2191.201
    cache size : 1024 KB

    rc4speed :
    Doing RC4_set_key for 5 seconds
    3429648 RC4_set_key's in 4.97 seconds
    Doing RC4 on 1024 byte blocks for 5 seconds
    1998887 RC4's of 1024 byte blocks in 4.97 second
    RC4 set_key per sec = 690070.02 ( 1.449uS)
    RC4 bytes per sec = 411843116.30 ( 0.019uS)

    The interesting thing is that the Opteron 248 CPU is faster than just clock cycles (using timothy's code)

    319*(2.2/1.8)=390 411

  • by SiliconEntity ( 448450 ) on Tuesday November 02, 2004 @12:03PM (#10700717)
    The RC4 stream cipher has a number of weaknesses. See Itsik Mantin's RC4 page [weizmann.ac.il]; he is a crypto student who did his master's thesis on RC4. Among other weaknesses, the 2nd byte of the output is twice as likely to match the plaintext as it should be; there are weak keys; and it is possible to distinguish the output from randomness. Some of the attacks are practical and have been used to break the WEP wireless encryption algorithm, which uses RC4.

    If you really need speed, you can use RC4 securely but you have to know what you are doing and be aware of these attacks so you can employ protective countermeasures. Otherwise you are better off to use a cipher like AES which is actually secure.

    • If you really need speed, and you're using a heavy duty CPU, you can do better than RC4 anyhow. RC4 manipulates 1-byte quantities. There are other stream ciphers that take about the same number of instructions but work on 4-byte or 8-byte quantities.
  • The benefits seen in this optimization are largely parallel with the G5. If Apple can get things optimized for 64bit computing that can take advantage of it, we will see great things. In many cases, they already have...

    I just think it's great that AMD is making such strides... for being a Mac guy, I pull for them in the PC world. What can I say? I like the underdog story.

  • Did anybody try to run this on a 64-bit Xeon yet? This kind of algorithm naturally wants 64 bit data types. It would be interesting to compare Xeon vs. Opteron performance here.

Single tasking: Just Say No.

Working...