Forgot your password?
typodupeerror
This discussion has been archived. No new comments can be posted.

AMD Unveils SSE5 Instruction Set

Comments Filter:
  • Re:APL (Score:2, Interesting)

    by Ilyon (1150115) on Friday August 31, 2007 @02:06AM (#20421305)
    I would say APL has always been compatible with the various vector/parallel machine languages. With the general but precise nature of APL expression, it should be easy to generically and efficiently parallelize/vectorize any APL interpreter for any machine architecture. Is there much activity in marketing of current APL products? It seems like IBM is doing nothing more than supporting existing customers. Jim Brown and company established SmartArrays, which caters a specific C APL library to specific customers. MicroAPL seems to be diversifying into other areas, although they still update APLX periodically. I haven't seen much action on the open source front, although I have seen an open source APL project on Sourceforge. Is there any chance that the emergence of parallel architectures will spur a resurgence of interest in APL?
  • by PhrostyMcByte (589271) <phrosty@gmail.com> on Friday August 31, 2007 @02:12AM (#20421335) Homepage

    I don't write those fancy codecs, but I can immediately see where some of these instructions could come in handy - for instance, PCMOV and PTEST (packed cmov/test).

    The new instructions take up an extra opcode byte, but seeing how they will lower the amount of instructions you would otherwise do, I don't see that as a problem. The super instructions (like FMADDPS - Multiply and Add Packed Single-Precision Floating-Point) do more than just help the instruction decoder too - they mention "infinitely precise" intermediate voodoo for several of them which makes it seem like doing a FMADDPS instead of a MULPS,ADDPS will result in a more accurate result.

    There are new 16-bit floating point instructions too, which I can see as a boon for graphics wanting the ease of floating point and a little higher rounding precision than bytes with values between 0 and 255 would give, without the large memory requirements of 32-bit floating point.

  • by WoTG (610710) on Friday August 31, 2007 @03:39AM (#20421759) Homepage Journal
    My thought was that the long term plan is to integrate the GPU anyway (for one product line at least). While the GPU is RIGHT THERE, they will find a way to use of much of it as they can when it's not busy with 3D work... which for the average office environment is 95% of the time.

    Gamers can still buy addon graphics cards, of course.
  • by gnasher719 (869701) on Friday August 31, 2007 @09:53AM (#20423869)
    '' Can one of the cryptographers on slashdot comment on weather this is useful to them or not? ''

    One useful addition (copied from Altivec) is the vector permute instruction. What is clever about it in terms of cryptography is that you can translate a vector using a 256 byte translation table _without doing any memory access_ by using the vector permute instruction in a clever way. Now the execution time is completely data-independent, so one important attack vector is closed.
  • AMD just forked x86 (Score:2, Interesting)

    by RecessionCone (1062552) on Friday August 31, 2007 @11:47AM (#20425535)
    If you read the fine print, AMD is actually not implementing all of SSE4 on the Bulldozer chip which will be the first to include SSE5. This is disastrous - the SSE "brand" has always implied backwards compatibility: SSE1 contains MMX, SSE2 contains SSE1 & MMX, etc. etc. Now AMD is breaking this, since SSE5 chips will not include all of SSE4. AMD shouldn't have named these new extensions SSE5. As it is, they are forking the x86 instruction set, which is a bad thing for all of us.

    Here's some more information: http://www.anandtech.com/cpuchipsets/showdoc.aspx? i=3073 [anandtech.com]
  • by Paul Crowley (837) on Saturday September 01, 2007 @07:57AM (#20432781) Homepage Journal
    I've just paged through the spec PDF, and I can't work out for the life of me how these instructions help you implement AES. In normal implementations AES does sixteen byte-to-word table lookups per round and these lookups take nearly all the time; they also open up a host of vulnerabilities in side channel attacks. To avoid these lookups you have to have a way of doing the GF(2^8) arithmetic directly, and I can't see any way these instructions will help.

    Anyone got any guesses? Someone who understands Matsui's recent work on bitslice AES implementations better than I do? Will this implementation be resistant to lookup-based side channel attacks?

If a camel is a horse designed by a committee, then a consensus forecast is a camel's behind. -- Edgar R. Fiedler

Working...