## AMD Unveils SSE5 Instruction Set 85 85

mestlick writes

*"Today AMD unveiled its 128-Bit SSE5 Instruction Set. The big news is that it includes 3 operand instructions such as floating point and integer fused multiply add and permute. AMD posted a press release and a PDF describing the new instructions."*
## It's a couple links deep... (Score:5, Informative)

Read this interview with Dr Dobbs [ddj.com]:

I believe this helps gaming and other simulations.

And then we have the "holy shit" moment:

If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...

As for existing OS support, it looks promising:

So, if you're really curious, you can download SimNow and emulate an SSE5 CPU, try to boot your favorite OS... even though they say they're not planning to ship the silicon for another two years. Given that they say the GCC patches will be out in a week, I imagine two years is plenty of time to get everything rock solid on the software end.

## Re:Can someone explain please (Score:3, Informative)

* Yes, PAE was a slight deviation from a 32 bit address space, but in userspace, it's 32 bit flat memory.

## Re:Can someone explain please (Score:4, Informative)

In the x64 world, the general purpose registers are 64-bit wide. This also used to influence the width of the 'int' datatype in the C compiler, although I'm not sure that 'int' is a 64-bit integer when compiling x64 code.

## Re:Can someone explain please (Score:5, Informative)

Technically, the "bit designation" of a platform is defined as the largest number on the spec sheet which marketing is convinced customers will accept as truthful. Seriously, over the years different processors and systems have been "16 bit" or "32 bit" for any number of odd and wacky reasons. for example, the Atari Jaguar was widely touted as a 64 bit platform, and the control processor was a Motorola 68000. The Sega Genesis also had a 68k in it, and was a 16 bit platform. The thing is, Atari's marketing folks decided that since the graphics processor worked in 64 bit chunks, they could sell the system as a 64 bt platform. C'est la vie. It's an issue that doesn't just crop up in video game consoles -- I just find the Jaguar a particularly amusing example.

But, yeah, having a CPU sold as one "bitness" and being able to work with a larger data size than the bitness is not unusual. The physical address bus width is indeed one common designator of bitness, just as you say. Another is the internal single address width, or the total segmented address width. Also, the size of a GPR is popular. On many platforms, some or all of those are the same number, which simplifies things.

An Athlon64, for example, has 64 bit GPR's, and in theory a 64 bit address space, but it actually only cares about 48 bits of address space, and only 40 of those bits can actual be addressed by current implimentations.

A 32 it Intel Xeon has 32 bit GPR's, but an 80 bit floating point unit, the ability to do 128 bit SSE computations, 32 bit individual addresses, and IIRC a 36 bit segmented physical address space. but, Intel's marketing knew that customers wouldn't believe it if they called it anything but 32 bit since it could only address 32 bits in a single chunk. (And, they didn't want it to compete with IA64!)

## Tom, Jerry, and IOP (Score:3, Informative)

## Re:Well, I'm excited. I think. (Score:4, Informative)

The result will still eventually be stored back into a floating-point number. What it means for an intermediate computation to be infinitely precise is just that it doesn't discard any information that wouldn't inherently be discarded by rounding the end result.

When you multiply two finite numbers, the result has only as many bits as the combined inputs. So it's quite possible for a computer to keep all of those bits, then perform the addition with that full precision, and then chop it back to 32bits. As opposed to implementing the same operation with current instructions, which would be: multiply, (round), add, (round).

## Re:It's a couple links deep... (Score:4, Informative)

For example, the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension

If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...

They copied two important features from the PowerPC instruction set: Fused multiply-add (calculate +/- x*y +/- z in one instruction), and the Altivec vector permute instruction, which can among other things rearrange 16 bytes in an arbitrary way. The latter should be really nice for AES, because it does a lot of rearranging 4x4 byte matrices (if I remember correctly).

## Re:32-bit Genesis before 16-bit Super NES? (Score:2, Informative)

## Re:Well, I'm excited. I think. (Score:3, Informative)

I'll give you an example. Lets say we are working with four decimal digits instead of 53 binary digits, which is what standard double precision uses. Any operation will behave as if it calculated the infinitely precise result and then rounded it. For example, any result x that is in the range 1233.5 = x = 1234.5 with infinite precision will be rounded 1234.

Now lets say we calculate x * y + z with infinite precision and round. We have x = 2469, y = 0.5, and z happens to be 0.00000000001. So x * y = 1234.5, x *y + z is just a tiny bit larger, so the result has to be rounded up to 1235. To do this right, you need x * y with infinite precision. Knowing twelve decimals wouldn't be enough. If I told you "x * y equals 1234.50000000 with twelve digit precision", you wouldn't know how to round x * y + z. x * y could be 1234.499999996, and adding z would still be less than 1234.5, so it needs to be rounded down. Or x * y could be 1234.500000004, and x * y + z needs to be rounded up.

That is meant by "infinite precision": The processor guarantees to give the same result _as if_ it would use infinite precision for the calculation. In practice, it doesn't use infinite precision. About 110 binary digits precision is enough to get the same result.

## Re:Well, I'm excited. I think. (Score:3, Informative)

intermediate. You don't get a result of infinite precision, you get a 32-bit result (since the parent mentioned single-precision floating point). But it carries the right number of bits internally, and uses the right algorithms, so that the result isas ifthe processor did the multiply and add at infinite precision,and thenrounded the result to the nearest 32-bit float. Which is better than the result you would get by multiplying two 32-bit floats into a 32-bit float, then adding that to another 32-bit float into a 32-bit float. You're limited to 32 bits at all times and therefore you have intermediate precision loss.Making sense now?