AMD Unveils SSE5 Instruction Set 85
mestlick writes "Today AMD unveiled its 128-Bit SSE5 Instruction Set. The big news is that it includes 3 operand instructions such as floating point and integer fused multiply add and permute.
AMD posted a press release and a PDF describing the new instructions."
Who cares... (Score:1, Insightful)
Re: (Score:2)
Well, I'm excited. I think. (Score:5, Insightful)
Re:...or are they just toys? (Score:5, Funny)
Re: (Score:2)
Re: (Score:3, Funny)
Nasty AMD added to it.
The better question is how the fuck did AMD get to write the next iteration of an Intel technology. Shouldn't it be AMD 3DNow!^2? This is like Apple deciding their next HFS filesystem will be versioned NTFS 7.0.
They can battle back and forth with version numbers and see who is first to get to 11, the version number where, for whatever reason, developers are forced to come up with a new versioning scheme. That will throw a wrench in the works. Take that Intel!
Re:Well, I'm excited. I think. (Score:4, Interesting)
I don't write those fancy codecs, but I can immediately see where some of these instructions could come in handy - for instance, PCMOV and PTEST (packed cmov/test).
The new instructions take up an extra opcode byte, but seeing how they will lower the amount of instructions you would otherwise do, I don't see that as a problem. The super instructions (like FMADDPS - Multiply and Add Packed Single-Precision Floating-Point) do more than just help the instruction decoder too - they mention "infinitely precise" intermediate voodoo for several of them which makes it seem like doing a FMADDPS instead of a MULPS,ADDPS will result in a more accurate result.
There are new 16-bit floating point instructions too, which I can see as a boon for graphics wanting the ease of floating point and a little higher rounding precision than bytes with values between 0 and 255 would give, without the large memory requirements of 32-bit floating point.
Re: (Score:2)
One of my pet peeves is statements like infinite precise
Re: (Score:1)
What I think you meant was, "How can the infinitely precise number be stored and accessed by a computer?" Well, that's not the same thing.
Re: (Score:1)
Re: (Score:2, Insightful)
Re: (Score:1)
Re: (Score:2)
Re:Well, I'm excited. I think. (Score:4, Informative)
The result will still eventually be stored back into a floating-point number. What it means for an intermediate computation to be infinitely precise is just that it doesn't discard any information that wouldn't inherently be discarded by rounding the end result.
When you multiply two finite numbers, the result has only as many bits as the combined inputs. So it's quite possible for a computer to keep all of those bits, then perform the addition with that full precision, and then chop it back to 32bits. As opposed to implementing the same operation with current instructions, which would be: multiply, (round), add, (round).
Re: (Score:3, Informative)
I'll give you an example. Lets say we are working with four decimal digits instead of 53 binary digits, which is what standard double precision uses. Any op
Re: (Score:3, Informative)
Re: (Score:2)
It's a couple links deep... (Score:5, Informative)
Read this interview with Dr Dobbs [ddj.com]:
I believe this helps gaming and other simulations.
And then we have the "holy shit" moment:
If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...
As for existing OS support, it looks promising:
So, if you're really curious, you can download SimNow and emulate an SSE5 CPU, try to boot your favorite OS... even though they say they're not planning to ship the silicon for another two years. Given that they say the GCC patches will be out in a week, I imagine two years is plenty of time to get everything rock solid on the software end.
Re:It's a couple links deep... (Score:4, Funny)
Re: (Score:2)
Backups. (Score:2)
Re: (Score:2)
Re:It's a couple links deep... (Score:4, Informative)
For example, the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension
If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...
They copied two important features from the PowerPC instruction set: Fused multiply-add (calculate +/- x*y +/- z in one instruction), and the Altivec vector permute instruction, which can among other things rearrange 16 bytes in an arbitrary way. The latter should be really nice for AES, because it does a lot of rearranging 4x4 byte matrices (if I remember correctly).
Re: (Score:2)
Any idea how this stacks up against VIAs Padlock?
AES - how is speedup achieved? (Score:3, Interesting)
Anyone got any guesses? Someone who understands Ma
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
no additional CX overhead (Score:2)
APL (Score:4, Funny)
So machine languages are APL-compatible these days.
Re: (Score:2, Interesting)
Matlab, Numpy, FORTRAN, ... (Score:1)
Cryptographer's Take? (Score:1)
(yes, I am paranoid... why do you ask? are you with the CIA?)
Re: (Score:2, Funny)
The weather (www.weather.com) is dependant on where you live and what specific time frame you are inquiring about, subject to the meteorologists report for that time frame and area.
??!!Whether???!! Hmmm... that's a whole different subject, but as I am with the CIA, why do you ask? Are you paranoid or something?
***Hmmm...jimXugle (921609)....posted....logging on server....LOGGED!
What was your question? We are from the government, we can help, honest!
Re: (Score:1)
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
One useful addition (copied from Altivec) is the vector permute instruction. What is clever about it in terms of cryptography is that you can translate a vector using a 256 byte translation table _without doing any memory access_ by using the vector permute instruction in a clever way. Now the execution time is completely data-independent, so one important attack vector is closed.
Can someone explain please (Score:1)
Re: (Score:2)
http://en.wikipedia.org/wiki/SIMD [wikipedia.org]
Re: (Score:3, Informative)
* Yes, PAE was a slight deviation from a 32 bit address space, but in userspace, it's 32 bit flat memory.
Re:Can someone explain please (Score:4, Informative)
In the x64 world, the general purpose registers are 64-bit wide. This also used to influence the width of the 'int' datatype in the C compiler, although I'm not sure that 'int' is a 64-bit integer when compiling x64 code.
Re: (Score:2)
That means my twelve-year-old HP48 calculator has a 64-bit processor, despite having a 4-bit bus and 20-bit addresses. :-)
32-bit Genesis before 16-bit Super NES? (Score:2)
Re: (Score:2)
It also states here [wikipedia.org] that a 16-bit architecture is one with a 16-bit data bus, address bus or register size. Perhaps the Motorola 68000 was never advertized as a 32-bit machine, because that sort of marketing
Bit count is still confusing (Score:2)
the Wikipedia article on the matter clearly states that the Motorola 68000 is a 16-bit architecture even though its general purpose registers, and basic arithmetic functions are 32-bit, simply because it has a 16-bit data bus.
It also states here [wikipedia.org] that a 16-bit architecture is one with a 16-bit data bus, address bus or register size.
Wouldn't that make the Super NES an 8-bit system? Its 65C816 CPU had 16-bit registers and an 8-bit data bus. And was the Nintendo 64 an 8-bit system because it used 8-bit RDRAM at a comparatively high clock rate for the time [wikipedia.org]?
Perhaps the Motorola 68000 was never advertized as a 32-bit machine, because that sort of marketing ploy was not exercised at the time?
Believe me, bit counts were the marketing ploy of the time.
Re: (Score:2)
Current "64-bit" CPUs have 128 bit memory busses -- that doesn't make them 128-bit.
Re: (Score:2, Informative)
Re: (Score:2)
The 68000 is a chip capable of performing 32bit arithmetic, but only able to load 16 bits at a time, therefore, it was most efficient to rely on 16bit values when possible (even though the extra 16 bits allowed you to do some neat tricks.) Later revisions of the 68000 exposed the entire 32bit data bus without changing the general architecture of the core. Those are clearly
Re: (Score:2)
Re: (Score:1)
Re:Can someone explain please (Score:5, Informative)
Technically, the "bit designation" of a platform is defined as the largest number on the spec sheet which marketing is convinced customers will accept as truthful. Seriously, over the years different processors and systems have been "16 bit" or "32 bit" for any number of odd and wacky reasons. for example, the Atari Jaguar was widely touted as a 64 bit platform, and the control processor was a Motorola 68000. The Sega Genesis also had a 68k in it, and was a 16 bit platform. The thing is, Atari's marketing folks decided that since the graphics processor worked in 64 bit chunks, they could sell the system as a 64 bt platform. C'est la vie. It's an issue that doesn't just crop up in video game consoles -- I just find the Jaguar a particularly amusing example.
But, yeah, having a CPU sold as one "bitness" and being able to work with a larger data size than the bitness is not unusual. The physical address bus width is indeed one common designator of bitness, just as you say. Another is the internal single address width, or the total segmented address width. Also, the size of a GPR is popular. On many platforms, some or all of those are the same number, which simplifies things.
An Athlon64, for example, has 64 bit GPR's, and in theory a 64 bit address space, but it actually only cares about 48 bits of address space, and only 40 of those bits can actual be addressed by current implimentations.
A 32 it Intel Xeon has 32 bit GPR's, but an 80 bit floating point unit, the ability to do 128 bit SSE computations, 32 bit individual addresses, and IIRC a 36 bit segmented physical address space. but, Intel's marketing knew that customers wouldn't believe it if they called it anything but 32 bit since it could only address 32 bits in a single chunk. (And, they didn't want it to compete with IA64!)
Tom, Jerry, and IOP (Score:3, Informative)
for example, the Atari Jaguar was widely touted as a 64 bit platform, and the control processor was a Motorola 68000.
The Jaguar had a 64-bit data bus, a 32-bit CPU "Tom" connected to the GPU, a 32-bit CPU "Jerry" connected to the sound chip, and a 32-bit MC68000 with a 16-bit connection to the data bus, used as an I/O processor (in much the same way that the PS2 uses the PS1 CPU). Some games ran their game logic on "Tom"; others (presumably those developed by programmers hired away from Genesis or Neo-Geo shops) ran it on the IOP. Pretty much only graphics operations ever used the full width of the data bus.
Re: (Score:2)
Please show us any example of a processor with a 64 bit address bus. I don't think there are any in existence.
What you mean is the width of logical addresses, which is something completely different.
Re: (Score:2)
Here [umd.edu] is a brief description of what SIMD is and what it can be used for:
Foundations for the GPU+CPU assimulation... (Score:5, Insightful)
It'll take a couple years for "SSE5" to show up in AMD chips... which happens to coincide nicely with their Fusion (combined CPU+GPU) product line plans.
Will Intel pick up on these instructions? Maybe not. Does that mean they die? No, the performance benefits for those areas where this will make the most difference will make it worthwhile. At the very least, AMD can sponsor patches to the most popular bits of OSS to earn a few PR points (and benchmark points).
Re: (Score:2)
Re: (Score:3, Interesting)
Gamers can still buy addon graphics cards, of course.
Sounds good... I hope (Score:2)
As it is, ATI/AMD is maybe less proprietary than nVidia, but their Linux support sucks. Intel, however, typically has very good support, even though it's entirely open drivers, and apparently not sponsored much by Intel itself.
Re: (Score:2)
I can't see that at all. Mostly they have been copying stuff that was present on PowerPC CPUs for ages, filled some obvious gaps in the SSE instruction set, and added s
OK, another PDF on a small subset ........ (Score:2)
Re: (Score:1)
http://developer.amd.com/devguides.jsp#Manuals [amd.com]
Re: (Score:2)
What about 256 bit? (Score:3, Insightful)
Sure multimedia & games use lower precision FP computations so 16b or 32b FP number is enough, but it's strange that AMD doesn't try to improve the usage for the scientific computation niche.
Maybe it's because the change would be expensive as to be efficient, the width of the memory bus should be expanded to 256b from 128b now.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
They stayed synched when going 64-bits, after all. They can compete on speed and features, but both would lose if they destroyed the x86 platform by becoming incompatible.
AMD just forked x86 (Score:2, Interesting)
Here's some more inf
I thought SSE was Intel's... (Score:2)
Re: (Score:1)
Re: (Score:2)
MMX/3DNow! are the early SIMD instructions which used FPU resources to reduce cost and maintain drop-in compatibility with Operating systems (The OS s
Does this impact molecular dynamics simulations? (Score:2)
For those who actually understand real molecular nanotechnology, aka "Drexlerian" nanotechnology, you may understand that one of the real "breakthroughs" comes when you can computationally simulate the function of a 4 to 8 million atom molecular nanoassembler. Because if you can simulate one and prove that it does not violate any laws of physics then one of the classical oppositions to real molecular nanotechnology falls [1]. The argument transitions entirely from "it can't work" (common among people or
Re:Does this impact molecular dynamics simulations (Score:2)
Re: (Score:2)
Re:Does this impact molecular dynamics simulations (Score:1)
As for designing the system that you want to simulate; the thing with microprocessors is that they're very modular. You can create a register, use it 256 or however many times, and there's your cache. Then you build the part that interfaces the rest of the CPU with that