Grand Unified Theory of SIMD 223
Glen Low writes " All of a sudden, there's going to be an Altivec unit in every pot: the Mac Mini, the Cell processor, the Xbox2. Yet programming for the PowerPC Altivec and Intel MMX/SSE SIMD (single instruction multiple data) units remains the black art of assembly language magicians. The macstl project tries to unify the architectures in a simple C++ template library. It just reached its 0.2 milestone and claims a 3.6x to 16.2x speed-up over hand-coded scalar loops. And of course it's all OSI-approved RPL goodness. "
Moore's Law has eroded the need for assembly (Score:1, Interesting)
I learned assembly long ago, still retaining a fair amount of it (80x86). There have been a few occasions where I've called upon its use, yeah twice in the last eight years... and that's about it.
Yes some people who write games are still concerne with assembly as are people in embedded markets. But those jobs, situations and skills are niche, much like the Win32 programming I used to do in the early 90's.
90% of IT jobs are with non-tech companies. Those situations are about the last place you will find anyone caring about something called "assembly language."
-M
Re:16X increase? (Score:2, Interesting)
What is curious is that if you are using a pre-Altivec proc (G3), it'll burn more CPU time while the same enhancement will be totally and natively supported by Altivec-enabled units : a 400MHz G4 Powerbook is enhancing these sytnhs more efficiently than an 800MHz G3.
I guess this was like the simultaneous operations that the ARM assembly language supports (e.g. both storing and rotating values in an operation)...
Re:Altivec (Score:3, Interesting)
I managed to pick up a ThunderIV last year with the DSP card, and had a run around with photoshop on it. It's impressive stuff. I have an iMac 350 here I also ran photoshop on, and while the 350 kicked the Thunder in a Quadra for many unaccelerated things, on those operations where the DSPs kicked in (and the card has those cool little LEDs to show just when it's happening) it could keep up with the iMac nearly neck & neck.
That's a 25MHz 68040 from 1992 and Thunder IVGX vs a 350MHz G3 from 2000. Very cool.
Black Art? Uh... (Score:4, Interesting)
The nice thing about altivec is that it has a C interface. You don't have to use assembly!
Take a look at this Apple tutorial [apple.com] to see how easy it is.
Autovectorization being add in GCC 4.0 (Score:5, Interesting)
GCC vectorizatoin project [gnu.org] (site seem offline atm) but the abstract from a recent GCC summit [gccsummit.org] is up.
Autovectorization Talk (google html view of pdf) [216.239.57.104]
Re:License issues (Score:2, Interesting)
Simple to understand; if you use it for free, you're expected to release your source code (i.e. the 'reciprocal' part of RPL). If you pay to use it, you don't have to release your source code.
--#voxlator
OS X Tiger will do it for you (Score:2, Interesting)
From the limewire... (Score:3, Interesting)
This project may be a step in the right direction. Benchmarks show that SIMD such as SSE/2/3 only provide a marginal speed increase. And meanwhile, the massively parallel computations done on graphics cards dwarfs anything SIMD claims to produce.
Perhaps we will see GFX manufacturers selling their technology to the CPU makers.
I forget the specifics, but a new GFX card can perform somewhere around 35 GFLOPS, while a 3.4Ghz P4(executing SIMD code) can only produce around 5-6GFLOPS at best.
With projects like Brook GPU emerging, the division of CPU and GFX processor may be narrowed significantly.
Ignorant submitter, or smart marketing? (Score:3, Interesting)
Or is this just another advertisement pretending to be a story, with the submitter trying to play ignorant about alternative Altivec and MMX libraries ?
liboil (Score:3, Interesting)
However in the future I can see things changing for the structure of the stardard PC.
At the moment in a high end machine you have the CPU, which is a scalar processor, a GPU, which is in essence a glorified vector processor (not just useful for graphics, as projects like GpGPU are showing us), and SIMD extensions to the CPU to allow it to do small amounts of vector processing.
Scalar processors are good for some things (branchy code) and vector processors are good for other things (very predictable parallel code). Having both is very useful.
I would say in the next 5-10 years we will see the GPU join together with the SIMD extensions to provide a seperate general purpose vector processor.
PCs will ship with two processors - one scalar, one vector. And everyone will be happy.
Now, whether this will be transparent to the programmer depends on how automatic code optimisation progresses over the next few years. Is Intel's icc auto vectorisation already good enough? Don't know.
Why? Altivec-optimized libraries supplied by Apple (Score:4, Interesting)
Not only that, but Apple's vecLib http://developer.apple.com/ReleaseNotes/MacOSX/ve
Content Addressable Parallel Processors (Score:3, Interesting)
Fortunately there is at least a little ongoing research [mit.edu].
The beauty of these processors is they integrate memory with computation so that the massive economies of scale we witness in memory fabrication apply to computation speeds as well so long as we can move toward relational rather than function computing as a paradigm. Fortunately this appears to be supported by the study of quantum computers, however those computers may never see the light of day for more fundamental reasons.
macstl vs. Blitz++ (Score:1, Interesting)
Re:16X increase? (Score:3, Interesting)
There are 32 of these registers (independent, not shared with the FPU) which means you can chain together a pretty complex series of calculations without intermediate load/store sequences. The unit has multiple independent computation units with their own dispatch queues (details vary between specific processor models). Some AltiVec opcodes are designed to common series of multiple scalar instructions.
The result is that speed ups of more than 16x are not at all rare. 30x is not uncommon in graphics manipulations; I would venture to say that 100x is "rarely the case."
Assembly lives! (Score:2, Interesting)