Grand Unified Theory of SIMD 223
Glen Low writes " All of a sudden, there's going to be an Altivec unit in every pot: the Mac Mini, the Cell processor, the Xbox2. Yet programming for the PowerPC Altivec and Intel MMX/SSE SIMD (single instruction multiple data) units remains the black art of assembly language magicians. The macstl project tries to unify the architectures in a simple C++ template library. It just reached its 0.2 milestone and claims a 3.6x to 16.2x speed-up over hand-coded scalar loops. And of course it's all OSI-approved RPL goodness. "
Moore's Law has nothing to do with assembly (Score:2, Insightful)
Moore's Law has eroded the need for assembly
Moore's Law has nothing to do with assembly language and optimizations. From Wikipedia [wikipedia.org]:
Moore's law is an empirical observation stating, in effect, that at our rate of technological development and advances in the semiconductor industry, the complexity of integrated circuits doubles every 18 months.
I wish people would stop saying "But Moore's Law..." for every hardware-related story on Slashdot. Do a bit of reading, please.
Assembly (Score:3, Insightful)
Even in embedded systems, assembly isn't used as much as it used to. It still get used in bootloaders, and sometimes in device drivers. However, most devices are memory mapped, and most of the driver is written in C, and asm() calls are made when appropriate (eg, asm("eieio");), especially when you get to use gcc and asm() syntax for accessing variables.
The future (Score:4, Insightful)
The way forward is turning the CPU (of a traditional) architecture into a Nanny for a range of various dedicated processing units. IBM saw this years ago, and thus began the whole Cell architecture - but I suspect that their job was much easier. The software that would run on the platform they are designing is fairly specific - games & multimedia which usually lend themselves well to vectorization.
The real challenge for architects (in my humble opinion) is translating will be applying the same technique to other system bottlenecks.
AMD's (and now Intel's) approach of crambing more and more processing cores onto an IC might pay off in the short term, but like the "free lunch" of clock speed, will hit a roadblock when issues like memory bandwidth and caching schemes just have too much work to do with 4 or 8 processing cores hacking at it all the time.
Re:Too expensive? (Score:2, Insightful)
Once you've payed a $30 dollar/hour developer for 10 days work, you've forked out ~ $2,500...
--#voxlator
Re:Moore's Law has eroded the need for assembly (Score:2, Insightful)
Sure, you could probably get it to work even faster with hand-tuned assembly than simply using this library. But programmer time is expensive, and customizing code adds complexity. By reusing optimized code, you can enjoy some of the benefits of SIMD without having to devote the same amount of resources.
Let's be honest, this isn't a silver bullet - this isn't going to speed up code that doesn't use lots of floating-point vectors anyway. But if it does... (nearly) free performance is always a good thing.
Depends on what you are doing (Score:5, Insightful)
Faster computers means better simulations. BUT, if the code is not as fast as it can be on a particular architecture, your simulations are not going to be as complete as they can be. At least within a given time allotment.
I've recently applied some code optimizations to a Monte Carlo simulation and saw speed ups of over 1000x. That's significant.
It's naive to think that faster computers means we should live with sloppy or unoptimized code. SIMD is a useful technique, and if it means the difference between me getting work done in a week or two or three weeks, I think I'll take the one-week sim.
Re:Isn't it what std::valarray is for? (Score:3, Insightful)
That's exactly what this is. If you read the part on his website about valarray [pixelglow.com] then you'll see that it does extensive SIMD optimizations for valarray for both Altivec and MMX/SSE/SSE2/SSE3 platforms. He's even added "parallelized algorithms such as integer division, trigonometric functions and complex number arithmetic" which you'd have to code yourself in either assembly or using the C-based intrinsics if you wanted do the SIMD programming by hand.
So basically, this allows you to code using std::valarray using normal C++ and then plug this in under the hood to get a nice speed boost.
--
Join the Pyramid - Free Mini Mac [freeminimacs.com]
Why limit yourself to Altivec when you have NVidia (Score:4, Insightful)
Re:Moore's Law has eroded the need for assembly (Score:4, Insightful)
Moore's Law has eroded the need for such knowledge
Moore's "law" (which is just an off-the-cuff observation, really) has nothing to do with this. If anything, Moore's law has enabled transistor and space devouring SIMD technology.
It would be like concerning myself on how to design circuits...
No, it's nothing like that at all. Just because you own and know how to use money doesn't mean there is no point to the complex financial reckonings that are made every day at institutions all over the world. You may not need, but you is not under discussion.
Yes some people who write games are still concerne with assembly as are people in embedded markets. But those jobs, situations and skills are niche
By this definition, everything is niche. The whole computing industry becomes "niche". Farming is "niche". The paper industry is "niche". What you're describing is just non-descript white collar administrative work which just happens to involve a computer; bit shuffling, rather than paper shuffling.
Those situations are about the last place you will find anyone caring about something called "assembly language."
Again, completely irrelevant.
The point is that with a few dozen lines of SIMD code (whether in assembly or some high level language) any reasonably competent programmer can achieve four-fold, ten-fold, even twenty-fold speedups on critical path code, from scratch, in as little as a week.
These are amazing results, and people should be encouraged to investigate the possibilities, not be dragged down into this drab netherworld of yours.
Re:faster? Bogus.... (Score:1, Insightful)
its been tested - get a program that converts assembler to c and then recompile with optimisation - it *will* run faster.
the only exceptions are where the compiler lacks an algebraic or RTL awareness of an instruction on a specific architecture.
jxxx
Re:More AltiVec Goodness (Score:3, Insightful)
Re:Why limit yourself to Altivec when you have NVi (Score:3, Insightful)
Re:Depends on what you are doing (Score:3, Insightful)
Nope. Technically, there are two constant burried in here. The definition is g(x) = O(f(x)) => g(x) <= k*f(x) where x > a for some orbitrary a. If you don't change algorithms, all you can do is manipulate the k. For a given k and a given level of improvement, I can give you a new k that hits that level of improvement.
Honestly: TO be able to get a 1000 times boost, your original code must have been beyond bullshit.
Also, his original code may have been "bullshit" but it may not have. It depends a lot on the algorithm in question. The higher the exponent on an exponential algorithm, they more sensitive its running time is to some optimization in an inner loop.
And of course using simd is better than not using it, but i would rather stay on a "let the compiler vectorize it" level. I mean, doing your inner loop in leet assambler only to NOT know after a long simulation if ther results are real or you just botched some line isnt worth it.
This is a simple matter of economics. There's a cost/benifit to expending the effort to optimize in assembly. If the compiler generates good code, then obviously, the cost/benefit of recoding in assembly is pretty high. However, without specific knowledge of *HIS* economics, I would suggest that you not spout off.