AltiVec Unwrapped 38
paradesign writes "O'Reilly is running a nice article on AltiVec in the G4 chip. The article includes examples, with code, showing its effectiveness. For everyone who is uneducated as to exactly what Altivec is, this is a must read."
Good (Score:2, Informative)
Re:Good (Score:3, Funny)
How do they compare to the AltiVec in terms of speed, precision, cache in/out, etc.?
Oh! http://www.processor-emporium.co.uk [processor-emporium.co.uk] seems to be a good reference site....
Re:Good (Score:2)
Well, actually you are not, but that shouldn't keep you from trying. 2nd example of using AltiVec: FP vector multiply-add instruction - a no-show on SSE(2) and 3DNow!. 3rd example: relies on the fact that x[i] and y[i] vectors stay the same - which they don't on the x86 SIMD extensions. So in those examples we already have some of the differences between AltiVec and the lesser SIMDs, others are more registers and better instructions for shuffeling data. IOW again MHz isn't everything - as shown by e.g. dnet rc5 scores.
Re:Good (Score:2)
Let's review. Implementing Altivec requires a code rewrite. If your application lends itself to parallel processing, why rely on a single processor that executes 4 instructions at a time when you could use 6 processors, that are clocked 50% faster and most of the time execute 4 instructions in parallel and somtimes are reduced to two in comparison. You can still execute 6, 100% faster by clock speed at a given price. As long as you are going to have to rewrite your code, might as well rewrite it for a cluster.
So, in our example, we pit 3 dual processor 1533mhz athlon XPs against 1 800mhz G4. Price point is $1600
In one corner, you have a single bottom end apple G4 tower at 800 mhz.
800MHz PowerPC G4
256K L2
cache
256MB SDRAM memory
40GB Ultra ATA drive
CD-RW drive
ATI Radeon 7500
56K internal modem
In the other corner we have 3u of Dual processor athlon goodness.
3 tyan tiger AMD 760mp chipset motherboards @ $522.
6 1800XP Athlons @ $624 (yes they work).
3 256mb PC2100 registered ecc DDR ram @ $195.
3 1u cases w/300w power supplies @ $120.
3 40gb hard drives @ $162.
Price point is $1623.
Now rewrite your code.
Which takes 3 weeks, by which time Apple raises the price of the G4 another hundred dollars while the price of the cluster drops a hundred dollars.
Ok, that was a flame, let's stick to matters at hand.
Refrencing this article, the ars technica article and the c't article (you know which one I'm talking about, that place where you dare not look, you'll find x86 there staring back at you) we can draw these assumptions:
The G4 with Altivec performs equily clock for clock with x86 w/SSE with some rare exceptions where it performs 100% faster clock for clock.
best case scenario for our similar priced systems using your best case for the G4 benchmark, rc5:
Single G4 800mhz 8,243,188 keys per second
6 AMD 1800XP 32,987,538 keys per second
Same price, x86 is 4 times as productive.
Seti@home using Ars Lambchop benching wu: Identicle!
3.35 per work unit.
x86 is 6 times as productive for the same price.
CINT2000: base 648 - XP1800
CINT2000: base 242 - G4 800mhz
684 vs 242... and that is a single processor comparison!
If we can optimise to scale, x86 is 16 times as fast for the same price
If you know of any benchmarks where Mac can compare favorably for the price, please let us all know. You are right, Mhz is not everything. But you have to get some numbers to back the claim that the G4 is even marginally close in performance to machines with well over twice the clockspeed. I'm sure that will convince us all to run out and buy Macs for number crunching
Re:Good (Score:2)
Re:Good (Score:2)
Run away! (Score:1, Funny)
We did a simple run of elastic polymer equilibria (for nitrogen, of course) and the RAM sub-bus gave out on us after registering a temperature of 87 farads. So we backed off to a simple newtonian extrapolation using quadrature-integrated gaussian kinetics and while it worked the results are no more accurate than we sould have gotten from DOS 5 on a 386.
In short, unless you are planning to run it above the Antarctic circle, don't buy one.
Re:Run away! (Score:2)
Gladly! (Score:1, Informative)
You will also find faradic temperature measurement in such fields physical proton bombardment, torque pressurization and shotput.
Re:Gladly! (Score:2)
Re:Run away! (Score:1)
OpenApple (Score:2)
Re:OpenApple (Score:1, Informative)
Re:OpenApple (Score:1)
gcc 3.1 will have both sse and Altivect support in code.
Re:OpenApple (Score:1, Informative)
Re:OpenApple (Score:2)
Marvell [marvell.com] makes ATX boards with 1 or 2 7450s.
Motorola [motorola.com]Makes a very nice ATX board with 2 7450's on it. They also have the Sandpoint platform which you can use with many different PPC chips.
Merlancia [merlancia.com] seems to have some good stuff.
There's a bunch more too, Tundra, GMS, Force, just do a search on google. You'll likely find though that Apple has the best prices. If you want to play with a PPC (I'm assuming you want to do some low level stuff for fun or profit) you'll end up spending $1500 on just a board from somewhere else, or $1500 on a complete system from Apple. The Apple systems retain their value for a long time too.
Re:OpenApple (Score:2, Informative)
google [google.com] is your friend.
LinuxPPC AltiVec support? (Score:2)
Mostly out of curiosity (as I don't have a G4 on my desk anymore - it died), what does anyone know about the status of AltiVec support under LinuxPPC (as opposed to OSX, as discussed in the article)? A quick Google search indicates that Motorola made some patches for gcc a couple years ago, but that it wasn't exactly production quality.
There's a website [altivec.org] that supposedly has tools, but you have to register for their mailing list to see what they've got (and I get enough mail as it is).
-"Zow"
Re:LinuxPPC AltiVec support? (Score:1, Informative)
If the kernel knows about the registers, it can preserve them during context switches. I'd imagine this trivial kernel mod was done years ago.
As for general programming, you're right about gcc. There isn't much vectorisation in gcc (c.f. intel's cc which vectorises for SSE2 on PIV) so I (with unrealistic self-confidence as usual) set about writing a C library of vector, matrix, complext etc. functions to use the SIMD features of K6-2/3, Athlon, PIII, PIV and PPC a while ago, and to provide a plain C implementation for folks without SIMD. If you want to help, have a look here [sourceforge.net].
I've only done 3DNow and C so far for a small number of functions, but one or two people are already interested.
Re:LinuxPPC AltiVec support? (Score:1)
Important stuff like MPEG2 decoders have supported it for a while, either with hand-written assembly or using output from the Apple compilers.
Re:LinuxPPC AltiVec support? (Score:1)
GCC 3.1 has support for altivec extensions in C/C++ code, however the syntax is a little different from Motorola's altivec extensions which are used in MacOS. Apple are apparantly going to support both the old and new altivec syntax in their GCC 3.1 based compiler. This means that altivec code written with the new syntax should work unmodified on both Linux and Dawrin/MacOS X.
Check out the Ars Technica article (Score:2, Informative)
--Paul
implementation-specifit coding (Score:1)
thanks...
Re:implementation-specifit coding (Score:2)
Each altivec register is 128 bits.
You can use them as 4 32 bit integers, 4 32 bit floats, 8 16 bit integers, or 16 8 bit integers.
There is a lot of information on altivec.org
Jeff