Using GPUs For General-Purpose Computing 396
Paul Tinsley writes "After seeing the press releases from both Nvidia and ATI announcing their next generation video card offerings, it got me to thinking about what else could be done with that raw processing power. These new cards weigh in with transistor counts of 220 and 160 million (respectively) with the P4 EE core at a count of 29 million. What could my video card be doing for me while I am not playing the latest 3d games? A quick search brought me to some preliminary work done at the University of Washington with a GeForce4 TI 4600 pitted against a 1.5GHz P4. My Favorite excerpt from the paper:
'For a 1500x1500 matrix, the GPU outperforms the CPU by a factor of 3.2.' A PDF of the paper is available here."
Not the Point (Score:-1, Insightful)
As has been said many time before ... (Score:5, Insightful)
178 Million in the P4EE (Score:5, Insightful)
In all of this, keep in mind that there's computing and there's computing...the kind of computing power in a GPU is excellent for doing the same numeric computation to every element of a large vector or matrix, not so much for branchy decisiony type things like walking a binary tree. You wouldn't want to run a database on something structured like a GPU (or an old vector-processing Cray), but something like a simulation of weather or molecular modeliing could be perfect for it.
The similarities of a GPU to a vector processing system bring up an interesting possibility...could Fortran see a renaissance for writing shader programs?
Re:Not the Point (Score:4, Insightful)
What's relevant is that to the processor on a graphics card, its dedicated purpose is simply a bunch of logic. There's no dedicated "this must be used for pixels only, all else is waste" logic inherent in the system. there are MANY purposes for which the same/similar logic that applies in generating 3D imagery can be used, and that seems the purpose of this paper. Run THOSE type operations on the GPU. Some things they won't be able to do well no doubt - but those they can, they can do extremely well.
This is BIG (Score:5, Insightful)
Don't miss the point that this is not intended for general purpose computing. Don't port OoO to the graphics chip.
Where it is huge is in signal processing. FPGAs have begun replacing even the G4s in this area recently because of the huge gains in speed vs. power consumption an FPGA affords. However, FPGAs are not bought and used as is, and end up costing a significant amount (of development time/money) to become useful. Being able to use these commodity GPUs for vector processing creates a very desirable price/processing power/power consumption option. If I were nVIDIA or ATI, I would be shoveling these guys money to continue their work.
When... (Score:3, Insightful)
Not the Point-headbanger. (Score:1, Insightful)
Maybe time for a new generation of math-processor? (Score:4, Insightful)
Maybe it's time to start making co-processing add-on cards for advanced operations such as matrix mults and other operations that can be done in parallell on a low level. Add to that a couple of hundred megs of RAM and you have a neat little helper when raytracing etc. You could easily emulate the cards if you didn't have them (or needed them). The branchy nature of the program itself would not affect the performance of the co-processor since it should only be used for calculations.
I for one would like to see this.
Re:Not the Point-headbanger. (Score:4, Insightful)
On those operating systems that require them, that could very well be.
Still makes a nice thought that a linux box without even X installed, but a kickass graphics card, could crunch away doing something 4 times quicker than any windowed machine.
Bass Ackwards? (Score:5, Insightful)
While it's true that general purpose hardware will never perform as well as or as efficiently as a design specifically targeted to the task (or at least it better not), it is also equally as true that eventually general purpose/commodity hardware will achieve a price-performance point where it is more than "good enough" for majority.
Re:178 Million in the P4EE (Score:5, Insightful)
IMHO, the perfect friend is someone interested in maximum performance and knows how to program and knows something about computer hardware.
Have you looked at fortran 90, 95 or 2000?
Violation of Compartmentalization (Score:2, Insightful)
Re:Wow (Score:5, Insightful)
Re:Maybe that's the answer... (Score:3, Insightful)
In any case, why do you believe all of Apple's conveniently high numbers, but you don't believe Spec numbers reported by Dell, AMD, etc.? These are not numbers pulled out of a hat; they are standard Spec results. Thus, the numbers should be comparable from company to company. But Apple retested other companies' products and released new numbers without properly optimizing for the x86. Why is it when Microsoft pays for benchmarks, people freak out, but when Apple PERFORMS benchmarks, people believe them instantly?
There are plenty of other links out there that provide similar information. It is patently false advertising for Apple to claim that they use the fastest chip of any PC.
Oh, and re: the Linux issue, you're right. But you'll find that the x86 is faster in Linux with a proper optimizing compiler.
My issue is basically that at best -- at best! -- the results are inconclusive. At worst, Apple blatently lied. It's foolish to believe Apple blindly just because they're the underdogs and produce a pretty, Unix-based OS. And it's foolish to hold this strange hatred for all that is x86. I don't understand this mentality.
AGP read latency not important when not real time. (Score:3, Insightful)
Re:Unused computing Power? (Score:5, Insightful)
a) Not equal. Apples and oranges. A GPU will do repeated calculations very, very fast, like matrix transforms and the like. A CPU on the other hand will make decisions based on input, rather than just crunching numbers
b) The main display (the GUI) already uses many tricks on the graphics card. The hard part is making sure that all graphics cards support the features. Things like the xrender extension and such are becoming more common as graphics cards and drivers get "standard" capabilities
c) Your imagination is the limit as to what it could be used for. Just realize that it's a good data processing unit, not a good program execution unit. Use each for their strengths.
d) Modified? With new cards/drivers, all it takes is OpenGL calls to start taking advantage of this power. All it really takes is someone who knows what they're doing and has a bit of inspiration.
I think I speak for many of us (Score:5, Insightful)
Sorry for the flames, but seriously, I get so damn sick of all the "all new games suck" whiners. Look, there are legit reasons to want new technology. It is nice to have better graphics, more realistic sound, etc. It is NICE to have game that looks and sounds more like reality. Yes, that doesn't make the game great, but that doesn't mean it's worthless.
What's more, don't pretend like all modern games suck while old games ruled. That's a bunch of bullshit. Sure, there are plenty of modern games that suck, but guess what? There are tons of old games that suck too. Thing is, you just tend to forget about them. You remember the greats that you enjoyed or heard about, the ones that helped shape gaming today. You forget all the utter shit that was released, just as is released today.
So get off it. If you don't like nice graphics, fine. Stick with old games, no one is forcing you to upgrade. But don't pretend like there is no reason to want better graphics in games.
Re:What comes next. (Score:3, Insightful)
Strange because it is a big problem for using GPU as coprocessors: usually scientific computation use 64bit floats or on Intel 80-bit floats!
Re:178 Million in the P4EE (Score:4, Insightful)
How do you know? In fact, modern GPUs require a large amount of small scattered memory blocks. Texture caches, FIFOs for fragment/pixels/texels when they are not in sync, caches for vertex shader and pixel shader programs etc etc..
More recent GPUs are notorious for their incredibly long latencies. Long latencies imply that a lot of data has to be stored in chip..
Re:Maybe time for a new generation of math-process (Score:5, Insightful)
There is a certain overhead because a communications protocol is to be established between the main processor and the co-processor. For simple tasks the main processor often stops and waits for the co-processor to complete the task and retrieves the results. For more complicated tasks, the main processor continues but later an interrupt occurs that the main processor must service.
You must be very careful or the extra overhead of this communication makes the execution of the task slower than without the co-processor. This is certainly going to happen at some time in the future, when you increase central processor power all the time but keep using the same co-processor.
For example, your matrix co-processor needs to be fed the matrix data, start working, and tell it is finished. Your performance would not only be limited by the processor speed, but also by the bus transfer rate, and by the impact those fast bus transfers have on the CPU-memory bandwidth available and the on-CPU cache validity.
When you are unlucky, the next CPU you buy is faster in performing the task itself.
Re:I think I speak for many of us (Score:5, Insightful)
There's something that's always puzzled me a little about this site - attached to every single article about some new piece of PC tech - a faster processor, better graphics card, etc - there are a number of comments bemoaning the advance. All of them saying that people don't need the power/speed they have already, that they personally are just fine with 4 year old hardware, or, in this case, that better graphics don't make for better games. Hell, the same is true for mobile phones - I've lost count of the number of comments bemoaning advances in them, too.
It's funny, but I thought this was supposed to be a site for geeks; aren't geeks supposed to *like* newer, better toys?
To get back on topic - no, better graphics are not sufficient for a better game. However, if the gameplay is there, then they can certainly make the experience more enjoyable. Would Quake have been as much fun if it was rendered in wireframes?
Better graphics help add to the sense of realisim, making the game a more immersive experience. The whole point of the majority of games is entertainment and (to an extent) escapism. Additionally, what a lot of people like the grand-parent poster seem to forget is that most of the big-name game engines are licensed for use in a number of games. Let people like id spend their time and money coming up with the most graphically intensive, realistic engine they can. Think Doom 3'll suck because the gameplay will be crap? Fine, then wait for someone to license the engine and create a better game with it. In the meantime, please shut up and remember that there are those of us who like things to be pretty, as well as useful/well made/fun/(good at $primaryPurpose)
Good graphics on their own won't make a good game, but they will help make a good game great.
Re:Violation of Compartmentalization (Score:4, Insightful)
No, having a CPU that does everything is what violates the tenet.
I don't know about you, but I don't have a chip that does my video processing for me, I don't have a chip that does all the encryption for me, I don't have a chip that handles (en/de)capsulating network traffic, as well as handing interrupts and routing.
Having a second processor that does some specialized work that a CPU isn't good at is an improvement, not a nightmare. I'd love to be able to plug in a chips or two into my PC and have them do better-than realtime MPEG-4 encoding that doesn't affect my processor at all... Who wouldn't?
Re:Maybe time for a new generation of math-process (Score:2, Insightful)
It would be much more efficient if you would implement an co processor with an FPGA. First programming the FPGA what functions to execute. And then feeding the data to it, when the calculation is completed you just reprogram it to become whatever you want.
This way you would not have an math only board, but a board that could perform many many functions. You just need to write algorithms to exploit them.
Re:Link to previous discussion on same/similar sub (Score:4, Insightful)
Microsoft can afford to be lazy with their products, they make money either way. I don't think that will last forever though. Sometimes they do try hard, NT for example, but then they pile a bunch of poorly designed stuff to go on top of it and that ruins it. If you can, check out OS X's directory structure, it's beautiful. Now compare that to Window's cryptic system...
"Microsoft, as usual, announced the feature after Apple shipped it"
"God I'm tired of hearing that phrase over and over again when 95% of the time it's just because Apple can control the hardware and it would be a total disaster if MS included a technology as fast as they do..."
Re:Audio DSP (Score:4, Insightful)
No, it's just the way that the OpenGL and DirectX API's evolved. There never was any need in the past to have a substantial data feedback. The only need back then was to read pixelmaps and selection tags for determining when an object had been picked.
All very impressive, but.... (Score:4, Insightful)
However I do know that a lot of people had been wondering about this for a while, could it be done, and was it worth attempting, so now we know. Maybe we shall soon see PCI cards containing an array of GPUs, I imagine the cooling arrangements will be quite interesting!
There are other things which are faster than a typical CPU, are not some of the processors in games machines 128-bit? Again, you could in theory put some of these together as a co-processor of some sort.
This was a good piece of work technically, but it says something about society that the fastest mass-produced processors, whether for GPUs or games consoles, exist because people want a higher frame rate in Quake. I can't think of any professional application that needs really fast graphics output, but many that could use faster processing. So why can't Intel and AMD stop putting everything in the one CPU (multiple CPUs with one memory are not really much better), and make co-processors again, which will do fast matrix operations on very large arrays, etc, for those who need them? The ultimate horror of the one CPU philosophy was the winmodem and winprinter, both ridiculous. Silicon is in fact quite cheap, as Nvidia have proved, people's time while they wait for long calculations to finish is not.
Maybe we are going to see an architectural change coming, I expect it will be supported by FOSS long before Longhorn, just like the AMD64.
Ever heard of PCI Express? (Score:2, Insightful)
Re:Commodore 64 (Score:3, Insightful)
Re:Altivec (Score:1, Insightful)
Now, compared to GPUs, I think SIMD instructions suck. Why do you think 3D games utilize GPUs than Altivec or SSE2? In general, you can't compare the performance a part of a general utility chip to a specifically designed chip tuned to gain the highest performance without having to worry about trade-offs.
Re:The day is saved (Score:5, Insightful)