BrookGPU: General Purpose Programming on GPUs 275
An anonymous reader writes "
BrookGPU is a compiler and runtime system that provides an easy, C-like programming environment (read: No GPU programming experience needed) for today's GPUs. A shader program running on the NVIDIA GeForce FX 5900 Ultra achieves over 20 GFLOPS, roughly equivalent to a 10 GHz Pentium 4. Combine this with the increased memory bandwidth, 25.3 GB/sec peak compared to the Pentium 4's 5.96 GB/sec peak, and you've got a seriously fast compute engine but programming them has been a real pain. BrookGPU adds simple data parallel language additions to C which allow programmers to specify certain parts of their code to run on the GPU. The compiler and runtime takes care of the rest. Here is the Project Page and Sourceforge page."
High Performance for General Purpose? (Score:4, Interesting)
Re:High Performance for General Purpose? (Score:4, Informative)
A real-world example - ray tracing (Score:4, Informative)
http://www.theregister.co.uk/content/54/25312.htm
http://online.cs.nps.navy.mil/DistanceEducation/o
Re:High Performance for General Purpose? (Score:5, Insightful)
Re:High Performance for General Purpose? (Score:2)
Re:High Performance for General Purpose? (Score:4, Interesting)
HP for GP?-AGP Bottleneck. (Score:2, Interesting)
Re:HP for GP?-AGP Bottleneck. (Score:5, Insightful)
That said, if you can fit your data sets and your program on to the video memory (128MB isn't uncommon on high-end), and you're doing lengthy calculations on these sets while being only interested in the results (again, not uncommon in HPC), then the relative slowness of reading these results back becomes a nonissue.
Does that help?
Re:HP for GP?-AGP Bottleneck. (Score:2, Insightful)
It will be replaced by PCI-Express, which as a general purpose bus supposely won't have these issues.
Re:HP for GP?-AGP Bottleneck. (Score:3, Informative)
Re:HP for GP?-AGP Bottleneck. (Score:3, Insightful)
Re:High Performance for General Purpose? (Score:5, Interesting)
Re:High Performance for General Purpose? (Score:3, Informative)
Re:High Performance for General Purpose? (Score:3, Informative)
Those are 4-component (RGBA) types, with 32, 16, and 24 bits per component, respectively.
None of them are enough for double floats, and none of them are good enough for 80-bit reals that x87 uses.
Re:High Performance for General Purpose? (Score:3, Insightful)
It depends. If you have a gaming card, it will sacrifice precision for speed to hit its price. If you're rendering 100 fps in a game and in a couple of noncontiguous frames the walls don't quite line up, no big deal. But a professional CAD card, speed is sacrificed for precision - the risk of an engineer making a mistake or failing to spot one in an assembly alignment because of rendering artefact is too high.
In practice, a CAD card is just as fast as a gam
Re:HP for GP?-Fakeout. (Score:4, Informative)
Well, ISV certification - a CAD vendor will assert "with this card, our software produces no rendering artifacts".
Re:High Performance for General Purpose? (Score:4, Interesting)
I especially like the idea that the GPU and CPU can work together on the task. If the GPU was handling neuron tasks and the CPU was handling other necessary tasks we could get a very big boost to desktop AI
TW
Re:High Performance for General Purpose? (Score:4, Insightful)
Of course NN can be used for "graphics-related things", such as image recognition, but not only image, for example voice recognition. And not only recognition, for example forecasting on huge sequences with explicit and implicit (hidden) side-factors.
Stock market trader on GPU, anyone?
Cool, but (Score:3, Interesting)
I mean, you probably just can't run any kind of algorithm on there can you?
Good point. (Score:5, Insightful)
Makes sence of course as that is what a GPU is all about. (Yes I'm vastly over-simplyifying here.) So I would gather that it might be used for types of data that are streamed alot? Maybe used for video editing, real time video, etc where your trying to deal with a lot of data at once that your trying to move around and not just store or have to perform some more complicated types of functions upon.
However, I'm no 3d programmer and I should would love a more detailed analysis of the potentals for this.
Re:Cool, but (Score:4, Informative)
Probably. I should imagine it has local storage with the corresponding fetch and store instructions, basic math, and ability to jump to arbitrary points in the shader program, which makes it very much turing complete. Everything else is a matter of a compiler backend. Bus latency would be an issue, so it'd be painful for programs that need a lot of I/O, but that's not an issue for a lot of programs.
GPU opcodes (Score:4, Informative)
Basically like having two processors... (Score:4, Interesting)
Re:Basically like having two processors... (Score:2, Offtopic)
Re:Basically like having two processors... (Score:3, Interesting)
I also seem to recall certain music pieces that could play extra parts by blanking the screen. There was also a really cool 9 second sample of 'You really got me' - the Van Halen version - and it blanked the screen to play it.
Wow! Them were the salad days!
Re:Basically like having two processors... (Score:2)
I coded a basic multi-tasker that would allow different threads of a basic program to be run at the same time for the Commodore. It got confusing if you tried to modify a variable in more than 1 spot. It was more fun to play with than really practical.
Re:Basically like having two processors... (Score:4, Interesting)
Cool ... (Score:5, Interesting)
Oh, suddenly, that 'game investment' also gives you a few 100 extra voices of polyphony?
Sweet
Re:Cool ... (Score:2, Informative)
isn't it much more interesting to do things that were not possible before, than to just do the some thing, but in increased quantity? Also convolution is the single most universal operation in audio dsp (fir filters, reverb), one well-built plugin would suffice for everything. synth development creativity would certainly suffer from the increased development costs.
Re:Cool ... (Score:2, Interesting)
To me it doesn't just mean Virtual Analog, or subtractive... it can be anything that makes noise
Its all good. Lets see what the GPU's can do
first link is incorrect (Score:5, Informative)
Like the good old days (Score:5, Funny)
I'm sure a lot of old farts will tell me how they used some serial controller to compute stuff back in the 60's and that I'm just a little kid.
Re:Like the good old days (Score:3, Informative)
Re:Like the good old days (Score:3, Informative)
Reminds me of the good old days when you used the processors in the C64 tapedrive to compute stuff. Wouldn't want to waste those precious cycles.
Actually it was the old 1540/1541 and later 1571/1581 disk drives. The tape drive did not have a processor in it.
wait a minute (Score:5, Interesting)
wait, if there is a technology that allows construction of GPU that is 3 times faster than the fastest CPUs, why Intel and AMD do not use this technology to build those 3times faster CPUs?
are you sure that you can compare the speed of GPU and CPU?
Re:wait a minute (Score:2, Informative)
A shader program
The GPU is designed for CG, not for 'general purpose computing'.
I guess the instruction set is pretty limited too.
Re:wait a minute (Score:2, Informative)
It does not mean you can use the GPU as a general purpose prossessor effectivly, or that it is even turing complete.
All it means is that certain types of programs could possibly run 3 times faster if ported to this system.
Re:wait a minute (Score:2, Informative)
A GPU on the other hand can do only so much. But its strength lies in areas where the CPU lags. Fast memory interfacing, extreme parallelization etc.
Now there exist cmoputing problems that can be solved very efficiently on the GPU, even with its limited instruction set. This is what this project is all about - to provide a generic programming language that compiles to a vertex/pixel shader
Re:wait a minute (Score:5, Informative)
It may be ok to compare the speed of a GPU and a CPU if they are infact different. If a GPU was a CPU used with cheaper material, yeah, it would be unfair. But as life goes, they both have their merits.. so why not? A GPU is prolly best at some matrix math transforms.. or not.
Re:wait a minute (Score:5, Insightful)
Also, good point about comparing GHz to GHz - AMD CPUs do more per cycle than Intel, but are also clocked much lower. You could look at a subset of instructions (ie: FLoating-point OPerations (FLOPS)) but this only gives you a piece of the overall performance picture.
Without having read the article, my guess is they extrapolated (educated, math-based guess) how fast a 10GHz P4 would perform and compared the results that way.
I'd LOVE to see this tech built into a SETI or Folding@Home client (steroids version). (Imagine the kids - "Mom, I need the Radeon 9800XT to find a cure for Grandma's cancer!")
remember those 3dfx tv ad's... (Score:2, Funny)
Re:wait a minute (Score:3, Interesting)
Re:wait a minute (Score:5, Informative)
Re:wait a minute (Score:4, Interesting)
But you forget the 256MB (at least) RAM on a steaming fast interface that you get with the GeForce... It makes the P4s' cache look pretty paltry in size by comparison.
MP
Re:wait a minute (Score:5, Informative)
are you sure that you can compare the speed of GPU and CPU?
Well, yes and no. In the same way you can take a render farm and say that "this provides the equivalent of a 100GHz Pentium" Which might be true, for that specific task. You see it already between GPUs, compare Pentium, Xeon, Athlon XP and Athlon 64. Do you get one benchmark "X is 3% faster than Y"? No. Faster at some, slower at others. For a specific benchmark, the difference can be pretty big already among "general" processors.
A specialized processor like a GPU will show much greater variation. It might really shine on some, really suck on others. Which is why it's no good using a GPU as a CPU. Those numbers tell you that it can be much faster than the fastest CPU around. Or better yet, if you can make it run in parallell to the normal CPU, give you a total performance which may theoretically be about 13GHz (10 + 3), where 3 of those can be general-purpose operations. Or it may be a task the GPU runs like a dog, and isn't even worth the overhead.
Kjella
Re:wait a minute (Score:5, Interesting)
Professor Pat Hanrahan [stanford.edu], of Stanford University, made a stab at answering this question in his presentation 'Why is Graphics Hardware so Fast? [stanford.edu]'. The first half of the presentation focuses on this question, while the second half of the presentation covers programming languages that utilitize this hardware. Specifically, the Stanford Real-Time Shading Language (RTSL) and Brook are discussed. Overall, it's a good presentation that should get you up to speed with the basics of what's happening in this area of research.
Re:wait a minute (Score:3, Funny)
PowerPoint-like presentation... going dumb... noooooo...
Re:wait a minute (Score:3, Funny)
DSPs = linear equation processors (Score:3, Interesting)
AT&T DSP32 Cluster Supercomputer in late 80s (Score:3, Interesting)
A typi
Re:AT&T DSP32 Cluster Supercomputer in late 80 (Score:3, Funny)
Re:AT&T DSP32 Cluster Supercomputer in late 80 (Score:4, Interesting)
Your music application sounds like fun. I didn't know anybody was still doing anything quite like that by 1990 - there was a whole range of people around John Cage's time who did lots of prepared piano stuff.
Some of the people who were trying to sell our multi-processor supercomputer flavor came up with a music studio application, doing lots of audio processing and mixing, sort of like your device turned inside out. Don't know if they sold more than one of them before the Lucent spinoff took them away.
How does this look? (Score:5, Interesting)
Re:How does this look? (Score:5, Informative)
The deaf leading the blind... (Score:5, Informative)
I would assume that this program simply never calls the drawing function, but instead gets the results back from the GPU. The normal screen should be able to run in the meanwhile (I assume you can e.g. build a 3D environment while showing a 2D cutscreen), so I would think you can have a plain GUI, as long as it doesn't need to use anything advanced.
Kjella
I am not an EE, but... (Score:5, Interesting)
Doing string searches, complex logic analyses, etc. would probably suck, but big data manipulations, such as SETI-style wave transformations, molecular analysis, etc., might be able to take advantage of them.
A good example of how an OS should be programmed. (Score:2, Insightful)
Fast Fourier Transform (Score:4, Interesting)
A lot of scientific code is constrained by how fast you can do an FFT, perhaps of arbitrary size. And a fast graphics card is a lot cheaper than a high-end processor.
For embarassingly parallel vector problems, this is just the sort of thing for cheap, powerful clusters based around a cheap PC and a fast GPU.
Re:Fast Fourier Transform (Score:5, Interesting)
Re:Fast Fourier Transform (Score:5, Funny)
Multiply power by N.
You work for Nvidia, don't you?
Re:Fast Fourier Transform (Score:5, Informative)
The FFT on a GPU
This page contains supplemental material for the following paper.
Moreland, K and Angel, E. "The FFT on a GPU." In SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, pp. 112-119, July 2003.
Re:Fast Fourier Transform (Score:2)
Site is slow (Score:2, Informative)
Homepage of GPGU research (Score:5, Informative)
www.gpgpu.org [gpgpu.org]
Very cool. Vector/Graphics processors could one day overtake General processors. They are way more energy efficient too.
Drawing text with GPU shader units? (Score:4, Interesting)
1) Each character would have it's own shader program.
2) You would set the shader program, draw a rectange, and the character would appear.
3) The shader programs would be automatically generated by processing TrueType files.
To implement:
1) Break Truetype outline up into a number of convex curve segments.
2) Each of these curve segments would be represented as a set of constants in the shader program
3) For each pixel, test a line from pixel to an edge.
4) If the number of segments crossed is odd the pixel is black else white.
The algorithm can be refined to add antialiasing and hinting.
What you end up with is text that is clear at any resolution. The size of the text is controlled by the rectangle you draw it in. The text can also be clearly rotated and sheared.
An obvious optimization is to get the GPU vendors to add a shader instruction to do the calculation for which side of the bezier curve segment the current point lies.
While not important for games drawing text is critical for desktops. And we all know about the current trends to draw desktops with 3D hardware.
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:3, Interesting)
The app thinks it is drawing into a flat rectangle. But the compositing engine distorts the font bitmap with it's transform. With the shader approach the distortion doesn't happen. Same problem happens when the compositing engine does scaling.
You only need one shader program per glyph not matter what point size you want to draw. There is a lot of overhe
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:2)
The plus is that it can be used to produce fairly nice antialiased text that intermixes well with other primitives, and rendering is very fast.
The minus is that a single set of geometric primitives for a character won't work for all point sizes if you need to use hinting. (Whether this is important or not depends on your application -- especially whether you need very small text, or have so little graphics memory available that y
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:3, Insightful)
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:2)
( http://freetype.sourceforge.net/patents.html )
Re:Drawing text with GPU shader units? (Score:2)
Re:Drawing text with GPU shader units? (Score:2)
Brook (Score:5, Insightful)
This looks like a straightforward and clean extension that experienced C/C++ programmers won't find difficult to learn, but it isn't entirely clear to me whether just using this language, without any knowledge of GPU architecture, will lead to big improvements in performance. Granted, you don't need to know the details, but you've got to have an idea of what it is that you're trying to do and in a general way how the special constructs of the language allow you to do that. As with other such language extensions, you can nominally write in the language but not really use the extensions (how many "C++" programs have you seen that were really C programs with // comments and a few couts?) or use them in unintended ways that prevent the intended optimization. It seems to me that if the project really is aiming at programmers who are not familiar with GPUs, they need at least to provide a brief introduction to the special properties of GPU architecture and some guidelines as to how to use the features of the language to take advantage of them. At present I don't find this either on the web sites or in the distribution.
Excellent! (Score:3, Interesting)
2003-04-20 01:51:36 Using video processing as "attached processor" (askslashdot,hardware) (rejected)
But as you can see it was rejected. I was particularly interested in the use of the GPU for cryptographic functions (e.g., with a loopback encrypted filesystem), to offload the processing from the main CPU. Is anyone aware of any work in this area?
Is this even a viable implementation, or would the overhead of continually dispatching work to the GPU exceed the benefit derived?
Re:Excellent! (Score:2)
I'd say it won't work. The AGP bus is slow at pushing data out.
Re:Excellent! (Score:4, Informative)
With encryption you are usually looking at processing streams of data. If your encryption method involves a lot of floating point math (almost never) on every bit of information, then it would be nice. But encryption is almost always integer based (GPUs don't' shine in integer like they do in floating point), and involves just as much data going in as coming back.
If you are looking for a great (co) processor for integers, look at the Altivec section of the G4 (and the similar one in the G5.. I forget the IBM name).
Research (Score:5, Insightful)
Re:Research (Score:5, Interesting)
Re: (Score:2)
I've always wondered when this would happen... (Score:3, Interesting)
The last few SIGGRAPHS had numerous approaches using GPU's to detect collisions, in real-time, betwen complex volumes using only the GPU. With some minor tweaking, graphics manufacturers can make this 100x more efficent and easier to implement.
With the 'shader' languages being able to create and modify meshesh now, procedurally, this is the best place to detect collisions (beaking back the mesh data to your motherboard so that your local CPU can figure out what collided, is not efficent).
Re:I've always wondered when this would happen... (Score:4, Interesting)
It's been done. The Havok [havok.com] game physics system is available for the Playstation 2, and the physics is running in the vector processors, where most of the PS2's compute power resides.
Collision detection isn't that CPU-intensive. (This may surprise people not familiar with the field. But it's true. If collision detection is using substantial CPU time, you're doing it wrong.) Correct collision resolution is where the time goes.
Physics code works better with double-precision FPUs. You need both dynamic range and long mantissas to do it well. Some of the game consoles, and most of the GPUs, only have single-precision FPUs. It's possible to make physics code work in single precision, but fast-moving objects that cover considerable distance may have problems.
Nivida CG (Score:4, Informative)
"About Cg The Cg Language Specification is a high-level C-like graphics programming language that was developed by NVIDIA in close collaboration with Microsoft Corporation. The Cg environment consists of two components: the Cg Toolkit including the NVIDIA Cg Compiler Beta 1.0 optimized for DirectX(R) and OpenGL(R); and the NVIDIA Cg Browser, a prototyping/visualization environment with a large library of Cg shaders. Developers also have access to user documentation and a range of training classes and online materials being developed for the Cg language."
http://www.nvidia.com/object/IO_20020612_7133.htm
Re:Nivida CG (Score:2)
Interesting (Score:2)
DivX (Score:2)
memory bandwidth is the key (Score:3, Insightful)
GPU use for scientific programming. (Score:4, Interesting)
More speed for the Terascale cluster? (Score:2, Interesting)
distributed.net (Score:3, Interesting)
I know my PC eats 20 Watts more of power when in 3D mode, but still, I want the faster agent
Crypto (Score:3, Interesting)
Indeed, I only know of one crypto hack that uses floats -- being from DJB, it's predictably brilliant. Basically, it's easy to compute the floating point error from a given operation, but computationally hard to find an operation that yields a given error. So you can effectively sign (or at least MAC) arbitrary content. Nice!
--Dan
Imagine a Beowulf Cluster... no, seriously (Score:5, Interesting)
This cluster has 70 Playstations (one article said that they'd ordered 100, but only 70 are in the cluster... Obviously the others are being used for "research".)
How long until (Score:4, Funny)
Ray tracing with a GPU? (Score:3, Interesting)
Re:The future is the past (Score:5, Interesting)
Picture five high end GPUs on the motherboard eclipsing the single high-end cpu for a fraction of the price. Intel and AMD would be forced to cut the asking price of their products to compete. We could finally see some real four-way competition for "processors".
TW
Re:multi gpu? (Score:2)
Re:multi gpu? (Score:2)