Forgot your password?

Why Don't Open Source Databases Use GPUs? 241

Posted by Unknown Lamer
from the connection-machines-rise-from-the-grave dept.
An anonymous reader writes "A recent paper from Georgia Tech (abstract, paper itself) describes a system than can run the complete TPC-H benchmark suite on an NVIDIA Titan card, at a 7x speedup over a commercial database running on a 32-core Amazon EC2 node, and a 68x speedup over a single core Xeon. A previous story described an MIT project that achieved similar speedups. There has been a steady trickle of work on GPU-accelerated database systems for several years, but it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc. Why not? Many queries that I write are simpler than TPC-H, so what's holding them back?"
This discussion has been archived. No new comments can be posted.

Why Don't Open Source Databases Use GPUs?

Comments Filter:
  • by Maury Markowitz (452832) on Wednesday December 25, 2013 @11:12AM (#45781907) Homepage

    The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.

    It's bad enough that the only standardized access system is ODBC, designed 25 years ago when pipes were short and thin and a WAN was the next building over. If we can't get that problem fixed, what's the hope for integrating new technologies?

  • by vadim_t (324782) on Wednesday December 25, 2013 @11:15AM (#45781927) Homepage

    "Many queries that I write are simpler than TPC-H, so what's holding them back?" -- simple queries don't need acceleration.

    A "SELECT * FROM users WHERE user_id = 12", or a "SELECT SUM(price) FROM products" doesn't need a GPU, it's IO bound and would benefit much more from having plenty cache memory, and a SSD. A lot of what things like MySQL get used for is forums and similar, where queries are simple. The current tendency seems to be to use the database as an object store, which results in a lack of gnarly queries that could be optimized.

    I do think such features will eventually make it in, but this isn't going to benefit uses like forums much.

  • by Arker (91948) on Wednesday December 25, 2013 @11:29AM (#45781975) Homepage

    Wow, a fp that hit the nail on the head.

    Indeed, database applications tend to bottleneck on I/O, not processor, so most uses would see little gain from this. That's probably the biggest reason no one has bothered to do it.

    Certain uses would probably benefit, but then there are other reasons too. You run databases on machines built for it, not gaming machines, so it's not like they already have this hardware. You would have to buy it and add it as an expense. And GPUs are error prone. Not what you want in most database applications either (although again, there may be niches where this would be ok.)

  • by tranquilidad (1994300) on Wednesday December 25, 2013 @11:37AM (#45781997)


    If you go beyond the abstract and read the paper you'll notice that they chose a TPC-H scale factor of 1 (1 GB of data) so that the entire dataset would fit in the GPU.

    The question they seem to really be asking is more akin to, "Why don't we make our datasets small enough for complex queries that it can all fit in the storage attached to a processor we like?"

    They continue to answer their own question when discussing results and admit they can't compare costs of "traditional" implementations because those tests were all run with scale of 100 (100 GB of data).

    They say the comparison is difficult against complete systems because of the scaling factor and "...this paper is about the effectiveness of mapping relational queries to utilize the compute throughput [of] GPUs".

    So, it seems to boil down to a test of compute power on data sets small enough to fit in memory rather than an effective test of relational query processing, though they did use relational queries as their base testing model.

  • by Runaway1956 (1322357) on Wednesday December 25, 2013 @11:40AM (#45782005) Homepage Journal

    I'll add that most people who put up the cash for high performing GPU's aren't much interested in actually "computing" with them. They are far more interested in "gaming". They demand video performance, as opposed to crunching database numbers. Those companies that are most likely to pay people for manipulating data bases generally have little interest in top notch video, so they aren't going to pay for hundreds of GPU's.

  • by houstonbofh (602064) on Wednesday December 25, 2013 @11:53AM (#45782065)

    ... so they aren't going to pay for hundreds of GPU's.

    Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.

  • Not true (Score:5, Insightful)

    by kervin (64171) on Wednesday December 25, 2013 @12:07PM (#45782127) Homepage

    ...because I/O is the limiting factor of database performance, not compute power?

    Just a few projects into Database Performance Optimization would convince you that's not a true statement. IO/Memory/CPU are in fact largely interchangeable resources on a database. And depending on your schema you can just as easily run out of any of these resources equally.

    For instance, I'm currently tuning a SQL Server database that's CPU heavy based on our load projection targets. We could tweak/increase query caching that would cause more resultsets to stay in memory. This would mean that less complex queries would be run, drastically reducing I/O and some CPU resource usage. But then drastically increasing memory usage. This is just a simple example of course to illustrate the point.

    Databases run out of CPU resources all the time. And a CPU advancement would be very well received.

    My guess as to why this hasn't been done is that it would require end-users to start buying/renting/leasing GPU enabled hardware for their Database infrastructure. This would be a huge change from how we do things today and this sector moves very slowly.

    Also we have many fairly old but more important Database advancements which have been around for years and are still almost unusable. If you ever tried to horizontally scale most popular Open-source databases you may know what I'm talking about. Multi-master, or just scaling technology in general, is required by about every growing "IT-dependent" company at some point. But that technology ( though available ) is still "in the dark ages" as far as I'm concerned based on reliability and performance measurements.

  • by girlintraining (1395911) on Wednesday December 25, 2013 @02:08PM (#45782617)

    Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.

    Is now a bad time to point out that many researchers have built clusters based out of thousands of GPUs to model the weather, protein folding, and other things? As it turns out, gamers aren't the only ones that buy GPUs. And GPUs aren't functionally all that different from FPGAs, which as I understand Linus went off to Transmeta to build CPUs based off such architecture.

    I'm irritated whenever people here on slashdot can't see past their own personal experience; it's become quite sad. The true innovators don't see something that's already been done and figure out how to do it better. They see the same things as everyone else, but put them together in radically new ways nobody's ever thought of before.

    GPUs for database processing? That's crazy! Which is why it's innovative and will push the limits of informational technology. three hundred quintillion polygasmic retina displays with 99 billion pixels to play Call of Duty 27 will never do that. Most slashdotters that put down an idea like this really have no concept of what geeks and hackers do.

    We push the limits. We fuck with things that ought not to be fucked with. We take the OSI 7 layer model, set it on fire, turn it inside out, and hack out new ways to do do it by breaking every rule we can find. We go where we aren't wanted, aren't expected, and we push every button we can find. We do things precisely because people tell us it's impossible, that it can't or shouldn't be done, and take great pleasure in finding novel new ways to do something even if there's already twenty proven ways to do it.

    And while probably 99 times out of a 100, the experience matters only for the hacker or geek doing it, and is done merely to learn... that glorious one time when something unexpected and interesting happens, that is what all progress on this industry is based on. And people like you who belch about "synthetic benchmarks" and insist nobody would do X because that's just stupid will never understand.

We can predict everything, except the future.