Why Don't Open Source Databases Use GPUs? 241
An anonymous reader writes "A recent paper from Georgia Tech (abstract, paper itself) describes a system than can run the complete TPC-H benchmark suite on an NVIDIA Titan card, at a 7x speedup over a commercial database running on a 32-core Amazon EC2 node, and a 68x speedup over a single core Xeon. A previous story described an MIT project that achieved similar speedups. There has been a steady trickle of work on GPU-accelerated database systems for several years, but it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc. Why not? Many queries that I write are simpler than TPC-H, so what's holding them back?"
Cost? Time? Hardware? Skill? (Score:5, Interesting)
The people with the skills have no jobs and want to write the code but the hardware is too expensive.
Risk aversion. (Score:2, Interesting)
A few screen "artifacts" tend to be less painful than db "artifacts". Maybe things have changed. But it's not been that long since nvidia had a huge batch of video cards that were dying in all sorts of ways.
As for AMD/ATI, I suspect you'd normally use some of their crappy software when doing that GPU processing.
Re:Something something online sorting (Score:5, Interesting)
performance ... put up cash...
The biggest opportunity for GPUs in Databases isn't for "performance". As others pointed out - for performance it's easier to just throw money at the problem.
GPU powered databases do show promise for performance/Watt.
http://hgpu.org/?p=8219 [hgpu.org]
However, energy efficiency is not enough, energy proportionality is needed. The objective of this work is to create an entire platform that allows execution of GPU operators in an energy proportional DBMS, WattBD, and also a GPU Sort operator to prove that this new platform works. A different approach to integrate the GPU into the database has been used. Existing solutions to this problem aims to optimize specific areas of the DBMS, or provides extensions to the SQL language to specify GPU operation, thus, lacking flexibility to optimize all database operations, or provide transparency of the GPU execution to the user. This framework differs from existing strategies manipulating the creation and insertion of GPU operators directly into the query plan tree, allowing a more flexible and transparent framework to integrate new GPU-enabled operators. Results show that it was possible to easily develop a GPU sort operator with this framework. We believe that this framework will allow a new approach to integrate GPUs into existing databases, and therefore achieve more energy efficient DBMS.
Also note that you can write PostgreSQL stored procedures in OpenCL - which may be useful if you're doing something CPU intensive like storing images in a database and doing OCR or facial recognition on them: http://wiki.postgresql.org/images/6/65/Pgopencl.pdf [postgresql.org]
Introducing PgOpenCL - A New PostgreSQL Procedural Language Unlocking the Power of the GPU
Re:Something something online sorting (Score:5, Interesting)
No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.
Any paper that does X on a GPU generally fits into this category. It's not science to run an existing algorithm on an existing Turing-complete processor. At most it's engineering. But it's a fairly easy way to churn out papers. Doing X 'in the cloud' or 'with big data' have a similar strategy. It's usually safe to ignore them.