Forgot your password?
typodupeerror
Databases

Why Don't Open Source Databases Use GPUs? 241

Posted by Unknown Lamer
from the connection-machines-rise-from-the-grave dept.
An anonymous reader writes "A recent paper from Georgia Tech (abstract, paper itself) describes a system than can run the complete TPC-H benchmark suite on an NVIDIA Titan card, at a 7x speedup over a commercial database running on a 32-core Amazon EC2 node, and a 68x speedup over a single core Xeon. A previous story described an MIT project that achieved similar speedups. There has been a steady trickle of work on GPU-accelerated database systems for several years, but it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc. Why not? Many queries that I write are simpler than TPC-H, so what's holding them back?"
This discussion has been archived. No new comments can be posted.

Why Don't Open Source Databases Use GPUs?

Comments Filter:
  • by Anonymous Coward on Wednesday December 25, 2013 @11:10AM (#45781889)

    ...because I/O is the limiting factor of database performance, not compute power?

  • They're coming... (Score:4, Informative)

    by Heretic2 (117767) on Wednesday December 25, 2013 @11:26AM (#45781965)
  • by Anonymous Coward on Wednesday December 25, 2013 @11:35AM (#45781993)

    Databases in the real world are rarely cpu bound (and when I have seen them CPU bound it was when something was going badly wrong) Generally they are data bound and the GPU has several times lower bandwidth than the real cpus so effectively will be even slower, so while the computation on the gpu may be 10x faster...feeding the data in/out is 10x slower meaning it did not do anything for you, except require you a lot of extra coding complication do use it.

    Benchmarks tend not look like real world queries, of often you can do something that helps a benchmark, but does nothing in the real world,.

    Bus installed co processors (pci/pcie/vme) are only useful if you can fit the entire dataset in the co-processors memory, when you have to do large accesses outside of that ram because the data does not fit, then the co-processor usually becomes much slower and all advantages go away. That is why it works for supercomputing...the dataset being worked on is tiny in the cases the gpu works well for.

  • by Anonymous Coward on Wednesday December 25, 2013 @11:49AM (#45782045)

    The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.

    Except for the part where errybody's keeping up with the latest developments. They're just actually looking at developments that matter. GPUs... Do not matter. If you want to know more, check the first post.

    Processing power is inconsequential compared to I/O. RAM is pretty straightforward; newer, faster RAM comes out, larger amounts become cheaper, you buy it, you throw it into the mix.

    The cool stuff is happening around SSDs (which are also pretty straight forward), solid state memory devices (think FusionIO-style cards; Violin devices; RAMSANs), and crazy arse storage solutions.

  • by Anonymous Coward on Wednesday December 25, 2013 @12:02PM (#45782111)

    Try getting a top end Radeon card for a reasonable price at the moment.

    They're being bought out by cryptocoin miners (LTC, for example) to the point that there's a supply shortage that's pushed the price way above MSRP.

    GP's point about GPUs being error prone is only partially correct; they're prone to errors if pushed beyond their power or thermal limits, and most DIY machines don't pay enough attention to either. I'm running an old 560ti on a number of BOINC projects, underclocked slightly and well cooled. Still much, much faster than CPU processing (an FX-8350), yet will crunch happily for as long as I leave the machine running and not produce any validation failures.

  • by fatphil (181876) on Wednesday December 25, 2013 @01:25PM (#45782433) Homepage
    Read the paper - page 7 (which bizarrely doesn't render clearly for me at all, and I can't copy/paste)
    "Scale Factor 1 (SF 1) ... data fits in GPU memory"

    They ran the TPC-H ("H"="Huge") with a dataset that was ABSOLUTELY FUCKING TINY.

    No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.

    Frauds.
  • by znrt (2424692) on Wednesday December 25, 2013 @03:22PM (#45783029)

    that's all nice and good. but what has that to do with "Why Don't Open Source Databases Use GPUs?". because GPUs do provide little benefit to nowadays DBs! why aren't diamond shaped networks of bread toasters used for open source databases? it's just a stupid question, has nothing to do with "innovation being misunderstood". there's nothing to understand here besides the fact that someone apparently was in need to fill his news-roll with random bullshit.

  • by Arker (91948) on Wednesday December 25, 2013 @03:36PM (#45783109) Homepage Journal

    I love the ignorance of the mods here, your post isnt interesting, it's boneheadedly stupid.

    "The different "class" of motherboard is simply a different form factor so you can't swap for another one. i.e., vendor lock-in."

    No, it is NOT. Important things like ECC support have to be built into the chipset, so you are using a different chipset. And if you are not getting ripped off many other components are going to be different as well.

    "RAM is different. It's claimed they use ECC for the safety of your data. In practice it's so you can't go to the local computer store to buy more. Corps tend to buy from the manufacturer because "that's where we got the server, and it was expensive."

    The ignorance here is appalling. ECC is for the safety of your data, without it you WILL have regular bit errors. They dont use it on consumer equipment because consumers are so dumb they will buy a cheaper computer without it and think they are getting a better deal, and because it's rationalized that no one (should) use consumer equipment for anything important anyway. Just based on the known incidence of cosmic radiation alone, combined with the small process size and sheer density of modern ram, guarantees you will have regular bit errors and the consequences are essentially 'random' - meaning one time the error could be something you wont even notice, but the next time it could necessitate a full reformat of the machine. Or it might just corrupt an important data file instead. There is no way to predict it.

    If you are doing anything important with the computer this is not acceptable and you should just quit being an idiot and get ECC.

  • by HornWumpus (783565) on Wednesday December 25, 2013 @05:25PM (#45783589)

    I used ECC on a workstation once. The bios logged ECC fixes. I had 2 over the life of the machine (3 years).

    ECC doesn't hard fault due to parity error. It has the bits to find and fix any single bit flip. That's the point.

    I don't use ECC anymore. Most good (not server grade) MBs do support it.

  • by brambus (3457531) on Wednesday December 25, 2013 @05:42PM (#45783669)

    You obviously have never torn down a server. I've built thousands.

    Bullshit and here's why:

    The last place I was at paid over $300K for a Sun machine with 128 cores and 1TB RAM. I priced the same machine, with 128 cores and 1TB RAM for something like $20K, but with faster components made for gaming use.

    This is such a load of crap it's hard to fathom you had anything to do with server procurement at any point at all. First, you can't (even today) build a 128-core/1TB RAM box using gaming components, so you're looking at a cluster of smaller boxes vs one big box. That impacts the software infrastructure in a big way. For example it's a vastly different affair to run one big DB instance vs a cluster of 12 little ones (not to speak of the extra money you'll spend on these extra instances). Clusters massively complicate administration, backup, replication, disaster recovery, etc.

    RAM is different. It's claimed they use ECC for the safety of your data. In practice it's so you can't go to the local computer store to buy more.

    Another reason you don't know what you're talking about. ECC absolutely *does* work and bits do flip in memory, which in the absence of ECC can result in data corruption or unplanned machine downtime. I've had the OS detect faulty memory sticks via ECC before.

    Corps tend to buy from the manufacturer because "that's where we got the server, and it was expensive."

    No, they do that because that way you have a valid support contract and can blame problems on a supplier if stuff goes down the drain (as it often does). Obviously you've never had to stand in front of top-brass and try to explain why your multi-million dollar project fell flat on its face because of a few bucks you've decided to save on some el-cheapo memory sticks.

    Box? Well, rackmount for racks, desktop for not-racks. I've seen plenty of people ungracefully stack rackmount boxes on the floor of a corner office, and complain when they need to pull out the bottom one. That's not so different than racks. I've seen people rack mount where they put in a shelf, and then put 10 servers on top of it without ever putting in the rail kits.

    It's not exactly the boxes fault when you guys are idiots and stack rack-mount servers.

    With only a very few exceptions, they're the same chipsets, using the same technologies.

    Have you *ever* had a server motherboard in your hands?

    Hell, even the hard drives are gaming, or are making their way there. SCSI was the only way to go, even though SATA overtook the performance long ago. Then they started putting 2.5" SAS drives in, which are laptop SATA drives with a bigger pricetag.

    I give up. How could this shit have been upvoted so much? The performance gap between a 2.5'' server SAS drive [tomshardware.co.uk] vs a 2.5'' laptop SATA [tomshardware.co.uk] drive is *huge*. And that's before we get to the way these things tend to behave in failure scenarios in large-HDD storage arrays (do you even know how a freakin' JBOD works?)

Life is difficult because it is non-linear.

Working...