Why Don't Open Source Databases Use GPUs? 241

Posted by Unknown Lamer on Wednesday December 25, 2013 @11:07AM from the connection-machines-rise-from-the-grave dept.

An anonymous reader writes "A recent paper from Georgia Tech (abstract, paper itself) describes a system than can run the complete TPC-H benchmark suite on an NVIDIA Titan card, at a 7x speedup over a commercial database running on a 32-core Amazon EC2 node, and a 68x speedup over a single core Xeon. A previous story described an MIT project that achieved similar speedups. There has been a steady trickle of work on GPU-accelerated database systems for several years, but it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc. Why not? Many queries that I write are simpler than TPC-H, so what's holding them back?"

This discussion has been archived. No new comments can be posted.

Why Don't Open Source Databases Use GPUs?

Load All Comments

Search 241 Comments Log In/Create an Account

Comments Filter:

Something something online sorting (Score:5, Informative)

by Anonymous Coward writes: on Wednesday December 25, 2013 @11:10AM (#45781889)

...because I/O is the limiting factor of database performance, not compute power?

Share
twitter facebook
- Re:Something something online sorting (Score:5, Insightful)
  
  by Arker ( 91948 ) writes: on Wednesday December 25, 2013 @11:29AM (#45781975) Homepage
  
  Wow, a fp that hit the nail on the head.
  Indeed, database applications tend to bottleneck on I/O, not processor, so most uses would see little gain from this. That's probably the biggest reason no one has bothered to do it.
  Certain uses would probably benefit, but then there are other reasons too. You run databases on machines built for it, not gaming machines, so it's not like they already have this hardware. You would have to buy it and add it as an expense. And GPUs are error prone. Not what you want in most database applications either (although again, there may be niches where this would be ok.)
  
  Parent Share
  twitter facebook
  - Re:Something something online sorting (Score:4, Insightful)
    
    by Runaway1956 ( 1322357 ) writes: on Wednesday December 25, 2013 @11:40AM (#45782005) Homepage Journal
    
    I'll add that most people who put up the cash for high performing GPU's aren't much interested in actually "computing" with them. They are far more interested in "gaming". They demand video performance, as opposed to crunching database numbers. Those companies that are most likely to pay people for manipulating data bases generally have little interest in top notch video, so they aren't going to pay for hundreds of GPU's.
    
    Parent Share
    twitter facebook
    - Re:Something something online sorting (Score:5, Insightful)
      
      by houstonbofh ( 602064 ) writes: on Wednesday December 25, 2013 @11:53AM (#45782065)
      
      ... so they aren't going to pay for hundreds of GPU's.
      Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.
      
      Parent Share
      twitter facebook
      - Re:Something something online sorting (Score:5, Insightful)
        
        by girlintraining ( 1395911 ) writes: on Wednesday December 25, 2013 @02:08PM (#45782617)
        
        Especially when they have already blown the budget on fast SSDs that actually make a real difference in real performance, not just synthetic benchmarks.
        Is now a bad time to point out that many researchers have built clusters based out of thousands of GPUs to model the weather, protein folding, and other things? As it turns out, gamers aren't the only ones that buy GPUs. And GPUs aren't functionally all that different from FPGAs, which as I understand Linus went off to Transmeta to build CPUs based off such architecture.
        I'm irritated whenever people here on slashdot can't see past their own personal experience; it's become quite sad. The true innovators don't see something that's already been done and figure out how to do it better. They see the same things as everyone else, but put them together in radically new ways nobody's ever thought of before.
        GPUs for database processing? That's crazy! Which is why it's innovative and will push the limits of informational technology. three hundred quintillion polygasmic retina displays with 99 billion pixels to play Call of Duty 27 will never do that. Most slashdotters that put down an idea like this really have no concept of what geeks and hackers do.
        We push the limits. We fuck with things that ought not to be fucked with. We take the OSI 7 layer model, set it on fire, turn it inside out, and hack out new ways to do do it by breaking every rule we can find. We go where we aren't wanted, aren't expected, and we push every button we can find. We do things precisely because people tell us it's impossible, that it can't or shouldn't be done, and take great pleasure in finding novel new ways to do something even if there's already twenty proven ways to do it.
        And while probably 99 times out of a 100, the experience matters only for the hacker or geek doing it, and is done merely to learn... that glorious one time when something unexpected and interesting happens, that is what all progress on this industry is based on. And people like you who belch about "synthetic benchmarks" and insist nobody would do X because that's just stupid will never understand.
        
        Parent Share
        twitter facebook
        
        Re:Something something online sorting (Score:5, Informative)
        
        by znrt ( 2424692 ) writes: on Wednesday December 25, 2013 @03:22PM (#45783029)
        
        that's all nice and good. but what has that to do with "Why Don't Open Source Databases Use GPUs?". because GPUs do provide little benefit to nowadays DBs! why aren't diamond shaped networks of bread toasters used for open source databases? it's just a stupid question, has nothing to do with "innovation being misunderstood". there's nothing to understand here besides the fact that someone apparently was in need to fill his news-roll with random bullshit.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by VortexCortex ( 1117377 ) writes:
        
        You're answering the wrong question.
        The question is Why don't Open Source Databases use the GPU. There are many answers: Supply and Demand is the best one. The other is that collating database rows in a GPU is fine, but you still have the damn bottleneck getting the data out to main system RAM. So, if your use case is a server then you're fucked because GPUs don't have a NIC interface.
        The true answer is: We don't run databases in the GPU because GPUs are stupidly dedicated designs. General Purpose Comp
        
        Re: (Score:3)
        
        by LWATCDR ( 28044 ) writes:
        
        Slightly off topic but how about using GPUs for RAIDs?
        
        Re: (Score:2)
        
        by houstonbofh ( 602064 ) writes:
        
        A standard CPU is better, and you are not limited to dedicated (and occasionally had to find) hardware. Look for threads about ZFS or mraid over hardware raid for a lot of discussion on this.
        
        Re: (Score:2)
        
        by Decker-Mage ( 782424 ) writes:
        
        I get your point but mostly due to the fact that I've been a heretic for the last forty-four years. The funny part is, what I was doing then is what everyone else is doing now, just twenty or so years later. The difference being, in one case, having to forklift my big data into the data center. It keeps me amused.
        
        Older doesn't mean one has to be mellower (Score:2)
        
        by Taco Cowboy ( 5327 ) writes:
        
        I don't do much of this sort of thing anymore but there was a time when I tried to look inside every file on my computer, telnet to every host available just to learn everything I could about this wonderful new tool that had come into my grasp. I sometimes miss those days and that guy, but now I tire easily and kind of just want everything to work.
        I do feel what you are feeling.
        As we get older we get more easily tired. But that doesn't mean I will rest more just because I get tired.
        On the contrary - I push myself harder simply because I get more easily tired.
        Only by doing so I get things to even out - maybe I am not as fast and as sharp, in both physical sense and in mental state, but as I push myself harder, I will do more in the same 24 hours allocated to me every single day.
        Why should I let the young uns having all the fun ?
        Why should I let my sco
    - Re:Something something online sorting (Score:5, Interesting)
      
      by ron_ivi ( 607351 ) writes: <sdotno AT cheapcomplexdevices DOT com> on Wednesday December 25, 2013 @01:00PM (#45782339)
      
      performance ... put up cash...
      The biggest opportunity for GPUs in Databases isn't for "performance". As others pointed out - for performance it's easier to just throw money at the problem.
      GPU powered databases do show promise for performance/Watt.
      http://hgpu.org/?p=8219 [hgpu.org]
      However, energy efficiency is not enough, energy proportionality is needed. The objective of this work is to create an entire platform that allows execution of GPU operators in an energy proportional DBMS, WattBD, and also a GPU Sort operator to prove that this new platform works. A different approach to integrate the GPU into the database has been used. Existing solutions to this problem aims to optimize specific areas of the DBMS, or provides extensions to the SQL language to specify GPU operation, thus, lacking flexibility to optimize all database operations, or provide transparency of the GPU execution to the user. This framework differs from existing strategies manipulating the creation and insertion of GPU operators directly into the query plan tree, allowing a more flexible and transparent framework to integrate new GPU-enabled operators. Results show that it was possible to easily develop a GPU sort operator with this framework. We believe that this framework will allow a new approach to integrate GPUs into existing databases, and therefore achieve more energy efficient DBMS.
      Also note that you can write PostgreSQL stored procedures in OpenCL - which may be useful if you're doing something CPU intensive like storing images in a database and doing OCR or facial recognition on them: http://wiki.postgresql.org/images/6/65/Pgopencl.pdf [postgresql.org]
      Introducing PgOpenCL - A New PostgreSQL Procedural Language Unlocking the Power of the GPU
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by BLKMGK ( 34057 ) writes:
      
      Actually as one of those "gamers" I'd love to be using my GPU to speedup real world things like x.264 and ffmpeg but sadly GPU isn't being used there and seems to be actively scorned. A real bummer as I'd love to be putting this bad boy to more use in things I do that tax my heavily overclocked CPU.
      GPU crunch numbers well, look at the differences made in password cracking for instance. In the right situation the GPU isn't used for video at all.
      I know several people who have invested serious cash in GPU that
    - - Re: (Score:2)
        
        by Decker-Mage ( 782424 ) writes:
        
        Partially true, at best. If you are serious about using GPU's for anything whose erroneous result can kill people, you don't use consumer-grade GPU cards. I don't know what you have but I only use workstation-grade and then I verify the results, just as I would do with anything hazardous. As an engineer, I can't afford those types of mistakes.
        
        And on the whole database thang? The GPU is chained to a database-eating machine (5.65 GBps SSD array, and yes, that's GigaBytes). Yeah, it can make pretty graphics
  - Re: (Score:2)
    
    by JWSmythe ( 446288 ) writes:
    
    Well, gaming machines do make great servers. What is a gaming machine? Fast CPU, lots of memory, fast storage. The only difference is the video card. For home built servers in PC cases, I just don't bother with the pesky high end video cards. They run so much cooler and quieter. I'd hate to have a rack of servers at the house. I rather not have a jet engine running in the next room. :)
    - - Re: (Score:3)
        
        by emj ( 15659 ) writes:
        
        ECC fails in a nice way, that's why you use them, there are lots of nice graphs wround to show you why. If you want to say that ECC isn't important you need to put up some facts to support that.
        That said ECC is slower, more expensive and makes vendor lock in easier,
        
        Re:Something something online sorting (Score:4, Informative)
        
        by Arker ( 91948 ) writes: on Wednesday December 25, 2013 @03:36PM (#45783109) Homepage
        
        I love the ignorance of the mods here, your post isnt interesting, it's boneheadedly stupid.
        "The different "class" of motherboard is simply a different form factor so you can't swap for another one. i.e., vendor lock-in."
        No, it is NOT. Important things like ECC support have to be built into the chipset, so you are using a different chipset. And if you are not getting ripped off many other components are going to be different as well.
        "RAM is different. It's claimed they use ECC for the safety of your data. In practice it's so you can't go to the local computer store to buy more. Corps tend to buy from the manufacturer because "that's where we got the server, and it was expensive."
        The ignorance here is appalling. ECC is for the safety of your data, without it you WILL have regular bit errors. They dont use it on consumer equipment because consumers are so dumb they will buy a cheaper computer without it and think they are getting a better deal, and because it's rationalized that no one (should) use consumer equipment for anything important anyway. Just based on the known incidence of cosmic radiation alone, combined with the small process size and sheer density of modern ram, guarantees you will have regular bit errors and the consequences are essentially 'random' - meaning one time the error could be something you wont even notice, but the next time it could necessitate a full reformat of the machine. Or it might just corrupt an important data file instead. There is no way to predict it.
        If you are doing anything important with the computer this is not acceptable and you should just quit being an idiot and get ECC.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by theArtificial ( 613980 ) writes:
        
        They dont use it on consumer equipment because consumers are so dumb they will buy a cheaper computer without it and think they are getting a better deal, and because it's rationalized that no one (should) use consumer equipment for anything important anyway.
        You might want to take another look at consumer equipment because nearly all ASUS [asus.com] AMD (look at the AM3+) boards have it. Intel is the one that goes out of their way to disable it.
        
        Re:Something something online sorting (Score:4, Informative)
        
        by HornWumpus ( 783565 ) writes: on Wednesday December 25, 2013 @05:25PM (#45783589)
        
        I used ECC on a workstation once. The bios logged ECC fixes. I had 2 over the life of the machine (3 years).
        ECC doesn't hard fault due to parity error. It has the bits to find and fix any single bit flip. That's the point.
        I don't use ECC anymore. Most good (not server grade) MBs do support it.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by baffled ( 1034554 ) writes:
        
        Using your data, assuming the key block is 512 bytes out of say 8GB RAM, the odds of the key block being corrupted is 1 in 16.78 million. Even if you have 16 bit errors a year, there's a 1 in a million chance of that happening in a year. Now compare that to other risk scenarios, and how much you invest in protecting against those.
        
        Re:Something something online sorting (Score:5, Informative)
        
        by brambus ( 3457531 ) writes: on Wednesday December 25, 2013 @05:42PM (#45783669)
        
        You obviously have never torn down a server. I've built thousands.
        Bullshit and here's why:
        The last place I was at paid over $300K for a Sun machine with 128 cores and 1TB RAM. I priced the same machine, with 128 cores and 1TB RAM for something like $20K, but with faster components made for gaming use.
        This is such a load of crap it's hard to fathom you had anything to do with server procurement at any point at all. First, you can't (even today) build a 128-core/1TB RAM box using gaming components, so you're looking at a cluster of smaller boxes vs one big box. That impacts the software infrastructure in a big way. For example it's a vastly different affair to run one big DB instance vs a cluster of 12 little ones (not to speak of the extra money you'll spend on these extra instances). Clusters massively complicate administration, backup, replication, disaster recovery, etc.
        RAM is different. It's claimed they use ECC for the safety of your data. In practice it's so you can't go to the local computer store to buy more.
        Another reason you don't know what you're talking about. ECC absolutely *does* work and bits do flip in memory, which in the absence of ECC can result in data corruption or unplanned machine downtime. I've had the OS detect faulty memory sticks via ECC before.
        Corps tend to buy from the manufacturer because "that's where we got the server, and it was expensive."
        No, they do that because that way you have a valid support contract and can blame problems on a supplier if stuff goes down the drain (as it often does). Obviously you've never had to stand in front of top-brass and try to explain why your multi-million dollar project fell flat on its face because of a few bucks you've decided to save on some el-cheapo memory sticks.
        Box? Well, rackmount for racks, desktop for not-racks. I've seen plenty of people ungracefully stack rackmount boxes on the floor of a corner office, and complain when they need to pull out the bottom one. That's not so different than racks. I've seen people rack mount where they put in a shelf, and then put 10 servers on top of it without ever putting in the rail kits.
        It's not exactly the boxes fault when you guys are idiots and stack rack-mount servers.
        With only a very few exceptions, they're the same chipsets, using the same technologies.
        Have you *ever* had a server motherboard in your hands?
        Hell, even the hard drives are gaming, or are making their way there. SCSI was the only way to go, even though SATA overtook the performance long ago. Then they started putting 2.5" SAS drives in, which are laptop SATA drives with a bigger pricetag.
        I give up. How could this shit have been upvoted so much? The performance gap between a 2.5'' server SAS drive [tomshardware.co.uk] vs a 2.5'' laptop SATA [tomshardware.co.uk] drive is *huge*. And that's before we get to the way these things tend to behave in failure scenarios in large-HDD storage arrays (do you even know how a freakin' JBOD works?)
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by drsmithy ( 35869 ) writes:
        
        Hell, even the hard drives are gaming, or are making their way there. SCSI was the only way to go, even though SATA overtook the performance long ago.
        Even old U320 SCSI drives have seek times ca. 2/3 those (and consequently higher IOPS) of the fastest SATA drives.
        Then they started putting 2.5" SAS drives in, which are laptop SATA drives with a bigger pricetag.
        You are utterly clueless.
        You've only got to hold an enterprise SAS drive and a consumer SATA laptop drive in each hand to know they have to be manufa
  - Re: (Score:2)
    
    by the_B0fh ( 208483 ) writes:
    
    Curious why do you think GPUs are error prone?
  - - Re: (Score:2)
      
      by greenfruitsalad ( 2008354 ) writes:
      
      Don't all modern databases run in RAM now? Is I/O therefore still a limiting factor?
      - Re: (Score:2)
        
        by Oligonicella ( 659917 ) writes:
        
        No. Yes.
        
        Re: (Score:2)
        
        by dwater ( 72834 ) writes:
        
        Perhaps he should have asked the question slightly differently...I seem to recall it being a selling point for big SGI machines that they could potentially hold the entire database in RAM. I suppose it totally depends on how big the database is...looking, it seems SGIs can have up to 64 TB of RAM (though the wording is unclear).
        If the database fits into that amount of RAM, then wouldn't that mean I/O is not a limiting factor?
        I have to wonder what happens when there's an unexpected failure...wouldn't the cha
- Not true (Score:5, Insightful)
  
  by kervin ( 64171 ) writes: on Wednesday December 25, 2013 @12:07PM (#45782127)
  
  ...because I/O is the limiting factor of database performance, not compute power?
  Just a few projects into Database Performance Optimization would convince you that's not a true statement. IO/Memory/CPU are in fact largely interchangeable resources on a database. And depending on your schema you can just as easily run out of any of these resources equally.
  For instance, I'm currently tuning a SQL Server database that's CPU heavy based on our load projection targets. We could tweak/increase query caching that would cause more resultsets to stay in memory. This would mean that less complex queries would be run, drastically reducing I/O and some CPU resource usage. But then drastically increasing memory usage. This is just a simple example of course to illustrate the point.
  Databases run out of CPU resources all the time. And a CPU advancement would be very well received.
  My guess as to why this hasn't been done is that it would require end-users to start buying/renting/leasing GPU enabled hardware for their Database infrastructure. This would be a huge change from how we do things today and this sector moves very slowly.
  Also we have many fairly old but more important Database advancements which have been around for years and are still almost unusable. If you ever tried to horizontally scale most popular Open-source databases you may know what I'm talking about. Multi-master, or just scaling technology in general, is required by about every growing "IT-dependent" company at some point. But that technology ( though available ) is still "in the dark ages" as far as I'm concerned based on reliability and performance measurements.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Bengie ( 1121981 ) writes:
    
    Rule of thumb, if your dataset can fit in memory, it probably won't benefit from GPUs. Talking about 10TB+ datasets and few long running Data Warehouse style queries, not small OLTP style queries. GPUs take a crap if you have any branching, so all queries used must not have any conditions that can cause different rows to take different branches to be useful, so very basic WHERE statements.
  - true. large, busy Wordpress CPU bound (Score:2)
    
    by raymorris ( 2726007 ) writes:
    
    Indeed. We have a large WordPress based site and it is bound by database CPU despite the fairly powerful CPU it uses. It should scale to many cores, so I'm thinking of trying a pair of the 8 core AMD processors. Intel is faster PER CORE, but an AMD rig could have 16 cores.
  - - Re: Not true (Score:3)
      
      by dgatwood ( 11270 ) writes:
      
      Fully interchangeable, no, but they are somewhat so. It's more like saying that the transmission and engine are interchangeable. In a literal sense, it isn't true—neither can do the other's job—but you can make up for a weak engine by adding more gears.
- Re: (Score:3)
  
  by fzammett ( 255288 ) writes:
  
  Very good point, entirely correct. However... for an in-memory database I wonder if there's gains to be had? I'm not sure CPU-memory I/O is much of a bottleneck, though such DBs aren't suitable to every task of course.
  - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    Even with in-memory databases, most of the stuff are simple computations with a large amount of data with often random access. GPUs like a lot of computation with streaming data. Also: data copying, virtual memory support. But perhaps Kaveri and its successors will be more useful for that.
- Re: (Score:2)
  
  by CODiNE ( 27417 ) writes:
  
  In other words... For databases that fit in memory GPU makes a lot of sense. For really large data sets the limit is how fast you can get the data off the hard disk.
  But what "io bottleneck" people may be missing is that an io bound server could still benefit from this if the freed up CPU time can be used for other things when it's not shuttling data to and from the GPU. It also could end up saving a lot of energy, and that's money.
  - Re: (Score:2)
    
    by marcosdumay ( 620877 ) writes:
    
    Except that GPUs are bad for most of the tasks a database do. Normaly, databases require random memory access (not mapping arrays) and complex selection rules. GPUs are best doing maps over continuous arrays, and with very simple (best if none) conditional cases.
  - Re: (Score:3)
    
    by Bengie ( 1121981 ) writes:
    
    For databases that fit in memory GPU makes a lot of sense.
    A bit more selective that that. For datasets that fit in memory, where memory patterns are sequential, and the queries have almost no branching. GPUs are very picky.
- Re:Something something online sorting (Score:5, Informative)
  
  by fatphil ( 181876 ) writes: on Wednesday December 25, 2013 @01:25PM (#45782433) Homepage
  
  Read the paper - page 7 (which bizarrely doesn't render clearly for me at all, and I can't copy/paste)
  "Scale Factor 1 (SF 1) ... data fits in GPU memory"
  
  They ran the TPC-H ("H"="Huge") with a dataset that was ABSOLUTELY FUCKING TINY.
  
  No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.
  
  Frauds.
  
  Parent Share
  twitter facebook
  - Re:Something something online sorting (Score:5, Interesting)
    
    by TheRaven64 ( 641858 ) writes: on Wednesday December 25, 2013 @02:51PM (#45782849) Journal
    
    No, I'm not shouting at you, I'm shouting at the fucking bogus pseudo-academics who wanted to bullshit with micro-optimisation rather than making actual advancements in the field of databases.
    Any paper that does X on a GPU generally fits into this category. It's not science to run an existing algorithm on an existing Turing-complete processor. At most it's engineering. But it's a fairly easy way to churn out papers. Doing X 'in the cloud' or 'with big data' have a similar strategy. It's usually safe to ignore them.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by ddt ( 14627 ) writes:
    
    Nice catch, Fatphil!
    Also, writing, debugging, and maintaining GPU code is a lot less fun than CPU code. Much open source GPU code do you know of that is still in use after 5 years?
    - Re: (Score:2)
      
      by fatphil ( 181876 ) writes:
      
      Someone else spotted this before me, it appears; I replied to him below, something-dad or something-lad.
      
      One of the problems is that things are just a little bit too new to be tested for longevity. Fragmented architectures plus chipset vendors pushing separate languages didn't help.
      
      To be honest, with modern GPUs, I believe that I was born too early. I was massively getting into optimisation 90s to early 00s. I'm tired of dicking about with all that kind of stuff now. And the kids these days have got it way t
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Bah, pesky facts! Don't you know that the latest buzzwords have to be accepted unquestioningly to be truly hip (and utterly incompetent)?
- - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    Things like joins and sorts are compute expensive, and can easily become the major bottleneck when most of your hot data fits in memory
    What kinds of joins on data that fits into a computer's operating memory get actually accelerated by something that has its own (seriously limited) physical address space and needs serious data copying to get proper random access to the data in the first place? (Not to mention the effective call latencies.)
    (Oh, BTW, and when did "compute" become a noun? Nouning Latin verbs feels ridiculously retarded.)
- - Re: (Score:3)
    
    by BLKMGK ( 34057 ) writes:
    
    What does this have to do with a machine having a GPU? You can equip a machine with a GPU and not hook a monitor to it - still works for processing just fine...
    - Re: (Score:2)
      
      by TrollstonButterbeans ( 2914995 ) writes:
      
      Hint: I was being absurd.
    - - Re: (Score:2)
        
        by BLKMGK ( 34057 ) writes:
        
        Hardware passthru, I already do this for RAID cards on my ESX setup. I know others who do it with GPU, no biggie.
        
        Re: (Score:2)
        
        by drsmithy ( 35869 ) writes:
        
        Hardware passthru, I already do this for RAID cards on my ESX setup.
        I'm kind of curious about the use case for this ?
        
        Re: (Score:2)
        
        by BLKMGK ( 34057 ) writes:
        
        On my server? I have 3 SAS cards. Two of them are passed through to a VM along with a USB bridge so that it can see it's license key and run it's NAS software just as if it was on a standalone machine. That software is unRAID and it works well for mass storage that need not be fast and saves power by sleeping drives when not in use. Been using it for years on standalone hardware with a cat and dog collection of drives totaling about 30TB..
        The 3rd SAS card is passed through to another VM running FreeNAS. Fre
Cost? Time? Hardware? Skill? (Score:5, Interesting)

by AHuxley ( 892839 ) writes: on Wednesday December 25, 2013 @11:10AM (#45781891) Journal

The people with the skills have day jobs and want to enjoy time off with other projects.
The people with the skills have no jobs and want to write the code but the hardware is too expensive.

Share
twitter facebook
Because SQL is basically dead (Score:3, Insightful)

by Maury Markowitz ( 452832 ) writes: on Wednesday December 25, 2013 @11:12AM (#45781907) Homepage

The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.
It's bad enough that the only standardized access system is ODBC, designed 25 years ago when pipes were short and thin and a WAN was the next building over. If we can't get that problem fixed, what's the hope for integrating new technologies?

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  The R&D effort in the SQL field is roughly zero, so it's not surprising people aren't keeping up with the latest developments in the hardware field.
  Except for the part where errybody's keeping up with the latest developments. They're just actually looking at developments that matter. GPUs... Do not matter. If you want to know more, check the first post.
  Processing power is inconsequential compared to I/O. RAM is pretty straightforward; newer, faster RAM comes out, larger amounts become cheaper, you buy it, you throw it into the mix.
  The cool stuff is happening around SSDs (which are also pretty straight forward), solid state memory devices (think
- Re: (Score:3)
  
  by houstonbofh ( 602064 ) writes:
  
  Run a big query on your database. Now, while the hard drive light is solid red, look at your CPU load. See how it is not using all the CPU because it is waiting on the hard drive? A GPU will not help that.
  - - Re: (Score:2)
      
      by advocate_one ( 662832 ) writes:
      
      16 gigs of fast RAM as well would be a good boost... but with really big databases, you go parallel... with multiple machines running against small subsets of the data at the same time...
      I'm surprised we haven't seen FPGAs being deployed instead of GPUs...
      - Re: (Score:3)
        
        by TheRaven64 ( 641858 ) writes:
        
        16GB of RAM? What is this, 2000? My laptop has 16GB of RAM, the machines in the rack have 256GB, and they're a year old (and due an upgrade). If you're running a database on something with 16GB of RAM, you either don't have much data or you're seriously skimping on hardware.
    - Re: (Score:2)
      
      by EvilAlphonso ( 809413 ) writes:
      
      I haven't seen the CPU as the bottleneck on any of the DB servers I have administered in the last 4 years, except on seriously under-spec'd systems. The most CPU intensive DB at work is peaking at 3 cores out of 24, but maxes on IOPS (8GB link to an auto-tiered SAN, 50% SSD disk pool) and RAM (256GB) throughout the entire job.
    - Re: (Score:2)
      
      by houstonbofh ( 602064 ) writes:
      
      No, but a multi channel SSD Raid would.. expensive yes... but certainly possible. then where is your bottleneck.
      Next would be I/O to the graphics card, as mentioned in the article. But in general, the bus is the next bottleneck.
      However on a server where you pay by CPU having a non-CPU extension makes a lot of sense. This would be why propriety systems have GPU extensions and open source systems do not.
      in short - if you need CPU in open source get get another CPU - it's cheap. If you need CPU in a closed source application you get a GPU - it doesn't work as well as a CPU but it adds performance and it doesn't incur more licensing fees.
      Are you actually using crappy licensing as a reason to use alternative hardware? And when the license is open and they only restriction is technical you can use a cheaper solution, and that is a bad thing?
"Them"? (Score:3)

by FaxeTheCat ( 1394763 ) writes: on Wednesday December 25, 2013 @11:12AM (#45781909)

so what's holding them back?
Wrong question. It is open source. If you need it, you fix it.

Share
twitter facebook
- Re: (Score:2)
  
  by houstonbofh ( 602064 ) writes:
  
  so what's holding them back?
  Wrong question. It is open source. If you need it, you fix it.
  No, it is the right question. And the answer is, the people that actually understand these things work also know this will not help anything in real world applications. They are also busy optimizing for additional cheap ram, and the new and fast SSD cards that are almost affordable.
Risk aversion. (Score:2, Interesting)

by Anonymous Coward writes:

Because a lot of us have personal experience on how "reliable" GPU calculations are.

A few screen "artifacts" tend to be less painful than db "artifacts". Maybe things have changed. But it's not been that long since nvidia had a huge batch of video cards that were dying in all sorts of ways.

As for AMD/ATI, I suspect you'd normally use some of their crappy software when doing that GPU processing.
You just answered your own question (Score:5, Insightful)

by vadim_t ( 324782 ) writes: on Wednesday December 25, 2013 @11:15AM (#45781927) Homepage

"Many queries that I write are simpler than TPC-H, so what's holding them back?" -- simple queries don't need acceleration.
A "SELECT * FROM users WHERE user_id = 12", or a "SELECT SUM(price) FROM products" doesn't need a GPU, it's IO bound and would benefit much more from having plenty cache memory, and a SSD. A lot of what things like MySQL get used for is forums and similar, where queries are simple. The current tendency seems to be to use the database as an object store, which results in a lack of gnarly queries that could be optimized.
I do think such features will eventually make it in, but this isn't going to benefit uses like forums much.

Share
twitter facebook
- Re:You just answered your own question (Score:5, Insightful)
  
  by tranquilidad ( 1994300 ) writes: on Wednesday December 25, 2013 @11:37AM (#45781997)
  
  This...
  If you go beyond the abstract and read the paper you'll notice that they chose a TPC-H scale factor of 1 (1 GB of data) so that the entire dataset would fit in the GPU.
  The question they seem to really be asking is more akin to, "Why don't we make our datasets small enough for complex queries that it can all fit in the storage attached to a processor we like?"
  They continue to answer their own question when discussing results and admit they can't compare costs of "traditional" implementations because those tests were all run with scale of 100 (100 GB of data).
  They say the comparison is difficult against complete systems because of the scaling factor and "...this paper is about the effectiveness of mapping relational queries to utilize the compute throughput [of] GPUs".
  So, it seems to boil down to a test of compute power on data sets small enough to fit in memory rather than an effective test of relational query processing, though they did use relational queries as their base testing model.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by fahrbot-bot ( 874524 ) writes:
    
    So, it seems to boil down to a test of compute power on data sets small enough to fit in memory rather than an effective test of relational query processing, though they did use relational queries as their base testing model.
    Or... Just because you can do something, doesn't mean you should.
  - Re: (Score:2)
    
    by fatphil ( 181876 ) writes:
    
    Exactly!
    
    "They say the comparison is difficult against complete systems because of the scaling factor[...]"
    
    The TPC go a little bit further:
    """
    Note 1: The TPC believes that comparisons of TPC-H results measured against different database sizes are misleading and discourages such comparisons. The TPC-H results shown below are grouped by database size to emphasize that only results within each group are comparable.
    """
    
    Their toy is simply irrelevant in the field of real world databases.
- Re: (Score:2)
  
  by locopuyo ( 1433631 ) writes:
  
  Select * from orders order by amount For huge sorting queries gpu destroys cpu.
They're coming... (Score:4, Informative)

by Heretic2 ( 117767 ) writes: on Wednesday December 25, 2013 @11:26AM (#45781965)

MapD is a GIS-centric database. [istc-bigdata.org]

Share
twitter facebook
What's holding them back? (Score:2)

by Culture20 ( 968837 ) writes:

Many queries that you write are simpler than TPC-H. Necessity is the mother of invention.
Why not? (Score:4, Funny)

by Black Parrot ( 19622 ) writes: on Wednesday December 25, 2013 @11:49AM (#45782043)

It's waiting for you to get on it.

Share
twitter facebook
Hardware costs are limiting factor (Score:2)

by dkf ( 304284 ) writes:

What's holding them back? I'd have thought it was obvious!
The big issue with GPGPU for DB work is that you have to have the DB entirely in memory or your performance will suck (even SSDs aren't that fast). To get a big database to work in such a scenario, you have to split it into many smaller pieces, but that makes working with these sorts of things expensive even with an open source DB. The paper even says this. That makes this sort of work only really interesting for people with significant budgets, and
Improvements have to come a few at a time (Score:2)

by leandrod ( 17766 ) writes:

All of these SGBDs are actually toys being sold for more then they are capable of. So developers there have to try to catch up to PostgreSQL before it becomes (even) easier to use and eat their lunch.
Meanwhile, the issues meriting scarce development and, mainly, review time at PostgreSQL are more interesting than accelerating a few workloads in hardware which is not yet in the servers out there. Things like making PostgreSQL even easier to install, set-up and manage, even more ISO SQL compliant, even more
- Re: (Score:2)
  
  by cyber-vandal ( 148830 ) writes:
  
  SGBD = DBMS en Anglais ;-)
  - Re: (Score:2)
    
    by leandrod ( 17766 ) writes:
    
    Thank you, even if I fear it is too late to fix.
    While I do speak French too, the mistake is probably from my native (Brazilian) Portuguese.
    - Re: (Score:2)
      
      by cyber-vandal ( 148830 ) writes:
      
      Sorry French results were the first on Google when I wondered what SGBD meant :-)
      - Re: (Score:2)
        
        by leandrod ( 17766 ) writes:
        
        No prob at all.
It depends (Score:5, Funny)

by Waffle Iron ( 339739 ) writes: on Wednesday December 25, 2013 @12:01PM (#45782109)

Research shows that there is good news and bad news on this approach.
The good news: Certain SQL queries can get a massive speedup by using a GPU.
The bad news: Only a small subset of queries got any benefit. They generally looked like this:
SELECT pixels FROM characters JOIN polygons JOIN textures ON characters.character_id = polygons.character_id WHERE characters.name = 'orc-wielding-mace' AND textures.name = 'heavy-leather-armor' AND color_theme = 'green' ORDER BY y, x

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  ORDER BY z ... sorry
there are certainly CPU-bound databases (Score:2)

by hedrick ( 701605 ) writes:

I'm responsible for a large university learning management system (Sakai). The daabase is completely CPU limited. I assume that's because the working set of data fits in memory. I would think lots of university and enterprise applications would be similar. Another data point is the experiments done on a no-SQL interface to innodb. That shows very large speedups. Surely some of this is due to the CPU overhead in processing SQL.
- Re: (Score:2)
  
  by marcosdumay ( 620877 ) writes:
  
  I assume that's because the working set of data fits in memory.
  As memory access count as CPU time, not I/O, doing any query in a dataset that is in memory will be CPU bound. But that does not mean that you'll get improvements by adding CPU speed.
- Re: (Score:2)
  
  by Arker ( 91948 ) writes:
  
  As the other poster pointed out, given that your set fits in memory, it's going to appear to be CPU bound. It still probably is not, however. Memory access is still likely to be the actual bottleneck.
Plenty to do first... (Score:2)

by fostware ( 551290 ) writes:

Besides datasets not fitting in to GPGPU memory, and I/O bottlenecks, I'm still seeing plenty of badly written SQL
A current contract has plenty of SQL work (not for me though), and the bulk of their time is cleaning up data exceptions, badly written report queries, and moving oft-used or large-dataset queries to stored procedures. GPGPU's will hide some of the rot, but if the SQL was written better in the first place, we're able to use parallelism and better use existing commodity hardware in clients virtua
- Re: (Score:2)
  
  by Grishnakh ( 216268 ) writes:
  
  This might be a stupid question as I'm not a DB expert, but isn't the problem of badly-written SQL something that could be mitigated by improvements in the SQL parser of a RDMBS? Other programming language compilers are frequently designed to optimize output code despite non-optimal constructs written by programmers. It seems to me that some of the improvements you talk of could be automated, especially moving oft-used queries to stored procedures.
  - Re: (Score:2)
    
    by fostware ( 551290 ) writes:
    
    I honestly don't know of any decent AQL optimisers...
    I know MS SQL Management Studio has SQL Profiler, Index Tuning Advisor, and Database Performance Tuning Advisor.
    But there's nothing in Aqua Data Studio that works with PostgreSQL, which means co-workers and I must rely on good looks and mad skillz (I'm only passable on both)
  - Re: (Score:2)
    
    by Bengie ( 1121981 ) writes:
    
    When your queries start getting into the 10 table joins, the join optimizer starts to attempt to make educated guesses because of the number of possible join arrangements. The metadata used is based on samples of the current data. To mitigate having to keep these metadata perfectly up to date, which would be very expensive and slow, the RDMBS only samples a subset.
    
    While this works most of the time, there are some cases that don't. I've had quite a few times where I had to force join orders and/or join ty
    - Re: (Score:2)
      
      by Grishnakh ( 216268 ) writes:
      
      See, (again I'm speaking from a position of relative ignorance here) it seems like the RDMBS should be intelligent enough to figure this stuff out automatically, instead of requiring an in-house expert. It should be adaptive and learn from the current usage patterns, in relation to the data it stores. So if, for instance, breaking the query up and using temp tables speeds things up, the DB should figure this out and do it automatically. It wouldn't work for one-time queries, but if the same kind of queri
      - Re: (Score:2)
        
        by Bengie ( 1121981 ) writes:
        
        Many of these optimizations that are done "manually" can be done because I know certain things that the RDMBS does not know about the usage case. It can guess about things and use current meta data, but those guesses are not always correct.
        
        Lets make an example. Say table A is a small table with a relation to table B, and table B is several magnitudes larger than A. Now say table B has a relation to table C, but table C is only a few factors larger than B.
        
        Lets assume there is also a reverse, where tabl
Postgres (Score:3)

by slackergod ( 37906 ) writes: on Wednesday December 25, 2013 @01:05PM (#45782355) Homepage Journal

Looks like exactly what PostgreSQL's PGStrom [postgresql.org] project is trying to acheive.

Share
twitter facebook
Conspicuous omission - PostgreSQL (Score:2)

by bill_mcgonigle ( 4333 ) * writes:

it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc.
Lemme guess, MySQL fanatic?
You can already go download:
https://wiki.postgresql.org/wiki/PGStrom [postgresql.org]
if it fits your problem domain and PostGIS has some hackers adding GPU support:
http://data-informed.com/fast-database-emerges-from-mit-class-gpus-and-students-invention/ [data-informed.com]
Why not the others? Perhaps because PostgreSQL makes developing extensions easier - it's got the largest extensio
GPU not 7 times faster than 32 CPU cores (Score:3)

by loufoque ( 1400831 ) writes: on Wednesday December 25, 2013 @02:29PM (#45782749)

A GPU, even a GTX Titan, simply isn't 7 times faster than a modern 32-core x86 CPU in real life. Most of the gain probably comes from just general optimization that could have been done on the CPU too.

Share
twitter facebook
The economics are wonderful (Score:3)

by Groo Wanderer ( 180806 ) writes: <charlie@semiaccura[ ]com ['te.' in gap]> on Wednesday December 25, 2013 @03:04PM (#45782927) Homepage

Gee, a $1000 GPU that runs 7x as fast a 1/8th of an $1500 CPU. It woud be good idea if you didn't need that CPU to run it, but just barely so. If you cheap out on the CPU and only spend ~$750 on it, assuminng there is no slowdown on the GPU because of it, then the economics break. And people wonder why GPU compute on databases isn't catching on.
Then there is the power use aka TCO/running costs to think about. And everything mentioned above. And.... This study has all he hallmarks of an Nvidia research project who's targets are financial analysts rather than potential customers. The science is fine but that is not the intent.
-Charlie

Share
twitter facebook
Interesting but flawed paper (Score:2)

by sloth jr ( 88200 ) writes:

This is clearly the question that corporate co-authors Nvidia and Logicblox hoped you would ask.

The paper seems to represent more of an evolutionary rather than revolutionary approach, but suffers from some unfortunate hand-waving, particularly in their attempt to negate the real cost of memory->PCIe transfers (to their credit, at least they call out that latency), their unwillingness to perform comparisons on like-to-like base hardware, and their rather odd choice of front-end environment. Coupled with
ARM64 in the data centre? (Score:2)

by ChunderDownunder ( 709234 ) writes:

Typical reponses above:
(a) DB operations aren't CPU intensive
(b) Servers don't come with dedicated graphics cards of any note
(c) Loading each server with a AMD or Nvidia card would increase power usage
So in summary, certain operations may benefit using GPUs but there's not a cost-effective solution to warrant such experimentation.
I'd be surprised ARM if haven't sponsored cloud research into OpenCL on the Mali GPUs.
- Re: (Score:3)
  
  by laffer1 ( 701823 ) writes:
  
  One problem is OS and toolchain support. You might get something together for Windows, OS X and Linux, but that's where the buck stops.
  The next problem is that standalone compute cards are rather expensive and putting in a high power GPU has considerable power requirements. Then most server racks are full of 1u wonders not designed to get rid of heat or even hold a huge AMD or NVIDIA GPU.
  Open source databases are great, but they're often pushed as a cost savings to companies. To turn around and buy extra
  - - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by michaelmalak ( 91262 ) writes:
        
        1+1 occasionally equaling 4
        
        Are you referring to the lack of ECC RAM on consumer grade GPUs or are you saying you know of FDIV or overclocking style unreliability in the compute engines themselves?
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: (Score:2)
      
      by countach74 ( 2484150 ) writes:
      
      Sorry, had to chime in on your claim that price would come down if they became popular. That is contrary to the Law of Supply and Demand. Top end GPU's are expensive because there is demand for the latest and greatest, which then justifies the cost of making very expensive, cutting-edge cards.
- Re: (Score:3)
  
  by fuzzyfuzzyfungus ( 1223518 ) writes:
  
  Most servers do not have powerful GPUs, and that is where heavy production databases are run.
  Servers turn over comparatively quickly, though (sure, every shop has ol' reliable trucking away on the 13GB SCSI drive that was pretty cool when it left the factory, doing something obscure but vital; but the population as a whole churns faster than that), and servers with nice chunks of PCIe (typically intended for your zippy network cards or fancy storage HBAs; but they are perfectly normal PCIe slots) aren't at all difficult to find. Nor has (Nvidia in particular, AMD trailing a touch) Team Graphics bee
  - Re: (Score:3)
    
    by houstonbofh ( 602064 ) writes:
    
    But for that money, more ram or faster drives makes more of a difference...
    - Re: (Score:2)
      
      by fuzzyfuzzyfungus ( 1223518 ) writes:
      
      Oh, with the exception of dedicated GPU compute setups, definitely, that's why the servers in use are configured as they are. My point was not that servers should have more GPU power; but that (if a change in software made doing so a good idea) the existing hardware wouldn't provide too much 'inertia' to stop or slow adoption.
      
      There doesn't seem to be too much interest, on the whole; but if one were interested they could change the composition of their servers in fairly short order; and a broader shift co
- Re: (Score:2)
  
  by PPH ( 736903 ) writes:
  
  IT staff needs GPUs to play Crysis. Your DBMS gets a lower priority.
- Re: (Score:2)
  
  by StripedCow ( 776465 ) writes:
  
  so while the computation on the gpu may be 10x faster...feeding the data in/out is 10x slower meaning it did not do anything for you, except require you a lot of extra coding complication do use it. ...
  Benchmarks tend not look like real world queries, of often you can do something that helps a benchmark, but does nothing in the real world,.
  But what if the benchmark is larger than the memory size of the GPU? I don't know the actual size, but I guess they use at least realistic amounts of data (larger than the memory of the GPU card), so that would prove your theory wrong!
  By the way, there's more to databases than just queries. Skimming through the abstract, I see that they only address speeding up the queries. The commit phase of a database is also interesting, but they don't seem to address it.
- Re: (Score:2)
  
  by mc6809e ( 214243 ) writes:
  
  ...maybe it has something to do with the fact that it's called a Graphics Processing Unit? Why the fuck are we using them as CPUs?
  We use them as CPUs because we don't suffer from that cognitive bias [wikipedia.org] known as functional fixedness. [wikipedia.org]
- Re: (Score:2)
  
  by Maow ( 620678 ) writes:
  
  Don't use open source db. Use SQL Server for security and speed.
  I agree, simply because I'm paid 50 cents to post this.
  How much were you paid?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Something something online sorting (Score:5, Informative)

Re:Something something online sorting (Score:5, Insightful)

Re:Something something online sorting (Score:4, Insightful)

Re:Something something online sorting (Score:5, Insightful)

Re:Something something online sorting (Score:5, Insightful)

Re:Something something online sorting (Score:5, Informative)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Older doesn't mean one has to be mellower (Score:2)

Re:Something something online sorting (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Something something online sorting (Score:4, Informative)

Re: (Score:2)

Re:Something something online sorting (Score:4, Informative)

Re: (Score:3)

Re:Something something online sorting (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Not true (Score:5, Insightful)

Re: (Score:2)

true. large, busy Wordpress CPU bound (Score:2)

Re: Not true (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Something something online sorting (Score:5, Informative)

Re:Something something online sorting (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Cost? Time? Hardware? Skill? (Score:5, Interesting)

Because SQL is basically dead (Score:3, Insightful)

Re: (Score:2, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

"Them"? (Score:3)

Re: (Score:2)

Risk aversion. (Score:2, Interesting)

You just answered your own question (Score:5, Insightful)

Re:You just answered your own question (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

They're coming... (Score:4, Informative)

What's holding them back? (Score:2)

Why not? (Score:4, Funny)

Hardware costs are limiting factor (Score:2)

Improvements have to come a few at a time (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

It depends (Score:5, Funny)

Re: (Score:2, Funny)

there are certainly CPU-bound databases (Score:2)

Re: (Score:2)

Re: (Score:2)

Plenty to do first... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)