Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Databases Open Source Oracle Upgrades

Oracle Claims Dramatic MySQL Performance Improvements 168

New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."
This discussion has been archived. No new comments can be posted.

Oracle Claims Dramatic MySQL Performance Improvements

Comments Filter:
  • by micheas ( 231635 ) on Thursday February 16, 2012 @02:12PM (#39064007) Homepage Journal
    If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.
    • by Anonymous Coward on Thursday February 16, 2012 @02:22PM (#39064183)

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

    • by wstrucke ( 876891 ) on Thursday February 16, 2012 @03:01PM (#39064785)
      You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine [mysql.com] separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

      Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

      NDBCLUSTER 7.2 adds what was called "push down joins" [blogspot.com] in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

      It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

      What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these [mysql.com] open in the bug tracker.
    • by Anonymous Coward on Thursday February 16, 2012 @03:15PM (#39064979)

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up

      We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.

      I am sorry for shameless self-promotion, couldn't resist :-)

    • by jon3k ( 691256 ) on Thursday February 16, 2012 @03:44PM (#39065359)
      Just fyi, It's called a correlated subquery.
    • by StuartHankins ( 1020819 ) on Thursday February 16, 2012 @04:35PM (#39066197)
      If your living depends on you knowing how to do something well, you will learn to use the tools most effectively.

      Professional DBA's know this, and know how to code correctly to avoid the problem in the first place. Having it "work" vs having it work reliably, quickly, and in a manner which scales is what separates any computer user from a professional. Training, ambition, curiosity, and experience are what transforms you into that professional.

      So while it may be nice to have the machine try to outsmart you while you work in a 4th generation language, learning how and why it works can be illuminating in other projects and other methods. You are able to do more because you know more.
    • Here's an actual example:

      http://code.google.com/p/django-tagging/issues/detail?id=160 [google.com]

      Basically, the subquery was run all the time, leading to an amplification of biblical proportions. The solution there was to put that subquery in another subquery (yo dawg), as shown in http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/ [xaprb.com]

      We also tried with a temp table, but this solution was cleaner :) The performance difference was around 1/100th of the time for a sample query.

      We also confirmed that the problem still existed in MySQL 5.5

  • by K. S. Kyosuke ( 729550 ) on Thursday February 16, 2012 @02:14PM (#39064053)
    I knew they would implement the Oracle storage backend into MySQL one day...
  • by CastrTroy ( 595695 ) on Thursday February 16, 2012 @02:14PM (#39064063)
    MySQL Seems like it could be interesting, but I can't get over how it requires the whole thing to be hosted in memory. I'm much more interested in Percona Cluster [percona.com] which is also based off MySQL.
    • by ducomputergeek ( 595742 ) on Thursday February 16, 2012 @02:22PM (#39064181)

      Using memory for db cluster nodes isn't new. Teradata has been doing it for a long time now in their data warehousing. It has its advantages on large databases and you have the money to afford the nodes you need for reliability.

  • by Anonymous Coward on Thursday February 16, 2012 @02:15PM (#39064083)

    Not the same thing.

  • by photonyx ( 2507666 ) on Thursday February 16, 2012 @02:20PM (#39064137)
    The case when the data set is bigger than RAM amount has not been investigated (link here [blogspot.com], see the comments). The hard drive I/O speed would slow it dramatically, unless it's an expensive array of SSDs.
    • by fuzzyfuzzyfungus ( 1223518 ) on Thursday February 16, 2012 @02:28PM (#39064265) Journal
      Nothing can really save your performance numbers once you have to hit the platters, so the question would be not 'does it suck more when it can't work purely in RAM?' but 'How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?'...
      • by tqk ( 413719 ) <s.keeling@mail.com> on Friday February 17, 2012 @01:18PM (#39076975)

        How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?

        Are we talking about a web back end db, or just a generic db? I ask as the page I'm replying to shows (according to AdBlockPlus) 47 separate entities running in the background. With all that !@#$ going on, what does DB fetch performance even mean?

        I think network performance is far and away the most determinant factor in this stuff. Optimize your queries to hell and back, but you'll still have to wait for the network. Zzzzzz ...

        "The DB's too slow!"

        Not necessarily. Why don't people understand Performance Analysis these days? Complex systems are complex. There's lots of variables involved.

    • by Bill, Shooter of Bul ( 629286 ) on Thursday February 16, 2012 @03:59PM (#39065617) Journal
      Who the hell uses actual disks when performance is an issue?
      Also, who the hell uses ssd's when performance is an issue?
      The fast kids use PCI Express cards loaded with flash.
      Texas Memory systems or Fusion IO. Expensive, yes, but this is Mysql CLUSTER we are talking about. You don't use that unless you need the speed.
    • by sitkill ( 893183 ) on Thursday February 16, 2012 @05:36PM (#39067027)
      That's weird....I believe that when the dataset is bigger than ram, you just break it into different servers (or just get more ram). This isn't anything new, this is actually old OLD technology, and you can find cases of this everywhere (google, microsoft uses it for hotmail, etc), called the Brick architecture or something like that.

      Actually, a ton of research was done on these types of architectures by Jim Gray (who tragically was lost at sea a few years ago) at Microsoft. Not sure how many of these links are still active but: http://research.microsoft.com/en-us/um/people/gray/ [microsoft.com]

      From my own experience, at a job a few years ago, we were dealing with a little over 1 petabyte of data, and the system was engineered to NEVER hit platter, and to always have data in ram.
    • by Trogre ( 513942 ) on Thursday February 16, 2012 @06:07PM (#39067405) Homepage

      That would make it a RAED array.

  • by Daetrin ( 576516 ) on Thursday February 16, 2012 @02:30PM (#39064295)
    Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?
    • by jedidiah ( 1196 ) on Thursday February 16, 2012 @05:53PM (#39067225) Homepage

      If you can't be bothered, you can ignore a lot of the finer details in Oracle just as much as you would be prone to with any other RDBMS.

      • by Daetrin ( 576516 ) on Thursday February 16, 2012 @06:03PM (#39067369)
        Uh, if it was something i could ignore then i wouldn't be cringing, i would be just shrugging and ignoring the problem. As long as some of our (big) customers insist on using Oracle we have to work with Oracle to support them. And they're not going to accept "we didn't implement/debug that feature in Oracle because working with Oracle is a pain" very well.
    • by tqk ( 413719 ) <s.keeling@mail.com> on Friday February 17, 2012 @01:54PM (#39077461)

      Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?

      You may wish to question whether you're in the wrong line of work. RDBs aren't really all that complex. Stuff it in, then drag it out and present it in a pretty display format. What else is there to know?

      People are way too hung up on complexity. Life doesn't have to be that hard. You're making it harder than it has to be. Simplify.

      • by Daetrin ( 576516 ) on Friday February 17, 2012 @03:16PM (#39078533)
        I think you're pretty massively missing the point? It's not RDBs that are the problem, it's Oracle specifically. Try developing complex software using a SQL back end that supports both MS SQL and Oracle. Getting everything working on Oracle takes longer. And when there are problems it takes longer to debug on Oracle. Just getting Oracle installed properly was a huge pain in the ass compared to MS SQL.

        _Oracle_ is making it harder than it has to be. If i had the choice i'd simplify just by dropping support for Oracle and only work with MS SQL, but that's not my choice to make.
  • Yes, I RTFA (sue me) (Score:4, Informative)

    by billcopc ( 196330 ) <vrillco@yahoo.com> on Thursday February 16, 2012 @02:30PM (#39064301) Homepage

    If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!

    Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

    • by medv4380 ( 1604309 ) on Thursday February 16, 2012 @03:04PM (#39064817)
      If you're going to read an article about a benchmark you might actually read the benchmark too.

      Previously each data node exploited about 6 CPUs with 8 threads, in this benchmark we used about 14 CPUs with 24 threads. We've also removed a number of bottlenecks and the configuration of the threads is done in such a manner as to handle many different loads well.

      The little thing you read about memcached was about adding in a Native API which probably helped things, but getting the multithreading upgraded probably had more to do with it. They also were comparing two benchmarks that both were using just memory, and not a case where 1 used disk and the other used ram.

    • by dreemernj ( 859414 ) on Thursday February 16, 2012 @03:52PM (#39065495) Homepage Journal
      It sounds like the RAM isn't actually the real boost. This is MySQL Cluster. The last time I used it, it kept everything in RAM already. I think since then they let you store non-indexed tables onto disk, but that's it. The speed boost sounds like it is for NDB, not InnoDB. NDB doesn't support foreign keys and removing that constraint probably helps its performance too.

      In this case, the actual boost is from AQL, which allows for a query to be split up by JOINs and sent out to different servers to run against their local tables in parallel.
    • by lewiscr ( 3314 ) on Thursday February 16, 2012 @05:37PM (#39067037) Homepage

      DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

      Just make sure you actually test it. Some of the performance recommendations in the manual don't scale from 64 MB to 48 GB. MyISAM Key Buffer and Query Cache, I'm looking at you.

      • by billcopc ( 196330 ) <vrillco@yahoo.com> on Friday February 17, 2012 @08:31PM (#39081753) Homepage

        I, er... I think I shorthand a lot of this stuff, because it seems trivial to me as a programmer. Anything that works like a hashtable is going to experience diminishing returns, because you have to scan through that every-growing table. O(n^2) and stuff.

        On the other hand, anything that spares you from hitting a disk, that's a guaranteed win. What I commonly do with my DB servers, since my data sets are small but complex, is to give them enough memory to cache the entire database into RAM. The disk effectively becomes a backup copy, in case I need to reboot the box. With RAM being so cheap these days, it's usually more cost effective than deploying more cluster nodes. Memcached is great if all you want is a key-value bucket, but for speeding up existing applications that (ab)use SQL, RAM is the undisputed king of performance.

  • by account_deleted ( 4530225 ) on Thursday February 16, 2012 @02:35PM (#39064375)
    Comment removed based on user account deletion
  • by Amadodd ( 620353 ) on Thursday February 16, 2012 @02:35PM (#39064377)
    Slightly off-topic, but I recently had the oppurtunity to test the speed of a MySQL in-memory database. I have some frequently queried read-only data that simply would not handle the load in MS SQL and was looking for an in-memory solution. MySQL provided the simplest implementation - simply tell the table to use memory storage and configure the server to allow the amount of data you want to host (~250MB in this case). You also have to remember to reload the data from normal InnoDB tables every time you restart the server. I used the same table structures, keys indexes and stored procedures (almost the same) to query the data and linked it through MS SQL so that my applications never new the difference. On exactly the same hardware the speed increase was at least 50X over MS SQL.
    • by bbbaldie ( 935205 ) on Thursday February 16, 2012 @02:55PM (#39064687) Homepage
      I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.
      • by rgbrenner ( 317308 ) on Thursday February 16, 2012 @03:13PM (#39064955)

        it usually takes SQL Server two or three seconds to respond

        then you're incompetent. stackoverflow uses SQL Server 2008, and serves 12-14 million pages a day. [brentozar.com]

        • by gbjbaanb ( 229885 ) on Thursday February 16, 2012 @05:29PM (#39066961)

          that's 162 *pages* per second, not SQL queries. Thing is, how many of those are cached and simply re-served up again? How big are the servers that run these? How many servers do they use? How much Lucene do they use instead of SqlServer??? (you didn't watch the video did you? - 23 minutes in)

          Anyway, your argument is like saying Facebook takes 2 seconds to return my wall of data whereas MyPersonalWebsite with a PHP script I wrote returns instantly, therefore Facebook are useless losers.

          There might well be a 2 second delay if the server is repeating many different sql queries - I used to see a Oracle DB that spent most of its time waiting on the query cache - or if its hitting the DB with queries that lock a lot of data pages - I've seen an app that happily took a lock on nearly all the data in various tables escalating the lock to a full table lock. So yes, they're incompetent, but sometimes queries that work well on one DB don't work well on others, in particular queries that happily ran on Oracle killed Sql Server, simply because of the locking design these 2 DBs implemented (ok, so much so that Sql server now has row-level locking, but there could be others still in there causing problems).

      • by xaxa ( 988988 ) on Thursday February 16, 2012 @04:22PM (#39066021)

        I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.

        Can you give an example?

        We're evaluating Postgres vs MS SQL at work, and at the moment haven't found any problem big enough to rule either out.

        • by codepunk ( 167897 ) on Thursday February 16, 2012 @05:31PM (#39066981)

          problems other than cost you mean.

          • by xaxa ( 988988 ) on Friday February 17, 2012 @07:49AM (#39072825)

            problems other than cost you mean.

            We already have an MS SQL database for the HR, finance (etc) systems, so using that might be as cheap (or cheaper) than a new Postgres database, given existing staff (DBA) knowledge and some odd way they fudge the numbers. (Recurring costs for MS software are apparently negotiated and paid for centrally, rather than charged to the department using the software. Daft.)

            Essentially, arguing on grounds of inital/ongoing cost is difficult. My best cost argument is the no-cost license for Postgres means we could change our architecture, have ten times as many servers, have virtual servers, have lots of development servers -- anything -- without having to pay. I'm also concerned that although Microsoft sees us as "academic" at the moment, that might change -- we aren't a university, and the government is telling us we should be trying to make money with our skills/data where we can.

            So, a good, technical reason is much better. Various limitations have ruled out MySQL (e.g. problems with multiple engines -- you may have to choose between full-text search and transactions, for example). Postgres has the edge over MS SQL at the moment -- partly flexibility, partly the GIS extensions, I'll elaborate if anyone's interested. But SQL Server at least claims to do these things too.

      • by rev0lt ( 1950662 ) on Thursday February 16, 2012 @11:53PM (#39070721)
        Did you test the queries on the actual database server, or did you just run your script? Were you using regular drivers, or PDO? Was the SQL server on the same machine, or on a separate one?
        Just because your script didn't run well, doesn't mean that SQL Server was slower. Probably the PHP database modules had a lot to do with it - I guess there is much more attention given to MySQL integration than with SQL Server, or some misconfiguration (always 2-3 seconds? Maybe is a DNS issue, were you using hostnames or just IP's?).
  • by jorela ( 84188 ) on Thursday February 16, 2012 @02:56PM (#39064693)

    which previously were quite poorly handled.
    See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

  • 70x (Score:5, Funny)

    by slasho81 ( 455509 ) on Thursday February 16, 2012 @03:06PM (#39064835)
    Developers: We've got some really good ideas for increasing performance of complex queries by...
    Marketing: How much in the best conceivable case?
    Developers: Oh, I dunno, maybe 70x.
    Marketing: 70x? Is that good?
    Developers: Yeah, I suppose, but the cool stuff is...
    Marketing: Wow! 70x! That's a really big number!
    Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
    Marketing: We promise.
    Developers: Thanks. Now, let me show you where the good stuff is...
    Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?
  • by Skinkie ( 815924 ) on Thursday February 16, 2012 @06:30PM (#39067629) Homepage
    So if Oracle is so happy with MySQL. Give us a TPC-H [tpc.org] benchmark. Scalefactor: 100.

The world is coming to an end. Please log off.

Working...