Forgot your password?
typodupeerror
Databases Open Source Oracle Upgrades

Oracle Claims Dramatic MySQL Performance Improvements 168

Posted by timothy
from the breathing-in-those-sulfurous-fumes dept.
New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."
This discussion has been archived. No new comments can be posted.

Oracle Claims Dramatic MySQL Performance Improvements

Comments Filter:
  • by micheas (231635) on Thursday February 16, 2012 @03:12PM (#39064007) Homepage Journal
    If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.
    • by Anonymous Coward on Thursday February 16, 2012 @03:22PM (#39064183)

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.

      yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

      • by K. S. Kyosuke (729550) on Thursday February 16, 2012 @03:30PM (#39064299)

        yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

        And what's wrong with using an RDBMS with a non-braindead optimizer?

        • by cheater512 (783349) <nick@nickstallman.net> on Thursday February 16, 2012 @04:56PM (#39065575) Homepage

          The poor RDBMS has to put up with crap like that over and over again.
          It will probably need therapy after you are through with it.

          The programmer is supposed to know how to use the tool effectively to get the best performance.
          In that example there is a stupidly obvious solution - use a join.

          • by K. S. Kyosuke (729550) on Thursday February 16, 2012 @07:22PM (#39067561)

            The poor RDBMS has to put up with crap like that over and over again. It will probably need therapy after you are through with it.

            The programmer is supposed to know how to use the tool effectively to get the best performance. In that example there is a stupidly obvious solution - use a join.

            To me that sounds like fixing a bug in the wrong place. RDBMS is *not* supposed to be dumb. If it were supposed to be dumb, we would not be using SQL in the first place and we'd be passing our own execution plans in the DB engine's internal intermediate language instead. (Well, you can sort of do it with Firebird (BLR) and SQLite (VDBE), but it's not how it's supposed to be used.)

            • by fatp (1171151)

              RDBMS is *not* supposed to be dumb.

              It is supposed to do what it was asked - if someone asks it to do something dumb, it has no choice.

          • This is going to happen less and less as programmers think they don't need to understand databases because they ORM like hibernate. I know of many programmers who don't want to know about sql beyond what they need to make their hibernate mapping work.
            • by apotheon (577944)
              It's tough to blame those programmers who don't want to use SQL. It's a miserable language. Even one of its principle designers hates it.
              • There are actually two different issues here. First, the relational model. That one is quite fine, it has strong theoretical foundations (easy to do correctness proofs of programs, introduce execution time optimizations, parallel execution, etc., unlike with previous pointer-chasing models), can model a lot of things and is pretty useful overall. Now, whether SQL is a good front end to an RDBMS...that's about as clever a question as whether COBOL is a good programming language. ORMs take away a lot of pain
          • by u38cg (607297)
            Point one, there are queries that can only be written using a sub-query (unusual, but they exist); point two, computers are supposed to help, not hinder. What's the point of having a declaritive language if I can't declare a query in the form I think of it in?
    • by wstrucke (876891) on Thursday February 16, 2012 @04:01PM (#39064785)
      You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine [mysql.com] separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

      Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

      NDBCLUSTER 7.2 adds what was called "push down joins" [blogspot.com] in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

      It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

      What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these [mysql.com] open in the bug tracker.
    • by Anonymous Coward on Thursday February 16, 2012 @04:15PM (#39064979)

      If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up

      We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.

      I am sorry for shameless self-promotion, couldn't resist :-)

    • by jon3k (691256) on Thursday February 16, 2012 @04:44PM (#39065359)
      Just fyi, It's called a correlated subquery.
      • Just fyi, It's called a correlated subquery.

        If the thing he's talking about had been correlated subquery, he wouldn't have been able to cache its results. Ergo, he must be talking about a lousy execution of a non-correlated subquery.

    • If your living depends on you knowing how to do something well, you will learn to use the tools most effectively.

      Professional DBA's know this, and know how to code correctly to avoid the problem in the first place. Having it "work" vs having it work reliably, quickly, and in a manner which scales is what separates any computer user from a professional. Training, ambition, curiosity, and experience are what transforms you into that professional.

      So while it may be nice to have the machine try to outsmart yo
      • by mosb1000 (710161)

        While I agree it's a professional's job to know this stuff, I would also like to point out it's not a crime to build your RDBMS in such a way that queries run as fast as possible regardless of how the query was written. Indeed, anything less is an inferior product.

    • by Terrasque (796014)

      Here's an actual example:

      http://code.google.com/p/django-tagging/issues/detail?id=160 [google.com]

      Basically, the subquery was run all the time, leading to an amplification of biblical proportions. The solution there was to put that subquery in another subquery (yo dawg), as shown in http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/ [xaprb.com]

      We also tried with a temp table, but this solution was cleaner :) The performance difference was around 1/100th of the time for a sample query.

      We also confirm

  • by K. S. Kyosuke (729550) on Thursday February 16, 2012 @03:14PM (#39064053)
    I knew they would implement the Oracle storage backend into MySQL one day...
  • MySQL Seems like it could be interesting, but I can't get over how it requires the whole thing to be hosted in memory. I'm much more interested in Percona Cluster [percona.com] which is also based off MySQL.
    • Using memory for db cluster nodes isn't new. Teradata has been doing it for a long time now in their data warehousing. It has its advantages on large databases and you have the money to afford the nodes you need for reliability.

  • by Anonymous Coward on Thursday February 16, 2012 @03:15PM (#39064083)

    Not the same thing.

    • by Animats (122034)

      MySQL Cluster used to be a pay-only product. Is it free now?

      (The setup with one MySQL master and N read-only slaves has been free for years. That powers Wikipedia, for example. It's not MySQL Cluster. A MySQL Cluster has multiple read-write machines.)

  • by photonyx (2507666) on Thursday February 16, 2012 @03:20PM (#39064137)
    The case when the data set is bigger than RAM amount has not been investigated (link here [blogspot.com], see the comments). The hard drive I/O speed would slow it dramatically, unless it's an expensive array of SSDs.
    • Nothing can really save your performance numbers once you have to hit the platters, so the question would be not 'does it suck more when it can't work purely in RAM?' but 'How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?'...
      • by tqk (413719)

        How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?

        Are we talking about a web back end db, or just a generic db? I ask as the page I'm replying to shows (according to AdBlockPlus) 47 separate entities running in the background. With all that !@#$ going on, what does DB fetch performance even mean?

        I think network performance is far and away the most determinant factor in this stuff. Optimize your queries to hell and back, but you'll still have to wait for the network. Zzzzzz ...

        "The DB's too slow!"

        Not necessarily. Why don't people understand Performance

    • Who the hell uses actual disks when performance is an issue?
      Also, who the hell uses ssd's when performance is an issue?
      The fast kids use PCI Express cards loaded with flash.
      Texas Memory systems or Fusion IO. Expensive, yes, but this is Mysql CLUSTER we are talking about. You don't use that unless you need the speed.
    • by sitkill (893183)
      That's weird....I believe that when the dataset is bigger than ram, you just break it into different servers (or just get more ram). This isn't anything new, this is actually old OLD technology, and you can find cases of this everywhere (google, microsoft uses it for hotmail, etc), called the Brick architecture or something like that.

      Actually, a ton of research was done on these types of architectures by Jim Gray (who tragically was lost at sea a few years ago) at Microsoft. Not sure how many of these li
    • by Trogre (513942)

      That would make it a RAED array.

  • Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?
    • by jedidiah (1196)

      If you can't be bothered, you can ignore a lot of the finer details in Oracle just as much as you would be prone to with any other RDBMS.

      • by Daetrin (576516)
        Uh, if it was something i could ignore then i wouldn't be cringing, i would be just shrugging and ignoring the problem. As long as some of our (big) customers insist on using Oracle we have to work with Oracle to support them. And they're not going to accept "we didn't implement/debug that feature in Oracle because working with Oracle is a pain" very well.
    • by tqk (413719)

      Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?

      You may wish to question whether you're in the wrong line of work. RDBs aren't really all that complex. Stuff it in, then drag it out and present it in a pretty display format. What else is there to know?

      People are way too hung up on complexity. Life doesn't have to be that hard. You're making it harder than it has to be. Simplify.

      • by Daetrin (576516)
        I think you're pretty massively missing the point? It's not RDBs that are the problem, it's Oracle specifically. Try developing complex software using a SQL back end that supports both MS SQL and Oracle. Getting everything working on Oracle takes longer. And when there are problems it takes longer to debug on Oracle. Just getting Oracle installed properly was a huge pain in the ass compared to MS SQL.

        _Oracle_ is making it harder than it has to be. If i had the choice i'd simplify just by dropping support
        • by tqk (413719)

          It's not RDBs that are the problem, it's Oracle specifically.

          Ah. On that, we agree. Larry's one sharp snakeoil salesman.

  • Yes, I RTFA (sue me) (Score:4, Informative)

    by billcopc (196330) <vrillco@yahoo.com> on Thursday February 16, 2012 @03:30PM (#39064301) Homepage

    If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!

    Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

    • by medv4380 (1604309)
      If you're going to read an article about a benchmark you might actually read the benchmark too.

      Previously each data node exploited about 6 CPUs with 8 threads, in this benchmark we used about 14 CPUs with 24 threads. We've also removed a number of bottlenecks and the configuration of the threads is done in such a manner as to handle many different loads well.

      The little thing you read about memcached was about adding in a Native API which probably helped things, but getting the multithreading upgraded probably had more to do with it. They also were comparing two benchmarks that both were using just memory, and not a case where 1 used disk and the other used ram.

    • by dreemernj (859414)
      It sounds like the RAM isn't actually the real boost. This is MySQL Cluster. The last time I used it, it kept everything in RAM already. I think since then they let you store non-indexed tables onto disk, but that's it. The speed boost sounds like it is for NDB, not InnoDB. NDB doesn't support foreign keys and removing that constraint probably helps its performance too.

      In this case, the actual boost is from AQL, which allows for a query to be split up by JOINs and sent out to different servers to run
    • by lewiscr (3314)

      DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

      Just make sure you actually test it. Some of the performance recommendations in the manual don't scale from 64 MB to 48 GB. MyISAM Key Buffer and Query Cache, I'm looking at you.

      • by billcopc (196330)

        I, er... I think I shorthand a lot of this stuff, because it seems trivial to me as a programmer. Anything that works like a hashtable is going to experience diminishing returns, because you have to scan through that every-growing table. O(n^2) and stuff.

        On the other hand, anything that spares you from hitting a disk, that's a guaranteed win. What I commonly do with my DB servers, since my data sets are small but complex, is to give them enough memory to cache the entire database into RAM. The disk effe

  • by nimbius (983462) on Thursday February 16, 2012 @03:35PM (#39064375) Homepage
    was GPL as well.
    dont forget that Oracle committed that MySQL server will continue to use the dual-licensing strategy long used by MySQL AB with commercial and GPL versions available until at least 2015. if other big players are any indication this is oracles attempt to avoid another apache/java landmine and preserve the trust and respect of developers until such time as the product is sufficiently 'in-housed' to divorce from the community without fear of retalliation. by 2015 the forked code, should there be yet another fork, can probably enjoy a tenth of the performance of oracles version and have to compete with ingrained lock-ins and contract conditions developed by oracle to further ostracize open source competetors.

    but im not sure if its really relevant at all. databases like hypercube and couch are really giving oracle a run for their copious amounts of money. by 2015 the "paradigm" may have "shifted" as the PHB says.
  • by Amadodd (620353) on Thursday February 16, 2012 @03:35PM (#39064377)
    Slightly off-topic, but I recently had the oppurtunity to test the speed of a MySQL in-memory database. I have some frequently queried read-only data that simply would not handle the load in MS SQL and was looking for an in-memory solution. MySQL provided the simplest implementation - simply tell the table to use memory storage and configure the server to allow the amount of data you want to host (~250MB in this case). You also have to remember to reload the data from normal InnoDB tables every time you restart the server. I used the same table structures, keys indexes and stored procedures (almost the same) to query the data and linked it through MS SQL so that my applications never new the difference. On exactly the same hardware the speed increase was at least 50X over MS SQL.
    • by bbbaldie (935205)
      I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.
      • by rgbrenner (317308)

        it usually takes SQL Server two or three seconds to respond

        then you're incompetent. stackoverflow uses SQL Server 2008, and serves 12-14 million pages a day. [brentozar.com]

        • by gbjbaanb (229885)

          that's 162 *pages* per second, not SQL queries. Thing is, how many of those are cached and simply re-served up again? How big are the servers that run these? How many servers do they use? How much Lucene do they use instead of SqlServer??? (you didn't watch the video did you? - 23 minutes in)

          Anyway, your argument is like saying Facebook takes 2 seconds to return my wall of data whereas MyPersonalWebsite with a PHP script I wrote returns instantly, therefore Facebook are useless losers.

          There might well be a

      • by xaxa (988988)

        I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.

        Can you give an example?

        We're evaluating Postgres vs MS SQL at work, and at the moment haven't found any problem big enough to rule either out.

        • by codepunk (167897)

          problems other than cost you mean.

          • by xaxa (988988)

            problems other than cost you mean.

            We already have an MS SQL database for the HR, finance (etc) systems, so using that might be as cheap (or cheaper) than a new Postgres database, given existing staff (DBA) knowledge and some odd way they fudge the numbers. (Recurring costs for MS software are apparently negotiated and paid for centrally, rather than charged to the department using the software. Daft.)

            Essentially, arguing on grounds of inital/ongoing cost is difficult. My best cost argument is the no-cost license for Postgres means we coul

      • by rev0lt (1950662)
        Did you test the queries on the actual database server, or did you just run your script? Were you using regular drivers, or PDO? Was the SQL server on the same machine, or on a separate one?
        Just because your script didn't run well, doesn't mean that SQL Server was slower. Probably the PHP database modules had a lot to do with it - I guess there is much more attention given to MySQL integration than with SQL Server, or some misconfiguration (always 2-3 seconds? Maybe is a DNS issue, were you using hostnames
  • by jorela (84188) on Thursday February 16, 2012 @03:56PM (#39064693)

    which previously were quite poorly handled.
    See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

  • 70x (Score:5, Funny)

    by slasho81 (455509) on Thursday February 16, 2012 @04:06PM (#39064835)
    Developers: We've got some really good ideas for increasing performance of complex queries by...
    Marketing: How much in the best conceivable case?
    Developers: Oh, I dunno, maybe 70x.
    Marketing: 70x? Is that good?
    Developers: Yeah, I suppose, but the cool stuff is...
    Marketing: Wow! 70x! That's a really big number!
    Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
    Marketing: We promise.
    Developers: Thanks. Now, let me show you where the good stuff is...
    Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?
  • by Skinkie (815924)
    So if Oracle is so happy with MySQL. Give us a TPC-H [tpc.org] benchmark. Scalefactor: 100.

Wherever you go...There you are. - Buckaroo Banzai

Working...