Oracle Claims Dramatic MySQL Performance Improvements

Slashdot is powered by your submissions, so send in your scoop

Oracle Claims Dramatic MySQL Performance Improvements 168

Posted by timothy on Thursday February 16, 2012 @03:01PM from the breathing-in-those-sulfurous-fumes dept.

New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."

This discussion has been archived. No new comments can be posted.

Oracle Claims Dramatic MySQL Performance Improvements

Search 168 Comments Log In/Create an Account

Comments Filter:

MySQL Cluster != MySQL (Score:5, Informative)

by Anonymous Coward writes: on Thursday February 16, 2012 @03:15PM (#39064083)

Not the same thing.

Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:5, Informative)

by Anonymous Coward writes: on Thursday February 16, 2012 @03:22PM (#39064183)

If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.
yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.

Parent Share
twitter facebook
Yes, I RTFA (sue me) (Score:4, Informative)

by billcopc ( 196330 ) writes: <vrillco@yahoo.com> on Thursday February 16, 2012 @03:30PM (#39064301) Homepage

If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!
Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.

Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:3, Informative)

by micheas ( 231635 ) writes: on Thursday February 16, 2012 @03:40PM (#39064437) Homepage Journal

SELECT id, title FROM page WHERE id IN (SELECT pageid FROM hotnews WHERE user = 8)
The answer to this is to execute the sub query and pass the result back as an array. If you have 10,000 records in table page, you will see about a 2000x speed improvement if you are using php

Parent Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:2, Informative)

by Anonymous Coward writes: on Thursday February 16, 2012 @03:49PM (#39064581)

That's just a crappy query... no wonder you've got bad performance.

SELECT id, title FROM page JOIN hotnews on page.id = hotnews.pageid AND user = 8

There. Enjoy your performance boost.

Parent Share
twitter facebook
The 70x is for (distributed) joins (Score:5, Informative)

by jorela ( 84188 ) writes: on Thursday February 16, 2012 @03:56PM (#39064693)

which previously were quite poorly handled.
See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2

Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:5, Informative)

by wstrucke ( 876891 ) writes: on Thursday February 16, 2012 @04:01PM (#39064785)

You're probably right, but that's not what this release is referring to. The NDBCLUSTER engine [mysql.com] separates "API" nodes from Data nodes. A server running MySQL with NDBCLUSTER enabled is considered an API node, but you can also have a C++ or Java or whatever API node that isn't MySQL. Data nodes are provisioned in one or more "node groups" with one or more data nodes in each group, though it would be dumb to have a single node group or a single node in a node group. Each node group splits the data somehow. You can force tables to exist on just one group, but by default if you have three node groups you would have approximately a third of your data in each group.

Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.

NDBCLUSTER 7.2 adds what was called "push down joins" [blogspot.com] in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.

It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.

What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these [mysql.com] open in the bug tracker.

Parent Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:4, Informative)

by Anonymous Coward writes: on Thursday February 16, 2012 @04:15PM (#39064979)

If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up
We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.
I am sorry for shameless self-promotion, couldn't resist :-)

Parent Share
twitter facebook
Re:Considering sub queries in IN statements. (Score:4, Informative)

by jon3k ( 691256 ) writes: on Thursday February 16, 2012 @04:44PM (#39065359)

Just fyi, It's called a correlated subquery.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Oracle Claims Dramatic MySQL Performance Improvements 168

Oracle Claims Dramatic MySQL Performance Improvements More Login

Oracle Claims Dramatic MySQL Performance Improvements

MySQL Cluster != MySQL (Score:5, Informative)

Re:Considering sub queries in IN statements. (Score:5, Informative)

Yes, I RTFA (sue me) (Score:4, Informative)

Re:Considering sub queries in IN statements. (Score:3, Informative)

Re:Considering sub queries in IN statements. (Score:2, Informative)

The 70x is for (distributed) joins (Score:5, Informative)

Re:Considering sub queries in IN statements. (Score:5, Informative)

Re:Considering sub queries in IN statements. (Score:4, Informative)

Re:Considering sub queries in IN statements. (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot