Oracle Claims Dramatic MySQL Performance Improvements 168
New submitter simula67 writes "Oracle wins back some karma from the open source community by releasing MySQL cluster 7.2 with ambitious claims of 70x performance gains. The new release is GPL and claims to have processed over 1 billion queries per minute. Readers may remember the story about Oracle adding commercial extensions to MySQL."
Considering sub queries in IN statements. (Score:5, Insightful)
Re:Considering sub queries in IN statements. (Score:5, Informative)
If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up. I am suspect that there are other performance stupidities in mysql that are worked around by people doing a simple query and then using php/perl/python/java/etc to parse the result and generate the second query.
yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.
Re:Considering sub queries in IN statements. (Score:5, Insightful)
yes, but many database engines have problems with inner select statements. Refactor your query to use a join an you'll be fine.
And what's wrong with using an RDBMS with a non-braindead optimizer?
Re:Considering sub queries in IN statements. (Score:4, Insightful)
The poor RDBMS has to put up with crap like that over and over again.
It will probably need therapy after you are through with it.
The programmer is supposed to know how to use the tool effectively to get the best performance.
In that example there is a stupidly obvious solution - use a join.
Re:Considering sub queries in IN statements. (Score:5, Insightful)
The poor RDBMS has to put up with crap like that over and over again. It will probably need therapy after you are through with it.
The programmer is supposed to know how to use the tool effectively to get the best performance. In that example there is a stupidly obvious solution - use a join.
To me that sounds like fixing a bug in the wrong place. RDBMS is *not* supposed to be dumb. If it were supposed to be dumb, we would not be using SQL in the first place and we'd be passing our own execution plans in the DB engine's internal intermediate language instead. (Well, you can sort of do it with Firebird (BLR) and SQLite (VDBE), but it's not how it's supposed to be used.)
Re: (Score:2)
RDBMS is *not* supposed to be dumb.
It is supposed to do what it was asked - if someone asks it to do something dumb, it has no choice.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Considering sub queries in IN statements. (Score:5, Informative)
Anyway -- prior to NDBCLUSTER 7.2 if you performed any join whatsoever the API node had to pull the complete tables from all data node groups prior to doing the join on itself and returning the result. This made join performance, simply put -- terrible. I've tested the same query on a standalone out of the box mysql server against a 7.1 cluster and had an 8 second query come back from the cluster in several minutes due to the join performance.
NDBCLUSTER 7.2 adds what was called "push down joins" [blogspot.com] in development -- basically the data nodes now do the joins within their own sub-sets of the data for certain joins resulting in a dramatic improvement in performance, since now the API nodes just get the result from the network instead of the entire dataset.
It really is an amazing improvement and is a result of the dynamic capabilities of MySQL. NDBCLUSTER was never designed for the type of data people are throwing at it, and with the recent improvements it might actually be viable for use on production web sites for more common usage scenarios.
What I do not see addressed yet with 7.2 is the reload time -- if you have a cluster loaded with several GB of data it can take upwards of 20 minutes to reload the data and indices across the cluster. While the cluster is designed to be up 24x7 (even through upgrades), a single bug or memory error that takes it down can result in a prolonged outage. There are several of these [mysql.com] open in the bug tracker.
Re:Considering sub queries in IN statements. (Score:4, Informative)
If you create a query in mysql with an IN statement in the where clause and you put a sub query as the in statement current versions will run the query once for each row of the primary table you are querying. Caching result alone would probably get the 70x speed up
We have exactly that in MariaDB 5.3 (a fork of MySQL) : http://kb.askmonty.org/en/subquery-cache. It won't always give the 70x speedup, sometimes you need to start execution from the subquery to get decent performance: http://kb.askmonty.org/en/semi-join-subquery-optimizations.
I am sorry for shameless self-promotion, couldn't resist :-)
Re:Considering sub queries in IN statements. (Score:4, Informative)
Re: (Score:2)
Just fyi, It's called a correlated subquery.
If the thing he's talking about had been correlated subquery, he wouldn't have been able to cache its results. Ergo, he must be talking about a lousy execution of a non-correlated subquery.
Re: (Score:3)
Professional DBA's know this, and know how to code correctly to avoid the problem in the first place. Having it "work" vs having it work reliably, quickly, and in a manner which scales is what separates any computer user from a professional. Training, ambition, curiosity, and experience are what transforms you into that professional.
So while it may be nice to have the machine try to outsmart yo
Re: (Score:2)
While I agree it's a professional's job to know this stuff, I would also like to point out it's not a crime to build your RDBMS in such a way that queries run as fast as possible regardless of how the query was written. Indeed, anything less is an inferior product.
Re: (Score:2)
Here's an actual example:
http://code.google.com/p/django-tagging/issues/detail?id=160 [google.com]
Basically, the subquery was run all the time, leading to an amplification of biblical proportions. The solution there was to put that subquery in another subquery (yo dawg), as shown in http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/ [xaprb.com]
We also tried with a temp table, but this solution was cleaner :) The performance difference was around 1/100th of the time for a sample query.
We also confirm
Re: (Score:3, Interesting)
The latest (past year's worth) of PostgreSQL releases automatically rewrite subqueries as joins for you.
However, MSSQL 2008 performance dropped significantly from prior versions due to the exact opposite --- subqueries are now looped through, where previously they were handled with hash tables internally.
Re: (Score:3)
SELECT * FROM STATEMENTS WHERE CLIENT IN (SELECT ID FROM CLIENTS WHERE STATUS=1);
SELECT * FROM STATEMENTS JOIN CLIENT ON STATEMENTS.CLIENT=CLIENTS.ID WHERE CLIENTS.STATUS=1;
The statements table has about 2.5 million rows and clients has about 30k rows. The match (status=1) results in roughly 10k rows.
Now the interesting
Re: (Score:2)
You are doing a SELECT *, so when you use the JOIN you are getting more columns back then when you are using IN. Rewrite to specify which columns you want back, and I suspect they will perform equally.
Re: (Score:2)
Re: (Score:3, Informative)
SELECT id, title FROM page WHERE id IN (SELECT pageid FROM hotnews WHERE user = 8)
The answer to this is to execute the sub query and pass the result back as an array. If you have 10,000 records in table page, you will see about a 2000x speed improvement if you are using php
Re: (Score:3)
Re: (Score:2)
Where do you see the client and the roundtrips? There would be only one (DB access once) and the the this method would be 100% transparent to the client.
Re: (Score:2)
Re: (Score:2)
Yes, he could have phrased it better, but php is processed server side, the client only gets the final result. So no more round trips except the ones between server and database (and in most cases, they are on the same machine or cluster, so it's negligible).
His method does speed up things if the query is executed with an IN and not with a JOIN... Even if it's not a best practice (lots of concurrency issues if you make things like that frequently).
Re: (Score:2)
"Yes, he could have phrased it better, but php is processed server side, the client only gets the final result."
Uhm, no. For the DB server, the PHP process is the client. As it is a good custom to have the two machines separate (I really don't know where you got the idea of running these two things on a single machine, that's all but insane), you do get round trips over wire. These will definitely cost you more processing time than keeping it inside the DB server process memory space the whole time - you have to convert the inner query results to the wire protocol, send it through the TCP/IP stack, have it arrive int
Re: (Score:2)
And, as stated, it still runs faster than the IN statement (not the JOIN, as everyone already said). The server is the client... Never heard that one before, I guess the OP is not the only one that needs to express itself better. From where I come from never heard the server called client before.
Second, for small businesses and the like, yes, the webserver and the DB server share resources. It's not only common, it's actually the default configuration option when you configure apache and php (so I'm guessin
Re: (Score:2)
And, as stated, it still runs faster than the IN statement (not the JOIN, as everyone already said).
If you have that as a practical experience, then there must be something horribly broken in your DB server software.
the time (sometimes minutes) that the server takes to process the IN, so you're still wrong.
Ahhh, that's the "horribly broken thing" I'm talking about. I'm not wrong, since there is no practical reason for that to take minutes, unless, of course, the result of the subquery (SELECT pageid FROM hotnews WHERE user = 8) is really huge. Do you really have authors who have written millions of stories?
Re: (Score:2, Informative)
SELECT id, title FROM page JOIN hotnews on page.id = hotnews.pageid AND user = 8
There. Enjoy your performance boost.
Re: (Score:2)
Those queries (Score:2)
are exactly equivalent. If your RDBMS is so crappy that it can't see that, then that's not the user's problem. SQL was supposed to be an abstraction, right?
Re: (Score:3)
Re: (Score:2, Insightful)
"some" people are perfectly aware of JOIN and have suffered one too many hairball, incomprehensible SQL statements with 10+ tables mashed together under FROM using JOIN. Sub-queries provide opportunities to factor out some of these into discrete parts that can be individually examined and tested.
Please, stop discouraging the use and application of sub-queries. If you have encountered people that insist on using them despite MySQL's tragically bad optimizer, it is likely they have been exposed to powerful
Re: (Score:2)
Which would be fine if it wasnt for the fact the version with the JOIN isn't just faster, but simpler to understand.
Seriously there is no excuse for bad SQL. Subqueries have their places, but using them like that is just bad programming.
Re: (Score:3)
Re: (Score:2)
select id, title
from page as a
where exists (
select 1
from hotnews b
where b.user = 8
and b.id = a.id
)
I would rearrange the fields in the "where" clause to match the index, highest selectivity first. If there were no suitable index I'd consider creating one, possibly a covering index.
Re: (Score:2)
The version of mysql in debian no longer silently truncates strings by default.
I found out as I was abusing the "feature" to not worry about sanitizing the length of data input and my import script suddenly stopped working last year and I had to do things a little less wrong.
.
Re: (Score:2)
MySQL choked an on "...and (condition or condition)" doing a seq scan instead of filtering even though there was an index on the table that those conditions were on. I changed it to an "where in (...)" and I got the same result. I had to move the or conditional into being a derived table joined in.
For the same type of query Informix does a dynamic hash join with no SQL thuggery necessary.
Finally! (Score:3)
Re:Finally! (Score:4, Interesting)
No, rather it's amazing what happens when you architect the DB such that the entire thing is forcibly held in memory.
Re: (Score:2)
MySQL Cluster (Score:2)
Re: (Score:2)
Using memory for db cluster nodes isn't new. Teradata has been doing it for a long time now in their data warehousing. It has its advantages on large databases and you have the money to afford the nodes you need for reliability.
Re: (Score:2)
... Unless you're running WAMP or something.
I hate people like you. [urbandictionary.com]
dict wamp
From V.E.R.A. -- Virtual Entity of Relevant Acronyms (June 2006) [vera]:
WAMP
Windows, Apache, MySQL, PHP (Windows, Apache, PHP, DB, SQL)
Damnit, define your !@#$ing terms, and what a stupid acronym that is! The OS should be irrelevant.
I really, really hate this century. :-P
[Okay, I feel better now. Apologies for the outburst, but please do feel free to suck a rock and die.]
MySQL Cluster != MySQL (Score:5, Informative)
Not the same thing.
Re: (Score:2)
MySQL Cluster used to be a pay-only product. Is it free now?
(The setup with one MySQL master and N read-only slaves has been free for years. That powers Wikipedia, for example. It's not MySQL Cluster. A MySQL Cluster has multiple read-write machines.)
But of course it reads from RAM (Score:4, Interesting)
Re: (Score:3)
Re: (Score:2)
How good are the mechanisms available for minimizing(ideally automatically, but at least without too much black magic on the application programmers' parts) the frequency with which your setup ends up needing something that isn't in RAM and being bottlenecked by having to fetch it from disk?
Are we talking about a web back end db, or just a generic db? I ask as the page I'm replying to shows (according to AdBlockPlus) 47 separate entities running in the background. With all that !@#$ going on, what does DB fetch performance even mean?
I think network performance is far and away the most determinant factor in this stuff. Optimize your queries to hell and back, but you'll still have to wait for the network. Zzzzzz ...
"The DB's too slow!"
Not necessarily. Why don't people understand Performance
Re: (Score:3)
Also, who the hell uses ssd's when performance is an issue?
The fast kids use PCI Express cards loaded with flash.
Texas Memory systems or Fusion IO. Expensive, yes, but this is Mysql CLUSTER we are talking about. You don't use that unless you need the speed.
Re: (Score:2)
Actually, a ton of research was done on these types of architectures by Jim Gray (who tragically was lost at sea a few years ago) at Microsoft. Not sure how many of these li
Re: (Score:2)
That would make it a RAED array.
That's great... (Score:2)
Re: (Score:2)
If you can't be bothered, you can ignore a lot of the finer details in Oracle just as much as you would be prone to with any other RDBMS.
Re: (Score:2)
Re: (Score:2)
Now can they please work on some dramatic usability improvements so i don't have to cringe every time an Oracle support question comes up at work?
You may wish to question whether you're in the wrong line of work. RDBs aren't really all that complex. Stuff it in, then drag it out and present it in a pretty display format. What else is there to know?
People are way too hung up on complexity. Life doesn't have to be that hard. You're making it harder than it has to be. Simplify.
Re: (Score:2)
_Oracle_ is making it harder than it has to be. If i had the choice i'd simplify just by dropping support
Re: (Score:2)
It's not RDBs that are the problem, it's Oracle specifically.
Ah. On that, we agree. Larry's one sharp snakeoil salesman.
Yes, I RTFA (sue me) (Score:4, Informative)
If I read the sales pitch correctly, they just integrated Memcached as a backend storage module, so that it plays nicely wrt ACID compliance. Yeah, memory is 70x faster than disk I/O... big whoop!
Anyone running a sizeable MySQL installation already has heaps of RAM allocated to the InnoDB buffers/caches anyway. It sounds like Oracle compared a stock, distro-default MySQL to their memory-hungry tweaks. Yeah, DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.
Re: (Score:3)
Previously each data node exploited about 6 CPUs with 8 threads, in this benchmark we used about 14 CPUs with 24 threads. We've also removed a number of bottlenecks and the configuration of the threads is done in such a manner as to handle many different loads well.
The little thing you read about memcached was about adding in a Native API which probably helped things, but getting the multithreading upgraded probably had more to do with it. They also were comparing two benchmarks that both were using just memory, and not a case where 1 used disk and the other used ram.
Re: (Score:2)
In this case, the actual boost is from AQL, which allows for a query to be split up by JOINs and sent out to different servers to run
Re: (Score:2)
Re: (Score:2)
DUH. I can get a 70x speedup too if I increase MySQL's memory usage from the default 64mb to 48 gigabytes.
Just make sure you actually test it. Some of the performance recommendations in the manual don't scale from 64 MB to 48 GB. MyISAM Key Buffer and Query Cache, I'm looking at you.
Re: (Score:2)
I, er... I think I shorthand a lot of this stuff, because it seems trivial to me as a programmer. Anything that works like a hashtable is going to experience diminishing returns, because you have to scan through that every-growing table. O(n^2) and stuff.
On the other hand, anything that spares you from hitting a disk, that's a guaranteed win. What I commonly do with my DB servers, since my data sets are small but complex, is to give them enough memory to cache the entire database into RAM. The disk effe
Comment removed (Score:3)
MySQL in-memory speed (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
it usually takes SQL Server two or three seconds to respond
then you're incompetent. stackoverflow uses SQL Server 2008, and serves 12-14 million pages a day. [brentozar.com]
Re: (Score:2)
that's 162 *pages* per second, not SQL queries. Thing is, how many of those are cached and simply re-served up again? How big are the servers that run these? How many servers do they use? How much Lucene do they use instead of SqlServer??? (you didn't watch the video did you? - 23 minutes in)
Anyway, your argument is like saying Facebook takes 2 seconds to return my wall of data whereas MyPersonalWebsite with a PHP script I wrote returns instantly, therefore Facebook are useless losers.
There might well be a
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I've programmed PHP to use SQL Server and MySQL, the MySQL statements typically run in milliseconds, it usually takes SQL Server two or three seconds to respond.
Can you give an example?
We're evaluating Postgres vs MS SQL at work, and at the moment haven't found any problem big enough to rule either out.
Re: (Score:2)
problems other than cost you mean.
Re: (Score:2)
problems other than cost you mean.
We already have an MS SQL database for the HR, finance (etc) systems, so using that might be as cheap (or cheaper) than a new Postgres database, given existing staff (DBA) knowledge and some odd way they fudge the numbers. (Recurring costs for MS software are apparently negotiated and paid for centrally, rather than charged to the department using the software. Daft.)
Essentially, arguing on grounds of inital/ongoing cost is difficult. My best cost argument is the no-cost license for Postgres means we coul
Re: (Score:2)
Just because your script didn't run well, doesn't mean that SQL Server was slower. Probably the PHP database modules had a lot to do with it - I guess there is much more attention given to MySQL integration than with SQL Server, or some misconfiguration (always 2-3 seconds? Maybe is a DNS issue, were you using hostnames
The 70x is for (distributed) joins (Score:5, Informative)
which previously were quite poorly handled.
See http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/?utm_source=rss&utm_medium=rss&utm_campaign=70x-faster-joins-with-aql-in-mysql-cluster-7-2
70x (Score:5, Funny)
Marketing: How much in the best conceivable case?
Developers: Oh, I dunno, maybe 70x.
Marketing: 70x? Is that good?
Developers: Yeah, I suppose, but the cool stuff is...
Marketing: Wow! 70x! That's a really big number!
Developers: Actually, please don't quote me on that. They'll make fun of me on Slashdot if you do. Promise me.
Marketing: We promise.
Developers: Thanks. Now, let me show you where the good stuff is...
Marketing (on phone): Larry? It's me. How big can you print me up a poster that says "70x"?
Re: (Score:2)
Re: (Score:2)
ftfy
Re: (Score:2)
TPC-H (Score:2)
Re:PostgreSQL (Score:5, Funny)
Is PostgreSQL webscale? MongoDB is.
Re:PostgreSQL (Score:5, Funny)
Any questions? [youtube.com]
Re: (Score:3)
The first 27 seconds made a good point about comparing various products and evaluating them on their merits. The remaining 5 minutes was a mix of strawmen and fallacies. Is there supposed to be a point to that?
Re: (Score:3)
http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale [xtranormal.com]
Re: (Score:2)
Yes: It's strengths are reliability and price point (free), but it's pretty fast, has clustering capabilities, and has been used for large-scale web applications.
Seriously, it's hard to go wrong with PostgreSql when you need a relational database. MongoDB, of course, is a very different animal intended for very different tasks.
Re: (Score:3)
Versus all the other factors you can throw in there for anything involving heavy lifting for an enterprise app, raw price point of the DB engine is pretty close to the bottom of the list.
Re: (Score:2)
Are you kidding? In an Oracle discussion?
We're talking very large numbers here, and potentially more money than you will make in your entire lifetime.
Some one definitely cares. They may bite their tongue and still buy Oracle but they do care. The numbers are not trivial.
Re: (Score:2)
Kubuntu did a nasty on me with a surprise, non-optional upgrade to kmail 4.7.2, which is just an unfinished work, to put it mildly. As a confirm glutton for punishment, I decided to make it work no matter what the coast. Anyway, it came with a MySQL backend that was sort-of working, but one day it just fell down and couldn't get up. So I replaced it with the experimental, unsupported PostgresQL backend, and after a little bit of pain learning how to administrate the server, it came up and so far has worked
Re: (Score:2)
At least when PostgreSQL starts working it stays working but I agree the same can't be said for MySQL. A recent upgrade changed the layout of mysql.user causing none of the users to be able to login. The oracle approved fix? mysqdump and restore but that just restored the bad layout of mysql.user. The actual fix involved dumping everything one database at a time and recreating mysql.user from a script. My best guess is that whoever designed the MySQL login system ignored SQL best practices and did the
Re: (Score:2)
If you start to manage lots of data ( and I think some people enter into this category for their email ), it make sense to use a DB.
It makes sense on paper. It has yet to be proved that it makes sense for email, and in fact, so far kmail 7.2+ is pretty good evidence for the contrary. I am afraid you will have a difficult time finding anyone who has stayed with Kmail through its recent convulsions and has in fact forgotten about the performance issue. Just do a quick search on "kmail 7.3 slow" to convince yourself.
Re: (Score:2)
Re: (Score:2)
k. This is slightly laughable. 33,500 rows? in 87 seconds? that seems glacial. And 1.23 seconds being the new speed? that seems as expected.
aside from comparing the speed of a Lada vs a common garden slug, how does this compare against other databases?
It was actually only 2000 rows returned -- the source tables had 35K rows. But in fairness, this was for a join across 11 tables, and it's easy for a database to perform slowly at this task if it doesn't have a good query optimizer. A badly optimized query could end up doing thousands (or even millions) of full table scans of multi-thousand row tables.
Re: (Score:2)
The only heavy MySQL application I use is Magento (and it relies heavily on JOINs), and a small server can blow these results out of the water, on similar-sized tables, and with multiple concurrent queries (but only using 3 to 5 JOINs per query).
Re: (Score:2)
The thing is that prior to this release, applications that were using SQL joins (involving more than 2 tables) where ruled OUT:
Now a lot of more applications *can* use MySQL Cluster.
Yeah, that's a less flattering remark than a 70x increase in speed, but probably more useful :D. I'm not familiarized with MySQL Cluster, and I'm surprised it is_that_ different from the server version.
Re: (Score:2)
Is there a Virtualmin module for MariaDB yet?
If so, I'll switch tonight.
Re: (Score:2)
Tried Drizzle, yet ?