MemSQL Makers Say They've Created the Fastest Database On the Planet 377
mikejuk writes "Two former Facebook developers have created a new database that they say is the world's fastest and it is MySQL compatible. According to Eric Frenkiel and Nikita Shamgunov, MemSQL, the database they have developed over the past year, is thirty times faster than conventional disk-based databases. MemSQL has put together a video showing MySQL versus MemSQL carrying out a sequence of queries, in which MySQL performs at around 3,500 queries per second, while MemSQL achieves around 80,000 queries per second. The documentation says that MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory, and that using a combination of write-ahead logging and snapshotting ensures your data is secure. There is a free version but so far how much a full version will cost isn't given." (See also this article at SlashBI.)
A nice approach perhaps... (Score:5, Interesting)
Show me vs a real DB engine (Score:5, Interesting)
Show me benchmarks vs Oracle, PostgreSQL or SQLServer. Spare me the comparison with MySQL or some other toy.
Err... what? (Score:4, Interesting)
Ok, so both article and video is extremely thin on details, the explanation for the massive performance is pretty much gibberish and their argumentation for ACID compliance is bullshit.
Just leaves me with the question, what are they trying to get out of this BS?
Ahhhh, Pick! (Score:5, Interesting)
The most over-the-top DB God I know started in Pick-land (ca 1972?). Although he does (is forced to?) use SQL nowadays, he thinks in ways that do not come out of any SQL DBA handbook. As a result he gets DBMSs to do things that are ... unnatural.
He is currently doing some data-cubing stuff for us that I didn't think could be done with something less than a DOD budget. He says his touchstone is thinking in Pick and then 'translating' to SQL.
I still think that the 2 missing courses from any CS degree program are 1) how to debug, and 2) history of computing.
Speed vs. speed (Score:5, Interesting)
Speed's fine, but what kind? Or more specifically, over what timeframe? High transaction rates are fine, but they don't do any good if you can only sustain them for a few seconds or minutes before the whole thing collapses. I want to know the transaction rate the thing can sustain over 24 hours of continuous operation. In the real world you have to be able to keep processing transactions continuously.
That long-time-period test also shows up another potential problem area: disk bottleneck. In-memory's fine, but few serious databases are small enough to fit completely in memory. And even if it will fit, you can't lose your database when you shut down to upgrade the software so eventually the data has to be written to disk. And that becomes a bottleneck. If your system can't flush to disk at least as rapidly as you're handling transactions, your disk writes start to lag behind. Sooner or later that'll cause a collapse as the buffers needed to hold data waiting to be written to disk compete for memory with the actual data. You can play algorithmic games to minimize the competition, but sooner or later you run up against the hard wall of disk throughput. And the higher your transactions rates are, the harder you're going to hit that wall.
Top coder (Score:5, Interesting)
They did have an ad to lure in "Top Coders" at http://developers.memsql.com/blog/ [memsql.com]
Apart from their ad, what they said about Top Coders was interesting - with the exception of top coders memorizing who books filled with algorithms, because top coders do not memorize nothing - top coders do not get to be top coders by memorizing.
Instead, top coders have that instinct to _know_ which algorithm to adapt and apply, and top coders know where (and how to) look for the algorithm (either from their own archive, from books, from old magazines, or from some strange corners on the Web)
Re:Ya Don't Say! (Score:4, Interesting)
Not just that - you can get a FusionIO ramdisk device for really big databases and get performance that's somewhere between SSD and memory. Those are all battery backed and such, so no monkeying around with whether the ACID was done right or not.
TimesTen Database (Score:2, Interesting)
So what is the difference between MemSQL and TimesTen [wikipedia.org]?
Other than the 16 years TimesTen has been out longer, the fact that Oracle now owns TimesTen, that it runs on both 32bit and 64bit Linux and Windows, that it can run in front of another database engine to give it a boost, and that it has customer installations up to the Terabyte range.
Just another lame attempt to reinvent the wheel.
Re:Ya Don't Say! (Score:5, Interesting)
It's a bit more complex. There's four main ways to do MySQL storage in RAM (which I know of because my current work project is a MySQL application).
First, the NDB Cluster system is there, which is what you've mentioned. That's basically just a MySQL frontend to a distributed, memory-based NoSQL database, though. Convenient, but not truly "MySQL".
The second is using the "Memory" storage engine, where it actually stores a normal MyISAM table in memory. However, this is a surprisingly crappy option, because it uses table-level locks for writing, so parallel write performance is only marginally faster than disk.
The third is to store regular InnoDB tables on a ramdisk. This can be crazy fast, but it also means that if your server crashes or loses power, you're *fucked*
The fourth is to use Memcached, which isn't really a MySQL thing at all. You're basically just caching data in a memory-only NoSQL database, at the application level. This is actually what we ended up doing, because all the others are pretty crappy options - Cluster is the best one, but the hardware requirements are higher than we could justify spending given our performance requirements. Shoving memcached onto the web server (which has RAM to spare) and setting certain queries to cache their results there sped things up significantly, at minimal cost.
As far as I can tell from the summary (I refuse to read the articles for such a blantant slashvertisement), this "MemSQL" doesn't do anything you can't do by configuring MySQL properly, although they likely optimized some rarely-used modules to make them faster.
Re:Top coder (Score:5, Interesting)
All of the best developers I've met had phenomenal memories. I think both a natural reasoning ability and great memory are assets. If you are missing one, you aren't going to be as strong as someone who has both.
Re:Top coder (Score:5, Interesting)
I have met quite few people that could fake being good coders using really good memory. They were in fact at best mediocre coders and sometimes really bad ones. While these people can code solutions to simpler things really fast, they usually do not notice when they are out of their depth and would need to look up things or think about them for a while. Then they screw up royally. That most people mistake them for really good coders (and no, memory does not help reasoning ability, it hinders it) makes things worse. One of the hallmarks of a great coder is a very keen sense for when he/she needs to be careful because something is more difficult than it appears to be. Those with really good memories regularly fail that test. Bad memory is an asset here.
Re:Ya Don't Say! (Score:5, Interesting)
The biggest issue with RAM drives are their cost.
Yes and no. If you can fit the Innodb writeahead-logs and a few of the worst bottleneck tables on, say, an 8 GB ram drive, it's a bargain.
HyperDrive: $300
2 * 4GB 240-Pin DDR2-800 SDRAM ECC: $234
16 GB CF card for backup: $30
Total: $564
That's downright cheap compared to what a RAID 10 or 50 of SSDs or short-stroked 10k/15k rpm drives would cost.
If it solves a bottleneck, it could be a big money saver.
Re:Speed vs. speed (Score:5, Interesting)
A terabyte of RAM costs quite a lot of money, far more than a terabyte of hard drive does. And it's not as big as it sounds, I've dealt with databases bigger. Usually the ones that demand the highest performance are also the ones that eat the most space once you start taking indexes and such into account.
And multiple power supplies? Won't help you when the data center rack loses all power. I recall at least 2, maybe more, reports of total loss at data centers in the last 12 months, so it's not like it's that rare an event. That's not counting partial losses, or cases where someone simply fumble-fingered and powered down or rebooted the wrong server. And it certainly doesn't count maintenance outages when the server or the database software had to be restarted to upgrade software. Redundant power supplies won't help against that, and while it's no big deal normally it's a really big deal when it means losing 100% of the contents of the database when memory gets cleared. Sooner or later you need the data on persistent storage, disk or an equivalent. You can handwave that need over the short term, minutes to maybe hours, but when you start talking about maintaining the database for months to years it's a different story. And if you want to say you don't need that kind of up-time, well, the business people where I work would probably boot you out the door so hard you'd bounce twice for suggesting they could just live with losing all our data a couple of times a year. Having it happen even once would probably be the end of the company.
Re:Filesystem anyone? (Score:3, Interesting)
I work on a system like that right now in a really big company. Let me tell you something- it's shit. If you need concurrent access to the files/directories by several processes, you'll have a heap of issues. Consumers pick up files before they are completely written by the producers (now fixed by file renaming, but required work). Sime directories now hold 300k files, and any file operations are extremely slow- filesystems aren't designed for this (in process of being fixed by splitting directories squid style into many subdirectories). On top of that, because we need to access the repository from multiple machines and we want reliability, it's now on NAS. This means any file open/close operation takes ~6 ms, at least 2 ping trips to the NAS server and back. This means 166 IO operations per second and no more. You want notification when a file arrives- you get to use polling. FS file modification notifications don't work on NAS. It is terrible. I wasn't there when this thing was designed, but I'd like to find the architector who thought this is a good idea and punch him in the face.
Uh, Oracle? (Score:2, Interesting)
Seriously, why do people, and then I mean slashdot nerds, think 'fast database' and then think 'mysql' ? 'MySQL-compatible' equals 'bad' in my world and, in comparison with Oracle, 'not so fast at all'.