Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Databases Software IT News

MemSQL Makers Say They've Created the Fastest Database On the Planet 377

mikejuk writes "Two former Facebook developers have created a new database that they say is the world's fastest and it is MySQL compatible. According to Eric Frenkiel and Nikita Shamgunov, MemSQL, the database they have developed over the past year, is thirty times faster than conventional disk-based databases. MemSQL has put together a video showing MySQL versus MemSQL carrying out a sequence of queries, in which MySQL performs at around 3,500 queries per second, while MemSQL achieves around 80,000 queries per second. The documentation says that MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory, and that using a combination of write-ahead logging and snapshotting ensures your data is secure. There is a free version but so far how much a full version will cost isn't given." (See also this article at SlashBI.)
This discussion has been archived. No new comments can be posted.

MemSQL Makers Say They've Created the Fastest Database On the Planet

Comments Filter:
  • Ya Don't Say! (Score:5, Insightful)

    by Rary ( 566291 ) on Sunday June 24, 2012 @07:28PM (#40433271)

    Really? Accessing RAM is faster than accessing a disk? What a novel discovery!

    It seems to me that MySQL can also be run in memory. Apparently that's how the clustered database works (or used to work). I've never tried it, but let's see some benchmarks between MemSQL and an entirely memory-based MySQL.

  • by tomhath ( 637240 ) on Sunday June 24, 2012 @07:44PM (#40433427)
    Price/performance is a better question. If it's fast enough that you don't need the Raid 10 SSD then it could be a good choice. Throw hardware at any DBMS and you'll get good throughput.
  • Meh. (Score:5, Insightful)

    by hey! ( 33014 ) on Sunday June 24, 2012 @07:45PM (#40433435) Homepage Journal

    Give me fast enough, robust, easy to administer and standards compliant. Maybe a little less fast means you throw more hardware at a problem, but it doesn't matter if overall the overall cost and risk is inflated. A platform decision boils down to three things: (1) is it good enough; (2) is it economical; (3) if we decide later this doesn't work for us, are we totally screwed.

    In any case, there's no meaningful way you can make a claim that a database management system is the fastest on the planet. All you have is benchmarks, and different benchmarks apply to different use-cases.

  • Re:okay...? (Score:5, Insightful)

    by Kergan ( 780543 ) on Sunday June 24, 2012 @07:51PM (#40433489)

    MySQL is the last thing I think of, personally. It sucks as soon as you make it ACID compliant and start hitting it with thousands of concurrent requests. You're much better off with PostgreSQL.

  • by realityimpaired ( 1668397 ) on Sunday June 24, 2012 @08:10PM (#40433611)

    As a long time SysAd/webmaster/developer, I'm certainly interested

    At the risk of sounding incredibly condescending....

    If you were really a sysadmin who could benefit from that kind of speed improvement, you'd know that it's possible to achieve that level of performance with MySQL already, by either running it from memory or by using a fast hard drive array. The simplest/cheapest option to drastically improve MySQL performance is to throw a large amount of RAM at a system and point MySQL at the memory. MySQL can be configured to keep the database in active memory and sync to the disk on a regular basis, which is almost exactly the kind of behaviour described for MemSQL... for an exceptionally large database that can't be stored in system memory, I imagine that the advantage that MemSQL is boasting would evapourate. There are other ways to go about doing it, such as running a fast disk array or a cluster, in order to get around the limitations of using RAM, but ultimately the prime determining factor for speed in MySQL is speed of access to the database file itself.

  • Re:okay...? (Score:4, Insightful)

    by evilviper ( 135110 ) on Sunday June 24, 2012 @08:15PM (#40433643) Journal

    When I think of fast databases to compare to, the first thing I think of is MySQL.

    MySQL is actually very fast under light loads / one-off queries, and if you choose to leave it at the non-ACID compliant default settings, and similar. eg. "innodb_flush_log_at_trx_commit"

    That's probably the only reason why it got popular... There weren't any open source NoSQL DBs at the time, and MySQL seems fast when tested with a basic, shallow benchmark. Of course others like PostgreSQL completely leave it in the dust once there's some real load, or complex queries, or you WANT to be absolutely sure transactions were committed to disk before returning.

    As a single point of evidence, I give you Zabbix... It supports the use of all the major databases (Postgresql, DB2, Oracle, SQLite, etc.) as backends, yet MySQL is recommended as it performs the fastest.
    http://www.zabbix.com/documentation/1.8/manual/performance_tuning [zabbix.com]

    /Actually, I'd rather see a comparison to Pick or other lightning fast MV dbs

    Level-2 overflow! Resize analysis! Change the modulo! Ahhhh!

    I've done the PICK-OS thing for a few years, and I'm not a big fan. I'm infinitely happier administering PostgreSQL DBs.

    Besides, you don't have to go to something as exotic as PICK to get away from SQL. Try ages-old Berkley DB (db4), or any of the newer NoSQL options.

  • Re:Ya Don't Say! (Score:4, Insightful)

    by errandum ( 2014454 ) on Sunday June 24, 2012 @08:41PM (#40433813)

    That and memcached (I think that's the name).

    This comparison is far from fair... Is it ACID? Or eventually synchs up? How does it compare with other memory based DB's?

    Comparing it with a slow relational DB will not give you any kind of credibility.

  • by Anonymous Coward on Sunday June 24, 2012 @09:43PM (#40434211)


    If you were really a sysadmin who could benefit from that kind of speed improvement, you'd know that it's possible to achieve that level of performance with MySQL already, by either running it from memory or by using a fast hard drive array.

    The guys that wrote it are former Facebook employees. So I have to assume they know how to get the best performance out of MySQL, and that itdoesn't suit their needs for whatever reason.

    The article doesn't really go into much detail about why, but my point is really about not jumping to conclusions and admonishing someone because you think you know more than they do. Maybe this whole product is useless, and maybe it's brilliant and useful, but you can't determine that soley from this article.

  • Re:Ahhhh, Pick! (Score:5, Insightful)

    by Zenin ( 266666 ) on Sunday June 24, 2012 @11:05PM (#40434713) Homepage

    I still think that the 2 missing courses from any CS degree program are 1) how to debug, and 2) history of computing.

    Practical software engineering is mostly about debugging. An actual course in debugging would imply that Computer Science curriculum had something to do with practical software engineering, which we're all painfully away it hasn't in the slightest.

  • Re:Speed vs. speed (Score:4, Insightful)

    by hawguy ( 1600213 ) on Monday June 25, 2012 @12:17AM (#40435155)

    I can buy servers with over a Terabyte of ram, mutiple power supplies and 4 x 10G interfaces for FCOE.
    What is a disk again other than to boot from.

    The disk is something to hold your data when a backhoe cuts your datacenter power, and cuts the network connections that you use to replicate data to your remote site.... then your UPS runs out of battery after an hour of transactions have been applied to the database with no replication to the remote site.

    Sometimes sh*t happens in ways you haven't planned for... when you have N degrees of redundancy, you'll get bit by the rare N+1 event. It's better to have your data stored somewhere that doesn't disappear after the power goes away (or the machine reboots).

    (if you're using your FCoE network to connect to the SAN to store your data, you're still using disks but there's no reason to use a local disk to boot from)

  • Re:Top coder (Score:5, Insightful)

    by mwvdlee ( 775178 ) on Monday June 25, 2012 @01:25AM (#40435517) Homepage

    Juggling 20 factors in your brain (short term memory) is not the same as having a good memory (long term memory).
    In fact they literally use different parts of the brain.

  • Re:Ya Don't Say! (Score:5, Insightful)

    by mwvdlee ( 775178 ) on Monday June 25, 2012 @03:00AM (#40435955) Homepage

    TFS states that transactions are writen to disk after being "acknowledged" in memory.
    I assume that means transactions are written to disk only after the database reports back a succesful commit.
    So failing to meet the D of ACID compliancy.

  • Re:Top coder (Score:5, Insightful)

    by zig007 ( 1097227 ) on Monday June 25, 2012 @05:15AM (#40436441)

    Except that so very little of programming these days is about algorithms.
    Rather, it is about elegantly solving businesses problems and to know one's way around huge frameworks.
    Being a "top coder" is in it self a very good thing of course, but there are very few companies that actually work with technical details like implementing a better hash algorithm and so forth.

    Rather, in most developers jobs, it is very valuable to;
    * be good at being able to understand, handle and especially change large systems.
    * be good at producing solutions that at a reasonable rate balances cost and customer demands versus simplicity, performance, structure and other technical values.
    * being able to foresee the usages of the solutions in different time frames, and through this make systems cheaper and easier to evolve. Sometime, a super quick and butt-ugly solution is a really good thing to get the customer going while it figures out what it really wants. As long as all parties are aware of the situation and knows that a complete rewrite will have to be paid for next.
    * not act like a stubborn child when ones pet solution or technology gets scrapped or unaccepted or that the rest of the company think it is risky to invest time in going down that road. But to just keep pushing.
    * to be professional and keep on working even though the current thing is really boring.

  • Re:Ya Don't Say! (Score:5, Insightful)

    by xelah ( 176252 ) on Monday June 25, 2012 @05:48AM (#40436571)

    I don't think that's something which can be changed, except by changing the hardware. The starting point is this: When a COMMIT is made all changes have to be written to the write-ahead-log before a success response can be returned to the client. The WAL is written sequentially, and so if you're using ordinary disks and are sensible you give it its own set of spindles (RAID1, say). That means that between each write you have to wait for one disk rotation - you append to the log, you process the next transaction, then you have to wait for the disk to rotate to just after where you finished writing before you can write the next one. So you can do 1/15k transactions per minute with this basic setup.

    You can do things to make this faster. You can write several transactions at once, and you can put slight delays in to transaction commits to wait for others to bundle them with (PostgreSQL I believe will do the first and can be configured to do the second). You can use battery backed caches in your RAID system, which will have much the same effect (and leave you limited by disk bandwidth and cache size). You can use SSDs that don't need to seek.

    I can't see anything in TFA that MemSQL is supposed be doing differently here, or anything it CAN do differently. From TFA: 'The key ideas are that SQL code is translated into C++, so avoiding the need to use a slow SQL interpreter, and that the data is kept in memory, with disk read/writes taking place in the background.'. The first I'm not too sure I understand (presumably they're not turning it in to C++ and then passing it through a C++ compiler....) but maybe we can blame the journalist for that. Or maybe they've just reinvented prepared statements. The second is what databases do anyway - except, of course, for the WAL and when you're reading data which isn't in memory. Perhaps what they're doing is flushing the WAL after the commit has returned to the client - which makes the database very much not ACID, and is also something that other databases can be configured to do if you don't care about your data.

    Potentially what they could do, though, is to have designed all of their data structures, algorithms, locking and so on around the assumption that everything is in memory. There are big differences in the best query plan to use when data is in memory vs on disk, and traditional databases don't necessarily make the right choices. They try, but may for instance use table scans for queries which return a large proportion of the rows in a table because sequential IO is faster, when they should be using indexes if the data is in memory. And BTrees and the way data everywhere is split in to pages is something traditional DBs do because that works well even when most of your data is on disk. So maybe that's what they've done differently that other DBs haven't already been doing.

  • Re:Ya Don't Say! (Score:4, Insightful)

    by drsmithy ( 35869 ) <drsmithy@nOSPAm.gmail.com> on Monday June 25, 2012 @05:50AM (#40436575)

    Wake me up when that is 800GB to a few TB. Then you can say. This might shock you to learn but some business uses their databases to drive more than just web forums.

    You can already put a TB of RAM into a server if you want. If you really need to have that amount of data with next to zero latency, then the cost (which is still relatively low) is unlikely to be much of a stumbling block.

    It clearly will shock you to learn that most databases are well under a couple of hundred GB in size.

"I've seen it. It's rubbish." -- Marvin the Paranoid Android

Working...