Forgot your password?
typodupeerror
Databases Facebook Data Storage

Facebook Trapped In MySQL a 'Fate Worse Than Death' 509

Posted by timothy
from the cake-or-death-or-trapped-in-mysql dept.
wasimkadak writes with this excerpt from GigaOM: "According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to 'a fate worse than death,' and the only way out is 'bite the bullet and rewrite everything.' Not that it's necessarily Facebook's fault, though. Stonebraker says the social network's predicament is all too common among web startups that start small and grow to epic proportions."
This discussion has been archived. No new comments can be posted.

Facebook Trapped In MySQL a 'Fate Worse Than Death'

Comments Filter:
  • Commercial databases (Score:3, Interesting)

    by drolli (522659) on Saturday July 09, 2011 @09:30AM (#36704210) Journal

    Well. then they convert from one db to another. So what. its not like that would be a completely new thing to happen, and i am sure that oracle or any other big db provider will send experts to help with the task.

    • Delegate scalability downwards. Throw hardware at the problem.

      • by Z00L00K (682162)

        It may not be a hardware problem, it may be a problem that actually has more to do with the fact that Oracle owns MySQL.

        • It may not be a hardware problem, it may be a problem that actually has more to do with the fact that Oracle owns MySQL.

          It's not unreasonable to suppose Oracle might "nudge" Facebook into the deeper end of Oracle's trough of slimy swill. But who to root for? This is a bit of a conundrum. Seeing Facebook's delicate bits getting squeezed is not an unattractive proposition, but seeing Oracle benefit therefrom would be appalling.

          • Maybe this guy's problem is that Facebook HAS created such a large and successful business without paying Oracle millions of dollars or his company millions of dollars.

            Kinda of sounds like that commercial for Scott trade where the Fat Cat broker is trying to keep his clients so he gets his fat commissions.

            • Re:Maybe... (Score:4, Insightful)

              by shutdown -p now (807394) on Saturday July 09, 2011 @04:11PM (#36707298) Journal

              Um, RTFA? It's not a pitch for Oracle. In fact, it's a rant against SQL in general. Quote:

              In Stonebraker’s opinion, “old SQL (as he calls it) is good for nothing” and needs to be “sent to the home for retired software.” After all, he explained, SQL was created decades ago before the web, mobile devices and sensors forever changed how and how often databases are accessed.

              Sounds like the usual NoSQL FUD, right? But wait, there's more here:

              Stonebraker thinks sacrificing ACID is a “terrible idea,” and, he noted, NoSQL databases end up only being marginally faster because they require writing certain consistency and other functions into the application’s business logic.

              Right... so what then? More magic buzzwords to the rescue!

              But Stonebraker — an entrepreneur as much as a computer scientist — has an answer for the shortcoming of both “old SQL” and NoSQL. It’s called NewSQL or scalable SQL ... Pushed by companies such as Xeround, Clustrix, NimbusDB, GenieDB and Stonebraker’s own VoltDB, NewSQL products maintain ACID properties while eliminating most of the other functions that slow legacy SQL performance. VoltDB, an online-transaction processing (OLTP) database, utilizes a number of methods to improve speed, including by running entirely in-memory instead of on disk.

              Now the article is pretty light on details regarding what is that "new" SQL, and Googling around doesn't really help. So far, to be honest, it sounds more like a bunch of DB makers have ganged together and came up with a nifty word to market their products against Oracle, DB2, MSSQL, Postgres etc - if it's "new" it must be good, right?

          • Re: (Score:3, Funny)

            by davester666 (731373)

            Maybe Facebook should just put all our data in the cloud...it's not like security or privacy is a big concern for Facebook...

    • Re: (Score:2, Interesting)

      by cgeys (2240696)
      But like the summary and article note, that requires rewriting the whole codebase. They should had gone with Oracle database to begin with, but of course no one ever thinks about the expanding possibilities when they're starting out and just want something free, ie. MySQL.
      • by Relayman (1068986)
        Why would that require rewriting the whole codebase? Isn't SQL standard?
        • by svick (1158077)

          SQL is a standard, but every provider implements it differently, with their own additions. So, for any non-trivial uses of SQL, you need to do at least some changes.

          In some cases, the changes could be really big. Especially when using some of the more complex features, like the support for recursive queries.

        • I would guess that instead of using PDO or similar abstraction layer, their PHP code is littered with "mysql_*" function calls, so they'll necessarily need to modify everything to handle any other database.

          Or just wait for enough people to move to Google+ instead so that their database load is reduced...

        • by kimvette (919543)

          SQL is a standard, but no, "SQL" isn't standard. There are syntax differences between databases, and if you get into stored procedures (or equivalent) and triggers (or equivalent), or rely on referential integrity (which is implemented on some RDBMS systems, but not others, and doesn't always work the same), it won't be a matter of dumping the database from one RDBMS and then importing it to another RDBMS. Things are going to break.

          I'd hate to have to deal with a(-) Facebook dump file(s); I'm sure everythi

          • by kimvette (919543)

            And, to add to that, Facebook is insane if they didn't implement what is commonly called an "access layer" for abstraction, so that the system can be rapidly ported from one RDBMS to another. However, even if they did implement that in their architecture, some issues come up: is it implemented throughout the project, or did some developers bypass it for performance, and is it intermingled with presentation code? Can they re-implement the access layer without performance suffering? Does the new RDBMS provide

            • by sphealey (2855)

              > And, to add to that, Facebook is insane if they didn't implement
              > what is commonly called an "access layer" for abstraction, so
              > that the system can be rapidly ported from one RDBMS to another.

              "Access layer" aka "database independence". Otherwise known as "absolute death to any hope of performance and scalability". The reason one pays large sums of money for an Oracle, DB2, or even SQL Server implementation and programming is _exactly_ to get the performance and scalability potential and improv

          • by ppanon (16583)
            It's not much of a RDBMS if it doesn't support referential integrity.
        • Re: (Score:2, Insightful)

          by GooberToo (74388)

          Yes and no. There is ANSI SQL. PostgreSQL is probably one of the more compliant databases and is by far one of the more portable solutions. But even that is iffy.

          MySQL is on the other end. MySQL is well known for being non-compliant, teaching very poor SQL code, offering minimal SQL compatibility and lowest common denominator features to achieve the same goal. That's also why, contrary to the lies and marketing hype, MySQL is almost always one of the slowest and least scalable solutions of any generally ava

          • by sphealey (2855)

            > Far too often, vast ignorance, huge ego, and massive pride
            > prevent people from considering alternative database solutions
            > and their ignorance of the domain allows them to quickly
            > become self assured they've picked a winner.

            I won't say that those factors don't come into play, because they do. But I think a more fundamental problem is that there has never been a good source of education on how relational databases really work in the corporeal world (as opposed to the theoretical world of CS

          • by LurkerXXX (667952)

            Oh give MySQL a break. They finally fixed it so it no longer recognized February 31st as a real date, so they are making progress.

          • by DragonHawk (21256) on Saturday July 09, 2011 @01:14PM (#36706164) Homepage Journal

            Geez, GooberToo, did a MySQL developer kill your father or something? You've posted two giant rants about how MySQL is so unsuitable for anything that it can't possibly work for any serious project. You make it sound like simply installing MySQL causes a server to immediately explode.

            You *are* aware that Facebook, Slashdot, Wikipedia, and many other sites use MySQL, yes? Maybe there are better choices (more likely, there are different tradeoffs, but whatever), but MySQL works well enough to power some of the most popular websites in the world. Proof by existence that what you claim is inaccurate.

      • by tgv (254536)

        They should have gone with Oracle? Why? I work with that expensive cr*p, and it can't perform its way out of an open box. They can't have that much db dependent software anyway. Just plug in a compatibility layer and use something fast under it. I guess this is only news because it's facebook and facebook is worth so much money.

      • by NeoMorphy (576507) on Saturday July 09, 2011 @10:09AM (#36704514)

        If you wan't to start a fun/interesting project that you didn't expect any revenue from, it would make more sense to use free software. MySQL is a popular choice for web applications and there is a lot of freely available documentation and examples available. Many people have been successful doing it, so it's a proven path that works.

        Oracle is expensive. It would have cost a fortune to start Facebook with Oracle, and I can't imagine what it would cost them now. But even if they have to hire a ton of experts to convert to Oracle( assuming that is the best thing to do...) They can probably be funded by the money saved by not using Oracle over the past couple of years.

        Maybe Oracle would have been a mistake, there are companies migrating from Oracle to DB2/DB2 to Oracle/Oracle to Sybase/Sybase to MySQL/Mainframe to AIX/AIX to Solaris/Solaris to Linux/etc.. It seems like nobody can agree to the best hardware/OS/database solution, but there are plenty of people who swear that the solution they know is the best one.

        • by drolli (522659)

          The fact is: there is no single best solution. Specific bottlenecks will require specific solutions.

  • Still their fault (Score:4, Informative)

    by nurb432 (527695) on Saturday July 09, 2011 @09:30AM (#36704212) Homepage Journal

    Once they started the trend to grow beyond being a toy, they should have redone things right then.

    Waiting until you are painted in a corner is irresponsible.

  • Professor Plum: What are you afraid of, a fate worse than death?
    Mrs. Peacock: No, just death, isn't that enough?

  • ... "Michael 'Ingres' 'Postgres' 'VoltDB' Stonebraker says 'MySQL doesn't scale'".
    • MySQL is 'fast' because its lack of feature and robustness mainly. Implying maketshare means qualit.is like implying that current crappy pop music is better than Classica Music because of the marketshare they get.
      • by laffer1 (701823)

        How ten years ago of you. You do realize that MySQL and Postgres are getting rather close on feature parity right? Both of them have been adding features the other lacked. MySQL 5.5 has stored procedures, views, triggers, etc.

        The biggest selling point with Postgres for me is the schema handling/support. One database can have many schemas.

        It's fine if you like postgres, but saying that technologically advanced in every way is wrong. There are benefits to both systems.

        Regarding the marketshare comments,

        • Maybe geature wise they're on par, however the MySQL syntax for nested queries is incredibly weird.

        • by segedunum (883035)

          How ten years ago of you. You do realize that MySQL and Postgres are getting rather close on feature parity right? Both of them have been adding features the other lacked. MySQL 5.5 has stored procedures, views, triggers, etc.

          Not sure which way to take that to be honest. Postgres has got faster without sacrificing useful and sometimes essential features while MySQL has had to scramble to catch up. It's an ass backwards way of going about things and it's one of the reasons why there are so many problems with it. Facebook, Google and God knows who else has goodness knows how many forks of MySQL now. That's the complaint being levelled by the article.

      • by repetty (260322)

        MySQL is 'fast' because its lack of feature and robustness mainly.

        I've read the same thing about C.

  • They would probably need to build an abstraction layer on top of MySQL. Something like "FBSQL" :) . This would create the ability to move to whatever database system they want.
    • by sphealey (2855)

      > They would probably need to build an abstraction layer on top of MySQL.
      > Something like "FBSQL" :) . This would create the ability to move to
      > whatever database system they want.

      And get equal performance on all of them. Equally poor, that is.

      sPh

  • The ex-Facebook developers who founded Quora think Facebook is stuck on [PHP] for legacy reasons, not because it's the best choice right now [quora.com].
  • "We're so new" (Score:5, Insightful)

    by michaelmalak (91262) <michael@michaelmalak.com> on Saturday July 09, 2011 @09:44AM (#36704360) Homepage

    I love the snippets "After all, he explained, SQL was created decades ago before the web, mobile devices and sensors forever changed how and how often databases are accessed" from the article and "We’ve been using stonge age technology to solve problems that didn’t exist 30 years ago." Yes, the problems existed 30 years ago, such as (land-line) telephone billing. I don't know how those problems were solved -- probably with a mainframe and a custom non-SQL database and not a PC running a SQL-based server -- but they were solved.

  • And this opinion has nothing to do with the fact that this is the guy who write PostgreSQL and he has been bitching about how MySQL has a to big market share, for years??

    MySQL has been faster that PostgreSQL for years, it doesn't have as many features, but it is **fast** !!

    • by Lennie (16154)

      Actually, if I understand it correctly he worked on Postgres, not PostgreSQL which came later.

    • by dzfoo (772245) on Saturday July 09, 2011 @10:48AM (#36704878)

      And this opinion has nothing to do with the fact that this is the guy who write PostgreSQL and he has been bitching about how MySQL has a to big market share, for years??

      Not at all. But it does have something to do with the fact that he is plugging his new product, which implements something he calls "NewSQL."

            dZ.

    • by Rufty (37223)

      MySQL has been faster that PostgreSQL for years, it doesn't have as many features, but it is **fast** !!

      /dev/null is even faster, but I wouldn't use that for data storage, either.

  • What, you want to make 100 billion dollars through an IPO and you moan because you might actually have to WORK?
  • by tyler_larson (558763) on Saturday July 09, 2011 @10:03AM (#36704480) Homepage

    Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.

    Presumably he's tired of Facebook being used as a counter-example to everything he's been preaching.

    • by fermion (181285) on Saturday July 09, 2011 @10:31AM (#36704696) Homepage Journal
      And note that two stories down it is reported that SAP is once again over budget and over schedule on a major implementation. So I suppose that now everyone will stop using SAP as it unreliable.
    • by rgmoore (133276) <glandauer@charter.net> on Saturday July 09, 2011 @10:33AM (#36704712) Homepage

      No, he's not an academic purist; he's a businessman who's selling a product that competes with MySQL. So he's trying to convince web startups to pay a bunch of money for his product rather than rely on free MySQL because he claims it will help them scale better than Facebook. IOW, businessman trashes competitor's product, claims you should buy from him instead. Nothing to see here.

    • by Animats (122034) on Saturday July 09, 2011 @10:40AM (#36704778) Homepage

      Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.

      Right.

      Some of the key architects of Facebook have spoken at Stanford about how the system is put together, and I went to that presentation and had a chance to talk to them. They didn't consider MySQL to be a bottleneck. Their big problem was PHP performance. They were writing a PHP compiler to fix that.

      Internally, the user-facing side of Facebook is in PHP. But the front end machines don't talk directly to the databases. They use an RPC system to talk to other machines that do the "business logic" parts of the system. Building a Facebook reply page may involve a hundred machines. There's heavy caching all over the system, of course, so the databases aren't hit for most read requests.

      The RPC system isn't HTML, JSON, or SOAP. It's a binary system that doesn't require text parsing. Otherwise, RPC would be the bottleneck.

      This makes for a flexible, easy to enhance system. New services go in new machines, which talk to existing machines.

      • by rekoil (168689) on Saturday July 09, 2011 @12:06PM (#36705518)

        The RPC system they're using is Thrift (http://thrift.apache.org/)., which they developed because JSON was becoming a bottleneck. And yeah, there's a metric crapload of memcached in their data centers as well. The multi-hour outage Facebook had late last year was due to a near-complete failure of the memcached layer, resulting in an overload of requests to the main mysql farms.

    • by sco08y (615665) on Saturday July 09, 2011 @10:41AM (#36704786)

      Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.

      Presumably he's tired of Facebook being used as a counter-example to everything he's been preaching.

      He's no academic purist. He's pushing his product, and he's either an outright liar or, worse, doesn't know what he's talking about:

      Stonebraker said the problem with MySQL and other SQL databases is that they consume too many resources for overhead tasks (e.g., maintaining ACID compliance and handling multithreading)

      Is that so? MySQL, as with virtually all SQL DBMSs, defaults to "repeatable read" [mysql.com] transactional guarantees, and it doesn't even spend time guaranteeing foreign key relationships [mysql.com] by default. About the only thing MySQL really guarantees out of the box is durability.

      It's just nonsense to talk about all the "wasted resources" when, if they don't need them, it's a few lines in a config file to turn them off.

  • That should do the trick, eh?

    Oops, time to buy Oracle stock and short FaceBook, I guess!

  • by tommeke100 (755660) on Saturday July 09, 2011 @10:13AM (#36704534)
    If anything, it's a success story for MySQL.
  • If what this guy says is true and facebook devs have to rewrite everything the solution(as i see it) is quite simple.

    Facebook user data is relatively similar across profiles. Their export application lends credit to this. Correct me if im wrong, but wouldnt it be simple to write a routine to port data from one product to the other? Test this thoroughly to make sure its bulletproof. Then run it until the job is done. To make this simple purge all the bullshit like wall posts, messages, notes and other use
    • Exporting the data isn't the issue. The issue is going through the codebase and altering all the MySql specific code and then making sure it all works.

    • by kimvette (919543)

      To make this simple purge all the bullshit like wall posts, messages, notes and other user content.

      So, what you are saying is to purge everything which makes Facebook worthwhile to the masses? What else could possibly give Google+ an opportunity to instantly overtake Facebook? :)

    • by sco08y (615665)

      Correct me if im wrong, but wouldnt it be simple to write a routine to port data from one product to the other?

      Depends on your definition of "simple":

      1. You build a second system.
      2. Your "porting routine" has to keep both systems constantly in sync, bidirectionally.
      3. You rewrite every client for the first system, or add bridges, to talk to the new system.
      4. Finally, once all the clients are updated, you can switch off the old system.

  • by solferino (100959) <hazchem@gmSLACKWAREail.com minus distro> on Saturday July 09, 2011 @10:27AM (#36704660) Homepage

    The guy in the article [wikipedia.org] does have some cred. He was a professor at UC Berkeley for 29 years where he was project leader on Ingres and led the creation of its follow up, Postgres.

    His new database, VoltDB [wikipedia.org], based on the 'NewSQL' ideas touched on in the article, is Free Software licensed under the GPLv3.

  • by sgt scrub (869860) <<moc.oohay> <ta> <muitnias>> on Saturday July 09, 2011 @10:35AM (#36704728)

    I don't think so. Facebook isn't known for having a lot of down time. It is known for opening up information to the public. If anything, that would be considered too much up time. I've used MySQL and PostgreSQL. I found MySQL to be limited but most limitations were easily worked around in code. PostgreSQL wasn't as limited. However, the options that it provided forced the need to vacuum the database. I would rather write code but to each his own.

  • by spectro (80839) on Saturday July 09, 2011 @10:42AM (#36704802) Homepage

    Over and over we hear about this "scrap and start over" concept. It sounds like a great idea but you are assuming you can do a better job than the guys before and more often than not you will be wrong.

    I used to suggest it but now I know better. I have seen new devs with little experience passionately suggest so called "total refactoring". It has never ended well.

  • by MAXOMENOS (9802) <maxomai.gmail@com> on Saturday July 09, 2011 @10:45AM (#36704840) Homepage

    The underlying problem according to Stonebrook:

    During an interview this week, Stonebraker explained to me that Facebook has split its MySQL database into 4,000 shards in order to handle the site’s massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve.

    Or you could put MySQL on an IBM Power Systems LPAR and use a commercial MySQL plug-in [krook.net] to store the data in a DB2 database. Then you can get away with maybe a dozen database machines instead of thousands. I have to imagine, btw, that Oracle has a similar offering in the works.

    Lesson: academic credentials are no match for real world experience.

  • by midom (535130) on Saturday July 09, 2011 @11:19AM (#36705118) Homepage

    (reposting as a logged in user) I wrote a bit longer response to this:
    stonebraker trapped in stonebraker 'fate worse than death' [dom.as]

    I think I know a bit more about database situation inside FB than Mr.Stonebraker. Go figure.

A LISP programmer knows the value of everything, but the cost of nothing. -- Alan Perlis

Working...