Forgot your password?
typodupeerror
Databases Programming

Moving From CouchDB To MySQL 283

Posted by Unknown Lamer
from the hep-cats-just-use-postgres dept.
itwbennett writes "Sauce Labs had outgrown CouchDB and too much unplanned downtime made them switch to MySQL. With 20-20 hindsight they wrote about their CouchDB experience. But Sauce certainly isn't the first organization to switch databases. Back in 2009, Till Klampaeckel wrote a series of blog posts about moving in the opposite direction — from MySQL to CouchDB. Klampaeckel said the decision was about 'using the right tool for the job.' But the real story may be that programmers are never satisfied with the tool they have." Of course, then they say things like: "We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails' ActiveRecord::Store. It’s not super well integrated with MySQL's feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs."
This discussion has been archived. No new comments can be posted.

Moving From CouchDB To MySQL

Comments Filter:
  • Not getting RDMS (Score:5, Insightful)

    by Anonymous Coward on Wednesday May 16, 2012 @09:41AM (#40016041)

    And in another three years they will switch to whatever is the coolest up-and-coming storage solution. Incompetent developers will always be incompetent developers.

    • by gbjbaanb (229885) on Wednesday May 16, 2012 @10:24AM (#40016533)

      true, just reading their blog

      Things like SQL injection attacks simply should not exist.

      HTTP API. Being able to query the DB from anything that could speak HTTP (or run curl) was handy.

      so sql injection is real bad, bad design of SQL... yet allowing any old HTTP javascript queries is somehow ok. Yes, incompetent developers indeed.

      They also say

      Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?

      apart from the concepts of query caches - and stored procedures - so what if the language is related to COBOL, javascript is closely related to C which is almost as old. And that has plenty of relations to Algol which is even older.

      So yes, it sounds like they havn't really got a clue. Great advert for their business!

      • by arth1 (260657)

        so sql injection is real bad, bad design of SQL... yet allowing any old HTTP javascript queries is somehow ok.

        HTTP isn't a subset of javascript - no javascript queries are needed for HTTP. Even for JSON and other javascript objects.

        That said, yes, the developers don't seem to "get it". An object/method based database query language, which they seem to want, has already been tried. Look where Informix is right now.

        Yes, parsing can be a bitch, and which is why using a structured database isn't always the right choice to start with. If you're just using it for data storage, it rarely makes sense.

      • by Xest (935314) on Wednesday May 16, 2012 @10:45AM (#40016881)

        "Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?"

        I couldn't agree with you more, this quote makes me want to vomit. Is this really how low the average competence of today's web developer has stooped? Between PHP developers not getting why PHP is a pretty shitly designed and developed language and stuff like this, I barely get how the web even runs anymore.

        To answer the original quote, the reason we're "still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?" is because SQL is a language based on mathematically sound principles, and which is supported widely, and known widely, and is processed by database engines across the globe that have literally decades of stability behind them, data in them and so forth.

        There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL. The fact the kiddie (I can only assume he's a kiddie due to his blatant lack of knowledge and/or experience in the field) who wrote that blog post doesn't get this suggests he should absolutely not be trusted with your data as he'll only lose it.

        This is a classic example of someone bitching about something not because it's bad, but because they simply don't understand it and believe that rather than learn about it properly, it's better to bitch and hope you can somehow effect change by bitching.

        The advantage of most SQL/RDBMS is that they do adhere to the ACID principles, and for people who want to be able to have some degree of trust in their data source that's pretty fucking important. It's no surprise that they've moved over to MySQL though as it's one of the few RDBMS that is completely shit at adhering to the ACID principles and keeping uptodate with solid, stable implementations of modern database functionality.

        • mod parent up
        • Re:Not getting RDMS (Score:5, Interesting)

          by K. S. Kyosuke (729550) on Wednesday May 16, 2012 @11:41AM (#40017611)

          There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL.

          False. First of all, SQL is NOT based on mathematically sound principles of relational algebra. SQL took the mathematically sound principles of relational algebra and fucked them up. There should be no NULLs, there should be no natural ordering of "columns", there should be no possibility of having duplicate rows, there should be no possibility of inconsistent intermediate states in transactions (no deferred checking) etc. SQL has them all, and then some. Why? Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".

          Second, there is a separation between the surface structures of a language and its foundations. I really don't think that a language based on relational algebra has to look like SQL. That's like saying that a language with nouns having singular and plural and verbs having tenses has to look like English. Nope, it doesn't have to at all. Just look and VB.NET and C#. Basically two front-ends to a virtually identical language semantics, only one of them does not avoid non-alphabetic structural delimiters like the plague (and is so much more pleasant for it).

          • Re:Not getting RDMS (Score:5, Interesting)

            by TheRealMindChild (743925) on Wednesday May 16, 2012 @12:32PM (#40018163) Homepage Journal
            There should be no NULLs
            Then how do I, say, indicate the date of death for someone who hasn't died? An IsDead field? Really? (Yes, a NULL in a field is a shortcut for proper relationship, but a lack of relationship when using a linking table will still be represented by NULL)

            there should be no natural ordering of "columns"
            Does it really matter? The natural ordering of columns is the order in which you added them to the table. Ignore it. It isn't important, and not in need of a "solution"

            there should be no possibility of having duplicate rows
            Firstly, get to know your DISTINCT SQL keyword. Secondly, data in real life sometimes IS duplicate. What the hell should people do? Have a DuplicatedThisManyTimes field? Ugh.

            possibility of inconsistent intermediate states in transactions
            That is a property of the database engine, not SQL.

            Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".
            Where do you get this shit? Are you telling me the function of foreign key constraints and referential integrity, and the good ol INNER/RIGHT/LEFT join keywords are just smoke and mirrors and everything is really just a chaotic bowl of soup? References please.
            • GP is correct, and your understanding of the relational model appears to be - no offense - a bit lacking. To address your first example: people and deaths are different, though related, concepts. Ideally, they should have separate tables, plus a view. If someone died, he or she has a row in a Deaths table, which joins to the People table; otherwise, not; no NULLS necessary. When interacting with the data from outside the database, you use a view, which can be engineered to appear to contain NULLs, dupli
          • by Xest (935314)

            "False. First of all, SQL is NOT based on mathematically sound principles of relational algebra."

            No, you've completely missed the point - I'm not saying SQL is an implementation of, and only of the relational model and nothing more, and nothing less, merely that those are it's foundations. SQL absolutely IS based on the principles of relational algebra - it's still ultimately based on much of the important set theory that underlies that when it comes down to it. The point being that sure, whilst SQL is far

        • ...I barely get how the web even runs anymore.

          In the cloud, obviously.

          What, you didn't get the memo?

        • I've worked on quite a few large-ish database applications (eg 800 - 2000 tables, some with multi-million rows), and I'd say I'm fluent with SQL. But the thing that annoys me most about SQL, from a maintenance perspective, is how much of the database structure ends up strewn around in your code base. SQL is *not* good at encapsulation.

          When a new requirement comes in that should cause you to change some of the primary relationships in your database, you have a look at how much code you'd need to change to d

      • by serviscope_minor (664417) on Wednesday May 16, 2012 @11:04AM (#40017155) Journal

        so sql injection is real bad, bad design of SQL...

        SQL injection actually has nothing to do with SQL.

        Exactly the same attacks happen in any system where you build up a string from user data and pass it off to an interpreter. SQL has nothing to do with it.

        Exactly the same thing used to happen with sudo shell scripts.

        Exactly the same thing happened with javascript injection in very early webmail systems.

        There are plenty of opportunities for code injection on poorly written PHP, too.

      • by Lisias (447563)

        But yet, are these same developers that are being *highly* paid on these Web 2.0 times.

        Serious. I was of of them - but got kicked out because I made the huge mistake of pointing the obvious: you must be a skilled programmer to do programs right. Ruby On Rails will not make a good coder from a dumb ass.

        The dumb asses joined up em kick me out. =D

        • by gmack (197796)

          That is a common reason for firing. A couple of years ago some programmers wanted me to support them with the boss on switching a project written in python to Java. Their justification? The python programmer called them a bunch of monkeys. No technical arguments at all.

          Unfortunately the boss sided with the monkeys and I was next on the chopping block for pointing out that a 200 Bingo player max using 3 machines (1 web 1 db, 1 backup db) was a design flaw.

      • by plopez (54068)

        SQL is nothing like COBOL. Once again they show how they are clueless rookies.

    • by gorzek (647352) <gorzek@gmail.3.1415926com minus pi> on Wednesday May 16, 2012 @10:25AM (#40016557) Homepage Journal

      I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.

      Granted, I didn't come into this field with that background, either, but I made a point to learn it, and now I'm very cognizant of implementing sound database designs. This whole idea of throwing random strings of structured text into a database column, and then relying entirely on the program code to parse and use it... well, why the hell even use a relational database, then?

      Relational databases aren't suitable for every application, nor are "bigtable" and other NoSQL implementations. The problem is that developers use a particular kind of database without really understanding how to use it properly. If they can get data in, and get data out, that's basically all they care about. Never mind if they make it a maintenance nightmare in the process.

      • Yes it makes sense up to a point , but it starts to suffer from the law of diminishing returns and at some point having to do complicated multi-table joins actually slows down your queries so much that it becomes simpler and faster to suffer duplicate data than normalise to the Nth degree.

        • by Xest (935314)

          It depends on the task though, I'd wager 90% of SQL work that is done by developers day to day isn't in such a performance sensitive environment that it needs to favour performance over normalisation, and I agree with the GP, there's far too many developers out there that just don't do it and hence simply don't have the performance excuse. It really is just bad database design as a result of incompetence most the time.

          • by gorzek (647352)

            I can definitely see the value in making an informed tradeoff, but like you said, a lot of the time it's not an informed decision--they just do it to make it work and don't really have the expertise to know which is the right way to go. I've definitely seen enough bad database designs to know that most developers just have no clue how to design them. The worst I've seen had bad designs and poor performance, and were built in a completely ad hoc manner without any eye toward maintainability, performance, or

          • by siride (974284)

            And in many databases, there'd be more performance gains from proper normalization than pre-mature optimization. I'm working with a legacy database that has this problem. Proper normalization would probably make it lightning fast, but instead it's slow as fuck because too many concerns are put in one table when they should be put in several tables. Also, it uses functions to retrieve values, which is just...so wrong.

        • Yeah, it really depends on what you are doing. But any time you break normalization there should be a good reason. Performance is certainly a valid reason. "I'm too lazy to make a well-designed database," however, is not.

          If you find yourself breaking normalization all the time, then you've probably found a use case where a relational database isn't the best tool for the job.

          While there is a "right" way to use a given tool, there is no one tool that is right for every situation. People who get this backwards are zealots and will often make poor decisions.

        • Yes it makes sense up to a point , but it starts to suffer from the law of diminishing returns and at some point having to do complicated multi-table joins actually slows down your queries so much that it becomes simpler and faster to suffer duplicate data than normalise to the Nth degree.

          The question is whether this should be solved at the conceptual model level. As a developer, I don't care whether the database cheats and duplicates something to speed things up, as long as I don't have to do it in the data model and as long as the implementation is correct. The same logic applies to CPU caches and compiler optimizations. The computer is allowed to "cheat" if it can prove that the shortcut is correct. But you shouldn't be forced to do it manually, since it only makes your code (and data str

      • by SQLGuru (980662) on Wednesday May 16, 2012 @11:01AM (#40017119) Journal

        I completely agree. A lot of non-DB centric people think that they can do more in the app tier, effectively using their databases as glorified file stores. Why even have a database server in those instances? I'm not saying that everything should be done in the database, either, but take advantage of every tool you have.

        NoSQL has a place, so does relational. Learn their strengths and determine which is the best fit for your project. Then, learn how to use the tool to its fullest.

        • Unfortunately the developers of these "NoSQL"databases seem to have the same idea. I'm working with one that shill remain nameless but sounds oddly like a piece of fruit right now. The generally accepted best practice for scaling is to pull as much of the logic out of the database layer. While there are fancy aggregation pieces, they're all impossibly slow (and hamper concurrency). Argh.

        • by Grishnakh (216268)

          A lot of non-DB centric people think that they can do more in the app tier, effectively using their databases as glorified file stores. Why even have a database server in those instances?

          This is pretty easy to answer, I think: because databases offer ACID attributes. Reimplementing those on your own is a big project and likely to create bugs; it's a lot easier to just grab an existing database and use it.

          For instance, what if you need a "glorified file store" that multiple processes on multiple systems can

      • by tgd (2822)

        I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.

        Or a developer who is experienced enough to know how bad an idea an overly normalized database is for most applications.

  • Why not PostgreSQL? (Score:5, Interesting)

    by JamesA (164074) on Wednesday May 16, 2012 @09:48AM (#40016117)
    • by squiggleslash (241428) on Wednesday May 16, 2012 @10:18AM (#40016449) Homepage Journal

      Because it's an urban myth.

      The reality is there are only two SQL databases in the entire universe: MySQL and Oracle. You might have been told others exist, hell, you might even have worked on something called "SQL Server" in your .NET shop, but in reality: they don't. They're all figments on your imagination. Your imagination is SO determined to find better, more robust, faster, powerful, alternatives to MySQL and Oracle that an entire fantasy world comprised of "a successor to Ingres that makes MySQL look like a piece of crap" and "A Microsoft product that doesn't feel like a thirty year old mainframe product hacked onto a modern platform" develops in your head.

      C'mon, if these mythical products actually existed, sites like Slashdot wouldn't ignore them, right? Right?

      • by aclarke (307017)
        I'm not generally a Microsoft fan, but I love SQL Server. However, I haven't started a new project with it in years, I guess since pricing for SQL Server 2008 was announced. I've not been in a situation where I could justify the costs as the project (hopefully) was successful and scaled up. I also don't like being forced to run my database server on Windows. For these reasons, I just don't use it any more except in projects where it was selected years ago. I know you have to look at TCO, but I still ca
  • Nosql in Postgres (Score:4, Interesting)

    by rla3rd (596810) on Wednesday May 16, 2012 @09:48AM (#40016123)
    You can get json support using the PLV8 extension http://code.google.com/p/plv8js/wiki/PLV8 [google.com]

    or altenatively you can use the hstore data type.
  • by vlm (69642) on Wednesday May 16, 2012 @09:54AM (#40016197)

    But the real story may be that programmers are never satisfied with the tool they have.

    Ah typo

    But the real story may be that programmers don't know how to store data

    They many not know because no one knows the business needs, but more often because they have no idea what they're doing WRT to data storage.

    IT training tends to cover data manipulation pretty well "how to add two numbers'
    IT training gets shakey on data structures "So, in junior level class we will talk about data structures, which is too bad because you've already developed at least two years of bad habits first"
    IT training tends to pretty much skip data storage "In a senior level class, you might talk about scalability, maybe in an optional class. Or maybe you'll take a semester of cobol instead"

    • by Zocalo (252965)

      But the real story may be that programmers are never satisfied with the tool they have.

      Ah typo

      Possibly, but given how quick many programmers are to get into a fruitless pissing match over their favourite language it's quite apropos, no?

  • It seems to be a knee jerk reaction amongst a lot of developers and designers that as soon as your app starts requiring persistent data beyond ini values a database is needed. Why? For large but simply structured data something like json or XML or even a flat csv file is perfectly adequate. Performance can be an issue during searches but if for example you have a fixed record size with key sorted data then finding a given key is simple (binary chop or similar).

    It seems to me that reaching for a DB is the ea

    • by TheSpoom (715771)

      Starting with a database avoids the pain of migrating flat files to a database later when the database is needed (and if your app gets at all popular, it will be).

      Sure, if you're only ever expecting 10k rows of data with very little concurrent access, go nuts with your flat files.

    • by rtaylor (70602) on Wednesday May 16, 2012 @10:23AM (#40016521) Homepage

      A CSV or XML or JSON file is a db (a DB is just structured data).

      Are relational DBs always required? Certainly not.

      The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.

      What isn't obvious is that second application is often aggregate reporting for management. "How many customers are using $foo and where do they live geographically". With a relational DB, I might knock that query out in a few minutes across millions of customers.

      With a flat XML file per customer spread across a number of servers, this could take days to assemble, particularly if $foo is nested deep in the structure.

      Having spent far too much time writing one-off scripts to gather customer data because the middleware didn't support that type of query, I've actually gone the other way and started shoving some business logic into the DB.

      Functions such as isCustomerPaymentOverdue are now in the relational DB with a very thin model in the middleware to allow for much easier and faster reporting.

      • by serviscope_minor (664417) on Wednesday May 16, 2012 @11:12AM (#40017241) Journal

        The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.

        Not just that, but good use of relations and normalization makes whole classes of bug impossible.

        • by Lisias (447563)

          Not just that, but good use of relations and normalization makes whole classes of bug impossible.

          That's precisely the motive the current cast of "developers" avoid it like the Devil.

          They NEED bugs in order to justify the overdue payments and overpaid weekend death marches.

          Software Development *must* be a arcane practice, not a scientific knowledge - or they will be measured under rational arguments, and ending up loosing their jobs.

          This kids think they are artists, and behavior as they are.

  • by Xanni (29201) on Wednesday May 16, 2012 @10:15AM (#40016411) Homepage

    PostgreSQL 9.2 (now in beta) includes native JSON fields:

    http://www.h-online.com/open/news/item/PostgreSQL-9-2-beta-improves-scalability-adds-JSON-1573815.html [h-online.com]

    It's also available as an extension for the current 9.1 release:

    http://people.planetpostgresql.org/andrew/index.php?/archives/255-JSON-for-PG-9.2-...-and-now-for-9.1!.html [planetpostgresql.org]

  • by kibbey (96367)

    Hop into the wayback machine and fire up any flavor of PICK. The database where schema is applied on use, not on storage. No length limits on fields and very fast on old hardware (really fast on new). Storing bits of xml and code are no problem. And for those users who simply must have SQL, many versions will support that too (UniData and UniVerse are two examples). It's not cool, not new, but it does work.

  • Urban Airship (Score:4, Interesting)

    by jjohnson (62583) on Wednesday May 16, 2012 @10:33AM (#40016685) Homepage

    Urban Airship went PostgreSQL to MongoDB to Cassandra to PostgreSQL. http://wiki.postgresql.org/images/7/7f/Adam-lowry-postgresopen2011.pdf [postgresql.org]

    It's a good presentation because they're in love with none of them and are moving for specific reasons each time, handling different issues. It's not coders chasing the new hotness.

  • ... the joy of schemaless DBs.

    You mean working with a file system and not using a DB at all, not needing to pay a DBA, not dealing with corrupted databases, not using arcane tools, etc.?

    I jest, but not entirely. Clearly there are purposes for which databases are the right toold for the job. I'm most definitely not convinced that big blob storage is one of them.

    • by The Moof (859402)

      not using arcane tools

      I know database concepts are difficult for some people, but it's by no means magic.

      • by pz (113803)

        not using arcane tools

        I know database concepts are difficult for some people, but it's by no means magic.

        Sorry, I beg to differ. You select a DB. Turns out that's just the interface, and you have to *then* select the actual DB engine. Some engines / databases allow checking for and repair of corruption on-line, some don't. There's locking. Line level, table level, database level. Oh, wait, you didn't know about tables vs databases? What do you do when your query takes too long? Didn't you know about connecting before making a querry, persistent connections, and how to interpret obscure error messages?

  • by Animats (122034) on Wednesday May 16, 2012 @11:29AM (#40017449) Homepage

    a majority of our unplanned downtime was due to CouchDB issues

    Nowhere on the CouchDB home page [apache.org] is reliability even mentioned. And that's the real issue. Developing a reliable database system is a difficult design and programming task. It requires real software engineering. The hacks who write PHP and use JSON aren't up to a job like that. The "aw, we'll fix it in the next release" attitude doesn't cut it in databases.

The economy depends about as much on economists as the weather does on weather forecasters. -- Jean-Paul Kauffmann

Working...