Forgot your password?
typodupeerror
Databases

Yale Researchers Prove That ACID Is Scalable 272

Posted by CmdrTaco
from the i-could-prove-lunch dept.
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
This discussion has been archived. No new comments can be posted.

Yale Researchers Prove That ACID Is Scalable

Comments Filter:
  • by poet (8021) on Wednesday September 01, 2010 @01:36PM (#33438188) Homepage

    Spoken with proud ignorance.

    Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:

    I am not a DBA and therefore I do not understand DBA and therefore I must complain.

    Further SQL has nothing to do with ACID. AT ALL!

  • Re:Pfah. (Score:5, Informative)

    by Trieuvan (789695) on Wednesday September 01, 2010 @01:41PM (#33438296) Homepage
    It is if you use innodb .
  • by Anonymous Coward on Wednesday September 01, 2010 @01:52PM (#33438440)

    ANSI defines a query mechanism for walking heirarchies in Common Table Expressions. Granted, the garbage that is MySQL has no support for this, but the latest releases of Postgresql, Oracle, DB2 and Microsoft SQL Server all do using the same syntax. I will say that SQL is not really the best suited for such things, but it does work.

    As for schema-less data, there are a couple of solutions which I believe are all DB-specific. Microsoft SQL Server allows storage of XML data as well as SQL extensions utilizing XQuery to query into that data. It also supports indexing that data and using XML schemas to constrict the nature of that data if necessary. Microsoft SQL Server 2008 also added sparse table support which is built on top of XML storage which allows a table to have 30,000 columns and optimized for the majority of those columns being NULL on any particular row. I know that ANSI does get into XML storage a little bit but I'm not sure which DBs actually implement the standard, if any, especially to a level where it would be a workable solution.

  • by elwin_windleaf (643442) on Wednesday September 01, 2010 @02:01PM (#33438582) Homepage

    From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID [wikipedia.org])

    "In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."

  • by davidbrit2 (775091) on Wednesday September 01, 2010 @02:07PM (#33438658) Homepage

    ...because on every application I have ever worked on, the Database has always been the performance bottleneck.

    What alternative have you seen that handles the same workload more efficiently? Flat files? I've seen plenty of database-related performance issues, but it's almost never inherent in the database - it's the idiot that wrote the lousy table-scanning code that's reading a couple rows out of a table with millions that's the problem.

    Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.

    If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state. And if you could somehow "back up" the database and "restore" it on a test server, or under a different name. That would be awesome.

    And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

    Checking your create/change scripts into source control is no more difficult than checking your C source in prior to compiling it.

    SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

    While I don't totally disagree on this point, calling SQL "fixed" is a bit like saying C# and Java are the same. I promise you any meaty SQL Server code will not run on Oracle without very significant changes that will have to be done by someone that will cost you a lot of money (and likewise with Oracle to SQL Server). The capabilities vary wildly by platform, and the syntax is only identical for the simplest of CRUD statements.

    Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.

    I have to give this one a LOLWUT. If you're using a big RDBMS, it's likely a multi-user system. If you've got multiple users and connections, you want ACID. This isn't like imposing sorting overhead on data structures, it's like imposing the basic memory protection, process isolation, and filesystem durability you find in any competent operating system. If you want to see what it's like without those protections, go use Mac OS 9 for a week or so, or an Access database used by a few dozen people over a network.

  • Re:Pfah. (Score:1, Informative)

    by Anonymous Coward on Wednesday September 01, 2010 @03:43PM (#33440080)

    But then it's not scalable.

    If you want ACID and scalability, use Postgres.

  • by Just Some Guy (3352) <kirk+slashdot@strauser.com> on Wednesday September 01, 2010 @03:58PM (#33440296) Homepage Journal

    And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

    That's easily solvable:

    1. Create a subdirectory called "storedprocs" inside your SCM directory.
    2. Inside that subdirectory, make files with names like "checkinvoice.sql" that store the sequence of commands required to create a stored procedure - one per file. Start each one with a statement like CREATE OR REPLACE FUNCTION myschema.checkinvoice([...]).
    3. Manage those files with your SCM system. Group them by database, or by project, or by phase of the moon, or by whatever else makes sense to you.
    4. To update every stored procedure you've ever written, or to build out a new database: cd storedprocs; psql < *.sql

    Stored procedures don't have to be any more difficult to manage than any other code.

  • Summary (Score:5, Informative)

    by azmodean+1 (1328653) on Wednesday September 01, 2010 @04:07PM (#33440488)

    Short Summary:
    We make some claims about scaling ACID databases, but then don't support them.

    Longer summary:
    We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
    In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
    No direct comparison to any NoSQL implementation is made.

    So yea, I'm not holding my breath for companies to start migrating away from NoSQL.

  • Re:Pfah. (Score:5, Informative)

    by DragonWriter (970822) on Wednesday September 01, 2010 @05:01PM (#33441258)

    Doesn't work so well if you've got a graph structure or a tree. If in a family tree, you want to find all 5'th descendants or all descendants of some guy, SQL won't make you happy.

    A decade plus ago, and that would be true.

    Standard SQL from SQL-99 on will, in fact, do this quite easily with via recursive Common Table Expressions. Now, some SQL-based DBMSs don't support enough of the standard to use this, but, current versions of, I believe, DB2, Firebird, PostgreSQL, and SQL Server all implement standard CTEs well enough to do those examples in SQL fairly directly, and Oracle has its own proprietary syntax (CONNECT BY) that works for the examples that you pose, though its less general than SQL-99 recursive CTEs.

  • Re:Pfah. (Score:5, Informative)

    by QuoteMstr (55051) <dan.colascione@gmail.com> on Wednesday September 01, 2010 @05:28PM (#33441658)

    An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.

    You have no idea what you're talking about, probably because your brain has been irreversibly warped by MySQL. Concurrent writing is widely-supported.

    Hint: MVCC [wikipedia.org].

  • Re:Pfah. (Score:5, Informative)

    by LurkerXXX (667952) on Wednesday September 01, 2010 @07:14PM (#33443234)

    An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.

    Wow. Just wow. Any serious ACID complient RDBMS can do that with no problem.

  • by Futurepower(R) (558542) <MJennings.USA@NOT_any_of_THISgmail.com> on Thursday September 02, 2010 @10:02AM (#33449394) Homepage
    SQL Antipatterns [amazon.com] may interest you. As one of the reviews says, "An excellent guide to database design tradeoffs".
  • Re:Pfah. (Score:3, Informative)

    by GWBasic (900357) <{slashdot} {at} {andrewrondeau.com}> on Thursday September 02, 2010 @04:46PM (#33456962) Homepage

    So, remember, NoSQL means that's anything but SQL. It's not a standard; rather, it's an honest effort to try to experiment with different database techniques where traditional SQL just isn't meeting an industry need. Key-value databases aren't going to satisfy the "give me how many widgets we sold in June to evil inventors in the tri-state area" need; but they do satisfy the scalability need for sites that have millions of concurrent users.

    Regarding Mongo, the NoSQL database that I use, it can answer the "give me how many widgets we sold in June to evil inventors in the tri-state area." Basically, instead of having 100 tables with foreign key relationships, you'll have 10 collections of "documents," which are really just data structures. You can query deeply into data structures and return partial data structures.

    Let's assume I have an "invoices" collection. Each invoice has an array of "line items", and each item has a count. I can do the following in Mongo:

    Again, NoSQL isn't a standard. It's basically experimenting with different ways of having a database with the hopes of finding one that's easier to work with. Mongo is a lot closer to SQL then things like Key-Value databases.

  • Re:Pfah. (Score:3, Informative)

    by DragonWriter (970822) on Friday September 03, 2010 @11:58AM (#33465490)

    I think the issue is that SQL is procedural, and query languages are better when declarative.

    SQL, as such, is declarative. Many RDBMSs include, in addition to SQL, an SQL-derived procedural scripting language (Oracle's PL/SQL, and so on.)

E = MC ** 2 +- 3db

Working...