Yale Researchers Prove That ACID Is Scalable 272
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
Re:I hate SQL and Databases in General... (Score:5, Informative)
Spoken with proud ignorance.
Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:
I am not a DBA and therefore I do not understand DBA and therefore I must complain.
Further SQL has nothing to do with ACID. AT ALL!
Re:Pfah. (Score:5, Informative)
Re:NoSQL is also about arbitrary schemas (Score:1, Informative)
ANSI defines a query mechanism for walking heirarchies in Common Table Expressions. Granted, the garbage that is MySQL has no support for this, but the latest releases of Postgresql, Oracle, DB2 and Microsoft SQL Server all do using the same syntax. I will say that SQL is not really the best suited for such things, but it does work.
As for schema-less data, there are a couple of solutions which I believe are all DB-specific. Microsoft SQL Server allows storage of XML data as well as SQL extensions utilizing XQuery to query into that data. It also supports indexing that data and using XML schemas to constrict the nature of that data if necessary. Microsoft SQL Server 2008 also added sparse table support which is built on top of XML storage which allows a table to have 30,000 columns and optimized for the majority of those columns being NULL on any particular row. I know that ANSI does get into XML storage a little bit but I'm not sure which DBs actually implement the standard, if any, especially to a level where it would be a workable solution.
Just in case anybody else doesn't know... (Score:4, Informative)
From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID [wikipedia.org])
"In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."
Re:I hate SQL and Databases in General... (Score:3, Informative)
What alternative have you seen that handles the same workload more efficiently? Flat files? I've seen plenty of database-related performance issues, but it's almost never inherent in the database - it's the idiot that wrote the lousy table-scanning code that's reading a couple rows out of a table with millions that's the problem.
If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state. And if you could somehow "back up" the database and "restore" it on a test server, or under a different name. That would be awesome.
Checking your create/change scripts into source control is no more difficult than checking your C source in prior to compiling it.
While I don't totally disagree on this point, calling SQL "fixed" is a bit like saying C# and Java are the same. I promise you any meaty SQL Server code will not run on Oracle without very significant changes that will have to be done by someone that will cost you a lot of money (and likewise with Oracle to SQL Server). The capabilities vary wildly by platform, and the syntax is only identical for the simplest of CRUD statements.
I have to give this one a LOLWUT. If you're using a big RDBMS, it's likely a multi-user system. If you've got multiple users and connections, you want ACID. This isn't like imposing sorting overhead on data structures, it's like imposing the basic memory protection, process isolation, and filesystem durability you find in any competent operating system. If you want to see what it's like without those protections, go use Mac OS 9 for a week or so, or an Access database used by a few dozen people over a network.
Re:Pfah. (Score:1, Informative)
But then it's not scalable.
If you want ACID and scalability, use Postgres.
Re:I hate SQL and Databases in General... (Score:4, Informative)
And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.
That's easily solvable:
Stored procedures don't have to be any more difficult to manage than any other code.
Summary (Score:5, Informative)
Short Summary:
We make some claims about scaling ACID databases, but then don't support them.
Longer summary:
We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
No direct comparison to any NoSQL implementation is made.
So yea, I'm not holding my breath for companies to start migrating away from NoSQL.
Re:Pfah. (Score:5, Informative)
A decade plus ago, and that would be true.
Standard SQL from SQL-99 on will, in fact, do this quite easily with via recursive Common Table Expressions. Now, some SQL-based DBMSs don't support enough of the standard to use this, but, current versions of, I believe, DB2, Firebird, PostgreSQL, and SQL Server all implement standard CTEs well enough to do those examples in SQL fairly directly, and Oracle has its own proprietary syntax (CONNECT BY) that works for the examples that you pose, though its less general than SQL-99 recursive CTEs.
Re:Pfah. (Score:5, Informative)
You have no idea what you're talking about, probably because your brain has been irreversibly warped by MySQL. Concurrent writing is widely-supported.
Hint: MVCC [wikipedia.org].
Re:Pfah. (Score:5, Informative)
An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.
Wow. Just wow. Any serious ACID complient RDBMS can do that with no problem.
Book: SQL Antipatterns (Score:3, Informative)
Re:Pfah. (Score:3, Informative)
So, remember, NoSQL means that's anything but SQL. It's not a standard; rather, it's an honest effort to try to experiment with different database techniques where traditional SQL just isn't meeting an industry need. Key-value databases aren't going to satisfy the "give me how many widgets we sold in June to evil inventors in the tri-state area" need; but they do satisfy the scalability need for sites that have millions of concurrent users.
Regarding Mongo, the NoSQL database that I use, it can answer the "give me how many widgets we sold in June to evil inventors in the tri-state area." Basically, instead of having 100 tables with foreign key relationships, you'll have 10 collections of "documents," which are really just data structures. You can query deeply into data structures and return partial data structures.
Let's assume I have an "invoices" collection. Each invoice has an array of "line items", and each item has a count. I can do the following in Mongo:
Again, NoSQL isn't a standard. It's basically experimenting with different ways of having a database with the hopes of finding one that's easier to work with. Mongo is a lot closer to SQL then things like Key-Value databases.
Re:Pfah. (Score:3, Informative)
SQL, as such, is declarative. Many RDBMSs include, in addition to SQL, an SQL-derived procedural scripting language (Oracle's PL/SQL, and so on.)