Forgot your password?
typodupeerror
Databases

Yale Researchers Prove That ACID Is Scalable 272

Posted by CmdrTaco
from the i-could-prove-lunch dept.
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."
This discussion has been archived. No new comments can be posted.

Yale Researchers Prove That ACID Is Scalable

Comments Filter:
  • Pfah. (Score:5, Interesting)

    by stonecypher (118140) <stonecypher.gmail@com> on Wednesday September 01, 2010 @12:53PM (#33437610) Homepage Journal

    NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

    Digg's engineers wear clown shoes to work.

  • Re:Pfah. (Score:1, Interesting)

    by Anonymous Coward on Wednesday September 01, 2010 @01:00PM (#33437746)

    NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL [...]

    Is MySQL ACID?

  • Interesting thesis (Score:5, Interesting)

    by Peeteriz (821290) on Wednesday September 01, 2010 @01:09PM (#33437876)

    In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.

  • by Kaboom13 (235759) <kaboom108@bellsout[ ]et ['h.n' in gap]> on Wednesday September 01, 2010 @01:13PM (#33437902)

    Because the entire site had been completely overwhelmed by spammers? Digg went from a great site to go see whats new to a glorified RSS feed for cracked.com , college humor and reddit. They had to change something,

  • Re:Pfah. (Score:5, Interesting)

    by bluefoxlucid (723572) on Wednesday September 01, 2010 @01:14PM (#33437914) Journal

    NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google

    Google uses BigTable, a NoSQL database.

  • by Ouija (93401) on Wednesday September 01, 2010 @02:02PM (#33438592)

    SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?

    Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.

    It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.

    When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
    When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
    When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.

    It's different. It solves different problems. Keep your mind open.

  • Not NoACID, NoSchema (Score:3, Interesting)

    by bokmann (323771) on Wednesday September 01, 2010 @02:03PM (#33438606) Homepage

    Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.

    Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post [tedneward.com] explains the kinds of problems better than most.

    NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.

    SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.

  • by quanticle (843097) on Wednesday September 01, 2010 @03:11PM (#33439528) Homepage

    All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

    Those statements could be applied to any technology that's being used inappropriately. Why are our programs so sensitive to bad algorithm design?

  • by smcdow (114828) on Wednesday September 01, 2010 @03:15PM (#33439576) Homepage

    TFA hints at this but doesn't come out and say it: the larger you scale, the more you swamp yourself with atomicity protocol overhead. If your database is geographically distributed, then you have to decide if atomicity is more important than forgoing the very large bills for the associated network usage. I suspect that this may explain a lot about why Google, Amazon, etc., went with NoSQL solutions.

  • Re:Pfah. (Score:4, Interesting)

    by RAMMS+EIN (578166) on Wednesday September 01, 2010 @04:15PM (#33440614) Homepage Journal

    ``There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.''

    I agree, but ...

    ``While we've come up with some clever solutions like ORM to alleviate the problem,''

    I don't think ORM alleviates the problem so much as entrenches it. The classes-and-instances object model and the relational model are different, but can be expressed in one another. Object-relational mapping makes this easy by pretending the models are the same, and doing the mapping behind the scenes. This works for some cases, but if you want to get the best performance, you have to express things in a way that takes into account the efficiency considerations of the actual implementation. With ORM, you run into the situation where what is most succinct to express in code is not necessarily what is most efficient in terms of disk access and network resource usage. So, for efficiency reasons, you end up breaking the abstractions that your ORM provided ...

    ``why not just store the data directly without any mapping?''

    There isn't really such a thing as "without any mapping". However, you can ensure that the constructs your API provides are equivalent to what you can efficiently fetch or store in your data store. Since typical RDBMSs are usually optimized to execute typical SQL queries efficiently, SQL is actually a fairly good starting point. You can optimize this by creating indices to speed up common operations, and by tuning your RDBMS to speed up common operations. And, no doubt, you can do even better by creating custom shortcuts for specific needs of your application.

    This is sort of what so-called NoSQL databases do: they are optimized for specific scenarios, and thus may outperform stock RDBMSs that are optimized for "we don't know what you want to do, so we try to make everything reasonably fast". It's also worth noting that NoSQL systems often return stale data or even allow inconsistencies in order to improve performance. By contrast, the strength of a good relational database is preserving the integrity of your data no matter what happens. Different tools for different jobs - or at least, different optimizations for different scenarios.

  • Re:Pfah. (Score:4, Interesting)

    by NNKK (218503) <nknight@runawaynet.com> on Wednesday September 01, 2010 @04:47PM (#33441050) Homepage

    That is an excellent question for a DBA evaluation exercise.

    So...

    Efficient SQL Usage == Programmer + DBA

    Efficient NoSQL Usage == Programmer

    Thank you for making the case for NoSQL so clearly.

  • Re:Pfah. (Score:5, Interesting)

    by Johnno74 (252399) on Wednesday September 01, 2010 @10:00PM (#33444742)

    Totally agree. Only problem is writing recursive CTE queries is beyond most programmers. Hell, a lot of programmers struggle with anything but simple inner joins.

    IMHO CTE's are one of the most underused and powerful features of SQL. Not just for recursive queries, but for bridging the gap between functional and procedural programming.

    I write all my complex queries as a series of simple CTE's now - each CTE gets me one step closer to the actual query I need, and the magic of the query optimizer combines them all into a single query plan. Makes testing, debugging and maintaining a complex query about a million times easier.

COBOL is for morons. -- E.W. Dijkstra

Working...