Yale Researchers Prove That ACID Is Scalable - Slashdot

Please create an account to participate in the Slashdot moderation system

×

Yale Researchers Prove That ACID Is Scalable 272

Posted by CmdrTaco on Wednesday September 01, 2010 @12:49PM from the i-could-prove-lunch dept.

An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."

This discussion has been archived. No new comments can be posted.

Yale Researchers Prove That ACID Is Scalable

Search 272 Comments Log In/Create an Account

Comments Filter:

Re:digg does not need to worry anymore (Score:3, Insightful)

by Pojut ( 1027544 ) writes: on Wednesday September 01, 2010 @12:59PM (#33437728) Homepage

offtopic:
Considering how fanatical digg users can be, I can't possibly imagine why they thought it was a good idea to implement the changes they've made.

Parent Share
twitter facebook
Re:Pfah. (Score:4, Insightful)

by TheSunborn ( 68004 ) writes: <mtilstedNO@SPAMgmail.com> on Wednesday September 01, 2010 @01:05PM (#33437818)

It was newer database size which were the problem but the number of queries per second(Aka performance) which could be executed.
You can run a Google size database from MySQL, but you can't use to MySQL* to implement a search solution with performance like Google, without requiring much much much hardware.
*Or an other sql database.

Parent Share
twitter facebook
Re:Pfah. (Score:5, Insightful)

by mini me ( 132455 ) writes: on Wednesday September 01, 2010 @01:06PM (#33437830)

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?
I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.

Parent Share
twitter facebook
Possible != Practical (Score:4, Insightful)

by Tablizer ( 95088 ) writes: on Wednesday September 01, 2010 @01:17PM (#33437952) Journal

A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.

Share
twitter facebook
Re:Pfah. (Score:3, Insightful)

by TooMuchToDo ( 882796 ) writes: on Wednesday September 01, 2010 @01:27PM (#33438084)

Google initially used MySQL for Adwords, tried to switch away from it, and then switched back (if I recall correctly). Your Googling May Vary.

Parent Share
twitter facebook
ACID does not imply SQL (Score:3, Insightful)

by LightningBolt! ( 664763 ) writes: <lightningboltlightningbolt@@@yahoo...com> on Wednesday September 01, 2010 @01:28PM (#33438094) Homepage

For instance, Neo4J is a scalable graph-based "nosql" DB with ACID.

Share
twitter facebook
Re:Pfah. (Score:5, Insightful)

by bluefoxlucid ( 723572 ) writes: on Wednesday September 01, 2010 @01:28PM (#33438098) Homepage Journal

There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.
Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).
Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).
Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.

Parent Share
twitter facebook
Re:Pfah. (Score:5, Insightful)

by DragonWriter ( 970822 ) writes: on Wednesday September 01, 2010 @01:29PM (#33438114)

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Database size was never the main driving force beyond the new move toward NoSQL databases. Support for distributed architectures is. In part, this is about handling lots of queries rather than handling lots of data; it also -- particularly if you are Google -- deals with latency when the consumers of data are widely distributed geographically.
And note that one of the companies that is heavily involved in building, using, and supplying non-SQL distributed databases is Google, who, as you so well point out, is very much aware of both the capabilities and limits of scaling with current relational DBs.
This new research may offer new prospects for better databases in the future -- but TFA indicates that the new design has a limitation which seems common in distributed, strongly-consistent system "It turns out that the deterministic scheme performs horribly in disk-based environments".
In fact, given that it proposes strong consistency, distribution, and relies on in-memory operation for performance, it sounds a lot like existing distributed, strongly-consistent systems based around the Paxos algorithm, like Scalaris. And it seems likely to face the same criticism from those who think that durability requires disk-based persistence, and that replacing storage on disks (which, one should keep in mind, can also fail) with storage in-memory simultaneously on a sufficient number of servers (which, yes, could all simultaneously fail, but durability is never absolute, its at best a matter of the degree to which data is protected against probable simultaneous combinations of failures.)
So -- reading only the blog post that is TFA announcing the paper and not the paper itself yet -- I don't get the impression that this is necessary are giant leap forward, though more work on distributed, strongly-consistent databases is certainly a good thing.

Parent Share
twitter facebook
Re:I hate SQL and Databases in General... (Score:5, Insightful)

by jeff4747 ( 256583 ) writes: on Wednesday September 01, 2010 @01:34PM (#33438182)

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?
Because it works.
"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.

Parent Share
twitter facebook
You hate what you don't understand (Score:5, Insightful)

by frist ( 1441971 ) writes: on Wednesday September 01, 2010 @01:38PM (#33438224)

Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.

Parent Share
twitter facebook
They answered the wrong question (Score:2, Insightful)

by mysidia ( 191772 ) writes: on Wednesday September 01, 2010 @01:45PM (#33438348)

We knew ACID can scale already.
With enough money poured into it, and new implementations, ACID can scale.
They solved some problems with scaling out, not necessarily the problems with it scaling up. Scaling does not necessarily just mean replicas and quick failover -- it means good performance without millions spent on hardware too, in terms of overhead, storage requirements, storage performance, server performance.
NoSQL scales in certain cases less expensively, with less work, and doesn't require complicated DBM algorithms. The representation of data is also simpler, and requires less work to maintain than tables.
It's just a result of major existing SQL implementations being so expensive with large datasets, that sometimes it costs more in terms of performance and required hardware, than simply using NoSQL.
I also love this gem from the article:
If the system is also stripped of the right to arbitrarily abort transactions (system aborts typically occur for reasons such as node failure and deadlock), then problem (b) is also eliminated. ... given an initial database state and a sequence of transaction requests, there exists only one valid final state. In other words, determinism.
I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time... I/Os never have unrecoverable errors, syscalls never return error codes, RAM is infinite, programs never crash.
The conclusion that ACID alone is the bottleneck is not necessarily true. The SQL language itself requires a complex implementation just to parse and implement queries, that can add latency.

Share
twitter facebook
Re:Pfah. (Score:1, Insightful)

by Anonymous Coward writes: on Wednesday September 01, 2010 @01:47PM (#33438374)

Right, raw size is only one component. As a practical matter, if you have 100 trillion records in a DB, you probably also have ferocious insertion and query rates, as well. Not enforcing ACID has its advantages under those conditions.
Whether such a tact was logically required is an interesting question...

Parent Share
twitter facebook
Re:digg does not need to worry anymore (Score:5, Insightful)

by Dan667 ( 564390 ) writes: on Wednesday September 01, 2010 @01:53PM (#33438456)

actually most of the change was to allow auto submitting of stories from big publishers/companies. They basically changed digg into a paid for RSS ad service. If you hated the gaming of the old site digg I am sure you just stopped using the new site digg all together. No one goes to a website to read ads.

Parent Share
twitter facebook
Re:Possible != Practical (Score:3, Insightful)

by Peeteriz ( 821290 ) writes: on Wednesday September 01, 2010 @01:54PM (#33438476)

Typically the NoSQL approach just shifts the problems from the database layer to the application programmer - if it's simply ignored, a typical app can't cope with unpredictable/corrupt data being returned from db, and results in weird bugreports that cost a lot of development time to find and fix; and with these fixes parts of the ACID compliance are simply re-implemented in the app layer.
You gain some performance of the db, you lose some (hopefully less) performance in the app, and it costs you additional complexity and programmer-time in the app.

Parent Share
twitter facebook
Re:Pfah. (Score:3, Insightful)

by GWBasic ( 900357 ) writes: <slashdot@@@andrewrondeau...com> on Wednesday September 01, 2010 @02:01PM (#33438568) Homepage

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I 100% agree. Earlier this year I created a moved a prototype application built around SQLite and flat files to MongoDB. MongoDB is SQL-like in its ability to have queries and indexes; but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables. This dramatically reduced complexity in code that used to deal with 5-6 SQLite tables. In the case of MongoDB, I was able to replace 5-6 tables with a single collection of structured documents. MongoDB lets me write queries against data that's deeply-nested, yet it can return the full data structure so I don't have the performance hit (and programmer time hit) of running (and writing) many queries to hydrate data structures around foreign key relationships.
The other advantage to MongoDB is that its schemaless approach makes it much easier to handle inheritance. I can have documents with common parts for base classes, and varying parts for child classes. This is much harder in SQL, because I either need to design a super-table that can handle all variations of the base class, or I need to use a multi-join around all potential classes that I can query. MongoDB's document-based approach, as opposed to SQL's table approach, lets me write a single query that can handle future subclassing of the data, and future variations of the data.

Parent Share
twitter facebook
Re:Pfah. (Score:4, Insightful)

by Anpheus ( 908711 ) writes: on Wednesday September 01, 2010 @02:17PM (#33438806)

Well, and if you don't need it [the guarantees of ACID], why pay for it? I mean, if you have to spend any amount of time thinking about "How do I make that work?" that's a cost.
Whereas if all you care about is updating individual records without global consistency, well, don't enforce global consistency.

Parent Share
twitter facebook
Re:I hate SQL and Databases in General... (Score:3, Insightful)

by GooberToo ( 74388 ) writes: on Wednesday September 01, 2010 @02:43PM (#33439160)

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?
Because fairly consistently, for the past forty years, every time someone says they've created something better than SQL and released to the market, the market proves them woefully and completely wrong. As such, as much as people piss and moan about SQL, SQL has consistently proven to be an excellent, general purpose solution and amazingly poorly understood by the masses. And solutions such as MySQL has only made things worse. That's not to say there are not superior niche solutions, only that SQL is one of the few database technologies which has continued to survive for decades as a general purpose solution, and rightfully so.
Its like the world suddenly doing their own plumbing, framing, and mechanical work and then proudly exclaiming the state of architecture and the car industry stinks because the world is falling apart around them. In reality, that means we need far more qualified DBAs and far fewer people who can barely spell, "SQL", designing and condemning the world around us.
Its literally been years since I've run into a qualified DBA, despite the fact "DBA" was part of their title. Turns out, being able to spell, "DBA" is all too often enough to qualify one for such a position. And don't get me started on the all the more common case of people who don't even know what a DBA does and yet they are responsible for actually creating the schema/data model.

Parent Share
twitter facebook
Re:Pfah. (Score:5, Insightful)

by h4nk ( 1236654 ) writes: on Wednesday September 01, 2010 @02:55PM (#33439316)

Well said. This "problem" has more to do with architects and developers understanding the concepts of layering and information hiding. When programmers are allowed to dictate architecture under the pretense that certain interfaces to a Service should determine the structure of the Information itself, there is a huge problem at the business level. How does this happen? Uninvolved, or under-skilled DBAs and data architects. This is their job. My experience is that business managers and programmers have always seen the database as some sort of necessary evil without understanding its full purpose. Too many programmers with very little database experience are given direct access to databases themselves. The motivation of "Get it to work" takes precedence over well-researched and proven approaches, approaches that will only benefit in the long run. Companies that implement poor strategies for the sake of short-term gains usually have the idea that the best approach is somehow the one that takes the most time to implement. Short-sighted solutions are put into play and almost as soon as they are implemented, the scalability and data requirement issues begin to crop. These poor strategies are often the result of inexperience and poor education on all levels. This is why it is so important to hire people that really know what they are doing from C-level management down to the programmers. I have seen bad thinking gut companies. A service built on sound architecture will have issues maturing, not doubt. How well it matures depends on the wisdom and skill of the company.

Parent Share
twitter facebook
Re:digg does not need to worry anymore (Score:1, Insightful)

by Anonymous Coward writes: on Wednesday September 01, 2010 @03:06PM (#33439472)

The first time I boycotted Digg was when they had a top headline or story where the URL didn't even resolve. Like 2,000 diggs for a host not found. I then went back for the almost safe for work mindless BS that they had for a while. Remember, digg used to be called the L1 cache for slashdot. Now, it looks like some kind of Windows XP clone and I have no idea what the content is supposed to be targeted for, so I think I'm done for now with them.
All around, a poor website as time has gone on. It was at least useful as comic relief, but that is gone as well now, its not really worth anything anymore...

Parent Share
twitter facebook
Re:Pfah. (Score:2, Insightful)

by bsdaemonaut ( 1482047 ) writes: on Wednesday September 01, 2010 @03:08PM (#33439482)

NoSQL has a lot to do with scalability. Sure there's other reasons, but not enough to recommend them over hash databases. Hash databases have been around for decades which do what you propose and a lot more, their main con is the lack of scalability -- hence NoSQL. BerkeleyDB is an example, but it's a list to huge to continue..

Parent Share
twitter facebook
Whose data is it? (Score:4, Insightful)

by sbjornda ( 199447 ) writes: <sbjornda&hotmail,com> on Wednesday September 01, 2010 @03:17PM (#33439608)

but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables.
I take it this is not business-type data? Otherwise you're doing it backwards. Start with your Entity-Relationship diagrams, devolve into logical than physical data models, and THEN start programming.
I forget who said it but it's true: The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete. (Unless it's data that has a short "best before" date.)
--
.nosig

Parent Share
twitter facebook
Re:I hate SQL and Databases in General... (Score:2, Insightful)

by jimrthy ( 893116 ) writes: on Wednesday September 01, 2010 @04:08PM (#33440508) Homepage Journal

Please don't take this wrong. I really do mean my comments respectfully and politely. It's been a long day, and I'm not sure I managed to write as sincerely as I intended.
... because on every application I have ever worked on, the Database has always been the performance bottleneck
Wow. We've had very different experiences, then. Sure, there have been plenty of times when the database was the bottle neck. But it seems like I've have more issues with network speeds. And I can think of a few cases where the file system was the issue. At my current day job, the system bus seems to be the most common bottle-neck. Not that we touch databases all that often.
Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.
Isn't that generally considered a "best practice" anyway? I mean, I've pretty much always just taken that as a given. What do you consider a feasible alternative?
Configuring applications to use this database or that database also ends up being a problem for most applications.
OK, now I really have to ask what kind of development environment you're using. That's always seemed like a fairly moderate "no-brainer." Sure, it's mildly inconvenient to make sure connection strings got changed when migrating from dev to test to staging to production, but it's not that big a deal.
Furthermore, while programming in general has continued to progress through many languages, exploring many different ways to describe problems, SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.
That's one way of looking at it, sure. Maybe you're missing the point, though? I mean, so many other languages and approaches have changed so drastically over the years...maybe SQL hasn't because it's good enough for what it does?
Bottom line: SQL is tedious, ugly, slow, and difficult to test.
Compared to what? Keep in mind its original purpose: letting business users look up algebraic sets while programmers got on with the serious data analysis. It just happened that having a standardized API that made it relatively easy to swap out back-ends turned out to be the easiest way for programmers to do our jobs.
If you really do have access to some magic technology that lets you look up persisted data (in a way that's anywhere near as flexible as SQL) significantly faster than any of the major RDBMSs...why haven't you founded a business on that and made your fortune?
And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.
You definitely need to look into some better tools. File | Save As... to stash your SP's in some directory, add to source control (if it's new), check in.
Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.
It's been a while since I had to mess with SQL, but I seem to recall specifying hints about how much transactional consistency I actually needed. I think you may be exaggerating the overhead a smidge. And I'm pretty sure there are ways to work around it. But that's getting way off track.
Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?
Two suggestions. 1) It works. And DBA's hate learning new technology. 2) No one's come up with an alternative that's compelling enough to convince more than a tiny fraction of companies to
Read the rest of this comment...

Parent Share
twitter facebook
Re:Pfah. (Score:2, Insightful)

by Krahar ( 1655029 ) writes: on Wednesday September 01, 2010 @04:09PM (#33440536)

Doesn't work so well if you've got a graph structure or a tree. If in a family tree, you want to find all 5'th descendants or all descendants of some guy, SQL won't make you happy. As far as I can see, you end up iterating a query to add children until you reach a fixed point, and SQL doesn't have fixed point operators so you have to do it by hand. Right?

Parent Share
twitter facebook
My article summary (Score:1, Insightful)

by Anonymous Coward writes: on Wednesday September 01, 2010 @05:51PM (#33442062)

Academic determines that if only you're willing to insert a single point of failure, all of your replication problems can be hand waved away. Also if you have this new single point of failure, somehow magically transactions will never need to abort ever again.

Share
twitter facebook
RDBMS is a golden hammer (Score:3, Insightful)

by yaphadam097 ( 670358 ) writes: on Wednesday September 01, 2010 @05:51PM (#33442070)

The reason that NoSQL is necessary is that ACID is not the only thing that developers need to think about. RDBMS was an innovative solution to the limitations of mainframe hierarchical databases circa 1970. Since then it has been the only game in town (At least for most enterprise software. Some of us do other things occasionally.)
It turns out that there are reasons to do things other ways, and having other options allows you to consider trade-offs. For many applications eventually consistent data scales just fine. For some applications, both big and small, an enterprise RDBMS is overkill. Why not just persist objects to a document store? Or even the file system?
The research is interesting, although I agree that we already knew we could scale the ACID paradigm. The conclusion is ridiculous. NoSQL has nothing to do with ACID, and it brings a richness to the conversation that has been missing for far too long. Like the Perl folks say, TMTOWTDI.

Share
twitter facebook
Re:Pfah. (Score:3, Insightful)

by lennier ( 44736 ) writes: on Wednesday September 01, 2010 @06:01PM (#33442190) Homepage

Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer.
That's fine until someone asks you an unstructured question for which a two-dimensional array cannot contain the answer.
Like, for example, 'Here's an ordered DOM tree of nodes each containing tags, subtrees and/or chunks of CDATA'.
Or 'Here is a set of objects each of which contain their own custom properties not found in others.'
Not every form of useful information in the real world is strictly typeful and represents a well-formed relation over finite domains.

Parent Share
twitter facebook
Re:Pfah. (Score:5, Insightful)

by hey! ( 33014 ) writes: on Wednesday September 01, 2010 @06:51PM (#33442914) Homepage Journal

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I've actually been in the business long enough to remember when relational databases were the new thing. What people seem to forget is that modeling your data in a different way than your application does *was the whole point*. The idea was to make data a reusable resource *across applications*. Of course, that turned out to be a lot harder than we thought it would be. Philosophically, one might well ask whether it is possible to understand data at all apart from its intended applications. Of course, by the time we'd figured that out, a whole new generation was coming up trying to create a Semantic Web.
I basically agree that SQL isn't always the right tool for the job. I happen to think certain aspects of the relational model are somewhat broken (e.g. composite keys), and SQL is a pretty crappy query language in any case. But I think because RDBMSs are a mature technology, recently trained programmers don't bother to understand them, and cover that lack of understanding by pooh-pooh-ing the stuff that's over their head. I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model. Naturally they had all kinds of problems on the relational end of things, and tended to view the RDBMS as a kind of pitfall in which bad things inexplicably happen. Consequently, they tended to think of the database as simply a backing store for the application *they* were working on. In some cases this is acceptable, but one often sees abominable schema that are the product of ignorance, pure and simple.
Naturally, non-relational systems are most attractive where performance is at a higher premium than flexibility. This characterizes many web applications that do a small number of relatively simple things, but to do it on a scale that takes special expertise to achieve using a relational model. That was very much the case at the beginning of the relational era, when applications tended to be narrower in scope and query optimization primitive. You thought of order line items as "part-of" an order, whereas in relational thinking they could just as easily be considered attributes of products. This made the programmer's job a lot easier, so long as the RDBMS could process invoices fast enough to make the users happy.

Parent Share
twitter facebook
Re:Pfah. (Score:3, Insightful)

by mikelieman ( 35628 ) writes: on Thursday September 02, 2010 @03:39AM (#33446636) Homepage

Unless you're writing the code for the database engine, you are NOT a database programmer, you're an application programmer...

Parent Share
twitter facebook
Re:Pfah. (Score:3, Insightful)

by bluefoxlucid ( 723572 ) writes: on Thursday September 02, 2010 @09:34AM (#33448852) Homepage Journal

That depends. If I'm storing video data I don't want a relational database. A small-scale family tree might be good in a proprietary format. A large-scale family tree might also be good in a proprietary format. The Windows registry is inherently hierarchical and needs a non-relational model, just like file systems (quit arguing that file systems should be relational DBs; the current model is fine).
A large-scale family tree that I need to use to look up other information with absolute identity (i.e. there are 15 James Clyde Simmons in the world, 7 in my city somehow, and 3 in my zip code!) needs to at least sync its individual identifiers with the primary key of a RDMBS holding all the other data in any case where relational analysis is also needed i.e. find me all PERSONS with $ATTRIBUTE. Keeping these two things in absolute sync requires a specialized database engine; but you can write program code that fakes it for all useful cases if you keep the primary common identifier unique and static.
There are going to be tasks where an RDBMS is excellent and anything else is going to be complete failure. College information systems, forever, have to track students vs student IDs vs all completed courses and grades vs when those courses were completed vs what courses the student is enrolled in now vs if they've paid for their tuition... this is the wrong kind of information to list line by line (flatfile) or hierarchically. Maybe I want to see everyone enrolled in MATH314, or everyone enrolled in MATH314 class DXA, or everyone enrolled in MATH314 on Middlesex campus. Maybe I want to see all courses James Peak is enrolled in, or has enrolled in ever. For these tasks, you need an RDBMS.
There are also going to be good flatfile cases-- MP3s, video files, XCF, etc. As well, there will be stores of information that must fall into hierarchical organization-- file systems, geneology databases, the Windows registry. These should optimally not use an RDBMS structure.
There will be tasks that operate on one set of data but bring a corner case that benefits from another method of organization. For example, looking through a database at an insurance company to check for dependents (parents/children/spouses). Of course hierarchical databases might be better for this operation; but all the information and all operations you'll ever do is going to go better in an RDBMS, and any other storage method will require either tons of cross-indexing (to the point of implementing a BAD RDBMS) or lots of memory and time to do 0.06 second queries in 10 minutes. Too slow, too broken. The corner case operations cause trouble, but what can you do?

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

321 commentsShould a Variable's Type Come After Its Name?
293 commentsAre Scrums a Cancer?
258 commentsC++ Creator Rebuts White House Warning
228 commentsWhite House Urges Devs To Switch To Memory-Safe Programming Languages
226 comments34% of AP CS Students Couldn't Solve This Java-Based 2D Array Question

"When it comes to humility, I'm the greatest." -- Bullwinkle Moose