Are Relational Databases Obsolete? 417
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
They're not mutually exclusive. (Score:5, Insightful)
Okay, at the risk of sounding stupid...
Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.
Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
IMS--Hierarchical DB Still Exists (Score:5, Insightful)
that doesn't mean they're going to become obsolete (Score:5, Insightful)
In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.
So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...
Yea, it's all the same. (Score:5, Insightful)
Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.
On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.
Why do people insist that one size really does fit all?
Re:They're not mutually exclusive. (Score:5, Insightful)
An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.
Re:Object Databases (Score:3, Insightful)
1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker. Suppose an object database can handle 95% of my queries with adequate performance. All well and good, but I'm totally screwed on those other 5%. On the other hand, if I was using a relational database with hibernate, hibernate might handle 95% of the queries with adequate performance, but for those other 5% I can workaround by writing custom SQL. With that setup I get the best of both worlds.
I don't know of any attempts to use object databases on large enterprise projects that haven't been complete failures, with the failure always due to performance issues.
Re:Yea, it's all the same. (Score:5, Insightful)
I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.
He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"
Re:Rotate (Score:5, Insightful)
Re:The guy... (Score:3, Insightful)
An interesting idea for improving database technology is to actually change the way that database data is mirrored in a disk array. Rather than writing EXACT duplicates of the data, perhaps one set could be written in row-oriented form, while the other set would be written in column-oriented form. This guarantees that the data is always duplicated, but offers a new option to the query engine for retrieving the data.
The primary issue I see is not creating an issue of "the worst of both worlds". Obviously row-oriented data is going to be faster to write. Thus you can't wait for the column oriented data to finish writing. A secondary process will have to manage that in parallel, with the assumption that the database will report the data as "committed" as soon as the row-oriented write is complete.
It's a simple trick, really (Score:2, Insightful)
First, pick a dualism. Any dualism. Here's some examples:
1) You can type in a word processor, and you can type in a spreadsheet.
2) You've got computers, and you've got networks.
3) Skirts can be short, or skirts can be long.
Now, make the unsupported claim that the dualism is really a continuum. It might be true, or it might be bullshit. Like this:
1) Things you can type in resemble word processors or spreadsheets, to certain degrees.
2) You can do computing activity on an isolated computer, or you can do it on a totally networked system.
3) Skirts can be any length between long or short.
Now, assert that there is movement. Look at which end of the bullshit continuum you've created that your asserted movement is pointing at. There's your prediction for the future. If you hype that like a pro, you too can be a tech pundit on Slashdot or even somewhere important.
Here's the predictions that result:
1) Over time, word processors are going to become more like spreadsheets. Entire sections of documents are going to be calculated by formulas.
2) In the future, the network will be the computer.
3) Next year we're going to have short skirts. Yay!
How often do we see this kind of thing? Very often. And sometimes they can even be good ideas, worthy of making products out of. But even if they are terrible ideas, they're useful for hyping yourself.
Here's some more:
1) This article - Rows and columns. You assert that rows are more important that columns. Except in the glorious future where columns take over.
2) We've got Embedded processors outnumbering desktop computers. In the future, no desktops, just embedded processors in our skulls.
3) Paper is giving way to computers. In the future, completely paperless office!
4) Mouse, Joystick. In the future, you tilt your mouse.
5) Structured and object. Obviously, everything's going objects in the future. The death of structured programming means that loops and decisions are obsolete.
And so on.
I know this because I was a consultant. It's a basic skill.
Re:Should be, but isn't, and won't. (Score:3, Insightful)
I've been in the banking industry for the past 6 years and every bank I've worked at has relied on text-only server side applications that we connect to via various terminal emulators. The workstations are all modern, but we don't use anything more taxing than excel and an e-mail app.
Why have none of them changed beyond a few interface bolt-ons? Well . . . one of them actually did once . . . and it wasn't pretty. Sure it was graphical and point-and-click and more "user friendly" in appearance. But the fact of the matter is that we were a production environment and what could be done with hotkeys in 3 steps was now a 12 step process clicking widgets etc. To be fair, there were still a few hot keys, but they were all different and everyone had to relearn. Productivity never quite got back to what it was under the old system.
Larger companies will kick and scream to avoid "upgrades." Many of them have had horrifying experiences and they just can't risk getting stuck with software worse than what they have now.
Re:Rotate (Score:1, Insightful)
Stonebraker's current track record (Score:5, Insightful)
Common Business Mistake (Score:3, Insightful)
#1: Assuming what you think your customer needs is what your customer wants.
#2: Assuming they are the ones who made the mistake when you lost the job.
Index on every column, how revolutionary! (Score:3, Insightful)
That's why row-oriented databases have indexes and perform index scans.
For most users this does not matter (Score:3, Insightful)
SSDs and column stores... (Score:3, Insightful)
Re:IMS--Hierarchical DB harder to use? (Score:3, Insightful)
You are confusing an RDBMS with a persistent storage mechanism. It's really not hard to just keep data persistent any more. You don't need a 3GL or anything fancy, just some hooks to record your modifications on permanent storage, and keep a small working set cached in memory. It's an easy, trivial, solved problem. And it was solved before relational databases were invented.
RDBMSs do a lot more. Here are just a few advantages:
* different applications can access the same data
* guarantee integrity via declarative constraints that can validate against all of the data at once, not just the single record in question
* different applications can have the same guarantees of integrity, and a bug in the first application can't break the guarantee for the second application
RDBMSs were invented for a reason. Many, many software bugs can be traced back to a bad data state -- some invariant that was broken and uncaught. Often, these bugs are not caught until long after the insert has taken place, and often cause a cascade of new bad data and you don't find out until many records are wrong. A lot of code is imperative, and re-stating the invariant declaratively (i.e. a database constraint) helps catch a lot of those bugs.
Trying to put these declarative constraints in the application is a bad idea. When should they be checked? And in which applications should they be checked (all of them, one would hope)? If you see a declarative constraint, are you sure it's correct, or might it have been added after inconsistent data was entered and before the constraint was actually run?
Databases solve this by making some promises. If you put a ".. CHECK (age > 0)" on an attribute, it will check all the records before applying it, and then all the records afterward will need to pass through that constraint. That's a lot better of a guarantee, and you know it's true for all applications. Someone else's bug or quick hack won't violate it, so your application can rely on that as the truth. Same with UNIQUE or FOREIGN KEY.
If you think about your reasoning for a moment, it's very narrowly focused on storing and retrieving single records. Presumably, anything needing to look at the data as a whole would need to read it fully into the application and process it from application code.
You don't take into account other readers of data who might require consistent reports or anything else that needs to look at more than one record. You also don't take into account the horrible mess you have when the application is wrong and stores bad data, or when you need to do data format changes. In the types of databases you describe, almost any change requires reorganizing the data physically. In an RDBMS, you can make many changes without physically changing the physical layout.
Re:Perl Objects have both column and row DB advant (Score:3, Insightful)
Re:IMS--Hierarchical DB harder to use? (Score:3, Insightful)
Re:Yea, it's all the same. (Score:3, Insightful)
There are tried and true approaches to the problems you describe; several actually for most of them, depending on your needs.
If you run into these problems due to an evolutionary growth into these features, then its time to stop, take a step back, and re-architect your schemas to handle these needs from the get go.
There's no reason at all to resort to hacks like stored procedures and triggers. These are only used when your schema is fundamentally mismatched to your needs, and so you have to be continuously cleaning up after your data mods.
Any mature system doesnt ever delete 'entity' objects from the DB. That would just be silly (and wouldnt be possible if you had referential integrity set up correctly). So all of these sorts of entities have an 'active' column, that determines whether they're active or not.
In the salary example you give, you DO want to store the 'current' salary field in the employees table, but then you also have a 'positions' or 'incumbents' table to store the history of things that change over time.
I think the core problem you're describing is due to the fact that you're trying to store all of this stuff in one table. And thats not the correct approach.
The 'event based storage' you're talking about is another table, with one row per event. Some folks advocate another approach, where entity tables are versioned, and all old versions are kept (sometimes moved to another table), so you can see the history of all changes for all time. Thats not a very well normalized solution, but there are times when its appropriate.