Forgot your password?
typodupeerror
Databases

Are Relational Databases Obsolete? 417

Posted by kdawson
from the long-in-the-tooth dept.
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
This discussion has been archived. No new comments can be posted.

Are Relational Databases Obsolete?

Comments Filter:
  • by KingSkippus (799657) * on Thursday September 06, 2007 @12:28PM (#20495819) Homepage Journal

    Okay, at the risk of sounding stupid...

    Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

    Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

  • by curmudgeon99 (1040054) <curmudgeon99.gmail@com> on Thursday September 06, 2007 @12:39PM (#20495979)
    You've all heard of the IBM product called DB2, right? So what was DB1? Answer: IMS, which is a hierarchical database. They were a pain in the ass to use--PSBs and all--but they were/are faster than hell and I doubt any company is going to throw them out for any reason. Same goes for relational databases. They're going nowhere. Sure, we have room for more but nobody is going to displace the RDBMS anytime soon.
  • by Arathon (1002016) on Thursday September 06, 2007 @12:39PM (#20495991) Journal
    Obviously, he's biased. But more importantly, he just said that column-store databases are going to take over the WAREHOUSE market. That doesn't mean that row-store databases are going to become obsolete, because there will always be applications out there that do a substantial amount of writing as well as reading.

    In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.

    So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...
  • by SatanicPuppy (611928) * <[Satanicpuppy] [at] [gmail.com]> on Thursday September 06, 2007 @12:42PM (#20496021) Journal
    Column stores are great (better than a row store) if you're just reading tons of data, but they're much more costly than a row store if you're writing tons of data.

    Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.

    On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.

    Why do people insist that one size really does fit all?
  • by OECD (639690) on Thursday September 06, 2007 @12:51PM (#20496185) Journal
    Congratulations! You figured out what an opinion is.

    An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.

  • by SashaMan (263632) on Thursday September 06, 2007 @01:00PM (#20496325)
    In a word, yes. I think there are a couple reasons for this:

    1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
    2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker. Suppose an object database can handle 95% of my queries with adequate performance. All well and good, but I'm totally screwed on those other 5%. On the other hand, if I was using a relational database with hibernate, hibernate might handle 95% of the queries with adequate performance, but for those other 5% I can workaround by writing custom SQL. With that setup I get the best of both worlds.

    I don't know of any attempts to use object databases on large enterprise projects that haven't been complete failures, with the failure always due to performance issues.
  • by KingSkippus (799657) * on Thursday September 06, 2007 @01:00PM (#20496341) Homepage Journal

    Why do people insist that one size really does fit all?

    I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.

    He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"

  • Re:Rotate (Score:5, Insightful)

    by ben there... (946946) on Thursday September 06, 2007 @01:12PM (#20496473) Journal

    Excel only handles 255 Columns.
    It should be noted that if you've designed a database (rather than an Excel abomination) with more than 255 columns, chances are, you're doing it wrong.
  • Re:The guy... (Score:3, Insightful)

    by AKAImBatman (238306) <akaimbatman@@@gmail...com> on Thursday September 06, 2007 @01:30PM (#20496745) Homepage Journal

    Or in other words column wise read would be fast but write slow. I could see this being an option within DBs in the future. As it is just a data layout problem not a language problem.

    An interesting idea for improving database technology is to actually change the way that database data is mirrored in a disk array. Rather than writing EXACT duplicates of the data, perhaps one set could be written in row-oriented form, while the other set would be written in column-oriented form. This guarantees that the data is always duplicated, but offers a new option to the query engine for retrieving the data.

    The primary issue I see is not creating an issue of "the worst of both worlds". Obviously row-oriented data is going to be faster to write. Thus you can't wait for the column oriented data to finish writing. A secondary process will have to manage that in parallel, with the assumption that the database will report the data as "committed" as soon as the row-oriented write is complete.
  • by MarsDefenseMinister (738128) <dallapieta80@gmail.com> on Thursday September 06, 2007 @01:36PM (#20496821) Homepage Journal
    This guy isn't really making a prediction, he's just generating publicity for himself, and that requires a gimmick. Creating the gimmick is easy, and it's done like this:

    First, pick a dualism. Any dualism. Here's some examples:

    1) You can type in a word processor, and you can type in a spreadsheet.
    2) You've got computers, and you've got networks.
    3) Skirts can be short, or skirts can be long.

    Now, make the unsupported claim that the dualism is really a continuum. It might be true, or it might be bullshit. Like this:

    1) Things you can type in resemble word processors or spreadsheets, to certain degrees.
    2) You can do computing activity on an isolated computer, or you can do it on a totally networked system.
    3) Skirts can be any length between long or short.

    Now, assert that there is movement. Look at which end of the bullshit continuum you've created that your asserted movement is pointing at. There's your prediction for the future. If you hype that like a pro, you too can be a tech pundit on Slashdot or even somewhere important.

    Here's the predictions that result:

    1) Over time, word processors are going to become more like spreadsheets. Entire sections of documents are going to be calculated by formulas.
    2) In the future, the network will be the computer.
    3) Next year we're going to have short skirts. Yay!

    How often do we see this kind of thing? Very often. And sometimes they can even be good ideas, worthy of making products out of. But even if they are terrible ideas, they're useful for hyping yourself.

    Here's some more:
    1) This article - Rows and columns. You assert that rows are more important that columns. Except in the glorious future where columns take over.
    2) We've got Embedded processors outnumbering desktop computers. In the future, no desktops, just embedded processors in our skulls.
    3) Paper is giving way to computers. In the future, completely paperless office!
    4) Mouse, Joystick. In the future, you tilt your mouse.
    5) Structured and object. Obviously, everything's going objects in the future. The death of structured programming means that loops and decisions are obsolete.

    And so on.

    I know this because I was a consultant. It's a basic skill.
  • by Orange Crush (934731) on Thursday September 06, 2007 @01:41PM (#20496881)

    I've been in the banking industry for the past 6 years and every bank I've worked at has relied on text-only server side applications that we connect to via various terminal emulators. The workstations are all modern, but we don't use anything more taxing than excel and an e-mail app.

    Why have none of them changed beyond a few interface bolt-ons? Well . . . one of them actually did once . . . and it wasn't pretty. Sure it was graphical and point-and-click and more "user friendly" in appearance. But the fact of the matter is that we were a production environment and what could be done with hotkeys in 3 steps was now a 12 step process clicking widgets etc. To be fair, there were still a few hot keys, but they were all different and everyone had to relearn. Productivity never quite got back to what it was under the old system.

    Larger companies will kick and scream to avoid "upgrades." Many of them have had horrifying experiences and they just can't risk getting stuck with software worse than what they have now.

  • Re:Rotate (Score:1, Insightful)

    by Anonymous Coward on Thursday September 06, 2007 @02:46PM (#20497715)
    You're doing it wrong.
  • by WindBourne (631190) on Thursday September 06, 2007 @03:16PM (#20498121) Journal
    1. He helped created THE first relational DB.
    2. He later moved to creating an Object-Oriented Relational DB with Postgres in the 80's. Much of that tech has found its way into other DBs such as Oracle and even helped create the OODBMS world.
    3. Now, he is creating the Column store DB and announces that this will be the next big thing.
    I would listen to him. Biased or not, He has a better track record than most intelligent ppl (and all the wanna-be/hasbeens; dvrack comes to mind) in the tech field.
  • by C10H14N2 (640033) on Thursday September 06, 2007 @04:06PM (#20498791)

    #1: Assuming what you think your customer needs is what your customer wants.
    #2: Assuming they are the ones who made the mistake when you lost the job.
  • by porneL (674499) on Thursday September 06, 2007 @07:16PM (#20500983) Homepage

    A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about

    That's why row-oriented databases have indexes and perform index scans.

  • by ChrisA90278 (905188) on Thursday September 06, 2007 @08:56PM (#20501931)
    Michael Stonebraker is certainly a well respected nae and he was been right on these issues in the past. Coinsidently I'd testing my software with a new version of PostgreSQL as I type. I think colum vs. row storage can be considered simply a option. I can even see it being an option that you specify at the table level. Most DBMS users really don't have much data. Today a 1,000,000 row table can be cached in RAM on even a low-end PC based server. Once cached in RAM row vs. column storage does not matter. I would imagine that 99& of the database table in te world have far fewer then a million rows. This discussion applies only to the very few that are really large.
  • by Kjella (173770) on Thursday September 06, 2007 @10:14PM (#20502565) Homepage
    Will the imminent transition to SSDs make any difference? Because row-based DBs means you're typically reading large chunks (one raw) sequentially, while columns stores means you're reading many small (number of columns) chunks for every row. I'd think that if random access time was almost none, you'd get almost the same write performance while read performance could be greatly improved because you only read the columns you need. It'd certainly make DB design easier too if you didn't have to worry about putting very light information in the same table as heavy blobs.
  • by jadavis (473492) on Friday September 07, 2007 @02:22AM (#20504243)

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    You are confusing an RDBMS with a persistent storage mechanism. It's really not hard to just keep data persistent any more. You don't need a 3GL or anything fancy, just some hooks to record your modifications on permanent storage, and keep a small working set cached in memory. It's an easy, trivial, solved problem. And it was solved before relational databases were invented.

    RDBMSs do a lot more. Here are just a few advantages:

      * different applications can access the same data
      * guarantee integrity via declarative constraints that can validate against all of the data at once, not just the single record in question
      * different applications can have the same guarantees of integrity, and a bug in the first application can't break the guarantee for the second application

    RDBMSs were invented for a reason. Many, many software bugs can be traced back to a bad data state -- some invariant that was broken and uncaught. Often, these bugs are not caught until long after the insert has taken place, and often cause a cascade of new bad data and you don't find out until many records are wrong. A lot of code is imperative, and re-stating the invariant declaratively (i.e. a database constraint) helps catch a lot of those bugs.

    Trying to put these declarative constraints in the application is a bad idea. When should they be checked? And in which applications should they be checked (all of them, one would hope)? If you see a declarative constraint, are you sure it's correct, or might it have been added after inconsistent data was entered and before the constraint was actually run?

    Databases solve this by making some promises. If you put a ".. CHECK (age > 0)" on an attribute, it will check all the records before applying it, and then all the records afterward will need to pass through that constraint. That's a lot better of a guarantee, and you know it's true for all applications. Someone else's bug or quick hack won't violate it, so your application can rely on that as the truth. Same with UNIQUE or FOREIGN KEY.

    If you think about your reasoning for a moment, it's very narrowly focused on storing and retrieving single records. Presumably, anything needing to look at the data as a whole would need to read it fully into the application and process it from application code.

    You don't take into account other readers of data who might require consistent reports or anything else that needs to look at more than one record. You also don't take into account the horrible mess you have when the application is wrong and stores bad data, or when you need to do data format changes. In the types of databases you describe, almost any change requires reorganizing the data physically. In an RDBMS, you can make many changes without physically changing the physical layout.
  • by tgv (254536) on Friday September 07, 2007 @03:04AM (#20504447) Journal
    I don't think that has anything to do with the article. That is about storage on disk, not about manipulating pointers in memory to such an extent that a programming language that should never have been invented in the first place becomes even more ununderstandable.
  • by NoOneInParticular (221808) on Friday September 07, 2007 @02:33PM (#20511921)
    Ouch. And this exactly why SQL should die as the primary interface to the RDMBS. How the hell is my compiler going to help me find simple typing errors when the interface to use the RDBMS is built upon ... strings! You are generating code from code, that's not integration, that's a hack! It's useful, as there is no sound other way to approach an rdbms, but it's not a pretty sight.
  • by Allador (537449) on Friday September 07, 2007 @08:20PM (#20516259)
    The answer to what you're describing is not to give up relational dbs, but to design your schema correctly.

    There are tried and true approaches to the problems you describe; several actually for most of them, depending on your needs.

    If you run into these problems due to an evolutionary growth into these features, then its time to stop, take a step back, and re-architect your schemas to handle these needs from the get go.

    There's no reason at all to resort to hacks like stored procedures and triggers. These are only used when your schema is fundamentally mismatched to your needs, and so you have to be continuously cleaning up after your data mods.

    Any mature system doesnt ever delete 'entity' objects from the DB. That would just be silly (and wouldnt be possible if you had referential integrity set up correctly). So all of these sorts of entities have an 'active' column, that determines whether they're active or not.

    In the salary example you give, you DO want to store the 'current' salary field in the employees table, but then you also have a 'positions' or 'incumbents' table to store the history of things that change over time.

    I think the core problem you're describing is due to the fact that you're trying to store all of this stuff in one table. And thats not the correct approach.

    The 'event based storage' you're talking about is another table, with one row per event. Some folks advocate another approach, where entity tables are versioned, and all old versions are kept (sometimes moved to another table), so you can see the history of all changes for all time. Thats not a very well normalized solution, but there are times when its appropriate.

"Just Say No." - Nancy Reagan "No." - Ronald Reagan

Working...