Forgot your password?
typodupeerror
Databases

Are Relational Databases Obsolete? 417

Posted by kdawson
from the long-in-the-tooth dept.
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
This discussion has been archived. No new comments can be posted.

Are Relational Databases Obsolete?

Comments Filter:
  • by DrinkDr.Pepper (620053) on Thursday September 06, 2007 @12:33PM (#20495879)
    Relational databases aren't being obsoleted. Some schema design heuristics are.
  • well (Score:5, Informative)

    by stoolpigeon (454276) * <bittercode@gmail> on Thursday September 06, 2007 @12:34PM (#20495891) Homepage Journal
    every article linked makes it clear that this is about warehousing as opposed to oltp. so is the technology dead? no - can it do everything? no
  • Re:dual-mode db? (Score:3, Informative)

    by XenoPhage (242134) on Thursday September 06, 2007 @12:37PM (#20495951) Homepage
    I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.
  • by cwford (848987) on Thursday September 06, 2007 @12:40PM (#20495997) Homepage
    From TFA:

    "Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. "

    Marketing hype for his startup.

    What a sleezeball.

  • Re:Rotate (Score:4, Informative)

    by Hijacked Public (999535) * on Thursday September 06, 2007 @12:50PM (#20496159)
    The most recent release (2007) will handle 2^14 columns.
  • by roman_mir (125474) on Thursday September 06, 2007 @12:52PM (#20496199) Homepage Journal
    Once someone shows that there is no longer a use for any relationship between data entries, then we'll be able to say that RDBMSs are obsolete. Actually both headlines (/. and the linked article) are mistaken about what Michael Stonebraker is saying. He is talking about read intensive applications mostly and he is talking about optimization of data for reading purposes. This does not mean that RDBMSs are obsolete for all uses, just that he sees a faster way to retrieve data for certain uses.
  • Re:dual-mode db? (Score:3, Informative)

    by tinkertim (918832) on Thursday September 06, 2007 @12:59PM (#20496315) Homepage

    I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.

    The FA threw me for a loop a couple of times, I honestly _did_ try to read it :) Correct me if I'm incorrect, but wouldn't having a service for column stores be (usually) not needed on most Unix-like platforms? Since this is mostly reading, I would think such efforts might be better spent on sqlite (or similar)?

    If your in a situation where you're mostly reading with (likely) only one infrequent writer, wouldn't eliminating the overhead of a database service entirely be desirable?

    I can't think of a situation where you would want many frequent writers to a column store schema, again, correct me if I'm off.
  • rtfa before posting (Score:4, Informative)

    by jilles (20976) on Thursday September 06, 2007 @01:12PM (#20496479) Homepage
    I can understand people not reading every link on a slashdot article they comment on. But if you post the bloody link, is it too much asked to actually RTFA?! It's an article about a column. The actual column is quite interesting.

    To add some content, this is about optimal storage for SQL databases in a data warehouse context where there are some interesting products that use something more optimal than the one size fits all solutions currently available from the big RDBMS vendors. The API on top is the same (i.e. SQL and other familiar dataware house APIs), which makes it quite easy to integrate.

    Regarding the obsolescence question, one size fits all will be good enough for most for some time to come. Increasingly people are more than happy with lightweight options that are even less efficient on which they slap persistence layers that reduce performance even more just because it allows them to autogenerate all the code that deals with stuffing boring data in some storage. Not having to deal with that makes it irrelevant how the database works and allows you to focus on how you work with the data rather than worrying about tables, rows and ACID properties. Autogenerating code that interacts with the database allows you to do all sorts of interesting things in the generated code and the layers underneath. For example, the hibernate (a popular persistence layer for Java) people have been integrating Apache Lucene, a popular search index product, so that you can index and search your data objects using lucene search queries rather than sql. It's a quite neat solution that adds real value (e.g. fully text searchable product catalogs are dead easy with this).

    Column based storage is just an optimization and not really that critical to the applications on top. If you need it, there are some specialized products currently. The author of the column is probably right about such solutions finding their way into mainstream products really soon. At the application level, you'll still be talking good old SQL to the damn thing though.
  • by SpaceLifeForm (228190) on Thursday September 06, 2007 @01:28PM (#20496707)
    The Slashdot article headline is trollish.

    The relational concept will still exist regardless of the underlying storage methods.

  • by Doctor Faustus (127273) <SlashdotNO@SPAMWilliamCleveland.Org> on Thursday September 06, 2007 @01:45PM (#20496929) Homepage
    I can't begin to tell you how often the words "FoxPro" and "MS SQL"
    You do know those aren't remotely comparable, right? FoxPro scales to more users than Access (due to tables separated into different files), but they're otherwise on a similar level in terms of what sort of jobs they're appropriate for. MS SQL Server is a full-fledged enterprise RDBMS. It may not scale quite as far as Oracle or DB2, but it get closer every generation, and having worked mostly in Oracle for the last year or so, I've been missing SQL Server.

    I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is."
    That was probably a polite way of saying "We don't believe you.". Maybe you made a case to them for why relation is obsolete, but you certainly didn't here.
  • Re:dual-mode db? (Score:3, Informative)

    by einhverfr (238914) <chris...travers@@@gmail...com> on Thursday September 06, 2007 @01:53PM (#20497019) Homepage Journal
    Ok, think about it this way:

    If you are doing killer aggregates (tell me the sum of the sales in every month for the last 25 years), you are going to be limited by possibly 2 things: CPU cycles and disk I/O throughput.

    There are several ways of addressing these issues. Basically this means either optimizing or parallelizing. Column-oriented stores are likely to help optimize the disk i/o throughput so you can just thow more processor effort at the problem.

    You can also do what Teradata and BizgressMPP do which is basically break the database into lots of little db's running on different servers and distribute the disk and processor time that way. Iirc Oracle and DB2 also offer this sort of option. Depending on how this is set up, it may be possible to use in an OLTP environment as well.

    On the other hand, suppose you are doing a lot of updates in a high transaction load environment. Which would you rather do? Update each column value (and skip around the disk doing this) or update the entire row on one disk block?

    Column-oriented databases are helpful for some things. They are not the only solution to the problem out there. And they are still relational. Hint: They are still based on Edgar Codd's relational mathematics.
  • by Sharkeys-Day (25335) on Thursday September 06, 2007 @02:06PM (#20497147) Homepage

    I think a place to start is to ask how to map language structures and RDBMS structures into a common denominator. One should never be looking at function calls with over 1000 parameters. That is just plain stupid. One should also never be dynamically mapping each and every tidbit of every field in a row on the fly at run time and especially so for each row in a table.
    Quite right, which is why programmers who still have have their sanity use JDBC or DBI. This part of your problem has already been solved at least twice.

    Here's how I write out a customer record:

      $dbh->do(
          'insert into customer (id,name,yada1,yada2,yada3) values (?,?,?,?,?)',
          undef,
          @customer{id,name,yada1,yada2,yada3}
      );

    I think that's even easier than your 3rd-gen code, and I didn't have to write my own indexing code.
  • by SixDimensionalArray (604334) on Thursday September 06, 2007 @04:04PM (#20498769)
    Just to add to that - PICK/D3 is called a "multivalue" database, and there are a lot of interesting kinds of databases out their that use this "multidimensional"/sparse array storage concept, such as the ancient MUMPS legacy system/database (now implemented as Intersystem's Cache product) found in old Veteran's Affairs (VA) systems in the US.

    I actually wonder if some of the current databases such as Microsoft SQL Server, etc. aren't going to actually start morphing into these older styles of databases due to the increase in use of XML files/data, which by their very nature are hierarchical and kind of "multidimensional". Actually, the company that now maintains PICK/D3 (Raining Data) has an interesting XML database (Tigerlogic) that uses some of the old technology and new technology combined.

    This could be a great and possibly painful experience of history repeating itself!

    SixD
  • ODBMS (Score:3, Informative)

    by Lodragandraoidh (639696) on Thursday September 06, 2007 @04:35PM (#20499125) Journal
    I think the object database management system (ODBMS) [wikipedia.org] will overtake RDBMSs in the future for several reasons (from the link and my own musings):

    1. Object-oriented databases are designed to work well with object-oriented programming languages such as Python, Java, C#, Visual Basic .NET, C++ and Smalltalk. This makes implimentation quick and easy - yet stable and scalable at the same time.

    2. ODBMSs use exactly the same model as object-oriented programming languages.

    3. It is also worth noting that object databases hold the record for the World's largest database (over 1000 Terabytes at Stanford Linear Accelerator Center).

    4. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.

    5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.

    I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets. ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.

    Right now my favorite ODBMS is the ZODB (Zope Object Data Base) [wikipedia.org] - an ODBMS system tightly integrated with both Python (implimented using Python's native 'pickle' object persistence functionality), and the Zope web application development system - which itself is built with and uses Python. You can learn more about Zope at Zope.org [zope.org].
  • by SatanicPuppy (611928) * <Satanicpuppy@g m a i l .com> on Thursday September 06, 2007 @04:36PM (#20499147) Journal
    Not sure how that follows...The column style database would be functionally identical to the row style database, only you'd have column proliferation instead of row expansion.

    The easiest way to deal with proliferating events is to create a very simple table that has a timestamp, your basic audit information (user who made the change, change the terminal was made from, etc), and the change itself.

    So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee record id, bob's salary, the date, and the audit crap. That's it. Then when bob gets a raise to 55,000 there is another simple entry, id, salary, date, audit crap. Etc, etc. All your data is there, you can easily retrieve the history, you know when the changes were made and by whom.

    It's all about normalization. Why put in two date fields if you don't have to? Two records, each with one date, will give you all the info you need, simplify your queries, whiten your teeth, etc. Whenever you have an event driven model, just throw the event, in the simplest possible form, in a table. If your tables start proliferating out of control, check your normalization and make sure you're not duplicating data across multiple tables. If that's not the problem, try to refine the scope of your database. If it's doing too many things, then try to break things out by their relevance.
  • by Anonymous Coward on Thursday September 06, 2007 @05:06PM (#20499481)
    No, it's not "PIC", it's "PICK", named after it's creator Richard Pick who originally wrote Generalized Information Retrieval Language System (GIRLS) in 1965 with Don Wilson to help inventory helicopter parts during the Vietnam War.

    http://en.wikipedia.org/wiki/Pick_operating_system [wikipedia.org]

    PIC is something completely different.
  • by Jim Ethanol (613572) on Thursday September 06, 2007 @06:16PM (#20500249) Homepage
    The next geekSessions [geeksessions.com] is on this topic. If you're interested in hearing about alternatives to RDBMS in and using them in practice, take a look at the site. The event will be held on October 2nd in downtown SF and will also be available via webcast. In addition to the presentations, we'll have a Q&A session along with some food and FREE BEER. Speakers at the event are:

    Josh Berkus from the PostgreSQL core team

    Paul Querna from Apache and Bloglines (wrote his own filesystem for Bloglines)

    Chad Walters from Powerset who is implementing BigTable there.

    Hope to see you there!
  • by CodeBuster (516420) on Thursday September 06, 2007 @06:34PM (#20500475)
    PICK has some very serious disadvantages over the relational databases, not the least of which is the rather annoying property that queries often do not return *ALL* of the related records unless the person writing the query has a very intimate knowledge of the data cubes and how the database is connected and even then there are no guarantees. The brilliance of atomicity defined by Codd and the query guarantees that one is able to make when the "one value per cell" rule is enforced are not of trivial benefit. PICK and multidimensional can be convenient in some circumstances, but the marketplace has basically decided that those niches are not worth the hassle of giving up the query benefits of the relational model for more general cases, especially when absolute speed of the query is less important than completeness of the information returned. It has also not helped that multivalue database vendors, the few that are left anyway, have generally lagged behind the relational database vendors in terms of tool support and integration with other systems.

    Note: I may be somewhat biased because of my long use of relational databases and a really bad experience updating a jBase solution that really was "long in the tooth". You might argue that I just saw a bad example of multivalue database application, but I think that it goes deeper than that or else we would all be using PICK today and not some flavor of SQL.
  • by c.gerritsen (960884) on Thursday September 06, 2007 @07:24PM (#20501075)

    Actually, Stonebraker did say, and I quote the original article that you went back and read (with a dereferenced pronoun in []),

    [The current major relational DBMSs] should be considered legacy technology, more than a quarter of century in age and "long in the tooth".

    He called the existing major RDBMSs legacy and long in the tooth for not implementing the feature he is trumpeting, column storage, as an option when setting up a DB for use. He laments the fact that these vendors don't give you the option of setting up your DB in a way that provides huge performance gains for some usages that are becoming more common everyday, and says that there will be a revolution.

    He is not saying that we don't still need row-oriented RDBMSs...which makes me wonder, does Vertica support both row and column stores?

  • by yl_mra (809735) on Thursday September 06, 2007 @08:32PM (#20501757)
    To be a relational database the database must meet a very specific set of requirements. While a standard view of the databases from the DB administrators and normal users view may allow limited ways to manipulate the data, looking at the actual storage structures and how to efficiently use the resultant files can provide some extremely efficient computational methods. For example, if one structures data in a relational database with few fields, perhaps as low as two, per representation where each representation represents a single file then the data read directly from the file, the reads may be via flat file techniques, into an array or efficient storage table, then you achieve the best of both worlds. If the above example requires compressed data, then the relational data must be read from a single file via DB operations with two different mappings overlaying the single database file. The programmer must understand the data orientation so that compression/decompression occurs correctly. Extremely fast reads/writes may be achieved this way. The advantage of a column oriented database is that the files are inherently optimized for data mining without the need to hire an expensive programmer. If the company had multiple requirements for the same data then multiple databases may be required, unless they are willing to hire the expensive programmer. A key problem results when multiple databases loose synchronization. BTW, row based databases optimized for storage size (footprint) compress data by column, not by row or record. Using these techniques, I have achieved far greater data compression than comparable Google stored data.
  • by WindBourne (631190) on Friday September 07, 2007 @02:29AM (#20504275) Journal
    Not Quite. Stonebraker was THE core developer on Ingres which was the second relational DB created (System R was the first developed by Codd et.al using SQL). He operated at both ends of the spectrum. Gray/Mohan did the same as well. While System R disappeared, Ingres was developed into a major company (ingres) and was ultimately bought by CA.
    Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.

    Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.

    Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.

    So armed with that knowledge, exactly WHY would you discount his work and his statements?
  • by MariusBoo (883340) on Friday September 07, 2007 @03:53AM (#20504665)
    Seemed interesting to me too. Try http://www.cs.aau.dk/TimeCenter/pub.htm [cs.aau.dk]. Lots of stuff, not obvious where to begin.

Computers will not be perfected until they can compute how much more than the estimate the job will cost.

Working...