Forgot your password?
typodupeerror
Databases

Are Relational Databases Obsolete? 417

Posted by kdawson
from the long-in-the-tooth dept.
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
This discussion has been archived. No new comments can be posted.

Are Relational Databases Obsolete?

Comments Filter:
  • dual-mode db? (Score:5, Interesting)

    by 192939495969798999 (58312) <<info> <at> <devinmoore.com>> on Thursday September 06, 2007 @11:33AM (#20495889) Homepage Journal
    Is there a dual-mode db, that lets you create a row-based or column-based "table"? I imagine cross-mode queries would kill performance, but at least you could have a system front-loaded with row tables, where data comes in, and then archive this data over time into the column-based tables, so that reads were fast.
  • by XenoPhage (242134) on Thursday September 06, 2007 @11:35AM (#20495911) Homepage
    Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

    Agreed. It definitely looks like a storage preference. Though column-based storage has definite benefits over row-based when it comes to store once, read many operations. Kinda like what you'd find in a data warehouse situation...

    Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

    Hrm.. You must be new here....
  • by stoolpigeon (454276) * <bittercode@gmail> on Thursday September 06, 2007 @11:37AM (#20495947) Homepage Journal
    You are exactly right and this is backed up by the home page for c-store [mit.edu]. It says: "C-Store is a read-optimized relational DBMS " - c-store is the open source project that apparently is the basis for Vertica - Stonebraker's commercial offering.
  • The guy... (Score:5, Interesting)

    by AKAImBatman (238306) <akaimbatman@gmai ... m minus language> on Thursday September 06, 2007 @11:38AM (#20495963) Homepage Journal
    ...is duping [slashdot.org] himself [slashdot.org] and thus Slashdot is duping the stories by extension.

    Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior. While I think he has a point, I'm not sure if he really goes far enough. Our relational databases of today are heavily based on the ISAM files of yesteryear. Far too many products threw foreign keys on top of a collection of ISAMs and called it a day. Which is why we STILL have key integrity issues to this day.

    It would be nice if we could take a step back and re-engineer our databases with more modern technology in mind. e.g. Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars.

    The demands on our DBMSes are only going to grow. So there's something to be said for going back and reengineering things. If column-oriented databases are the answer, my opinion is that they're only PART of the answer. Take the redesign to its logical conclusion. Let's see databases that truly store any data, and enforce the integrity of their sets.
  • by dada21 (163177) <adam.dada@gmail.com> on Thursday September 06, 2007 @11:42AM (#20496039) Homepage Journal
    In my IT business, a vast majority of our top tier clients (grossing over US$100 million annually) are still using antiquated software that is still using a relational database backend. While these companies are generally VERY efficient in terms of providing services or products to their market, their accounting, purchase orders and project management software is decades outdated. Many of the companies that maintain these packages have merely made the interface more current (but still 5+ years old, but are still using terribly outdated software. I can't begin to tell you how often the words "FoxPro" and "MS SQL" come up and it ends up being a relational database "solution" or even worse.

    It is very frustrating because we do have programmers on staff that create third party plug-ins to these databases to try to make solutions that the OEM code doesn't. When you meet younger programmers, many of them are frustrated themselves to work on ancient solutions that have no hope of being upgraded, because these industries we work in are not in a rush to try anything new and shiny, but instead are happy with the status quo.

    I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is." That's money thrown out the window, employees who are frustrated (without knowing why), and forcing the company to lose efficiency by not being able to compete with newer companies that are utilizing newer technology to better their bottom line.

    Ugh.
  • Object Databases (Score:4, Interesting)

    by jjohnson (62583) on Thursday September 06, 2007 @11:45AM (#20496101) Homepage
    Are they now officially an also-ran? Has the whole concept failed to be usefully implemented commercially, or will it be another Lisp--elegant, beautiful, and largely unused because it's kind of weird?
  • by theGreater (596196) on Thursday September 06, 2007 @11:47AM (#20496125) Homepage
    So it seems to me the -real- money is in integrating an RDBMS which, for usage purposes, is row-oriented; but which, for archival purposes, is column-oriented. This could either be a backup-type thing, or an aging-type thing. Quick, to the Pat(ent)mobile!

    -theGreater
  • He may have a point (Score:3, Interesting)

    by duffbeer703 (177751) * on Thursday September 06, 2007 @11:49AM (#20496143)
    For data warehousing, a higher or different level of abstraction may be useful and make database design easier, particularly as paralellism becomes more and more common. Storing rich markup language or media in a database might be problematic as well.

    But there's no way that RDBMS's are going away -- relational algebra simply solves too many data storage problems.
  • by stoolpigeon (454276) * <bittercode@gmail> on Thursday September 06, 2007 @11:51AM (#20496189) Homepage Journal
    Maybe, but I doubt it. The money is in the data warehouse market and the etl tools that move the data from the oltp environment to the warehouse environment. I think what the author points out is not that people are trying to use the same database to do both, but rather that they are trying to use the same product to both. He says it would make more sense to use Oracle (for example) for oltp - and something else for the warehouse, rather than trying to get Oracle to do both well.
  • by homb (82455) on Thursday September 06, 2007 @11:56AM (#20496265)
    I wish we could put this thing to rest once and for all. And I wish so-called "experts" in the field actually were.

    Rule of thumb:
    - you use row dbs for OLTP. They're great for writing.
    - you use column dbs for data mining. They're amazing for reading aggregates (average, max, complex queries...)

    The major problem with column dbs is the writing part. If you have to write one row at a time, you're screwed because it needs to take each column, read, insert into it and store. If you can write in batch, the whole process isn't much more expensive. So writing a single row could take 500ms, but writing 1000 rows will take 600ms.
    Once the data's in, column dbs are the way to go.
  • Wrong approach? (Score:4, Interesting)

    by Aladrin (926209) on Thursday September 06, 2007 @12:20PM (#20496569)
    Maybe his approach is all wrong. The database my company uses has MANY tables that are rarely written to, but a few that are written to ALL the time. Instead of trying to cram his 'one size fits all' database scheme down our throats and replace the current 'one size fits all' database scheme, maybe he should be trying to create a database engine that can do both.

    I think you would have to determine the main use of the table beforehand (write-seldom or write-often), but the DB engine could use a different scheme for each table that way. I know some will claim that it can't be more efficient to split things this way, but remember that this guy is claiming 50x the speed for write-seldom operations.

    As for Relational Databases... How is this exclusive to that? This is simply how the data is stored and accessed. If he is claiming 50x speed-up because he doesn't deal with the relational stuff, that's bunk. You could write a row-store database with much greater speed as well, given those parameters.
  • by dpbsmith (263124) on Thursday September 06, 2007 @12:20PM (#20496589) Homepage
    '"In every major application area I can think of, it is possible to build a SQL DBMS engine with vertical market-specific internals that outperforms the 'one size fits all' engines by a factor of 50 or so," he wrote.'

    I know very little about DBMS systems, but I thought it has always been true that you can achieve monumental performance increases by building somewhat specialized database systems in which the internals of the system make assumptions, and are tied to, the structure of the data being modelled. In fact, when RDBMS systems came in, one of the knocks on them was that they were far more resource-intensive than the hierarchical databases they displaced. However, the carved-in-stone assumptions of those models made them difficult and expensive change or repurposed.

    I'm sure I remember innumerable articles insisting that "relational databases don't need to be really all that much terribly slower if you know how to optimize this that and the other thing..."

    In other words, as an outsider viewing from a distance, I've assumed that the increasing use of RDBMS was an indication that in the real world it turned out that it was better to be slow, flexible, and general, than fast, rigid, and specialized.

    So, what is a "column store?" It sounds like it is an agile, rapid development methodology for generating fast, rigid, specialized databases?
  • by goombah99 (560566) on Thursday September 06, 2007 @12:22PM (#20496617)
    Traditionally perl-objects are hashes with one blessed hash per instance. The hash contains all the instance variable values using their names as keys.

    instead one can use blessed scalars holding a single integer value for instances and let the class variable contain all the instance data in arrays indexed by the instances scalar value.

    This technique was originally promoted as an indirection to protect object data from direct manipution that bypassed get/set methods. But it also allows the object to be either row or column oriented internally. that is the class could store all the instance hashes in an array indexed by the scalar. or it could store each instance variable in a separate array that is indexed by the scalar value.

    Thus the perl class can, on-the-fly, switch itself from column-oriented to row-oriented as needed while maintaining the same external interface.

    Of course this is not a perl-exclusive feature and it can implemented in other languages. It just happens to be particularly easy and natural to do in perl.

  • by cdn-programmer (468978) <terr@terraloLAPL ... t minus math_god> on Thursday September 06, 2007 @12:35PM (#20496819)
    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    I gave up on that interface.

    ---------------

    Oracle had pre-compilers. They did the same damn thing. The code generated by the pre-compilers was just awful.

    ---------------

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Nooo...

    In the old days one could read and write into a flat file at a given location with a single statement or function call depending on the language. Of course "where" to read and write became a real issue and I do fully understand the complexity of file based tree structures and so forth, especially since I wrote a lot of code to implement these algorithms.

    The thing is now we have RDBMS and other solutions that give us the data organisational abilities we need - and we lose the ease of mapping these structures into a suitable structure or object in the programming language.

    I for one do not think we have stepped forward very far at all.

    -------------

    I'll toss in a case in point made by a good buddy of mine who just happens to be one of the top geophysical programmers in this city.

    One of his clients was running an application hooked to an Oracle database running on a fast SUN. Run times were measured in close to a day.

    Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.

    As my buddy says - He will NOT use any RDBMS. He can take 5 views of the data comprising 1000's of seismic lines and the user can click on any trace number, line number, well tie and so forth and in real time he can modify all views of the data on as many as 5 s
  • Re:dual-mode db? (Score:3, Interesting)

    by mr_mischief (456295) on Thursday September 06, 2007 @12:52PM (#20497007) Journal
    That would actually be a smart way to handle it. Lots of apps write only to a single master (or its fallback) and read only from the slaves already. If you had a row-based master for fast writes and replicated to column-based slaves that could be a real win.

    RLE on the data columns is a pretty big win for column-based stores, too. If the slaves manage RLE during a replication, you could have one hell of a DB farm.
  • by Jerry (6400) on Thursday September 06, 2007 @01:26PM (#20497447)
    Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!


    You are years late. The PICK operating system/db already does that. Back in 1985 I used the DOS based Advanced Revelation to write GAP accounting packages. It used the ASCI 253 character to separate "columns" of data in a cell. Reading and writing was parsed automatically. Invoice information, for example, was stored in a Customer's info table, not in a invoice table, and doing a query on accounts receivable produced very fast results. Symbolic dictionary definitions containing formulas allowed for easy column and row totals.

    In fact KDB/K looks a lot like a PICK system that uses FORTH as the language.

  • by Bonewalker (631203) on Thursday September 06, 2007 @01:42PM (#20497653)
    So, what we need is a morphing database structure depending on need. While I am writing, I want it to be row-based. As soon as I am done and ready for reading, it should switch to a column-based structure. Who's up for starting a new project on sourceforge.net? The shape-shifting relational database! SSRD is the wave of the future!
  • by bjourne (1034822) on Thursday September 06, 2007 @03:08PM (#20498827) Homepage Journal
    The sane thing to me seems to be to just skip the oltp step since the data contained therein is a strict subset of the data in the warehouse. Let's say you design a relation database for a company. You have a table with Employees and another with their Positions, a Project table and so on like the orthodox relational model tells you to. Which works great until someone quits, then you have a problem.

    Naturally, you don't want to delete that person because then you lose lots of important archival data. So you introduce two fields in the Employee table which contains the dates when the employee started and quitted. You also need to do the same for almost all other data. If Bob gets a raise, you don't want to simply update his salary field, you want the data to state that Bob had salary $x during the dates $A-$B and $y during the dates $B-$C. So you need even more fields for validity intervals to fully describe your data. It gets incredibly messy, all your queries becomes really complicated and you need to employ stored procedures and triggers to keep your data intact. The relational model breaks down.

    What you need at this point, is some kind of "event based data storage." Like "Bob got a 5% raise 20070801" or "Company address changed from $x to $y 20001010." For these kinds of data, a warehouse is much better suited. A column based storage would probably also be much faster than a row based for it.
  • by lgw (121541) on Thursday September 06, 2007 @03:15PM (#20498915) Journal
    The "relational" concept existed before SQL, really. It's just a question of whether you want to do the work on the client side or the server side.

    The modern RDBMS is good when the pipe from client to server is much smaller than the pipe from server to backing store/cache. Minimal communication for maximum results. The trade-off, of course, is that the server needs lots of resources because it's doing significant work on behalf of every client.

    "Non-relational" databases still have their place today, however, in embedded/appliance environments. When the pipe between the client and server is very large (shared memory), and the server has only a token amount of memory for cache, a RDMBS is a poor choice. The distinction is only the implementation: nothing stops you from using relational concepts and pushing the work to the client side. But the implementation difference really changes what's fast and what's slow.

    Similarly, a column-based store is just an implementation difference from a traditional RDBMS, but it also makes a real change in what's fast and what's slow, which of course means a real change in what makes a good DB design.
  • by einhverfr (238914) <chris.traversNO@SPAMgmail.com> on Thursday September 06, 2007 @03:16PM (#20498927) Homepage Journal

    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    I think there is a key distinction here. Application object store vs data management. Hierarchical db's are far better at storing object information, but *far* worse at real data managment.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    In LedgerSMB, we solved this by putting a functional interface in the db. Then we dynamically map the objects and their properties into functions and arguments. Works great :-)

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    You are either trolling or you need to fire the DB architect who designed that. THere is *no way* that a 100-column table is good DB design. (Ok, mathematically, there is nothing that precludes it being good db design, but I can't even imagine a scenario where this would be OK).

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    Oh, you were the one who designed the 100-column table. Sorry..... Please go out and get some books on db design. You will thank me :-)

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    IMO, your problem honestly is in the fact that you are using a monkey wrench as a ball peen hammer. It may sorta work but you are using the wrong tool for the job. If you want a simple object store use BDB or something like it. If you want a real data management solution, build your db *first.* If that is not your goal, use something other than an RDBMS.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    How is this an issue with RDBMS's?

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    Again, this is not a PostgreSQL problem ;-)

    I gave up on that interface.

    From your description, that sounds like a wise choice.

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Again, this is an issue with the frameworks you are using. Personally, I tend to do the

If the code and the comments disagree, then both are probably wrong. -- Norm Schryer

Working...