Slashdot Log In
Are Relational Databases Obsolete?
Posted by
kdawson
on Thu Sep 06, 2007 11:27 AM
from the long-in-the-tooth dept.
from the long-in-the-tooth dept.
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
Related Stories
This discussion has been archived.
No new comments can be posted.
Are Relational Databases Obsolete?
|
Log In/Create an Account
| Top
| 417 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
They're not mutually exclusive. (Score:5, Insightful)
(http://skippus.blogspot.com/ | Last Journal: Sunday June 19 2005, @07:25AM)
Okay, at the risk of sounding stupid...
Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.
Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
Re:They're not mutually exclusive. (Score:5, Interesting)
(http://blog.godshell.com/)
Agreed. It definitely looks like a storage preference. Though column-based storage has definite benefits over row-based when it comes to store once, read many operations. Kinda like what you'd find in a data warehouse situation...
Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
Hrm.. You must be new here....
Yea, it's all the same. (Score:5, Insightful)
(Last Journal: Tuesday December 19 2006, @05:12PM)
Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.
On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.
Why do people insist that one size really does fit all?
Re:Yea, it's all the same. (Score:5, Interesting)
(http://dhbarr.freeshell.org/)
-theGreater
Re:Yea, it's all the same. (Score:5, Interesting)
(http://thepeckfamily.us/ | Last Journal: Thursday November 08, @11:19AM)
Re:Yea, it's all the same. (Score:5, Funny)
Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!
Just think about it, what could be better than one file for each column in each row?
And they said I couldn't have my cake and eat it too, sheesh
Re:Yea, it's all the same. (Score:5, Interesting)
You are years late. The PICK operating system/db already does that. Back in 1985 I used the DOS based Advanced Revelation to write GAP accounting packages. It used the ASCI 253 character to separate "columns" of data in a cell. Reading and writing was parsed automatically. Invoice information, for example, was stored in a Customer's info table, not in a invoice table, and doing a query on accounts receivable produced very fast results. Symbolic dictionary definitions containing formulas allowed for easy column and row totals.
In fact KDB/K looks a lot like a PICK system that uses FORTH as the language.
Re:Yea, it's all the same. (Score:5, Insightful)
(http://skippus.blogspot.com/ | Last Journal: Sunday June 19 2005, @07:25AM)
I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.
He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"
Are Relational Databases Obsolete? Not at all (Score:4, Informative)
The relational concept will still exist regardless of the underlying storage methods.
Re:Yea, it's all the same. (Score:5, Funny)
(Last Journal: Monday August 22 2005, @11:02AM)
Oh, the horror! That's a heinous crime on Slashdot! Not even the editors do that!!!
Perl Objects have both column and row DB advantage (Score:5, Interesting)
instead one can use blessed scalars holding a single integer value for instances and let the class variable contain all the instance data in arrays indexed by the instances scalar value.
This technique was originally promoted as an indirection to protect object data from direct manipution that bypassed get/set methods. But it also allows the object to be either row or column oriented internally. that is the class could store all the instance hashes in an array indexed by the scalar. or it could store each instance variable in a separate array that is indexed by the scalar value.
Thus the perl class can, on-the-fly, switch itself from column-oriented to row-oriented as needed while maintaining the same external interface.
Of course this is not a perl-exclusive feature and it can implemented in other languages. It just happens to be particularly easy and natural to do in perl.
Re:They're not mutually exclusive. (Score:5, Interesting)
(http://thepeckfamily.us/ | Last Journal: Thursday November 08, @11:19AM)
Re:They're not mutually exclusive. (Score:5, Funny)
Re:They're not mutually exclusive. (Score:5, Interesting)
Rule of thumb:
- you use row dbs for OLTP. They're great for writing.
- you use column dbs for data mining. They're amazing for reading aggregates (average, max, complex queries...)
The major problem with column dbs is the writing part. If you have to write one row at a time, you're screwed because it needs to take each column, read, insert into it and store. If you can write in batch, the whole process isn't much more expensive. So writing a single row could take 500ms, but writing 1000 rows will take 600ms.
Once the data's in, column dbs are the way to go.
Stonebraker's current track record (Score:5, Insightful)
(Last Journal: Friday December 01 2006, @10:51AM)
- He helped created THE first relational DB.
- He later moved to creating an Object-Oriented Relational DB with Postgres in the 80's. Much of that tech has found its way into other DBs such as Oracle and even helped create the OODBMS world.
- Now, he is creating the Column store DB and announces that this will be the next big thing.
I would listen to him. Biased or not, He has a better track record than most intelligent ppl (and all the wanna-be/hasbeens; dvrack comes to mind) in the tech field.Re:Stonebraker's current track record (Score:5, Informative)
(Last Journal: Friday December 01 2006, @10:51AM)
Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.
Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.
Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.
So armed with that knowledge, exactly WHY would you discount his work and his statements?
Re:They're not mutually exclusive. (Score:5, Insightful)
(Last Journal: Monday August 20, @01:07PM)
An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.
C'mon, the guy is biased! (Score:5, Funny)
(http://www.networkmirror.com/ | Last Journal: Thursday July 05, @04:34PM)
Re:C'mon, the guy is biased! (Score:5, Funny)
(Last Journal: Tuesday June 06 2006, @01:50PM)
Mod Article -1 (Author doesn't get it) (Score:5, Informative)
dual-mode db? (Score:5, Interesting)
(http://www.devinmoore.com/ | Last Journal: Thursday May 24, @06:16AM)
well (Score:5, Informative)
(http://thepeckfamily.us/ | Last Journal: Thursday November 08, @11:19AM)
Rotate (Score:5, Funny)
>"column stores will take over the [data] warehouse market over time, completely displacing row stores."
Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy.
Re:Rotate (Score:4, Informative)
Re:Rotate (Score:5, Insightful)
(Last Journal: Tuesday October 17 2006, @12:18AM)
The guy... (Score:5, Interesting)
(http://www.intelligentblogger.com/ | Last Journal: Monday August 27, @11:47AM)
Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior. While I think he has a point, I'm not sure if he really goes far enough. Our relational databases of today are heavily based on the ISAM files of yesteryear. Far too many products threw foreign keys on top of a collection of ISAMs and called it a day. Which is why we STILL have key integrity issues to this day.
It would be nice if we could take a step back and re-engineer our databases with more modern technology in mind. e.g. Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars.
The demands on our DBMSes are only going to grow. So there's something to be said for going back and reengineering things. If column-oriented databases are the answer, my opinion is that they're only PART of the answer. Take the redesign to its logical conclusion. Let's see databases that truly store any data, and enforce the integrity of their sets.
IMS--Hierarchical DB Still Exists (Score:5, Insightful)
(http://freejavalectures.googlepages.com/)
Re:IMS--Hierarchical DB harder to use? (Score:5, Interesting)
From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.
We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.
What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.
As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...
In the 3rd generation programming languages this was just a simple structure with 100 entries.
The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.
Next: There were about 10 parameters per field (column).
1: Column name
2: Column name length
3: data type
4: data length
5: character representation
finally 10: Address where the data lives.
The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.
Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.
Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.
I gave up on that interface.
---------------
Oracle had pre-compilers. They did the same damn thing. The code generated by the pre-compilers was just awful.
---------------
While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.
Nooo...
In the old days one could read and write into a flat file at a given location with a single statement or function call depending on the language. Of course "where" to read and write became a real issue and I do fully understand the complexity of file based tree structures and so forth, especially since I wrote a lot of code to implement these algorithms.
The thing is now we have RDBMS and other solutions that give us the data organisational abilities we need - and we lose the ease of mapping these structures into a suitable structure or object in the programming language.
I for one do not think we have stepped forward very far at all.
-------------
I'll toss in a case in point made by a good buddy of mine who just happens to be one of the top geophysical programmers in this city.
One of his clients was running an application hooked to an Oracle database running on a fast SUN. Run times were measured in close to a day.
Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.
As my buddy says - He will NOT use any RDBMS. He can take 5 views of the data comprising 1000's of seismic lines and the user can click on any trace number, line number, well tie and so forth and in real time he can modify all views of the data on as many as 5 s
Re:IMS--Hierarchical DB harder to use? (Score:4, Interesting)
(http://www.metatrontech.com/ | Last Journal: Sunday October 21, @01:39PM)
From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.
What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.
1: Column name
2: Column name length
3: data type
4: data length
5: character representation
finally 10: Address where the data lives.
The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.
Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.
Re : Are Relational Databases Obsolete? (Score:5, Funny)
It's like the packet of crisps that says "Is there a 20 pound note in here !!?" - the answer should always be 'No'.
Except maybe for one person.
sed -e 's/crisps/potato chips/' -e 's/pound/dollar/'
that doesn't mean they're going to become obsolete (Score:5, Insightful)
(Last Journal: Monday October 08, @07:57PM)
In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.
So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...
Aha! (Score:5, Funny)
(http://ofteninspired.com/ | Last Journal: Sunday April 01 2007, @05:49PM)
turning your head sideways.
Should be, but isn't, and won't. (Score:5, Interesting)
(http://www.unanimocracy.com/about.html | Last Journal: Tuesday April 04 2006, @12:04PM)
It is very frustrating because we do have programmers on staff that create third party plug-ins to these databases to try to make solutions that the OEM code doesn't. When you meet younger programmers, many of them are frustrated themselves to work on ancient solutions that have no hope of being upgraded, because these industries we work in are not in a rush to try anything new and shiny, but instead are happy with the status quo.
I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is." That's money thrown out the window, employees who are frustrated (without knowing why), and forcing the company to lose efficiency by not being able to compete with newer companies that are utilizing newer technology to better their bottom line.
Ugh.