Are Relational Databases Obsolete? 417
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
Mod Article -1 (Author doesn't get it) (Score:5, Informative)
well (Score:5, Informative)
Re:dual-mode db? (Score:3, Informative)
Marketing hype by FUD.. typical (Score:2, Informative)
"Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. "
Marketing hype for his startup.
What a sleezeball.
Re:Rotate (Score:4, Informative)
Are relations obsolete? (Score:5, Informative)
Re:dual-mode db? (Score:3, Informative)
The FA threw me for a loop a couple of times, I honestly _did_ try to read it
If your in a situation where you're mostly reading with (likely) only one infrequent writer, wouldn't eliminating the overhead of a database service entirely be desirable?
I can't think of a situation where you would want many frequent writers to a column store schema, again, correct me if I'm off.
rtfa before posting (Score:4, Informative)
To add some content, this is about optimal storage for SQL databases in a data warehouse context where there are some interesting products that use something more optimal than the one size fits all solutions currently available from the big RDBMS vendors. The API on top is the same (i.e. SQL and other familiar dataware house APIs), which makes it quite easy to integrate.
Regarding the obsolescence question, one size fits all will be good enough for most for some time to come. Increasingly people are more than happy with lightweight options that are even less efficient on which they slap persistence layers that reduce performance even more just because it allows them to autogenerate all the code that deals with stuffing boring data in some storage. Not having to deal with that makes it irrelevant how the database works and allows you to focus on how you work with the data rather than worrying about tables, rows and ACID properties. Autogenerating code that interacts with the database allows you to do all sorts of interesting things in the generated code and the layers underneath. For example, the hibernate (a popular persistence layer for Java) people have been integrating Apache Lucene, a popular search index product, so that you can index and search your data objects using lucene search queries rather than sql. It's a quite neat solution that adds real value (e.g. fully text searchable product catalogs are dead easy with this).
Column based storage is just an optimization and not really that critical to the applications on top. If you need it, there are some specialized products currently. The author of the column is probably right about such solutions finding their way into mainstream products really soon. At the application level, you'll still be talking good old SQL to the damn thing though.
Are Relational Databases Obsolete? Not at all (Score:4, Informative)
The relational concept will still exist regardless of the underlying storage methods.
Re:Should be, but isn't, and won't. (Score:3, Informative)
You do know those aren't remotely comparable, right? FoxPro scales to more users than Access (due to tables separated into different files), but they're otherwise on a similar level in terms of what sort of jobs they're appropriate for. MS SQL Server is a full-fledged enterprise RDBMS. It may not scale quite as far as Oracle or DB2, but it get closer every generation, and having worked mostly in Oracle for the last year or so, I've been missing SQL Server.
I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is."
That was probably a polite way of saying "We don't believe you.". Maybe you made a case to them for why relation is obsolete, but you certainly didn't here.
Re:dual-mode db? (Score:3, Informative)
If you are doing killer aggregates (tell me the sum of the sales in every month for the last 25 years), you are going to be limited by possibly 2 things: CPU cycles and disk I/O throughput.
There are several ways of addressing these issues. Basically this means either optimizing or parallelizing. Column-oriented stores are likely to help optimize the disk i/o throughput so you can just thow more processor effort at the problem.
You can also do what Teradata and BizgressMPP do which is basically break the database into lots of little db's running on different servers and distribute the disk and processor time that way. Iirc Oracle and DB2 also offer this sort of option. Depending on how this is set up, it may be possible to use in an OLTP environment as well.
On the other hand, suppose you are doing a lot of updates in a high transaction load environment. Which would you rather do? Update each column value (and skip around the disk doing this) or update the entire row on one disk block?
Column-oriented databases are helpful for some things. They are not the only solution to the problem out there. And they are still relational. Hint: They are still based on Edgar Codd's relational mathematics.
Re:IMS--Hierarchical DB harder to use? (Score:3, Informative)
Here's how I write out a customer record:
$dbh->do(
'insert into customer (id,name,yada1,yada2,yada3) values (?,?,?,?,?)',
undef,
@customer{id,name,yada1,yada2,yada3}
);
I think that's even easier than your 3rd-gen code, and I didn't have to write my own indexing code.
Re:Yea, it's all the same. (Score:3, Informative)
I actually wonder if some of the current databases such as Microsoft SQL Server, etc. aren't going to actually start morphing into these older styles of databases due to the increase in use of XML files/data, which by their very nature are hierarchical and kind of "multidimensional". Actually, the company that now maintains PICK/D3 (Raining Data) has an interesting XML database (Tigerlogic) that uses some of the old technology and new technology combined.
This could be a great and possibly painful experience of history repeating itself!
SixD
ODBMS (Score:3, Informative)
1. Object-oriented databases are designed to work well with object-oriented programming languages such as Python, Java, C#, Visual Basic
2. ODBMSs use exactly the same model as object-oriented programming languages.
3. It is also worth noting that object databases hold the record for the World's largest database (over 1000 Terabytes at Stanford Linear Accelerator Center).
4. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.
5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.
I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets. ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.
Right now my favorite ODBMS is the ZODB (Zope Object Data Base) [wikipedia.org] - an ODBMS system tightly integrated with both Python (implimented using Python's native 'pickle' object persistence functionality), and the Zope web application development system - which itself is built with and uses Python. You can learn more about Zope at Zope.org [zope.org].
Re:Yea, it's all the same. (Score:3, Informative)
The easiest way to deal with proliferating events is to create a very simple table that has a timestamp, your basic audit information (user who made the change, change the terminal was made from, etc), and the change itself.
So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee record id, bob's salary, the date, and the audit crap. That's it. Then when bob gets a raise to 55,000 there is another simple entry, id, salary, date, audit crap. Etc, etc. All your data is there, you can easily retrieve the history, you know when the changes were made and by whom.
It's all about normalization. Why put in two date fields if you don't have to? Two records, each with one date, will give you all the info you need, simplify your queries, whiten your teeth, etc. Whenever you have an event driven model, just throw the event, in the simplest possible form, in a table. If your tables start proliferating out of control, check your normalization and make sure you're not duplicating data across multiple tables. If that's not the problem, try to refine the scope of your database. If it's doing too many things, then try to break things out by their relevance.
Re:Yea, it's all the same. (Score:2, Informative)
http://en.wikipedia.org/wiki/Pick_operating_syste
PIC is something completely different.
We're talking about this topic at geekSessions (Score:3, Informative)
Josh Berkus from the PostgreSQL core team
Paul Querna from Apache and Bloglines (wrote his own filesystem for Bloglines)
Chad Walters from Powerset who is implementing BigTable there.
Hope to see you there!
Re:Yea, it's all the same. (Score:3, Informative)
Note: I may be somewhat biased because of my long use of relational databases and a really bad experience updating a jBase solution that really was "long in the tooth". You might argue that I just saw a bad example of multivalue database application, but I think that it goes deeper than that or else we would all be using PICK today and not some flavor of SQL.
Re:Yea, it's all the same. (Score:2, Informative)
Actually, Stonebraker did say, and I quote the original article that you went back and read (with a dereferenced pronoun in []),
He called the existing major RDBMSs legacy and long in the tooth for not implementing the feature he is trumpeting, column storage, as an option when setting up a DB for use. He laments the fact that these vendors don't give you the option of setting up your DB in a way that provides huge performance gains for some usages that are becoming more common everyday, and says that there will be a revolution.
He is not saying that we don't still need row-oriented RDBMSs...which makes me wonder, does Vertica support both row and column stores?
Arther makes great point, complex subject (Score:3, Informative)
Re:Stonebraker's current track record (Score:5, Informative)
Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.
Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.
Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.
So armed with that knowledge, exactly WHY would you discount his work and his statements?
Re:Yea, it's all the same. (Score:1, Informative)