Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Databases Programming Software IT Technology

Is the Relational Database Doomed? 344

DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."
This discussion has been archived. No new comments can be posted.

Is the Relational Database Doomed?

Comments Filter:
  • by bogaboga ( 793279 ) on Friday February 13, 2009 @05:48PM (#26849855)

    It has been suggested before that the life of the relational DB is coming to an end. I must say that while I agree with this statement: -

    Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale.

    I disagree with the following statement: -

    Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

    I submit that the complexity can be managed and that's why we have jobs.

    I am an IT consultant at a major bank and we keep all kinds of data. Data that many find useless and is spread across 27 [major] nodes. Total records in our biggest table number about 57 million with 49 rows. I can tell you that data querying and integrity maintaining are a breeze if the schematic design is correct in the first place.

    We are always designing and testing different scenarios. In cases where we have had to change the schema, it has been simple if one knows what to do.

    I must say that Open Source DBs have worked for us though we rely on products from IBM and Oracle.

    Our philosophy is: If it works in PostgreSQL, it will even do wonders on DB2 or Oracle. I do not see how we can do away with the relational DB. Whoever designed it in the beginning did a marvelous job.

  • by mlwmohawk ( 801821 ) on Friday February 13, 2009 @06:05PM (#26850079)

    The relational database is not going anywhere and nothing in that article is based on any firm understanding of managing data.

    Is the notion of a "join" obsolete? No, but it is typically impractical in a high volume system. You would probably use denormalization as a strategy.

    Scaling many nodes? OK, you still gotta put your data "in" something.

    key/value indexing? yawn. select val from keyvalue_tab where key = foo;

    The value can be basically anything, and most "relational" databases have good object support as well as XML, JSON, etc.

    So we can establish that a SQL relational database can do *everything* a simpler system can do. Now, think about ALL the things you can do with your data in a real database.

    What is the point of using a limited and less functional system? A good system, like Oracle, DB2, PostgreSQL, etc (!mysql of course) will do what you need AND allow you do do more should you be successful.

    The problem with data is two fold: Managing read/write/deletes and finding what you are looking for. These problems have been solved. A good database will do this for you. Want to store object? XML, JSON, binary objects, or a specialized database extension works perfectly.

  • by iamhigh ( 1252742 ) on Friday February 13, 2009 @06:12PM (#26850195)
    Does that example of a relational DB have a serious error, or is that just me? Why have make key in two tables?

    He lost cred right then.
  • by Eravnrekaree ( 467752 ) on Friday February 13, 2009 @06:14PM (#26850223)

    Actually i read TFA, and I just couldnt make sense of the benefits offered by the key value thing. You basically should be able to get the same benefits with a relational database system with a query that does a lookup on a single column index. This would involve searching the b-tree for that column, which would yield a row data address of some sort, to either a linked list of cells or a list of addresses of those cells. Once the single b-tree is done it is then very fast to find the other column values in that row. The b-tree or other index lookup also has to be done with the key value pair, the relational is just a collection of multiple key value indexes.

    There is the issue of having a variable number of pieces of data linked to a certain key. But you can do this in relational too. Just create a table with an id column, value type column and value column. A well designed relational, if you do a query on the id column, the b-tree will lead to data which has all of the row data addresses in the database that match the id. EAch of those rows will contain a different data type/data payload for the id. This is again pretty much as fast as a simple single index database.

  • Re:new record (Score:1, Informative)

    by Anonymous Coward on Friday February 13, 2009 @07:12PM (#26850815)
    And now the first search result on Google for Jah-Wren Ryel returns a conversation discussing whether or not he's a child molester.
  • by cat_jesus ( 525334 ) on Friday February 13, 2009 @07:21PM (#26850897)
    Total records in our biggest table number about 57 million with 49 rows.

    I think you mean columns.
  • Re:Here's a match.. (Score:5, Informative)

    by DarkOx ( 621550 ) on Friday February 13, 2009 @07:38PM (#26851083) Journal

    Wow, um where to being really....

    So you realize that the structure you are suggesting can be easily built in a traditional RDB, using a star-schema or cluster design right?

    Next you suggest doing the sorting on the client, and then say that if there is more data then a client can handle the server can be asked to send chunks according to the clients sort order. That means the server has to have all the sort logic the client has and probably in all but the most trival applications do all the sorting anyway... Seems to me a star schema and indexing the fact table on the attributes that are most comonly going to be used for sorting makes much more sense; because as I said the serve is going to be sorting anyway.

    Now there are data sets that non relational structers do make some more sense, but we have hierarchy , and navigational designes for those, yours is not one of them.

  • Re:new record (Score:3, Informative)

    by Pseudonym ( 62607 ) on Friday February 13, 2009 @07:57PM (#26851313)

    This is what linguists refer to the "tabloid headline question mark". Its use is to say something inflammatory and only tangentially related to the story in order to get readers.

    Examples:

    "Is Jennifer pregnant?"
    "Steve Ballmer: Love child of Satan?"

  • by WuphonsReach ( 684551 ) on Friday February 13, 2009 @10:56PM (#26852729)
    Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.

    QBE grids are nothing more then a UI abstraction of the underlying SQL SELECT statement. In fact, in MS-Access (which has a QBE grid), you can flip between looking at the QBE and looking at the raw SQL SELECT statement.

    Sometimes it's faster to do it in raw SQL, sometimes it's faster to setup the query in a QBE grid.
  • by DougWebb ( 178910 ) on Friday February 13, 2009 @11:42PM (#26852999) Homepage

    Map/Reduce [wikipedia.org] was developed at Google. It's a bit tough to wrap your head around at first, and once you get it you wonder what the big deal is, until you realize how suitable it is for Google's datacenters.

    Basically, you take a dataset (a bunch of key/value pairs) and a mapping function, and you run the mapping function over every item in the dataset. This gives you an intermediate dataset with different keys and values. You then run that through a reducing function, which produces your final dataset. This can be a single result, or a dataset that can then be processed with a different map/reduce pair of functions.

    The big deal for Google is that many of their problems can be expressed in terms of map and reduce functions that can operate in parallel over their datasets, and that their datacenters can handle absolutely enourmous quantities of parallel operations. So, for the mapping operation, they take the original dataset and mapping function, subdivide the dataset over thousands of servers, and let them run the mapping function in parallel. When these servers return their results, it's common for many different servers to return the same or related keys in the intermediate set. These are collated, so that when the intermediate dataset is distributed with the reduce function, all of the values with the same keys go to the same servers. This helps the reduce function to be run in parallel; it's often counting the number of original items that were assigned to the same key in the intermediate set.

  • by Anonymous Coward on Saturday February 14, 2009 @01:35AM (#26853535)

    E F Codd, an IBM mathematician. And I won't even look at a technology that claims to replace the RDB until I've seen a fully developed mathematical treatment that at least approaches the sophistication of Codd's work.

  • by encoderer ( 1060616 ) on Saturday February 14, 2009 @02:12AM (#26853695)

    Suggesting that you could replace a MS-SQL server with SQLite basically forces anybody in the know to ignore every other point you make.

    MySQL is good, unless you need a highly performent query analyzer.

    Postgres is good, unless you need actual replication features.

    SQLite is good, if your datastore is less than 1GB.

    Oracle is no-doubt a valid replacement and improvement upon SQL Server. And I use MySQL more than any other DB. But you need to hire Percona to get the same performance out of MySQL that you get from SQL Server out of the box.

  • by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Saturday February 14, 2009 @02:51AM (#26853877) Journal

    Suggesting that you could replace a MS-SQL server with SQLite basically forces anybody in the know to ignore every other point you make.

    You're assuming that the person using MS-SQL Server knows what they're doing. How do you know it's more than just a glorified Access database?

    MySQL is good, unless you need a highly performent query analyzer.

    In other words, the query analyzer is slow? Because the queries work well enough.

    Postgres is good, unless you need actual replication features.

    Like these [tinyurl.com]?

    SQLite is good, if your datastore is less than 1GB.

    Another quick Google, and we find these limits [sqlite.org] -- by default, the maximum database size is just under 32 terabytes.

    Not that I'm suggesting it's a good choice at that point, especially with multiple processes. But it does make it kind of hard to take you seriously with that kind of imagined limit, unless you're suggesting there's a practical, performance wall after 1 gig.

  • Re:Here's a match.. (Score:1, Informative)

    by Anonymous Coward on Saturday February 14, 2009 @02:12PM (#26857107)

    Need to die? Really. I'm not starting a pissing contest but I work on a 300GB database with single tables containing 700,000,000+ rows. And in the right hands, this stuff flies.

    And I think you go on to invent hierarchical databases. From personal experience, 1993 is calling. (And your indexing may get complicated. Wonder if anyother systems have had to address this?)

    Sorting on the client? Comedy gold. There are lots of way to sort but the big-ass server can usual cope. You're not suggesting something that might involve sorting a few million rows across the network (in chunks) just to return the first 100 or so? That would be stupid. If you're serious I'll sell you lots of hardware and bandwidth that's completely unnecessary.

    RBDMSes work. Other stuff will work too. Doesn't mean RBDMS need to die.

    As for faster read performance, it'll depend on what your doing. But I can pull numbers out of my arse/ass too.

  • by arevos ( 659374 ) on Sunday February 15, 2009 @09:34AM (#26862717) Homepage

    What you call a "hash table database" others might call an "indexed cursor".

    Others would be wrong ;)

    An indexed cursor only contains a reference to the original data. Memcached contains a duplicate of the original data, so I'd argue it was a database in its own right.

    However, even if Memcached doesn't meet the criteria of a database, DBM [wikipedia.org]-based databases certain do. They operate on a similar principle; a unique key points to a specific piece of data. Unlike Memcached, they are persistent, but like Memcached they are very fast and easily scalable.

    I was asking for an example of a data storage technique that scales better than RDB.

    Well, consider a modern DBM-based database like Tokyo Cabinet [sourceforge.net]. Let's say we want to distribute it evenly across 16 machines, labelled 1 to F. When a request for data comes in, we MD5 the key and use the first 4 bits to determine the machine to use. This gives us an even and consistent spread of data between machines.

    Relational databases can't easily use the same trick, because table joins are very costly to perform if the table data is distributed across several machines. In a nutshell, the flexibility of relational databases reduces their speed and scalability compared to databases with a more limited scope.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...