Forgot your password?
typodupeerror
Databases Programming Software IT Technology

Is the Relational Database Doomed? 344

Posted by ScuttleMonkey
from the just-more-tools-in-the-bucket dept.
DB Guy writes "There's an article over on Read Write Web about what the future of relational databases looks like when faced with new challenges to its dominance from key/value stores, such as SimpleDB, CouchDB, Project Voldemort and BigTable. The conclusion suggests that relational databases and key value stores aren't really mutually exclusive and instead are different tools for different requirements."
This discussion has been archived. No new comments can be posted.

Is the Relational Database Doomed?

Comments Filter:
  • new record (Score:5, Interesting)

    by hguorbray (967940) on Friday February 13, 2009 @05:14PM (#26849365)

    that's efficient -a summary that refutes the inflammatory headline

    I'm just sayin'

  • Yes, but not soon. (Score:4, Interesting)

    by pwnies (1034518) * <j@jjcm.org> on Friday February 13, 2009 @05:17PM (#26849419) Homepage Journal
    The flexibility offered in key/value databases is simply too good of a feature to pass up. However, do you really think you can get people to give up MSSQL? It'll be nice for smaller projects, but corporations wont even consider it for a number of years.
  • by SanityInAnarchy (655584) <ninja@slaphack.com> on Friday February 13, 2009 @05:24PM (#26849503) Journal

    do you really think you can get people to give up MSSQL?

    In favor of MySQL, PostgreSQL, SQLite, even Oracle, yes, I do.

    corporations wont even consider it for a number of years.

    You must have some specific corporations in mind, because I've known many corporations to use each of the above technologies. In fact, SQLite is one of the most popular databases ever.

    No, the reason it's not soon is because these other ones (CouchDB) aren't mature, and the ones that are (BigTable) aren't available at any price.

  • by Mr. Underbridge (666784) on Friday February 13, 2009 @05:27PM (#26849561)
    This same basic story keeps getting submitted from the same group of people who are generally trying to sell non-relational-DB stuff. This is an ad. Move along.
  • by Ckwop (707653) <Simon.Johnson@gmail.com> on Friday February 13, 2009 @05:29PM (#26849585) Homepage

    99.9% of database claim to follow the relational model.

    The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.

    So the answer is a simple, emphatic, no.

  • by Anonymous Coward on Friday February 13, 2009 @05:34PM (#26849659)

    Ugh, yet another superficial blog post pimped out on slashdot. The guy doesn't have a solid technical grasp about data system and what really constitutes the difference between a system like BigTable or SimpleDB versus an RDBMS. Instead of talking about the differences in transaction management, consistency guarantees, etc. he comes up with brilliant ideas like RDBMSes are slower because they are more consistent.

    Enough with the bad blog posts already, it's like facebook, only less interesting.

  • Not buying it. (Score:5, Interesting)

    by reginaldo (1412879) on Friday February 13, 2009 @06:13PM (#26850203)
    In theory, I agree the most costly actions in a database are joins. It seems like the key/value model is a great solution to this, on the surface. However, what the key/value model does is push the cost to the application layer. Instead of ensuring relational integrity and conformity in the database, suddenly all app code has to do this on the frontend. Also, instead of managing this process in a single place, suddenly this process is distributed among multiple methods. Sure, the DB is more scaleable, but suddenly the app is a mess.
  • Here's a match.. (Score:3, Interesting)

    by Slicker (102588) on Friday February 13, 2009 @06:20PM (#26850277)

    Relational databases need to die. I loved them and preached the goodness of them 10 years ago, but they are just too rigid for contemporary needs. I've learned better ways of organizing and filtering data.. but the old RDBMS school is too canonical (stubborn) and self-indulging to realize that needs are changing and their model doesn't fit.

    We need efficient attribute/value models. We need to stop referencing data by where it is and start referencing it by what it is. There is too much data that needs to exist in different views, based on policy--not explicit placement.

    Dumb-tags (attributes without values) like those used with Delicious bookmarks are also broken. They are too vague.

    My own approach is that every attribute may have any number of value instances. Each value instance may, in turn, have sub-attributes. So you can look up data based on its characteristics even with disregard for its name. For example: /mycompany/mailserver1/ip of zone = infirewall

    This returns all IP addresses under the "zone" attribute while also under the mailserver1 attribute that is under the mycompany attribute.

    When validating instances of the "ip" attribute, it looks backward in the path because it is extremely quick that way.

    The data server's sole responsibility is storing and retrieving information (not just data) in context (aka filtering).

    Sorting is the responsibility of the client. This makes sense because there are an infinite number of algorithms one could have for sorting data (e.g. alphabetic mixed case, ASCII order, etc). To facilitate this, I wrote a method to return the number of values that would be returned if the values were requested. If too big a bite for the client, it can re-request the size of a smaller chunk, segmented according to the client's ordering method. This is useful for scale, in any case. Processing in chunks makes sense whether over a network of limited capacity or from directly form disk with limited memory.

    And--this is a columnar approach like Google's BigTable is.. That means you get 10+ times faster read performance.

    Matthew

  • by Cajun Hell (725246) on Friday February 13, 2009 @06:27PM (#26850351) Homepage Journal
    Documented here [thedailyshow.com].
  • by photon317 (208409) on Friday February 13, 2009 @07:16PM (#26850865)

    Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.

    The reason these projects have taken off is that the relational features being skipped comprise most of the complexity of an RDBMS. Without them, it's relatively trivial to write new database engines from scratch instead of re-using MySQL, PostgreSQL, and so-on. These new feature-poor rewrites can take on many challenges that are harder for the big relational guys, like stellar performance on huge datasets, and being truly distributed in nature.

  • by Savantissimo (893682) on Friday February 13, 2009 @07:55PM (#26851303) Journal

    SQL and all its pointy-headed progeny are the real problem with databases, not the relational vs. newMarketingBuzzwordDuJour arguments.

    Database operations do not need to look like code or algorithms, the only reason they do is to provide jobs for database programmers.

    Over 15 years ago Paradox's query-by-example was light-years ahead of today's soul-killing SQL crap.

    SQL is not going away, though, any more than its idiot older brother Mumps (M, Caché).

  • by mlwmohawk (801821) on Friday February 13, 2009 @08:01PM (#26851357)

    no, the relational database is not going anywhere, you are correct. but, that does not mean that there aren't instances where a non-relational database, with the addition of map/reduce, aren't extremely useful.

    A relational database is a lot like C++. It provides the tools for relationships but that does not mean you have to use in that way or that it is sub-optimal.

    non-relational databases have been around for decades, and are in use for quite a number of applications involving rapid development and storage of very large records. couple this with map/reduce, and you have the ability to scale quickly with very large datasets.

    What is a large data set? I have a few PostgreSQL databases with some pretty HUGE data sets, and it blows the doors off anything else I could use.

    scaling quickly is a very difficult problem to solve with an RDBMS - you either need to continue to throw more hardware at the problem, to the point of diminishing returns, or re-architect your data at the cost of possible significant downtime, while still attempting to serve up the data in a timely manner.

    Without any details this sounds like an urban legend. If you designed your system as you would have with a lesser system like a simple "key/value" pair, how would a RDBMS be any different?

    i've been deep in the bowels of oracle RAC, fighting to get just 5% more speed out of a query over a billion rows and realizing that i have to start over with a new schema, just to squeeze more data out. compare that to simply adding another machine and letting the map functionality run across one more cpu before returning it for the reduce.

    Why do you need to start over with a new schema? Why not simply denormalize some of the tables. You can alter the table on line.

    I don't mean to impugn your abilities, but if you can't get oracle to do what you need, you aren't going to get a lesser system to do better.

    once again, correct, but having to denormalize to a snowflake or a star isn't always the best solution. you're taking the best parts of the relational database model, and throwing them out - normalization, referential integrity, just to squeeze more out of something that may not be the best tool for the job.

    Normalization is a tool. Denormalization is also a tool. They are approaches to handling data. The RDBMS doesn't care. Why bitch about referential integrity if you aren't going to use it anyway?

    do you hammer with a wrench? i have before, and i managed to hurt my thumb.

    Poor analogy. Do you use the bottle opener screwdriver in your swiss army knife or do you use a real screw drive?

  • by Tablizer (95088) on Friday February 13, 2009 @08:03PM (#26851375) Homepage Journal

    Some of those systems appear to more or less still be "relational". If each row is treated like a map (associative array) of strings, then the "schema" for a given table is the set union of all attributes used in the table, and non-existing columns for a given row can be treated as nulls.

    As long as an asterisk is not used in a query (ex: "select * from tableX"), then it will pretty much act like existing RDBMS, and as long as the type-explicitness issues are resolved based on dynamic language conventions. (Asterisks can be implemented perhaps, but it could be computationally expensive.)

    It's kind of like dynamic (AKA "scripting") languages versus static or type-heavy languages. The static kind of languages requires more up-front info that "protects" the integrity of the thing at the expense of flexibility and declaration volume. The same dichotomy can be applied to RDBMS also. We have RDBMS that like a lot of info up-front, and now those which accept incremental or ad-hoc insertions are starting to be common (but still less standardized).

    And constraints can be incrementally added, such as later requiring that every new record in a "Cars" table have a value for "brand" or the like.

    One possible exception is that there were some examples that violated "map-ness" of records, such as having two colors for a car. If they instead supplied "color_1" and "color_2", then map set rules would not be violated, keeping it closer to true relational.

    In short: We don't have to abandon relational to get dynamism.

  • by arevos (659374) on Friday February 13, 2009 @08:47PM (#26851803) Homepage

    99.9% of database claim to follow the relational model.

    The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.

    Uh, actually, relational databases are pretty damn hard to scale. That's basically the main problem with them. Why do you think relational databases are so often paired with a cache made from a hashtable-based database?

  • by DougWebb (178910) on Friday February 13, 2009 @11:22PM (#26852855) Homepage

    Without any details this sounds like an urban legend. If you designed your system as you would have with a lesser system like a simple "key/value" pair, how would a RDBMS be any different?

    The difference is optimization vs generalization. Many problems can be handled using simple key/value pair relationships. You can model this in an RDBMS using two-column tables that you never join across, where all of your queries are SELECT val FROM tab WHERE key=? and INSERT INTO tab (key,val) VALUES (?,?). However, if you use the RDBMS this way, you're paying for the overhead of the SQL engine, (usually) a client/server connection, and your language's library for interacting with an RDBMS.

    The alternative is a non-relational database like BerkeleyDB [wikipedia.org], which is optimized for key/value pair operations. All the fetch and store operations do is fetch and store the value for a given key, with a minimum of overhead. BerkeleyDB is also an in-process database, where your application is accessing the database files directly using the BerkeleyDB library code. (The library handles locking so that multiple processes can use the database files at the same time.) Again, the overhead is kept to a minimum.

    BerkeleyDB is much less flexible than an RDBMS, but for the problem domains where that flexibility is not needed, BerkeleyDB is much more efficient. I've easily achieved over 6000 read/write transactions per second on modest hardware in a single-threaded process; a multi-threaded and/or multi-process application can achieve much higher rates. Compare that to a typical Oracle database connection, where you're lucky to get as many as a few hundred transactions per second, just because of the network round-trip.

  • by Estanislao Martínez (203477) on Saturday February 14, 2009 @01:03AM (#26853365) Homepage

    The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.

    Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)

    Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.

    But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.

    So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.

  • by SanityInAnarchy (655584) <ninja@slaphack.com> on Saturday February 14, 2009 @02:55AM (#26853893) Journal

    let me guess, you don't like mssql because it's microsoft?

    And because it's proprietary, single-platform, and expensive for what is, at the end of the day, just a database.

    And because I have seen new and interesting things built with MySQL, like NDB. What has MS SQL got on that?

    what a fucking sheep

    Look who's talking.

    More seriously, while I have pretty much no MS SQL experience, I don't particularly want to. The only good experience I've ever had from a Microsoft product was Halo. Bungie was acquired, and has now been sold, making me wonder if Microsoft had the chance to screw them up yet.

  • Re:Ridiculous (Score:3, Interesting)

    by xelah (176252) on Saturday February 14, 2009 @06:58AM (#26854767)
    Hierarchical queries are a historical weakness of SQL (but not the relational model) - that's why he chose it as an example. You'd actually do something like this (but you'll need a very recent database):

    WITH RECURSIVE hierarchy AS (
    SELECT * FROM employees WHERE name = 'personsname'
    UNION ALL
    SELECT sub.* FROM employees AS sub, employees AS super
    WHERE super.id = sub.parent_department
    )
    SELECT * FROM hierarchy;

  • by anothy (83176) on Saturday February 14, 2009 @08:58AM (#26855201) Homepage

    But you need to hire Percona to get the same performance out of MySQL that you get from SQL Server out of the box.

    this has not been my experience. at least with version 8 (two back from current), performance was miserable compared to either mysql or postgresql of comparable vintage. this was my first serious experience using mssql, but with no tuning on either side, both mysql and postgresql outperformed mssql by a factor of about 2.
    while we never got the database on the production system swapped out (development was underway to replace the application it was supporting anyway), and thus i can't speak to mysql or postgresql's reliability in the same use environment, mssql was very unstable. the database would hang indefinitely if either a query or the resulting data was too large, and, as near as we could tell, once every other month or so for no particular reason. the data set was tens of thousands of records a month going back a few years, which is not a trivial sum of data, but shouldn't be considered a lot for a modern database.
    while it's not a direct comparison, i've used mysql in several production projects and have seen less than a half dozen hangs in production total. i've only used postgresql in production on one project, but have seen no production hangs.

"In order to make an apple pie from scratch, you must first create the universe." -- Carl Sagan, Cosmos

Working...