Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Databases Programming Software IT

Beyond Relational Databases 360

CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems. But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware. According to the ACM, the future of database architecture must include more modularity and configuration. Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."
This discussion has been archived. No new comments can be posted.

Beyond Relational Databases

Comments Filter:
  • KISS (Score:5, Insightful)

    by mcrbids ( 148650 ) on Tuesday May 24, 2005 @03:14PM (#12626311) Journal
    Some of the biggest problems that "new" database designs have:

    1) Overly complex

    2) Don't scale

    3) Tied to a single platform/implementation

    4) Poor performance

    It's typical to see all four in a single try!

    SQL, on the other hand:

    1) Reasonably simple API

    2) Scales to very large databsaes

    3) Cross-platform/architecture

    4) Performs very well.

    Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)
  • by Anonymous Coward on Tuesday May 24, 2005 @03:17PM (#12626348)
    Doesn't make it obsolete. "Databases are old and kludgey. Teh suXX0rs for R0xxng H4XX0rs liek me.

    Just because people are too stupid to take the time to read and understand the theory and learn the application doesn't mean the technology is no longer relevant.

    Of course no solutions are proposed. There are none because relational theory is correct, and appropriate for real database driven applications. Little crap bulletin boards can use MySQL.

    Netcraft confirms relational databases are dead!
  • by Vile Slime ( 638816 ) on Tuesday May 24, 2005 @03:17PM (#12626349)
    People,

    Have been crying for the need to replace relational databases since the early nineties at least.

    We can all see where that got them.
  • by kfg ( 145172 ) on Tuesday May 24, 2005 @03:18PM (#12626352)
    Funny how they never are, eh?

    KFG
  • Re:I did not RTFA (Score:5, Insightful)

    by AKAImBatman ( 238306 ) * <akaimbatman@gmaYEATSil.com minus poet> on Tuesday May 24, 2005 @03:19PM (#12626373) Homepage Journal
    I didn't RTFA but for my needs

    Or the summary

    mySQL suits me quite well.

    That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

    The power of PHP and mySQL is all I need.

    And a moped is all you need to get to work. If you want to haul 300 metric tons of rock from point A to point B, you need a dump truck. Again, that's what this article is about.

    Back on topic, this entire article is mostly speculative for the moment. A lot of excellent work has been done in OODB and XMLDB designs, but no singular design has yet emerged to solve all our woes. For example, I love the Prevayler [prevayler.org] concept. It solves a lot of problems, lowers data access times, and provides for complete data security. It also isn't usable or scalable without a lot more design work.

    The future will hold some very interesting things, but for now we'll have to keep inventing until we come up with a consolidated solution.
  • Re:Databases? Bah! (Score:2, Insightful)

    by techwolf ( 26278 ) on Tuesday May 24, 2005 @03:22PM (#12626409)
    That's what Perl is for...
  • Re:I did not RTFA (Score:5, Insightful)

    by techwolf ( 26278 ) on Tuesday May 24, 2005 @03:28PM (#12626448)
    Quite true. MySQL does very well into the gigabytes. I haven't seen any good evidence of its abilities in handling terabytes of data. Don't get me wrong, I'm a huge fan of the MySQL, but I'm a bigger fan of using the right tool for the job. For your web message board, MySQL works fine. For holding product, sales, distribution, etc. information for, say Levis, it would not.
  • by tyler_larson ( 558763 ) on Tuesday May 24, 2005 @03:30PM (#12626478) Homepage
    After extensive research, our Databases Of The Future department has come up with the following prediction:

    In the future, databases will be more "good."

    While we can't yet go into detail as to how this will all work, suffice it to say that we have a pretty solid idea what the future holds.

  • Why 'Beyond'? (Score:5, Insightful)

    by Anonymous Coward on Tuesday May 24, 2005 @03:34PM (#12626510)
    Designed in the 1970s, the RDBMS has nevertheless proven to be the cornerstone of Web development three decades later. Thanks to systems like MySQL deployments are surely at record levels.

    Essayist Clay Shirky has gone to far as to suggest that MySQL is at the center of a whole new software movement [shirky.com].

    In my experience with Web applicaions the chief problem with the RDBMS seems to be that it does not do text indexing and search very well, so I have to keep a second store of data in something like Lucene.

    The other major problem is the level of skill required to tune the database to achieve high-performance SQL queries, so hopefully the RDBMS will evolve with more self-configuration capability.

    The article, which I only skimmed, actually addresses these two concerns but seems to pooh-pooh the notion of simply refining the existing RDBMS systems. Instead it says " Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems. "

    The paper seems awfully squishy on what this means. The clearest I found was a call to "produce a storage engine that is more configurable so that it can be tuned to the requirements of individual applications."

    But this call for new highly modular/configurable storage "engines" seems to me to require at least as much fussy care and feeding as a traditional RDBMS. You're just replacing one DBA with another. And throwing out decades of refinement in the process.

    The raison d'etre of the RDBMS is to allow the programmer to treat storage as a black box while gaining nifty ACID features. Extending this to text indexing seems logical.
  • by Over_and_Done ( 536751 ) on Tuesday May 24, 2005 @03:35PM (#12626522)
    I agree with the sentiments of the posters that SQL is not going anywhere, but I had a question.
    As I am designing more and more complex web apps, I am constantly having to think of new, innovative ways to design the tables and databases and am currently making it up as I go. Does anyone have a reccomendation for books/sites that talk about good design proactices, that is not "How to use SQL" and relatively agnostic on the specific brand on DB?
    Sorry for the OT post, its just something that has been bugging me for a while
  • by Anonymous Coward on Tuesday May 24, 2005 @03:49PM (#12626652)

    by MARGO SELTZER, SLEEPYCAT

    Sleepycat? The guys who make a brain-dead key/value database with no data manipulation or integrity capabilities? Who are they to educate others on the topic of relational databases? (Sleepycat's products are useful tools, but they are not true databases).

    while data management has become almost synonymous with RDBMS, however, there are an increasing number of applications for which lighter-weight alternatives are more appropriate.

    Ahh, so the proper title of this paper should be: "Beneath Relational Databases" or "Below Relational Databases". Because the relational model is a *complete* model for data storage and manipulation, so if you have a subset of this functionality, you are not "beyond" it.

    As argued by Stonebraker, the relational vendors have been providing the illusion that an RDBMS is the answer to any data management need. For example, as data warehousing and decision support have emerged as important application domains, the vendors have adapted products to address the specialized needs that arise in these new domains. They do this by hiding fairly different data management implementations behind the familiar SQL front end. This model breaks down, however, as one begins to examine emerging data needs in more depth.

    Well, the mention of Stonebraker's name as an authority on databases is generally an indicater of a content-free paper, but let's be sure we're talking about the same thing: the relational *model* is a *complete* model. There is no other more effective model, in fact as far as I know, there are no other complete models!

    So if you want to use the relational model as a foundation to build new database products, go right ahead. If you're talking the same old vendor BS about "post relational" or "XML" (hierarchical) or "object" (network and/or hierarchical), then please shut up!!

    My feeling is when he says "in depth", he means "less depth".

    As more documents are created, transmitted, and operated in XML, these translations become unnecessary, inefficient, and tedious. Surely there must be a better way. Native XML data stores with XQuery and XPath access patterns represent the next wave of storage evolution. While new items are constantly added to and removed from an XML repository, the documents themselves are largely read-only.

    Uh, yes there is a better way: create an XML data type in a relational database with a full set of XML operators. The relational model doesn't care about data types.

    I have no interest in giving up the general relational model for a hierarchic model (rejected decades ago as not being general enough) based on a TEXT FILE FORMAT.

    Stream processing is a bit of an outcast in this laundry list of data-intensive applications.

    I smell Stonebraker.. yes, it's an outcast because stream processing has nothing to do with data storage!!!

    Some argue that database architecture is in need of a revolution akin to the RISC revolution in computer hardware

    Yes, all these people need to study and understand the relational model which was developed 30 years ago and is still the only complete data model. The relational model can be described in half a page, and consists of a small number of core operations from which any possible data storage and manipulation need can be developed. Stop thinking about implementations, think about the *model* and then use that develop new implementations!!

    Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems.

    What does this mean exactly? I need to store and manipulate data without limitations. The relational model offers this. What is "old" or "new" here? I'm not going to switch to an ad-hoc subset of the relational model because it's "new".

    This "paper" (wasn't there one a couple weeks from some Microsoft dude, which was equally useless?) commits the same old sins: 1) look at existin

  • Re:KISS (Score:4, Insightful)

    by NoOneInParticular ( 221808 ) on Tuesday May 24, 2005 @03:51PM (#12626675)
    Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

    As for SQL I do not agree with point 4: SQL does not perform very well. We're in the age of Ghz processors, fast disk drives and it *still* is an performance issue to add a few million records to a database? What SQL sorely lacks is a recognition that the 4th generation of software languages was a bad mistake, and get back to a third generation language: explicit indices, explicit loops over these indices and fast (compiled) execution of said loops. Just freaking program the database instead of waving chicken bones at it.

  • We've already got RDBMS tech - why reinvent an inadequate version of it?

    Because current RDBMS designs are unsuitable for filesystems. Relational theory still holds (just as it does for OODBs), but the physical design should be quite different if it's going to be effecient.

    As I said, this has been beaten to death in the research communities. BeOS even included a DBFS design [nobius.org], but it went largely unused. NTFS also has all the necessary stuff in it, but Microsoft constantly removes it in final releases. ReiserFS has DBFS features, but these also go largely unused.

    I think the problem is that making effective use of a DBFS requires a very different set of applications. i.e. If the applications are aware of the functionality, then they can assist the user and provide useful support. But without this form of OS and application support, the user will find that the metadata is nothing but added confusion.
  • Re:KISS (Score:4, Insightful)

    by MSBob ( 307239 ) on Tuesday May 24, 2005 @04:00PM (#12626803)
    Relational algebra is a very nice and tidy concept but SQL is a piss poor (and limited) implementation thereof.

    In other words, relational databases are very nice and elegant but their interface (SQL) is bad and should be replaced.

    Also relational databases by themselves can't supply the needs of a typical enterprise and that's where technologies such as OLAP are built on top of RDBMS to make certain data manipulations efficient.

  • Re:AS-400 (Score:3, Insightful)

    by Monkelectric ( 546685 ) <[moc.cirtceleknom] [ta] [todhsals]> on Tuesday May 24, 2005 @04:04PM (#12626860)
    As/400's are great for accounting and manufacturing. They are *NOT* used by IT professionals or Scientists and are only used in accounting (primarily gambling -- to monitor slot machines) and manufacturing.

    AS/400's are hopelessly complex even to seasoned IT professionals such as myself, and they're only around is not because people like them but because the work SO GOD DAMNED WELL :)

    point being, tractors work well to but you dont drive one day to day do you? :)

  • by Frank T. Lofaro Jr. ( 142215 ) on Tuesday May 24, 2005 @04:06PM (#12626870) Homepage

    Little crap bulletin boards can use MySQL.

    Like Slashdot :) Remember yesterday's fiasco where posts were migrating into other articles and the time before that where it happened and the time before that. :)

    Why people don't use PostgreSQL is beyond me - unless they don't like fact it is under a BSD license instead of the GPL (please don't get me started on that).

  • by darthium ( 834988 ) on Tuesday May 24, 2005 @04:07PM (#12626890)
    I concur with this opinion, I've seen a lot of supposedly 'Top notch' designs and I've seen flaws that made me blush myself....mostly the problem is with stupid design mistakes and poor concepts.

    Instead of blaming the technology and tools, they should improve skills in the Sytem Artchitects and all the way down the road the people involved in Software Development.
  • by The Slashdolt ( 518657 ) on Tuesday May 24, 2005 @04:29PM (#12627140) Homepage
    Here is the problem with your idea. Unlike the relational model, XML does not link facts. XML documents can be joined in any way, either valid or invalid, without you knowing one way or another. The relationships between documents are weak. There is no referential integrity. Within a proper relational model you are stating facts and factual relationships. Joins of those facts generate derived facts that are as true and accurate as your original model. Why add the overhead and complexity of xml? Why not just use a proper relational model?

  • Re:How about.... (Score:2, Insightful)

    by Anonymous Coward on Tuesday May 24, 2005 @04:39PM (#12627241)

    1) a table is *not* a class or an object.

    Good advice. A table is simply a snapshot of a relational value. True relational values have no top/bottom or left/right ordering so you can't really show them on paper or on the screen. It doesn't make sense to map classes or objects to tables.

    2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule.

    Normalization has nothing to do with integrity per se. You can add constraints to a denormalized database that make it logically equivalent to a normalized one. I'm not aware of any "once and only once" rule in relational theory.

    Normalization is about dependencies: your data shouldn't have certain types of dependencies in it.

    As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

    Wrong, the number of fields has nothing to do with normalization. And you should review your data model even if it has 1 field.

    Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

    Yes, 0+1 is faster, but lots of RAM is even faster. However this is a physical detail and not anything to do with the relational model.

    6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble.

    This should be point #1 with a double underline. The lack of understanding and even hostility toward theory is frightening in the IT industry. If the author of the paper above had any foundation knowledge her paper would be much different.

    8) Avoid XML. Too much bloat.

    Just remember that XML is FILE FORMAT like Jpeg, CSV, and you'll do fine.

    9) Learn how to use indices on tables.

    Christ, I hope people know this one! They should also realize that an index is a PHYSICAL manifestion of a logical idea: keys.

  • by Frank T. Lofaro Jr. ( 142215 ) on Tuesday May 24, 2005 @04:45PM (#12627307) Homepage
    If full 3rd normal form is too slow, you need better hardware.

    If a few joins (assuming you've got indexes, etc setup) make it too slow you are too close to the edge.

    Saving on hardware but spending more in dealing with data problems is false economy.
  • Re:KISS (Score:4, Insightful)

    by WaterBreath ( 812358 ) on Tuesday May 24, 2005 @04:57PM (#12627441)
    Usually a piss-poor database reflects on piss-poor developers.

    This is a BIG IMPORTANT POINT. But, unfortunately, the people that are architecting the databases in these cases usually don't realize that they're producing bad designs. All they see is "The database is slow. Why the *!?$ is it so slow!?" There's usually one or two answers: 1) Your schema is organized poorly. 2) You're just plain juggling too much data at once. And let's not forget the deadly combination of both 1 and 2. 100M rows, for example, is a lot of data, any way you look at it. If you've got 100 bytes per row (remarkably small in many cases), that's 10G of data. If you're shuffling through all that data every time, yeah, it's gonna be slow. Ya might want to look into reorganizing, or archiving.

  • Re:KISS (Score:2, Insightful)

    by waveman ( 66141 ) on Tuesday May 24, 2005 @05:08PM (#12627558) Homepage
    > SQL, on the other hand:
    > 1) Reasonably simple API...

    The outrageous success of SQL is in part because it complies with the "second law of axiomatic design": a good design has minimum entropy. Prior to SQL a lot of application knowledge was duplicated in the database which increased entropy. Also a lot of knowledge of the physical structure of the data was embedded in the programs which also increased entropy. This is bad because the more redundant information the system contains the harder it is to change.

    With SQL you can change the physical structure without rewriting all your programs. And you can change your code without affecting the DBMS design eg you can get the rows back in a different order. In the bad old days if you wanted things in a different order you had to reload the database and recompile all your programs.

    Even though SQL uses 3 to 10 times as many CPU cycles as hierarchical DBMSs or ISAM files it took over because of this flexibility.

    Similarly, I think that one of the reasons OO programming has not delivered the productivity benefits it was supposed to is that OO somehow encourages people to embed a lot of duplicated knowledge of the application into the class structure. This duplicated information makes it hard to change the system. I have seen this again and again. In contrast, before OO a lot of this information ended up in data or in a database, where it could be easily changed.
  • by irritating environme ( 529534 ) on Tuesday May 24, 2005 @05:15PM (#12627602)
    JDBC (probably ODBC too, tho haven't used it in eight years) helps to standardize key generation (in JDBC 3.0 FINALLY), and Date processing (christ, date functions are so annoying). Most other operations can usually be done by the platform language that is processing the data, so you can avoid the tie-in that results from various SQL dialects' built-in functions. XML databases were a total flop, and so were object databases. I agree, SQL engines are so mature now that I don't see any database tech replacing them for another ten years. By the way, can we get PostGres (and now Oracle's) support of regular expression LIKEs standardized? And can we please get JDBC to support something like: INSERT $price INTO pricetable where productid = $productid rather than using ?'s and counting which ? to set values to? Hibernate's query language does this, and I really like it.
  • by irritating environme ( 529534 ) on Tuesday May 24, 2005 @05:18PM (#12627630)
    An XML doc is a tree, thus it is hierarchically organized data. There have been hacks to try to extend around this limitation, but relational data still has superior flexibility.

    that's why XML databases flopped
  • Re:KISS (Score:4, Insightful)

    by Brian_Ellenberger ( 308720 ) on Tuesday May 24, 2005 @06:20PM (#12628344)

    Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.



    Google is a very unique case. There are two things in Google's advantage that most RDBMS system have to take into account:

    1. Google does not have to update in realtime. If I add a page to my website it is not immediately available on Google. Contrast this to a normal RDBMS where if I add a record it must be available immediately. Google is much more similar to a Data Warehouse than a RDBMS.

    2. Websites indexes are more easily parallelized because complex joining of data is not needed. Naively, Google can store all websites starting with 'A' on a server, 'B' on a server, etc. You can store the User table and the Address table on separate database servers and expect to query on users and their addresses with any sort of performance.

    3. Going along with 3, the queries expected out of most RDBMS systems are much much more complex than any queries expected out of Google right now. I'm assuming you haven't seen any complex financial reports or statistics that have been generated from a RDBMS. Databases do alot more than "select name from user where id=1234".



    Brian
  • Re:KISS (Score:3, Insightful)

    by isomeme ( 177414 ) <cdberry@gmail.com> on Tuesday May 24, 2005 @06:30PM (#12628463) Journal
    Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

    Google's performance and value are both amazing. But it's easy to drive yourself nuts (if you're an enterprise software architect, which I am) trying to get that kind of performance out of other types of application. You see, Google has two major advantages over nearly all other large data-backed applications:

    1. There is no "right" answer

    Google keeps their ranking and indexing schemes proprietary, so nobody can say what "should" come back from a given search. Indeed, execute the same query at the same time from different machines, or the same machine at different times, and you sometimes get wildly varying results. SEO folks call this the "Google Dance".

    2. Writes are asynchronous with reads

    As near as anyone can tell, the Google index is rebuilt by their crawler over the span of a few weeks, and then the whole new index is exported to production machines over the span of a few days. Only the crawler writes data to the index, and the index as it's being built is not read by end-user clients; the production index is *only* read.

    These advantages let Google use a distributed, loosely coupled, inconsistent server farm made of cheap boxes that needn't be in sync with one another. It doesn't matter if queries to two of them give different answers, after all.

    Contrast this with (e.g.) an online bookstore; once you order a book, every node in every part of the system needs to know about that order in order to keep everything consistent (stock level monitoring, end-user purchase tracking, and so forth). This is a much, much harder problem to solve.

    So again, not to take anything away from Google's tech (which I more or less worship), but it's not a fair point of comparison for most large enterprise apps.
  • by Shadowlore ( 10860 ) on Tuesday May 24, 2005 @09:49PM (#12630051) Journal
    Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?

    Quite a bit actually. In my experience most only support MySQL. Why? They don't understand the value of data integrity and the features MySQL lacks/lacked. As a result they have had to poorly implement things that the DB itself should do, and when they try to "port" this to another DB they run into all sorts of issues.

    And SQL is a language, not an API.
    So they give up.

To do nothing is to be nothing.

Working...