Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Databases Programming Software IT

Beyond Relational Databases 360

CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems. But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware. According to the ACM, the future of database architecture must include more modularity and configuration. Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."
This discussion has been archived. No new comments can be posted.

Beyond Relational Databases

Comments Filter:
  • SQL isn't a database (Score:5, Informative)

    by Nytewynd ( 829901 ) on Tuesday May 24, 2005 @03:30PM (#12626477)
    SQL, on the other hand:
    1) Reasonably simple API
    2) Scales to very large databsaes
    3) Cross-platform/architecture
    4) Performs very well.
    Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)


    SQL is a language for set operations. By itself it isn't a database or storage utility. There are some different versions similar to what you describe. Oracle's PL/SQL allows you to make temporary tables and materialized views. Neither solves the overall problem the article describes.

    SQL by itself doesn't perform. It is based on the database engine, and how good the developer is. I have gotten SQL queries that took minutes to exectue in seconds by adding indexes, analyzing tables, and totally rewriting inefficient code. It is only "cross-platform" if you follow the ANSI SQL standard. Each database has it's own set of handy functions that make the code database centric.

    SQL doesn't really have an API. It is a specification that is sometimes followed by database designers, and sometimes ignored. For example, in Oracle you can either use the ANSI joining sytax (LEFT OUTER JOIN) or use the (+) in the where clause.

    It scales to large databases only when they are designed properly. I work with 18 terabytes of data. My sql code wouldn't work so hot if the tables weren't designed correctly. Indexing, partitioning, and table structure have more to do with performance at that level than the code. The code can make a large difference too, but if the underlying structure is wrong, even the best SQL won't help you.
  • Synchronization (Score:4, Informative)

    by 3770 ( 560838 ) on Tuesday May 24, 2005 @03:30PM (#12626479) Homepage
    Most mainstream databases support replication. They are designed to be as fast as possible under heavy load.

    Synchronization for a mobile device has another main requirement, robustness when the connection to the server is lost. A mobile device has to gracefully handle when the owner runs down into the subway.
  • by turgid ( 580780 ) on Tuesday May 24, 2005 @03:54PM (#12626720) Journal
    The Pick [wikipedia.org] OS.
  • by Dammital ( 220641 ) on Tuesday May 24, 2005 @03:56PM (#12626746)
    "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems.
    Huh? Go back and reread some of Codd's papers (in the late 60's, BTW) and you'll see that efficiency was never a motivator. Simplicity was his aim, filesystem details were made irrrelevant, explicit navigation was obsoleted, and a built-in security model was included.

    When relational systems finally began to appear (and I'm thinking specifically about IBM's System R) they were dog slow, and the extant hierarchical and CODASYL network databases of the day ran rings around them. Still do, unless you throw lots of hardware at the RDBMS.

    RDBMS have lots of advantages over older technologies, but performance is not among them.

  • Re:KISS (Score:3, Informative)

    by Frank T. Lofaro Jr. ( 142215 ) on Tuesday May 24, 2005 @04:02PM (#12626832) Homepage
    Umm SQL does have explicit indexes and they make a big difference:

    create index t_c_idx on t (c);

    creates an index "t_c_idx" on column "c" of table "t".

    I've shaved 2 multi-second queries to under 1/10 of a second (it was an overall speedup of over 100X) just by adding indexes and letting them be used (e.g. you can't have where some_function(c)=value and expect the DB to use the index on c).

  • by erasmix ( 880448 ) on Tuesday May 24, 2005 @04:10PM (#12626918) Homepage Journal
    Many of the capabilities that, according to the article, should exist in the next generation of databases exist today in IBM Informix Dynamic Server. Examples: The hability to use other than B-Tree+ indexing technology, the hability do describe data beyond traditional types, etc. Furthermore the article relies heavily on comments by Stonebreaker, the father of Informix's Object Relational capabilities.
  • by genus babbage ( 630038 ) <slashdot@@@wgm...net> on Tuesday May 24, 2005 @04:12PM (#12626946)
    Depends what level you're after really, but here are some pointers that might help:

    I know it says access on the cover, but Steve Roman's book Access Database Design & Programming [amazon.co.uk] is pretty good as a starting point, IMO.

    You could also try Michael Hernandez's Database design for mere mortals [amazon.co.uk]

    I found both to be pretty readable and a good introduction to the theory.

    Once you've got the basics down, there's lots of further stuff you can try, from the obvious books by Codd and Date, to the more esoteric and weird.

    I would suggest, however, that you don't stress the "platform independance" of databases; you can do it, but it's not going to be pretty - most of them have their own syntax, additions, quirks and foibles, and that's just in the SQL; it gets worse when you actually try and design code stuff; for example if you program your application to work with Oracle for locking, and then switch to, for example, SQL Server, you're going to have to redesign the way it works, since they have completely different strategies (or at least they did - been a while since I've done any SQL Server).

    If you try for DB independance, you'll also end up coding to the lowest common denominator (no sequences for example... 'cause they all have their own standard...), which kinda takes the fun out of anything.

    For Oracle, Tom Kyte is probably a good bet - Expert One-on-One Oracle [amazon.co.uk] is very good, IMO, well worth the asking price; I can't really recommend anything for non-oracle though, maybe someone else can help out; O'Reilly are bound to have something for pretty much everything and everyone.

  • How about.... (Score:5, Informative)

    by plopez ( 54068 ) on Tuesday May 24, 2005 @04:21PM (#12627055) Journal
    really implementing a relational model to begin with? Then we can decide if the relational model is broken or just the vendor implementation.

    How about... a query language that is fully set operations compliant, i.e., something other than ANSI SQL which is a strange mixture of set and bag operations, and a mixture of relational algebra and relational calculas and some other 'extensions'.

    How about... realizing that a major design goal for the relational model was data integrity. Modularity and configurability are also good goals but if you are serious about your data, integrity will be at the top of the list.

    The biggest problems I see with databases is very few people understand how to use them. Here's a few tips:
    1) a table is *not* a class or an object. Tables + constraints + user defined types + constraints etc. when used properly can define domains which are close to classes and objects.

    2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule. As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

    3) Point 2 is often the consequence of mindlessly slurping in spread sheets or MS Access database tables. Anyone doing this has no business being within 50 feet of an IDE.

    4) Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

    5) Have multiple channels for data, transaction logs, large indices and O/S or user applications to reduce bottle necks. This is expensive but for large databases going cheap will hurt you.

    6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble. Do not be afraid of learning about the technology you are using. After all, technology is what you are good at, right?

    7) If it is a read only database, turn off logging for speed (impossible to due under SQL Server 2000 btw). Also, if a table is on a purge and load paradigm (many reporting and/or datawarehouse tables are) turn off logging on the table level if your version of database engine allows you to do so. Likewise, turning off logging on a hand held or other single user system may be appropriate, just make sure two people do not try to use the database at the same time.

    8) Avoid XML. Too much bloat.

    9) Learn how to use indices on tables.

    10) Learn how to read a perfomance monitor/top etc.

    Postgresql is both working hard to become truly relational AND is adding support for geographic objects and objects. The MySQL crew is working hard to improve. Oracle has some nice perfomance features but I think their 'Object/Relational' implementation is broken. SQL Server is getting 'long in the tooth'. There is also a great need for temporal databases and lightwieght engines. But remember, there is no 'silver bullet', no short cuts. Just hard work to be done.

  • by mcrbids ( 148650 ) on Tuesday May 24, 2005 @04:22PM (#12627079) Journal
    SQL by itself doesn't perform. It is based on the database engine, and how good the developer is.

    A truth I hold to be self-evident. The language of SQL provides all the tools you need to make your application perform well, as you state.

    SQL doesn't really have an API.

    Realistically, SQL is an API. It's a highly abstracted interface for communicating between two programs. (your app, and the DB server software)

    It is only "cross-platform" if you follow the ANSI SQL standard.

    Sorta. See, I can write a script using PHP with a particular SQL call, and do the same thing in Perl, Java, ASP, C, C++, Python, Ruby, and even BASH, on Linux, Windows, Mac, or just about anything else with a tcp stack and a compiler. Sure, SQL implementations are different, with various shortcuts and extensions, but I'd call that cross platform if ever there was one.

    Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?
  • Hmm .. (Score:5, Informative)

    by ghakko ( 261165 ) on Tuesday May 24, 2005 @04:38PM (#12627233)
    Has anyone noticed that the author of the article is from Sleepycat (which sells commercial licenses for Berkeley DB to embedded systems developers)?

    She puts forth a case against SQL and relational databases in general and claims that many applications (like directory services and search engines) have read-heavy, hierarchial access patterns which favour lighter-weight, non-relational, transaction-optional databases.

    And .. it just so happens that Sleepycat's flagship products are Berkeley DB (a flat-file database) and DBXML (an XQuery engine built on top of that).

  • Re:KISS (Score:2, Informative)

    by 3nuff ( 824173 ) <erecshion@gmail.com> on Tuesday May 24, 2005 @04:41PM (#12627259) Homepage Journal
    I'm currently working on one of those GUI applications, not Informatica. I am pleased with it's capabillities and I think it's a very powerful and reliable tool. Big projects can take a couple days once a good template is built and the metadata management is simple and useful. Reusable code and SMP processing are available and greatly enhance the speed of data transfer.

    In the wrong hands though, such a tool can be made into a piece of crap. The problem is that too many "programmers" don't know SQL before they use these tools. They just go at it. No standards in place; just head down. Judgements need to be made on when to use the tool's sql generation and hand coded SQL to get best performance.

    Too many companies get substandard people in place and then wonder why their $1M software doesn't work, it puts these tools in a bad light, that is undeserved.

  • by uss_valiant ( 760602 ) on Tuesday May 24, 2005 @04:59PM (#12627467) Homepage
    Another approach to the problem: JSR 170: Content Repository for JavaTM technology API [jcp.org]
    Standardizing the interfaces to various data resources (filesystem, database, cache, ...).

    The expert group reads like a who's who in data management. And it seems to be very near to the final draft.
  • by HornWumpus ( 783565 ) on Tuesday May 24, 2005 @06:18PM (#12628318)
    You're advice: don't store total's on invoices and just requery the line items every time? (don't let me construct a straw man please correct me if I'm misinterperting)

    You do realize you don't know how big my parent table is nor how infrequently the children change.

    All generalizations are false.

    IMHO When the child data changes very rarely or never, update triggers that recalculate parent totals are sometimes the way to go. This violates third normal form and is the most common de-nomilization I've done. Hell I've lived without the update triggers and just stored totals. When I was a kid we ran data checking batch jobs to check data validity.

  • by tepples ( 727027 ) <tepples.gmail@com> on Tuesday May 24, 2005 @07:22PM (#12628971) Homepage Journal

    I think it's more because people look at the name, try to figure out how to pronounce it, then give up

    Why, given that this FAQ entry [postgresql.org] clearly states that it's "post-gress-cue-ell"?

  • by HornWumpus ( 783565 ) on Tuesday May 24, 2005 @08:01PM (#12629326)
    Sure it could. But it does'nt.

    The DBMS could theoretically fix badly stated SQL and do it right anyhow. But it does'nt.

    I've tuned data collection processes that were originally written by a moron who wrote his masters thesis on query optimizing. My version turned a two and a half minute data collection, reduction and display process into a fifteen second process. Hearing him exclaim 'that's impossible' over the cube walls was a highlight of that job. (Bozo did a masters thesis on optimizing, but did'nt know how to read a query plan).

    Just because someone has ivory tower credentials does'nt mean they know their head from their ass.

  • Re:KISS (Score:3, Informative)

    by ejamie ( 765128 ) on Tuesday May 24, 2005 @09:09PM (#12629786) Journal

    I fully agree with this the parent post. I've built several systems (web or otherwise) running on SQL Server over the last 5 years.

    I've learned this from experience myself: There is no reason why a database driven application should be slow, provided the database is layed out appropriately and the built-in performance facilities of the RDBMS are utilized.

    Show me a slow performing database-driven application, and I will show you a set of indexes, stored procedures, vertical/horizontal table partitioning, or table normalization/denormalization that will fix the problem.

    The other cool things about DB optimization (one of my favorite development subject areas), is that generally putting in effort to optimize in these areas will give you an immediate and many-fold increase in performance. Not so easy to do in non-RDBMS environments!

The one day you'd sell your soul for something, souls are a glut.

Working...