Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Technology

SQL, XML, and the Relational Database Model 453

Kardamon writes "In an article on DBAzine, Fabian Pascal writes that SQL is not a good representation of the relational model, and is afraid the situation will get worse with XML and XQUERY. An overview of some of the reactions on the positions Pascal and also C.J. Date take on this issue is given in this article over at SearchDatabase.com by Sara Cushman."
This discussion has been archived. No new comments can be posted.

SQL, XML, and the Relational Database Model

Comments Filter:
  • Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.
    • by Bitsy Boffin ( 110334 ) on Monday June 28, 2004 @03:14PM (#9554063) Homepage
      No.

      XML is a file format, it has nothing to do with objects, no more than HTML does, which is not at all.

      However to counter your claim that XML is hard to represent in a relational database. Uhm. No, it's not.

      XML consists (simplifying) of elements and attributes, elements may be nested.

      A generic mapping to a relational database is that elements correspond to the entity tables, attributes correspond to columns in those tables, and the nesting of elements is modelled as a foreign key in the child entity records.

      Whats so hard?

      • Whats so hard?

        The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer.

        XML databases such as Xindice [apache.org] OTOH, allow you to create a table and insert XML in whatever format you chose. XPATH queries take a bit of getting used to, but you can query on tags, attributes, CDATA, or whatever else you chose at whatever level in the XML hierarchy you choose. Thus I can query for the list of addresses for all records that have a firstname attribute that is LIKE "Bob". Or I can dive down to the individual address level and query for all records that have an address of "Drury Lane" (important for tracking down the Muffin Man when you need a giant gingerbread cookie).

        It's not like you can't do this stuff with SQL databases, it's simply a different method of accomplishing the task. Depending on the data you're working with, an XML database may very well be a more efficient method of storage and queries.
        • > XPATH queries take a bit of getting
          > used to, but you can query on

          Right on. Another nice thing about XPath is that it can be mapped onto other hierarchical structures. For example, the Java static analysis utility PMD uses XPath to query Java source code for problems. This XPath query checks for empty if statements:

          //IfStatement/Statement/Block[count(*) = 0]

          Good stuff; more XPath rules are here [sourceforge.net]. Props to the Jaxen [sourceforge.net] and SaxPath [sourceforge.net] guys for their fine work!

          • Sure, this is a nice syntax to describe one-off problem. Though for a database you'd hope that it is done once to transform it to something useful instead of 100 times per second.

            //*[@color = "Red"] is taken as a good example in your sense but is actually the worst case in the article. A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data. Multiple trees spanning the same data is only a contradiction in XML.

            • A database would hopefully aleady have track of "red things" and shouldn't have to traverse a tree full of other useless data.

              In fact, that's exactly what a good XML database will do. By indexing all the accessible fields, it's able to provide fast access to database wide queries like the one you suggested. Of course, these indexes do come at a cost of disk space and memory, so not everything is quite rosy in Denmark. :-( The upshot is that with processing and storage power at all time highs many (most?)
        • The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer

          Um... I haven't read this article yet, but to respond to your comment.. um, no, not really. Here's a trivial possible setup to describe an arbitrary xml tree:

          Table tagname: int tagname_id, varchar name
          Table attrname: int attrname_id, varchar name
          Table tag: int tag_id, tagname_id tagname, tag_id parent
          Table attr: attrname_id attrname, varcha
          • Ok, so you break down the XML by its structure. Now how do you do a valid query on it? Since you've completely dereferenced its structure and stuck it in a relational model, you've cut yourself off from the prospect of doing XPATH type queries. Instead, you'll need to make multiple (perhaps hundreds?) of passes at the table to reconstruct its data structure. XMLDBs don't have this problem. They deal with the XML in its natural form and are thus able to index, order, and query in that fashion.

            As I said before, you can do many of the same things with an SQL database as you can with an XML database. That's not the point. The point is working with the data in a form that is natural to it and will provide the best results.

      • by MojoRilla ( 591502 ) on Monday June 28, 2004 @03:25PM (#9554180)
        Try to model anything moderately complex with XML, where things have many to many relationships with each-other. Nesting becomes impossible.
    • Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.

      This is completely false. If you had RTFA, it is mentioned that the relational model can represent hierarchies (and thus XML) just fine. It is SQL that is deficient for this purpose.

      Also, it makes no sense to call XML "object-oriented," which is a programming language term[semi-OO? LOL]. XML is a syntactic hierarchy that can be used to represent "object
    • Of course XML is going to be hard to represent in a relational database.

      Generic XML, sure, but you can always layout your XML in a relational style, like this:

      <root>
      <Table1>
      <Table1Row Table1RowID="1"/>
      <Table1Row Table1RowID="2"/>
      </Table1>
      <Table2>
      <Table2Row Table2RowID="1" Table1RowID="1"/>
      <Table2Row Table2RowID="2" Table1RowID="1"/>
      </Table1>
      </root>

      Join support would be nice for simple jobs, but this works really well for more complicated jobs in XSLT. You can use a for-each on "/root/Table2/Table2Row", calling a template and drill down to "/root/Table1/Table1Row[@Table1RowID='$Table1ID'] within the template. This lets you use whatever hierarchy you want, rather than being stuck with the one hierarchy the original designer chose. Just like real relational databases.

    • by yintercept ( 517362 ) on Monday June 28, 2004 @04:06PM (#9554623) Homepage Journal
      One does not know whether to laugh or cry. It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

      This quote needs to be placed toward the beginning of the Grand Encyclopedia of Intellectual Arrogance. Let's see, you have flat tables with a defined primary key and you form relations between these flat tables.

      I do agree that SQL is not the best possible query language, but it succeeds where the other languages fail, it is easy for people to grasp and manipulate. Likewise, HTML has many faults. Plain HTML is still the preferred choice of most web designers because it is easy to learn and write.

      Personally, I think the primary intellectual impulse is to add convolution to simple processes. There will never be an end to the stream of blither about how nulls cannot exist, and anyone who simply uses an sequence counter as a primary key is the devil incarnate. HTML and SQL have two things that almost all the stuff coming from arrogant snits like this author lack. They were designed by people who were actually doing stuff.

      This quote needs a position in the library of intellectual arrogance as well:

      Indeed, data/information management requires "some organizing principle"; that is, structure; anything "unstructured" -- and many in the industry promote XML for that purpose -- is not data, but meaningless random noise that carries no information.

      A snit crassly dismisses several millenia of literature because it is unstructured.

      Quite frankly, meaning and structure are independent of each other. It is possible to find meaning in things with radically different structures. It is true that there is a correlation between structure and the ability to communicate meaning, but a healthy mind can find meanings in things that have not been normalized.

      Likewise, you can have meaningless garbage in relational databases. A case in point is the large number of fake web sites that do things like join the FIPS database to product names so that they can have millions of pages that show up in search engines. Likewise, we see academician filling volume after volume of publications with meaningless tripe.

      • (...)is not data, but meaningless random noise that carries no information.
        A snit crassly dismisses several millenia of literature because it is unstructured.

        He was not talking about 'data for human consumption', but rather 'data for machine consumption'. As far as a machine is concerned, all literature carries no information, because a machine cannot extrapolate meaning from it like humans can.

  • by MattRog ( 527508 ) on Monday June 28, 2004 @03:08PM (#9553993)
    Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".

    Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
    http://www.hughdarwen.freeola.com/TheThirdManifest o.web/Missing-info-without-nulls.pdf [freeola.com]

    And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!
    • by MattRog ( 527508 ) on Monday June 28, 2004 @03:15PM (#9554066)
      To those not "in the know" here's some further clarification:
      "The use of the terms "flat tables" or "2D tables" to describe data stored in a relational database is wrong, he added."

      Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. The problem is that current SQL DBMS products do NOT do this and so we have the associated performance problems with normalized schemas. If the DBMS was truly a RDBMS then it could optimize the physical storage to improve performance.
      When asked if the relational model was implemented soundly in today's systems, Craig Mullins' instant reply was "no," but he doesn't think the situation is as bad as Date says it is.

      "We're doing production work and delivering value," Mullins said. "Isn't that what it is all about?"

      The question is not "Are current SQL systems providing value" because certainly they are. They overthrew the hierarchic DBMS products for good reason - they were better. The real question is "Are the current SQL systems providing all the value they can". One can simply look at the wide array of DBMS offshoot products like XML DBMS, so-called "Multivalued DBMS" etc. to know that there are significant limitations of SQL products - ones which Date/Pascal/Darwin stress are not limitations of the Relational Model but merely these SQL products. To put words in their mouth, but I don't think they'd disagree at my summation, they'd suggest that if someone were to implement a Truly Relational Database Management System that these other products would quickly become obsolete.

      • by Wastl ( 809 )
        Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure.

        True. However, this encoding is usually very inconvenient (consider representing an HTML document or a structured piece of literature in this manner).

        Besides this, nested structures are at least as logical as flat structures (I continue to call them flat because they

    • by Chops ( 168851 ) on Monday June 28, 2004 @03:36PM (#9554287)
      While you're at it, check out this tripe (from the article):

      The relational model is predicate logic applied to databases. Predicate logic is the real-world's two-valued logic (true/false) ... logic guarantees correctness -- defined as consistency -- of query results. It is to preserve logical correctness, therefore, that Codd's Information Principle requires that all information in relational databases be represented as values in relations. The term "NULL values" suggests that Chamberlin does not realize that part of the problem with NULLs is that they are not values -- indeed, they are supposed to be markers for the absence of values. Whatever a database table with NULLs is, neither is it a relation, nor do NULLs represent anything in the real world and, consequently, correctness and the rest of the relational benefits are lost.

      Incidentally, "inapplicable values" are a red herring. They are an artifact of bad database design. There is only one kind of missing value -- unknown -- and as I demonstrate in the above-mentioned chapter, it can be handled relationally, without the huge problems of SQL's NULLs.


      I read this and pretty much gave up getting anything of value out of this article -- I hadn't understood much that went before it, though my distrust of all things XML had led me to believe this guy might know what he's talking about.

      If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic "null" row, which I suspect is what he's talking about by "it can be handled relationally." To suggest that missing or inapplicable values are not part of "the real world" is so wrong it's... well... wrong. Anyone who's actually done database work (or programming work, for that matter) knows this.
      • SQL NULLs are the worst thing since unslicable bread. They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

        I understand the argument (NULL indicates no data--so you can't claim it's equal to anything). Academic bullshit. Anyone who's maintained code using SQL NULL semantics will agree. If you really want to claim that NULL is so much 'not a value' that you can't compare it to
        • nulls shouldn't be used in the case where this behavior would be unwanted. a null isn't a blank value, it's a missing value. implementing a missing value relationaly would mean absence of a reference (the null is the result of a left outer join or simply isn't returned).
        • I cannot think of a situation where I would want NULL=NULL to be true. I'm no SQL guru and I'm a bit rusty, but:

          1. the only situation I can think of where you are comparing two values, where neither is a literal, is when you are joining on a pair of columns

          2. if one of the values *is* a literal, there is no reason not to write "IS NULL" instead of "=NULL"

          3. if you *are* joining on a pair of columns, letting "NULL=NULL" would not make any sense (you would get the cartesian product of all rows from both t
        • by IBitOBear ( 410965 ) on Monday June 28, 2004 @06:43PM (#9556040) Homepage Journal
          table: order ID, part number, quantity shipped

          select part number, sum(quantity shiped), avg(quantity shipped)
          Group By Part Number;

          This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.

          If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.

          But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?

          What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?

          Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.

          In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.

          My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.

          Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.

          This costs bocup time and degrades the quality of the product.

          You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.

          You can't have. You think of "NULL" in terms of equality.

          I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.

          • by ynotds ( 318243 ) on Monday June 28, 2004 @08:48PM (#9556845) Homepage Journal
            (I just used my mod points in another thread, so I gotta hope other mods recognise the parent post.)

            In the world of self taught dabblers, NULL is not well enough understood to be expected to do anything more than cause the kind of problems you alude to with the likes of '-' to (partially) imply what NULL should be used for.

            SQL has to coexist with other components where an empty string and a numeric zero are assumed null and treated accordingly, the quantity shipped example you give being just as easy to understand and implement with zero meaning not shipped as with a separate null (just add "where quanity shipped > 0").

            There are also several possible reasons for a data value to be left NULL or undef, not all of which are mutually exclusive. Is it "not yet", "not known" or "not applicable"? In the real world we sometimes need to pair a status enum and a (numeric or string) value column to properly represent a single logical datum which needs to sometimes take state values not sensibly representable by numbers or strings.

            We used to use a string of 9s in a numeric key field to represent end of data and even today Perl's DBI interface uses the 0E0 kludge to represent a "true" zero.

    • by doom ( 14564 )

      Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:

      Well, I read it and essentially what they're saying is that instead of having a single NULL that means "we don't have the info and we don't know why", they recommend having lots of single column tables pointing at the missing values, one table for each reason/excuse. So instead of having NULLs in the salary field, you've got tables that explain "No salary because the guy is on commision", "No

  • by fiannaFailMan ( 702447 ) on Monday June 28, 2004 @03:10PM (#9554014) Journal
    It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular; and SQL was hardly developed in accordance with good language design principles.
    What exactly is the problem with SQL?
    • by XMyth ( 266414 ) on Monday June 28, 2004 @03:14PM (#9554062) Homepage
      Well, all the apps which are backed by SQL databases are crashing all over the place. After its several years in the field now SQL has been proven to be unstable, unreliable, and completely incapable of doing the job.

      Evidence of this is in the hundreds of companies who are completely unable to maintain a database of any significant size despite vendor claims to the contrary. Also, note the thousands of websites which routinely fail due to random database problems. It appears that all SQL products are sad implementations of a horrible standard which simply does not cut it .

      (the above is intended entire as sarcasm)
    • by afidel ( 530433 ) on Monday June 28, 2004 @03:18PM (#9554107)
      The problem with standards is their's so many to choose from. Or in the case of SQL every vendor seems to think that the standardized language is inadequate and yet they make no roads towards improving the standard. This leads to every vendor having their own superset of the language which makes maintainability in cross database projects exceedingly difficult and migration in applications that aren't designed for it incredibly difficult. As to fundamental flaws in the concepts around SQL I have yet to hear a concrete argument against it, mostly vague rantings from people who's ideas weren't chosen by the marketplace to serve real world needs.
      • The one bone I have to pick about SQL is there is no standard way of retrieving the value of an Identity, Counter, AutoIncrement (or whatever your database calls it) field after inserting a record. This is brain dead.

        Every SQL dialect has a special, unique way of getting the value in this field. Many (not all) things in SQL can be written to be compatible with virtually every database (if you are very very careful) except this one important thing. I know you can requery the database with all the fields or
    • by Tarantolato ( 760537 ) on Monday June 28, 2004 @03:19PM (#9554114) Journal
      One problem is that as a language it lacks elegance and is awkward to build large queries in. More deeply smug relational weenies [c2.com] insist that it does not properly model [c2.com] the relational algebra model [c2.com] pioneered by Ted [databaseanswers.com] Codd [wikipedia.org].

      I'm not sufficiently versed in database theory to understand the technical side, but SQL certainly does feel to me like a non-optimal solution.
      • Ugh, the lameness filter kept complaining about this!

        That's why sequences (as implemented by PostgreSQL and Oracle) are handy. Simply create a sequence, and call NEXTVAL(sequence_name) to get the identity of the next record. It isn't an autoincrementer (but can be used as one with default values), and you get the advantage of knowing what ID you will use before your insert - very handy for inserting a lot of related data at once. You can also do tricks like having sequences with different increment numbers

    • by tanguyr ( 468371 ) <tanguyr+slashdot@gmail.com> on Monday June 28, 2004 @03:20PM (#9554127) Homepage
      What exactly is the problem with SQL?

      Wouldn't call it a problem, but there just seems to be something about it that drives all the Oo fanboys up the wall - maybe it's the fact that they can't make nice UML diagrams of a query or something.

      There nothing wrong with SQL and RDBMS - they've been around for years and they'll be around for years to come. I have this argument each and every day at work with people who seem to think that the solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code and invest a king's ransom in application server licenses to run it on. Don't get me wrong: i've seen some real horrorshow coding with SQL mixed into code, but a bad coder will produce bad code in any language. Until then, SQL works. What more can you say?
      • by sql*kitten ( 1359 ) * on Monday June 28, 2004 @03:37PM (#9554293)
        there just seems to be something about it that drives all the Oo fanboys up the wall

        Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:
        • A table is not a class
        • A row is not an object
        • A column is not a property

        Whenever I see a project gone horribly wrong, and the language is C++ or Java, the problem usually is the system architect didn't grok the above statements. They should be tattooed onto the forehead of every OO programmer, so when they're "pair programming" they can read it off each other.

        solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code

        I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!
        • by tanguyr ( 468371 ) <tanguyr+slashdot@gmail.com> on Monday June 28, 2004 @03:41PM (#9554329) Homepage
          I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!

          Here's one that'll make you howl: "sorting is a presentation-tier concern"

          • Technically, it's not a relational-database concern. Relation variables (tables) aren't sorted, order doesn't matter. Any time your query asks the output to be sorted, it's asking for something extra that isn't appropriate for relations, but -is- appropriate for list/vector/array/what-have-you.

            Yes, it's more efficiently done server-side. But a relational database server could, academically, be a full-blown RDBMS and not have any sorting abilities whatsoever.
          • by D-Cypell ( 446534 ) on Monday June 28, 2004 @05:33PM (#9555506)
            Here's one that'll make you howl: "sorting is a presentation-tier concern"

            Largely it is. If the user hits a table header to sort on the selected row, are we are supposed go back to the database and do a different 'order by'?... I dont think so!

            If the sort is required purely to provide the user with a list in 'alphabetical' order then sorting in the presentation tier tends to be smarter because it reduces the 'bug-space'. That data generally passes through several levels of indirection, at any time someone may decide to replace an ordered collection with an unordered one and by the time it reaches the screen... it all out of sequence. Also, its very possible that while the data needs to be sorted to give users that 'fuzzy feeling' the same API can be used to provide a SOAP/XML-RPC/CORBA interface that doesnt require sorting (or rather, let the consumer decide). Why do this, fairly expensive, operation on a tier that doesnt always require that it is done?

            If sorting is required in the middle-tier it is usually due to some search algorithm or something. In this case, I prefer to put the sort with the search so that it is clear. Where I have worked with pure SQL (tend to use ORM tools now) I like to put the SQL in a seperate repository that the DBA's can tweak without recompliation. That 'order by' gets dropped pretty quick when the PHB is complaining about DB performance... and WHOOPS... really weird bug in the search code.

            So it seems to me, that when you factor in the real world issues around using the database as a sorting tool.... its not quite so 'howl-worthy'.
            • Largely it is. If the user hits a table header to sort on the selected row, are we are supposed go back to the database and do a different 'order by'?... I dont think so!

              Then think again. It's far from a black and white question. If data transfer between persistence and presentation tier isn't costly, you're better off passing the sort to the database. It may have the data already ordered, either because the a similar query or subquery is cached, or because the database has an index on the sorted colum

          • "Here's one that'll make you howl: "sorting is a presentation-tier concern"

            Yeah, well when you have to sort String values using oddball locale-specific methods it sure the heck is.

      • As a OO fanboy, I must protest your mischaracterizatoin of us. Unlike the DB fanboys, we never claimed that OO is suitable for every problem domain.

        What drives me nuts aobut the DB fanboys is that they have to use DB for everything. For example, I maintain an embedded system where some nitwit ex-web-developer decided to implement the process table with MySQL. Huh?

        DB is suitable when you have massive amounts of uniform data. OO is suitable when you have heterogenous structured data. Often these two areas o
      • by perlchild ( 582235 ) on Monday June 28, 2004 @04:24PM (#9554789)

        Until then, SQL works. What more can you say?

        That SQL is mostly a kludge?

        Let me restructure that...

        The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.

        That's certainly interesting, and leaves us to ponder two things:

        1) a) Just how much more power could we get? b) And at what cost?

        2) What about alternatives, can we get that same power elsewhere, cheaper?

        1)a) is beyond my mere predicate logic skills at this time.

        1)b) The cost of a model for data storage, representation and management is directly linked to it's adaptability to the data you represent. The article mentions a lot of errors with NULLs(I remember thinking, while reading the article: a NULL was an attempt by the language developers to simulate an interrupt in a language that doesn't have any, this is of course, an oversimplification on my part, but considering stored procedures and triggers[SQL's own exceptions] weren't around yet, they sound like a good basis for further research.) There are a lot of other hidden "costs" for people who use a relational tool for not-quite-so-relational data, but that's not part of the cost of a relational language, per se.

        2) Brings up a few notions: there are the types of databases relational databases replaced, like network databases, and there are attempted replacements, like object databases. There are also further possibilities that I will explore deeper later. Object databases can certainly be interesting, in the sense that by bundling data with code, you can have data that can handle itself, in the very basic sense that we humans apply it to ourselves. The problem is that we tend to have a very fuzzy, real-world view of such data, and can't work with it that easily(we are using computers to make data easier to work with, so if we had software that could handle real-world data complexity outside of our brains, we wouldn't be having this discussion). Object data is certainly very adept with data that has some broad commonalities, re-usable behaviours, and follows set-rules. We can call those business rules for now. Those business rules imply that a certain subset of "The Universe" interests us more than the rest, and follows predictable commonalities, making our mental models a lot less complex. On the other hand, object methodology is not always well understood, and the documentation and models it generates sometimes dwarf some production systems implemented to solve the same problems.

        Now, at the beginning, Relational Systems were data-handling "toolkits" set to handle specific subsets of data, who also followed business rules.
        That's interesting to my purpose, simply because I can envision, at this time(some vendors have similar concepts, but don't formalize them in any way), a new set of "toolkits" where the relational model is only one of many "toolsets" available.

        Indeed, what is probably the most used sql-based server available(MySQL) has been lacking true relational functionality for most of its life, yet that doesn't make the tool less useful for most of its users. Future toolkits can inspire themselves by focusing on specific uses of technology to solve specific problems, and yet keep the SQL as a sort of security blanket, since that's where most of the training about databases(and indeed, usually most of the training about data, period, is in database classes and perhaps, some algorithmics classes)

        After reading the linked articles about XML's weaknesses, though, I don't think it belongs into any toolkit of that nature. Simply because the tool that belongs in the toolkit is the "self-documenting data", and XML's weakness in that area is evidenced there. XML's early focus as a medium of e

    • by MattRog ( 527508 ) on Monday June 28, 2004 @03:21PM (#9554137)
      There are a couple of "problems" that they have identified:
      1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
      2) Little/No support for relational domains (e.g. complex data types)
      3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
      4) Weak support for complex integrity constraints (e.g. business rules)
      5) No support for entity sub/supertype relationships
      6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)

      Try searching www.dbdebunk.com for SQL. Or pick up the great book "Practical Issues in Database Management" by Fabian Pascal.
      • by LostCluster ( 625375 ) * on Monday June 28, 2004 @03:48PM (#9554404)
        1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
        Just because two queries return the same results today do not mean that they will continue to do so in the future. If a value that used to be bounded from 1 to 10 suddenly is declared to be allowed to be cranked to 11, then suddenly "equal or greater than 9" and "equal to 9 or equal to 10" will have gone from always returning the same results to now specifying different sets. Clearly, the more specific code will execute faster, but if an assumed boundry no longer holds in the future, the program will become obsoleted and require revision to the less specific version. This isn't a language-specific issue, it's just a problem that crops up whenever a computer program encounters a situation its designer wasn't expecting.

        2) Little/No support for relational domains (e.g. complex data types)
        Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back.

        3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
        Totals will always be a non-updatable view. You can't change the number of objects you have without creating some new objects or chosing to get rid of some existing objects. Fields in a one-to-many relationship cannot be changed because to do so would be ambigious... do you want to create a new entry in the other table, or do you want to rename an existing entry in the other table. Go do what you meant to do, then refresh your view.

        4) Weak support for complex integrity constraints (e.g. business rules)
        That's more an issue for applications rather than databases. The program or user that's creating the query should know what's allowed by business rules, because if the database is going to refuse a query due to business rule violations, that query shouldn't have been offered to the database in the first place. Those errors should be trapped upstream before they get that far. SQL triggers for business rules should be a last line of defense, not something that should be regularly asked to function.

        5) No support for entity sub/supertype relationships
        Plenty of support, just not intrinsically. Just use a one-to-many relationship in your DB structure and go along your way.

        6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)
        That's like trying to do math without a concept of zero. Sometimes, things just don't apply and we put "N/A" on the form and "NULL" in the database.
        • by MattRog ( 527508 ) on Monday June 28, 2004 @04:01PM (#9554567)
          "Just because two queries return the same results today do not mean that they will continue to do so in the future."

          Total misunderstanding of what I wrote. To put it another way:
          SQL allows you to write queries which are mathematically equivalent but result in vastly different query plans and performance. Again, not a particularly stinging-indictment of SQL as such but had it been designed differently it could have avoided such ambiguity in the language.

          "Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back."

          So you're saying a tree has no "structure"? That a domain has no structure? If it had no structure, it would be a little difficult for computers to process.

          "View stuff"
          Pascal (or Date, can't remember) provides an iron-clad (mathematical definition) method of creating views which will always be updatable. There are structural deficiencies in SQL which prohibit this. I will not waste time/typing here illustrating them, they are all identified at their web site.

          "SQL triggers" etc.
          It is precisely the reason that applications were enforcing business rules that DBMS were invented all those years ago! There are plenty of reasons that application-enforcement of business rules is a bad thing. Again these are illustrated on their web site. Also, your quote about "SQL triggers" is basically re-stating what I mentioned: that SQL is poor at implementing business rules!

          "Plenty of support, just not intrinsically."
          Which is exactly the same as saying "no support for entity sub/supertypes". Plus, one-to-many tables are not the same, you're thinking of something else. Chapter 6 of Fabian Pascal's book "Practical Issues in Database Management" covers this in some depth.

          "That's like trying to do math without a concept of zero."
          Not quite the same. Remember that the relational model is based upon predicate logic and set theory. Set theory has the empty set, which is not the same as NULL. SQL products currently handle null in a ridiculous manner (some sort NULL oddly, comparison is difficult, summation is odd). Pascal/Date do not suggest that the concept of "unknown" is bad, just that the SQL representation as NULL is.
    • by Xentax ( 201517 ) on Monday June 28, 2004 @03:32PM (#9554242)
      No kidding. This sounds like Andy griping that Linus' "school project" is an inferior kernel.

      Sounds like a semantic argument to me; where the Rubber Meets The Road, Linux is the kernel of a variety of widely used, production-quality OS's, while Minix is an academic *model* (on purpose, to be sure, but a *model* rather than a useable-on-a-daily-basis kernel, nevertheless). Similary, claiming SQL is crappy for various academic/theoretical reasons doesn't change the fact that it's in wide use today, as a concrete solution to any number of million- and billion-dollar abstract problems.

      So, if SQL is so bad, maybe they should stop cursing the darkness, and show us the light. In the meantime, people will use (and incrementally) improve the tools at hand to solve the problems at hand.

      Right now, SQL-based database solutions are generally the best solutions for *real* data problems that we have to solve, from mySQL-driven personal webpages, to enterprise-grade databases powering major websites, business-to-business e-commerce, and everything in between.

      Invent a better mousetrap, and the world will beat a path to your door. Criticizing the mousetrap as an inferior pest control device doesn't do much to keep the mice out...

      Xentax
  • My Two Cents (Score:4, Insightful)

    by Doesn't_Comment_Code ( 692510 ) on Monday June 28, 2004 @03:13PM (#9554043)
    I think relational SQL databases are just fine, easy enough to use, and serve their purpose very well. They DO take some serious thought when designing tables and queries - but we shouldn't be afraid to think. If your head hurts from SQL, keep on it for a couple more minutes and you'll probably have it! If your head hurts from SQL, you've tried thinking about it - and you still don't get it - you're probably in the wrong business. Complex information retrieval is complicated and sometimes difficult to understand.

    On to the next part. XML serves its purpose very well. Although I wouldn't consider XML and SQL to serve the same problem sets equally well. There are certain situations where SQL is perfect. And there are other situations where XML is preferable. If you think of the two as two sides of the same coin, I think you're making a mistake. Likewise, you can't just flip between the two on a whim. Choose the format that's most suited to what you want to do and go forward.

    It aint broke and don't need fixin'
  • by MisanthropicProgram ( 763655 ) on Monday June 28, 2004 @03:13PM (#9554047)
    Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL [itworld.com]
  • by b0lt ( 729408 ) on Monday June 28, 2004 @03:13PM (#9554051)
    Easy to use, easy to debug, easy to understand, powerful. Isn't this good enough?
  • SQL sucks? (Score:5, Insightful)

    by localman ( 111171 ) on Monday June 28, 2004 @03:15PM (#9554074) Homepage
    From the article:

    It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

    Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.

    Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.

    I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.

    I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.

    Cheers.
    • Re:SQL sucks? (Score:5, Insightful)

      by kpharmer ( 452893 ) * on Monday June 28, 2004 @03:28PM (#9554207)
      Keep in mind that Fabian Pascal is generally considered a crackpot purist. He's been insisting for years that there is no such thing as a relational database product - since none implement a purely relational model.

      However, he hasn't delivered an exaple of one, he hasn't clearly articulated the differences between his vision and the commercial options, and he apparently refuses to acknowledge that some problems in life fail to fit well into the relational model (hierarchies, networks, inheritance, etc).

      Much of what he, Celko, and Date complained about were actually responses by vendors to adapt to the real world. They were somewhat successful - and now SQL can be used successfully to solve a far greater set of problems than Pascal has ever admitted exist.

      A perfect example of this nonsense is there insistence that good indexing in a 3NF model outperforms denormalized data in a star schema. Sounds great, absolutely doesn't work. Across eighteen years working with relational databases I've never seen their suggestions work. Of course they have a response to this - the vendors should just "make the databases faster" - like it's fucking magic or something.

      Of course, this isn't to say that he's wrong about xquery - trying to work with unbalanced networks or hierarchies in which the rules change change throughout the schema causes a few problems.

      We already have extensive support for recursion & networks in the more powerful RDMBS (db2 & oracle for sure). But combining that with data structures supporting optional branching, complete lack of declarative constraints, optional rules, etc - sounds like something that will never work well.

      And going back to the days in which you have to spend a day writing code against a hierarchical database in order to answer a simple question sucks.

      Hmmm, haven't people gotten tired of the xml hype yet?
    • Re:SQL sucks? (Score:5, Insightful)

      by Tassach ( 137772 ) on Monday June 28, 2004 @03:31PM (#9554234)
      people pulled out the UML books in favor of a decent First Normal Form DB
      I'm not sure I'd ever use the words "decent" and "First Normal Form" in the same sentence.

      In 15+ years as a database developer, I've never seen a non-trivial problem which could be modelled correctly in 1NF. In my experience, 3NF is pretty universially considered to be the default level of normalization. Any decision to deviate from 3NF, either up or down, should be documented and backed up with a solid engineering case as to why it's necessary.

    • Re:SQL sucks? (Score:3, Insightful)

      by iabervon ( 1971 )
      Actually, SQL is the interface to every non-toy scale data storage project. Non-toy databases are not SQL inside, but use SQL as the high-level language that they compile into their execution plans. SQL is the sole interface language supported by databases with efficient internal implementations, and therefore all of SQL's flaws are overwhelmed in the marketplace by the fact that it's what the good software uses. It's like the named.conf format, which is widely used, but only because it's what BIND uses, no
  • by deepchasm ( 522082 ) on Monday June 28, 2004 @03:17PM (#9554094)

    Readers interested in what Pascal and Date have to say may be interested in visiting Database Debunkings [dbdebunk.com]. It has lots of articles written by one or both of them.

    Personally, I recommend the articles written by Date because they are clearly, concisely, accurately, and calmly written. Pascal's tend to turn into a rant, which I wouldn't mind but he always seems to refer to his books rather than give a detailed justification of his arguments.

  • What the?!... (Score:5, Insightful)

    by __aagmrb7289 ( 652113 ) on Monday June 28, 2004 @03:19PM (#9554115) Journal
    As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.

    Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).
    • Re:What the?!... (Score:4, Interesting)

      by OscarGunther ( 96736 ) on Monday June 28, 2004 @05:38PM (#9555544) Journal
      Ever read any Trotsky? Or Lenin? Pascal sounds like any of the old Communists (not the later totalitarians, but the true believers who were old enough to have known Marx or Engels personally). His diatribe is entirely typical of the species. He gratuitously belittles his targets:
      "Natural" perhaps for those without a grasp of data fundamentals.
      (Yes, Fabian, the co-inventor of SQL probably doesn't have a grasp of data fundamentals.) He sprinkles his text liberally with "quotes" and italics so you can "feel" his anger, his dismay -- indeed, you can almost hear him spitting the words in Chamberlin's face. You can almost hear him chortling to himself as he bangs away on his keyboard, demolishing his opponents.

      He venerates the Founder. Finding a quote that supports your argument settles the matter. Codd the Wise avoided the errors that Chamberlin made; clearly the latter is the inferior intellect. And there's only a small core with the Founder. "We" are the true believers; all others are apostates and heretics.

      Overstatement is a definite tell. Chamberlin's explanation of the difference between SQL and XML data is "unbelievable." The nesting argument is "ridiculous." Industry pronouncements are "incoherent." And most prominent of all is the cutting remark that's meaningless to anyone not in the know or already in agreement:

      Unbelievable. Any wonder that SQL fails so abysmally at relational fidelity? We may not expect the average practitioner to distinguish between
      pictures of relations, which are "flat" due to the presentation medium, and relations of N cardinality themselves, which are N-dimensional logical structures. But we sure expect "industry experts" to be aware of the difference.
      And I sure expect a polemicist to know enough about his art to understand when he's descended into self-parody.
  • Good grief (Score:3, Insightful)

    by Dracolytch ( 714699 ) on Monday June 28, 2004 @03:22PM (#9554145) Homepage
    I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.

    ~D
    • I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.

      The issues in dispute can't be expressed in plain english. That's why we need to upgrade to haughtiness and phase out plain english immediately.
    • by IBitOBear ( 410965 ) on Monday June 28, 2004 @04:46PM (#9555026) Homepage Journal
      The problem is that most "Relational Database Management Systems" only represent one type of "relationship", that being "the table".

      This, in turn, means that every operator (programmer, statement, etc) on the database must _individually_ "already understand" all the relationships that lie outside "the table" before they can act on the data at any significant scope.

      That is, you, the programmer or operator must know, from some source besides the RDBMS itself, how the different tables work with eachother.

      In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.

      Consider a new verb "EXPOSE thing, thing, thing, thing, thing..." that would fish out of a database the one-or-more relationships between the things, and produce a table-looking vector of "tuples" that consist sets of actual values for those things. This is what the theoretical "perfect" RDBMS would do.

      Given (somewhat denormal 8-):

      Customer ID -> Customer Name
      Customer ID -> Street Address
      Customer ID -> Zip Code
      Customer ID -> Purchase Order ID
      Purchase Order ID -> Part Number, Quantity

      One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.

      But that doesnt happen.

      Now all the people bleating about the Higherarchial databases and bad things from the past are doing this harping because they remember the bad-old-days when a database would maintian one tree-structured set of relationships like this. In the higherarchical model, you could basically do this EXPOSE operation, but only if you had, by dint of pre-knowledge, asked for things lying on one linear path through the tree. (* simplified for brevity).

      In essence, SQL requires you, the programmer, to be in the business of making up relationships that should be in the data or schema structure but isn't.

      A magically complete RDBMS would take a series of vectors of the form "Independent Datum (key)->Dependent Datum (value)" (where either side of the arrow could be a list of atomic values). The RDBMS would then assemble and maintian tables or linked lists or whatever with no exposure of SQL-esque "tables" and the accessors would be storage method agnostic. (That is, there would be no such thing as a FROM CLAUSE.)

      For instance, in the above list of relationships, Customer ID, Customer Name, Street Address, and Zip Code *could* all live in a "table", or not, but you would never know that. But a better vector of
      Customer ID -> Street Name, Building Number, Suite to replace "Customer ID -> Street Address" has a table-feel, but would not bias against "EXPOSE Street Name, Part Number".

      The power of this comes from being able to do:

      EXPOSE Building, Part Number, Quantity
      Where Building == "Word Trade Center 2";

      And have the RDBMS already know the sequence of relationships to get from Building to (address elements) to Customer ID, to Part Number without the programmer writing the three stage join across the "uninteresting middle tables".

      (The above presumes you have a building relation that has Street Name, Street Number -> Building or some such.)

      All the XML nonsense is nonsense *_BECAUSE_* the strict-nesting enclosure requirements of XML make it "naturally" become hierarchically organized. But data exists outside the single-inheretance strict parantage trees that the hierarchical model dictates.

      The problem is that SQL got real popular and so the idea of structural inferrence got pared down to tables and Query-Like constraints on tables before anybody had a chance to formalize the idea of living, encoded relationships between arbitrarily stored datum. So we never really got a language or system that could "EXPOSE".
  • Relation arithmetic (Score:3, Interesting)

    by Baldrson ( 78598 ) on Monday June 28, 2004 @03:22PM (#9554148) Homepage Journal
    While at HP's E-speak project I spearheaded some work in reviving Bertrand Russel's relation arithmetic based on the work of Tom Etter, a researcher who had been working at Interval Research on some advanced theories of quantum software. We were trying to solve some basic problems with the way RDF and predication were being pursued in systems like Cyc and the semantic web. Unfortunately, basic research like this is rarely afforded any support at all, and what little support I was able to get wheedle out of the E-speak project dried up after a few months. At least we had a preliminary paper written up for future work.

    I described a general vision for this sort of formalism in a prior slahsdot post [geocities.com]. Suffice to say some progress has been made since then -- and work in other areas is starting to converge. There is much yet to be done.

  • by LostCluster ( 625375 ) * on Monday June 28, 2004 @03:24PM (#9554164)
    Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...

    XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.

    I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.
  • by BillsPetMonkey ( 654200 ) on Monday June 28, 2004 @03:27PM (#9554202)
    A legacy application is one that works. And the same can be said of SQL. Actually XML works too.

    It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.

    An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.

    SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.

    And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.

    • XML doesn't eliminate the intermediary. Eliminating the intermediary eliminates the intermediary.

      I mean, one can kiss of the intermediary with CSV files, financial 5-bit data files, packed decimal data files, string-and-cans dictation systems, etc.

      And one can sign up to use any number of intermediaries with their brand-new XML data files backed with excellent XSDs, etc.

      The data file does not bear on the intermediary.

      I don't have time here but I will postulate what anyone doing data integration with disp
  • by Decaff ( 42676 ) on Monday June 28, 2004 @03:50PM (#9554434)
    The author seems to have many serious misunderstandings about XML.

    The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged - semantics - and once they do that, there is no need to repeat the tags in each and every record/document being transmitted

    A major point of XML is that the semantics should be explicit. If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing. For example, it does not allow sparse data in which non-default or null values can be excluded.

    Inter-system data exchange requires an agreed-on efficient machine-readable delimited file format.

    XML was designed to avoid the issue of 'yet another machine-readable format'. XML can be read reasonably efficiently, but always preserves meaning, ignoring the horrors of character sets and byte order. Compressed XML is a very efficient way to transmit data.

    An important part of XML design was that it should allow information to be expressed in a way that is independent of the software that uses it. In this way, it has something in common with SQL. The point of 'human reading and writing' is that in the last resort, you still have you data! It also makes data transfer hugely easier to debug.

    There is also a lot of confusion about the order of tags in XML. Its possible to specify in a schema or DTD that some tags are required and should be in a certain order, but its also possible to just not care about order. XML is neutral about this.

  • by wdavies ( 163941 ) on Monday June 28, 2004 @04:03PM (#9554585) Homepage
    ok, aside from efficiencies in implementation and writing, the main concern should be with the expressability.

    Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.

    Reason? Because SQL is not full relational calculus. It is basically propositional calculus (actually I maybe slightly wrong, and it falls in a higher calculus than propositional).

    Prolog for example is closer, but still not true relational calculus (I forget whats missing).

    There's a reason for SQL's limitations, and that's decidability - guaranteeing that the query will terminate... (admittedly in theory, and you can write some pretty horrendous statements).

    Anyway, just wanted to get that off my chest. Doesn't say much about where XML falls in the scheme of thing - I guess strictlt speaking it doesn't - its just a layout. In fact, to compare XML and SQL is a complete misunderstanding. Its XSLT which should be compared.

  • by MobyDisk ( 75490 ) on Monday June 28, 2004 @04:03PM (#9554589) Homepage
    This guy is really smart. He takes issue with the design of SQL, and with some of the commonly-used relational-database concepts (such as the NULL). His criticisms here are valid, but his position seems to be extreme. With that said, he is absolutely right about XQuery/XPath/XQueryX++/whatever.

    XML is great for data exchange. Schemas are a wonderful way to describe data. But it is completely inappropriate for querying. If you need to query XML, you should import it into a database then query it that way. The very design trade-offs made when building XML were to make it extensible, and hard to query. XPath is nice to have for simple dinky import scripts. But trying to build a whole hierarchical query language on top of XML is silly.

    If someone wants to build a standardized hierarchical query language, that's great. Very few people use hierarchical databases and need such a language, but I wish them luck in that endeavor. But don't pretend that it is appropriate to use it on XML, or that it has anything to do with XML. XML is an interchange format. Leave it where it works well.
  • by Qbertino ( 265505 ) <moiraNO@SPAMmodparlor.com> on Monday June 28, 2004 @04:22PM (#9554774)
    I know some people who do in some cases, but I wouldn't exactly call that a standard procedure. Or call those people DB designers for that matter. 'Cause that is NOT database design.
    You design a DB best with a pen and a large sheet of paper. Or some drawing tool your extremely good at.
    SQL is the language you feed you results into the box so it builds a more or less representative imprint of the abstract reality you've designed. Which can be as relational as you want it to - as long as it meets the physical constraints of non-abstract reality. As soon as you put it onto a computer, you'll have to cut corners. That's the difference between a database _model_ and a database _implementation_. That takes stuff into account like DB load, DB Server Features and data types.

    Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.

    I'm suprised a supposedly db expert guy get's all worked up about this and doesn't seem to be able to keep apples and pears apart.

    Anyone initially designing a non-trivial DB with SQL and - on top of that - bitchering about this DB language not being rational deserves a clobbering.
    My 2 cents.
  • by 14erCleaner ( 745600 ) <FourteenerCleaner@yahoo.com> on Monday June 28, 2004 @04:58PM (#9555171) Homepage Journal
    I'm not really much into data modelling theory and such, but I do have two perspectives from which to view this dispute:

    I've done several years of application programming using SQL

    I've also implemented the (XQuery-derived) query processing module for a native XML database [xpriori.com]

    In my former life as a application programmer, I really liked SQL. It allowed some pretty complicated computation to be done in the query, and very concisely in many cases compared to doing the same thing in, say, C++. For example, things like grouping are very nice for many application purposes.

    In my current job, I'm hoping to create an XML query language that supports the same sort of capabilities as SQL. Our XML query language implementation has decent path/predicate, sorting, and output structuring capabilities, mostly derived from earlier drafts of XQuery.

    My feeling about XQuery 1.0 is that it is extremely bloated. XML seems really simple; querying it shouldn't be all that complicated, should it? But the XQuery committee has created several hundred pages of specifications for the new language. This seems excessive, to say the least. We basically have implemented a subset of an earlier version (with paths, predicates, sorting, XML construction, a few dozen functions), and stopped tracking what they were doing. This is kind of unfortunate, but we really don't have the resources to support his behemoth in all its awesome grandeur.

    We just want a language that lets programmers efficiently access our database. I think we're on the right track. I'm not at all sure that XQuery is going to wind up as a long-term success, partly because of its bloat factor.

    My favorite illustration of the XQuery bloat is this: early versions (up to about April 2002 [w3.org]) of the XQuery language description contained this sentence in the introduction:
    It is designed to be a small, easily implementable language in which queries are concise and easily understood.

    Starting in August 2002 [w3.org], this was changed to:
    It is designed to be a language in which queries are concise and easily understood.

    The "small, easily implementable" part got smothered up by the avalanche of features they were adding.

  • by anttix ( 194675 ) on Monday June 28, 2004 @05:19PM (#9555378)
    "XML is not meant to be read by humans, it's a data interchange format and thus meant to be read by machines" - Good heavens!
    Someone saying something like that has really got a BS in BS! Or perhaps even worse: a PhD in BS.
    XML is all about programmers being able to understand the data! Yes, because we are not anywhere near that nirvana of fully semantic systems that can (semi)automatically understand each other. NO! Programmers have to do the work to make the systems fit together and XML gives them the advantage that they do not have to reverse engineer another proprietary data format or dig into a horsepile of documentation. XML makes it easy to understand how to process the information at hand - without any extra work!
    Also it's great format for storing small amounts of constantly changing data (like user preferences) cause it's extensible and with only a tiny bit of effort backward compatible as well.
    Anyone trying to use XML for processing large amount of data (like data warehousing) is either nuts or doesn't give a damn about the speed or costs.
    However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain (anybody who has tried to decode data from some long gone accounting package from the '80-s knows what I am talking about).
    So yes XML is self describing only to humans and that's the whole point of it. Formalizing data semantics is not the goal of XML, has never been and will never be, thats what we have RDF, RDFS, OWL and other nice initiatives from Semantic Web movement for.
    • XML is all about programmers being able to understand the data!

      No. XML is all about storing meta-data alongside the data. And you are implying that programmers will not understand data that is not easy readable, like XML; this implication is an oversimplification.

      (...) reverse engineer another proprietary data format (...)

      XML is certainly not the only well-documented interchange format.

      (...) dig into a horsepile of documentation.

      Is is fairly easy to create XML documents that require horsepiles
  • Conflicting goals (Score:3, Interesting)

    by esap ( 2010 ) on Monday June 28, 2004 @06:12PM (#9555816) Homepage
    I think this whole dispute is not necessary. It's just a simple case of conflicting goals. One side wants "efficiency" and another wants "extensibility". What both sides miss is that real systems can't afford to choose just one of these, you have to get both. So you have to have just enough extensibility to allow reasonable extensions, and still attain reasonable levels of efficiency. But trying to get all of either thing will totally lose the game. So the XML camp is wrong to think that extensibility is everything (ever tried parsing XML in a real-time system?). And the "binary transfer format" camp is wrong to think efficiency is everything (ever tried to make two versions of your protocols to interoperate?).
  • by Salamander ( 33735 ) <jeff.pl@atyp@us> on Monday June 28, 2004 @07:15PM (#9556270) Homepage Journal

    It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.

  • by rycamor ( 194164 ) on Tuesday June 29, 2004 @01:00AM (#9557877)
    One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, they *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories (although there is much more):
    1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
    2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
    3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
    4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
    5. Many small inconsistencies in SQL itself

    But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a word processor. To any who are curious, I suggest you do some reading. The absolute best simple introductions for these problems are in a two online documents by Hugh Darwen at www.thethirdmanifesto.com [thethirdmanifesto.com]. Look for "The Askew Wall", and "The Importance of Column Names".
  • by master_p ( 608214 ) on Tuesday June 29, 2004 @04:55AM (#9558485)
    One foundamental error in today's operating systems is that they are datatype-agnostic. They simply don't know what the data they handle is. This task is left completely to applications, and each application usually provides its own way of managing data. This causes incompatibilities between applications, and these incompatibilities are not solved either by SQL or XML (that are nothing more than human-readable representations).

    In my opinion, an operating system must primarily provide a data management solution. It must provide the common ways to organize, store, retrieve and process data. This means that the application should only care about the logic behind the data, not how data management is implemented. There are only a few methods of data organization anyway, and it is a shame that these methods are not available when an O/S is installed.

    The algorithms that concern the data types should also be available along with the data types. This means that an operating system not only should provide data management, but it should be object-oriented: each "data node" in the system should be available as a class in the chosen programming language (if it supports such a concept).

    The availability of data and their types on the O/S level would also boost security and safety, as it would not be possible for a 'devious' application to approach the data in any other way other than the intended one.

    Finally, the concept of 'application' is also wrong, and I am saying this in the context of data: in our day and age, data not only multiply fast, but the types of data are frequently modified. The liquid status of data (and their data types) makes the concept of an 'application' (thousands of source code lines, cast in the stone, with a huge degree of coupling between them) a huge obstacle in really making computers useful. Applications need to be replaced by a live system of persistent objects that do simple jobs and inform the world (through events) about changes in their state or the results of their computations; the O/S should be responsible of organizing how objects communicate with each other (either in the same memory space, in different memory spaces, or in different computers).

    Since the current situation is not exactly orthogonal (as described above), there are many misunderstandings and problems in defining concepts clearly; many thousands of dollars are spent in re-inventing the wheel, and many work hours and brain power is consumed in creating what should already be there...(and thus we can have a nice /. discussion when half of the posters say that SQL sucks and the other half saying that XML sucks!)

  • Oh dear (Score:4, Informative)

    by Ankh ( 19084 ) * on Tuesday June 29, 2004 @11:18AM (#9561111) Homepage
    I'm not going to try and reply in detail, but since I participate in the W3C XML Query Working Group and am also the w3C XML Activity Lead, a few comments may be useful.

    The article seems to says "I don't like SQL and I don't like XML and I think XML Query is about mergin them although I don't understand it very well, so the people working on XML Query must be stupid, and in any case it's easier to attack people than understand a specification".

    Perhaps that's unfair, but it's clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years.

    In both cases the story is about interoperability.

    If you look at the XML Query Home Page [w3.org] you'll see approximately two dozen implementations of the XML Query draft, including a number of open source ones. If you look at the public mailing list for comments, you'll see we received over 1100 detailed technical comments at the last public review. So there's a lot of interest in this work.

    Why is that? One reason is that, like Web services and SOAP, XML Query is able to replace a lot of proprietary and hard-to-maitain middleware. Another reason is that for the first time we'll have a standard way to search over multiple kinds of data source.

    Don is the primary editor of the XQuery language, but the technical decisions reflected in the specification are a result of collaboration, and are agreed on by aconsensus process by a much larger number of particpants. The goal is to make a language that people agree to implement and to use. With support announced by Microsoft, Oracle, IBM, BEA and others (see Web page mentioned above) and judging by the public interest, I think it's fair to say that's going to happen.

    It's pretty rare to see a large complex system that everyone is happy with. It's actually pretty rare to see a small system that everyone is happy with. There are people who are unhappy with some features in the Unix cat program, but it's better to have cat in every Unix system than to have millions of shell scripts break on systems where it's missing! The trick, then, is often to include features that will lead to massively wider adoption, even if some people would rather be without them.

    Then we have (as part of W3C Process [w3.org]) a public call for implementations so that we can test to see how confident we are that all the major features can be implemented compatibly (i.e. interoperably) in multiple independent implementations.

    Features that were not implemented get removed before the specifications are final.

    Is XML Query a waste of time? Is XML evil? Is SQL evil? A lot of people think otherwise, and some of them are pretty smart, so if you are concerned, take the time to read the specs and decide for yourself. :-)

In the long run, every program becomes rococco, and then rubble. -- Alan Perlis

Working...