Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Choosing the Right XML Database? 65

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"
This discussion has been archived. No new comments can be posted.

Choosing the Right XML Database?

Comments Filter:
  • your xml (Score:5, Funny)

    by Anonymous Coward on Wednesday March 12, 2003 @02:42PM (#5495670)
    <post>
    first
    </post>
    <!-- take that beyotches -->
    • What is the major advantage of an XML-Database ??
      --
      Stefan

      DevCounter [berlios.de] - An open, free & independent developer pool
      created to help developers find other developers, help, testers and new project members.
      • What is the major advantage of an XML-Database?

        Optimized, open, standards-based, buzzword compliance.
        • Let's rephrase the question. What requests does one make of an XML database that are difficult or impossible to make of a RDBMS?

          Are you trying to determine Elements that contain a given Attribute value or name? Are you trying to return Text nodes that contain a particular search string? Or, are you simply storing small Blobs of XML data organized through some higher level data?

          As a starting point for discussion (and having not researched XML databases) here's is a simple table structure:

          [ElementMapper]
          Pa
          • ...but I suppose as this gets complicated the performance of the SQL queries required to answer questions like (in pseudo-code) SELECT Address WHERE Street Contains "Main Street" becomes difficult...

            It shouldn't have to. Some RDBMS systems allow you to optimize locality of rows from multiple tables based on some key. Oracle calls this a "table cluster". Rows from multiple tables are stored together on disk when their keys match across tables. They behave like normal tables in most other respects. Que
            • SO it's kind of like a virtual table on disk. Or a pre-joined join.

              What's the advantage? You're spending runtime (disk i/o, data varification )joining every row together as you load the tables. Not knowing if you're going to need every row joined later on.

              As opposed to just doing a join for what you need later when you're pulling data out of the tables.

              Maybe I just don't understand.....
  • by kzeddy ( 529579 ) on Wednesday March 12, 2003 @02:44PM (#5495692)
    Berkley DB XML [sleepycat.com] is a new product. i have not tested it though... so this is not a reccomendation.
    • by Anonymous Coward on Wednesday March 12, 2003 @03:00PM (#5495833)
      Yup I was going to mention that one. I've tested it and it works great. Basically regular Berkeley DB which rocks the house already, with an XML-aware layer on top.

      If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.

      Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)
    • looks good. but doesnt have a Java API. My app is going to use Apache Cocoon which runs on Tomcat. So I would prefer a DB that has Java API
      • by Anonymous Coward on Wednesday March 12, 2003 @04:02PM (#5496441)
        It does have a java API! Did you check it out? Comes with C/C++, Java, Perl, Python, and TCL support out of the box. It's just not *written* in Java which makes it more flexible. since it's still "prerelease" you have to sign up to get the software but that's not a big deal.
      • For what it's worth, at my workplace at the moment, we're doing the exact same thing, but already have a ton of data that we need to get ingested. The pointy haired boss hired These Guys [luminas.co.uk] who know there stuff pretty well, and prefer to use Xindice. The only problem is that it's well.. quite slow.

        Other commercial alternatives are Ipedo [ipedo.com] or Tamino [softwareag.com] if your development house has the cash. Education discounts of 99% are availible I believe from Tamino, but the Ipedo people aren't as forthcoming with what they
        • The intro page to Xindice has the following item:

          The benefit of a native solution is that you don't have to worry about mapping your XML to some other data structure. You just insert the data as XML and retrieve it as XML. You also gain a lot of flexibility through the semi-structured nature of XML and the schema independent model used by Xindice.

          This is especially valuable when you have very complex XML structures that would be difficult or impossible to map to a more structured database.

          Is this ever

          • Manually, by hand, figuring out a way to store XML relationaly can be a nightmare, especially as schema's grow more complex. XML databases solve this problem

            Impossible? Not at all, ever. There's always a way to represent it in an RDBMS, but it usually makes it quite hairy to retrieve in a meaningful fasion. It may not be effecient, but it's always possible.

            Basically, that whole paragraph describes ALL XML databases (although, many do it better), not just Xindice. The benifit of native XML databases is t
      • So, like I said earlier, we've been doing this at work, and we found some new stuff since my post yesterday.

        I believe we've found our solution (hope I'm not speaking too soon). But we happened upon eXist [exist-db.org] for an XML database solution. While sourceforge lists it as alpha, the currunt version number is 0.9 and it seems rather mature, and FAR faster than Xindice. It looks to be a really good solution, and is easy to administrate. It also boasts Cocoon interoperability. Since you're going to be using Java
  • by jeffdill ( 553365 ) <slashdot@zuovo.cCOLAom minus caffeine> on Wednesday March 12, 2003 @03:00PM (#5495831) Homepage
    To pick the right database, you need to analyze the structure of your data and the operations you intend to perform on it. XML is a useful general format for interchange of serialized data, but just because you have some data represented in XML doesn't mean you should store it in XML. What is the structure of the data? What will you do with it? Why is a relational database or a object database a bad choice for your application?
    • The XML that I will be storing is a tree like structure, with lotsof children. so mapping that to relational database was not easy.

      the other option for me was to use a pure OODB like objctivity, which i have used in other project. I could still use it for this projekt.

      But I thought it would be better if I use some engine that support XPath and XQuery.

      • by Anonymous Coward on Wednesday March 12, 2003 @03:44PM (#5496277)
        Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.

        At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.

        DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.
    • by rycamor ( 194164 ) on Wednesday March 12, 2003 @04:25PM (#5496694)
      For more opinions to make you think:

      http://www.dbazine.com/pascal9.html [dbazine.com]
      http://www.dbazine.com/pascal8.html [dbazine.com]

      And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:

      http://www.dbdebunk.com/lauri1.htm [dbdebunk.com]

      (PostgreSQL is an example of a DBMS with extensible types)

      • How does that work? My XML knoledge goes no further than simply representing a datastructure/object/whatever in a TAG like format.

        What use is it to store the XML in a table? wouldn't that just be storing a string?

        please help :)

        --moi
        • >wouldn't that just be storing a string?

          Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.

          A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...

          But, it would also allow you to perform relational queries and place integrity constr
    • Take the simple instance of a BOM relationship.

      For those not sure what a BOM is, it stands for bill of materials. In those relationships, you have a part. It is made up of other parts. Each of those parts is made up of parts, etc. etc. The end result of large complex parts is a non-determinant SQL join. Say you need to find how many screws you need for a car. It's a nasty issue for relationals. XML systems, OTOH, handle it beautifully. XPath would do that query simply, pulling out a single part t
      • How is this at all a problem for the relational model?

        The relational model is a logical model and I challenge you to find any example of data that cannot be represented quite easily in the relational model. In your example, you have traded any notion of data integrity for what you assume will be faster data access. In fact, since the relational model makes no recommendations on how data is physically stored, this is not necessarily the case.

        How would XPath enforce your rules on how parts can relate to

        • Take the example of a BOM, given above, with subassembiles (components which are not raw materials) This type of layout is common in manufacturing situations.

          I can build something called WIDGET1, for example.

          WIDGET1 uses 10 screws and 3 of WIDGET2.

          WIDGET2 uses another 20 screws and 4 more of WIDGET3.

          Write SQL query which can express this information. The only requirement placed upon your tables is they must be at least 2nf.

          You're going to end up rejoining your BOM file at least a couple times for tha
          • I believe you and your example are confused.

            First, you use SQL and relational interchangeably. That is incorrect.

            Second, you fail to provide a coherent logical model of your data - something that is necessary regardles of your preference for a RDBMs or a "XMLDB".

            For example, your refer to WIDGET1 as an entity when really it is a type. In your database, you will need to track the instances of WIDGET1. Something is a WIDGET1 because it needs to relate to 10 screws and 3 instances of WIDGET2.

            So far we

  • by angel'o'sphere ( 80593 ) <{ed.rotnemoo} {ta} {redienhcs.olegna}> on Wednesday March 12, 2003 @03:02PM (#5495854) Journal
    Thre are the things I would question first:

    a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
    b) does it support your programming language of choice?
    c) Where do you get fastest a running prototype?

    C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.

    Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.

    b) is only a matter of your flexibility ....

    I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.

    I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.

    And all XML was stored in a few tables, element, attribute and such ... it was very fast ... and well, you programmed your XML manipulation by directly manipulation "virtual" DOM trees inside of the DB. In SQL and in a relational DB, of course.

    So much to "relational" wont fit your needs :-)

    Regards,
    angel'o'sphere

    P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is ... XML. It makes no sense when you think you need to use XML because of hype or something ....
    • thanks for the reply :) Actually the data that will be stored is XML, and is very well suited for XML. Plus the application that will retrieve it later, will be expecting a Well-formed and valid XML as the input. Apache's Xindice also support storing everything in Relational DB.
    • Actually, I'd approach it differently... What do you need to do? Saying an "XML" database is already pretty limiting. Hell, a database might not even be the right answer. In some cases, a few flat files will do the trick, or a pipe, or other things.
    • If you need to hit the DB from some type of programming environment I'd recommend using a DB with an implementation of the XML:DB API [xmldb.org]. I've been looking at Xindice [apache.org], and Software AG [softwareag.com]'s Tamino [softwareag.com], both of which support the Java XML:DB API, which actually seems rather nice.

      As for the speed, I can't comment from personal experience, but according to the Software AG folks it's quite fast even for their customers who are indexing terabytes of data. Of course, that's pr bunny speak so it's to be taken with a grain of

  • by tongue ( 30814 ) on Wednesday March 12, 2003 @03:52PM (#5496360) Homepage
    Frankly, I don't think i understand why relational is considered a poor choice for this. would someone please explain this? (this is not a troll, i really don't know) is it just the work involved in storing an object in a set of tables?

    • tree like structure of XML vs tabular format of RDBMS.
      An ORDBMS might work in some situations.
      • tree like structure of XML vs tabular format of RDBMS. An ORDBMS might work in some situations.

        Trees are easy to implement in an RDBMS. Just think of it as a series of one-to-many relationships. Just because your data is in an XML format doesn't mean you need to store it that way. XML is just another file format, and it's a horribly inefficient one for data storage and retrieval. It's the data that you really need to worry about, not the XML code wrapped around it. Generating XML on the fly from a relational database gives you all sorts of flexibility.
    • by Anonymous Coward on Wednesday March 12, 2003 @04:08PM (#5496503)
      Well, if you're just sticking the entire XML document into the table like a blob of text, then yeah, there's no problem. But then you can't really do anything with it other than retrieve it (i.e., you can't run a query on parts of it).

      But if you want the database to be aware of the *structure* of the data, you have to decompose the data into pieces, stick them in various tables, keep the integrity between the tables, and, oh yeah, write some code to convert the data back into XML when you want to get the whole document.

      For instance if you are storing an XML document that's made with one-or-more Chapters, Paragraphs, or Sentences and each Chapter has one-or-more Paragraphs, and each Paragraph can contain Sentences .. etc.. you have a complex many-to-one structure you have to store in multiple tables .. how would you do it? Well, you'd make a document table, a chapter table, a sentence table, and link them all together with unique id's .. etc.. you get the point I hope, that the XML doc's rich structure has to be decomposed into rows and columns.

      XML databases take care of this automatically and also can *index* the various parts of the document so that queries (XPath or otherwise) run faster (i.e., give me the documents that contain sentences beginning with "Hello").
      • XML databases take care of this automatically

        Take care of what? Parsing? That is a parser, not a database. How about a specific example.

        Relational is pretty flexible if you just know how to use it. (I agree that existing commercial relational systems could use some adjustments, but lets not throw the Cray out with the bathwater.)

        Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?
        • <oderlist>
          <order id="1" customer="Aunt Bea">
          <apple type="golden" color="yellow" />
          <orange />
          </order>
          <order id="2" customer="Bob">
          <car type="pinto" color="yellow" />
          </order>
          </orderlist>

          What sort of relational schema do you use to save the above data? How do I query for orders with 2 items? orders with yellow items? yellow apples? How about the items that a customer who bought a chair and a yellow apple in possibly different trips has bought? XP
          • Actually, I've implemented relational databases with schemas exactly like the one in your example. Of course, you'd have Customer, Order, and OrderItem tables. The Product table would be generic and primarily contain a unique ID for each product, whether it's a car, apple, orange, whatever. This table might also have some other generic fields like Description, Price, etc.

            To handle the specific attributes of each product, one way to do it is to have a separate table for each product type that has unique

            • You can store anything in a SQL database, but you do have to take the time to design it and migrate the data as the schema changes.

              Spending lots of time and money designing a system that the customer can not imagine is a waste of money, because you will have to change the design as the business units focus on what they want, normally after they see your initial results.

              Sometimes you have to use duct tape.

              I have one app in production that uses XML files as data stores. There are about 24 users. I also h
          • What sort of relational schema do you use to save the above data? How do I query for orders with 2 items?

            I will leave the SQL for such a query as a reader excercise because that kind of query tends to vary per dialect. It will probably involve a GROUP BY and a COUNT operation, or perhaps a correlated subquery. (SQL is not the ideal relational language IMO.)

            Here is one schema approach. Note that it may vary per business.

            Table: Customers
            ----------------
            custID
            nameMI
            last N ame
            etc...

            Table: Products
            -------
            • In my observation, if you don't have enough info to create a starting schema, then you need to do some more analysis.

              This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for. Applictions evolve. The flexibility offered by an unstructed data store like XML lets you eveolve the data model like the rest of the application.

              You gloss over the hard part with "etc..." Attributes or even structured child tags can not be anticipated and built into the schema or els
              • This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for.

                You have to "probe" them. Study the manual process. Look at their manual reports. Look at other systems for similar companies. Make some sample screens and reports for the client to jog their mind. Ask them questions like, "Is there only one address per employee, or could they have multiple addresses/contacts?"

                XML is NOT going to make up for a lack of understanding of what is needed. You can make an or
                • Apparently we wont come to an agreement on software development. My experience indicates that no amount of research will best quickly evolving software (XP, prototyping, what ever today's buzz words are). XML databases are better suited to support evolving software than relational databases, due to the lack of or the flexibility of the schema. Relational database can be better optimized than XML databases. OO databases (or hierarchical) can be best optimized, but in my opinion are rarely needed.

                  You ca
                  • XML databases are better suited to support evolving software than relational databases, due to the lack of or the flexibility of the schema.

                    Adding a new column is a snap on some systems. What is the complaint? I realize that some shops have rather static rules WRT schemas, but that is a political issue, not a technical one.

                    BYW, nice website. You are wrong :-), but nice site.

                    If wrong, then show it.

                    (My back ground is large company IT projects.)

                    So? A wide variety of techniques are used on large
        • Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?

          I think the advantage would be its "hierarchical", not "navigational", nature. And this is the problem with relational databases for the kind of problems I encounter. Ever tried to store complex *inherently* hierarchical data in them? It's just the wrong idiom, and shouldn't even be attempted. Of course, some clown alw
          • I think the advantage would be its "hierarchical", not "navigational", nature. And this is the problem with relational databases for the kind of problems I encounter. Ever tried to store complex *inherently* hierarchical data in them?

            If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with. Trees are easy for managers and users to grok, but they simply don't reflect the complexities of the real world relationships and chang
            • If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with

              What about cases where entities can contain instances of *themselves*? Or where the depth and width of the nesting is not necessarily known up front?

              You end up creating these artificial "id" fields, and in so doing build a "tree" on top of the relational database, which is a very silly thing to do.

              And what about cases where ordering of contained elements is important?
              • Take it where? I suggest your journal.

                Sure! I have yet to use it. Now is as good a time as any to try it I guess.

                particularly for engineering, scientific, and mathematical problem domains.

                I deal most in the custom biz app domain. Maybe math is different. I never said relational was always the best solution. But, it is often bashed or passed up for the wrong reasons IMO.
                • Sure! I have yet to use it. Now is as good a time as any to try it I guess

                  Well, create and entry and we'll start slugging it out :)

                  I'd certainly be interested to see if ordered data can be represented easily in a relational model. My suspicion is that it can't, and since I do a lot of "modelling" and "infrastructure software" (that needs persistifiable state to ensure QOS), ordered whole-part relationships come up a lot. And scoping. And nesting.

                  I deal most in the custom biz app domain

                  That's where I
        • Why does Tabelizer allways get "Insightfulls" for his ranting?

          XML data is in general not well hold in RDBMSs, Tabelizer. There are exceptions of course as I pointed out in my post above.

          A RDBMS returns on an SQL querry, what? A textual table starting with a header of column names, followed by rows of text.

          That is not XML, is it?

          Further more: to querry a full document, for regeneration XML by the querrying application, you need to make several querries one after the other based on the returned data from
  • by DevilM ( 191311 ) <devilm@@@devilm...com> on Wednesday March 12, 2003 @04:56PM (#5497065) Homepage
    http://builder.com.com/article.jhtml?id=u003200303 06gcn01.htm

    http://www.devx.com/xml/article/9796
  • Does anyone know if any of the above can maintain XSLT transformations of the data as views? Much like you can create SQL views etc? That would be a usefull feature.
  • by mattc58 ( 603979 ) on Wednesday March 12, 2003 @06:39PM (#5498566)
    It's interesting that you bring this up.

    I just finished writing an article for an online magazine on object databases and .NET. You might want to look into Matisse [matisse.com]. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.

    And I'll point everybody to my article when it's published.
  • If you have a lot of money, try Tamino [tamino.com].
  • It seems that nowadays most people have a great problem distinguishing between the logical and the physical representation/storage of data. (Personally, I think that XML sucks from a logical point of view, because its semantics are rather weak and limited.) What we lack is tools for mapping logical representations to physical representations. I think that the main reason why we do not have such tools is that from a marketing perspective they would be very undesirable. (No serious commercial company likes to
  • From the ground up, Object Store [odi.com] was built as a purely XML databse.
  • IMS [ibm.com] is a hierarchical database from IBM. The structure of the DB matches up with XML nicely and it is super fast. Of course it is also one of the oldest software products in existence...
  • I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.

    The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...

    So there are tw

  • Try the generic IBM XML [ibm.com] page also.


    • It's not relational, it's been described as 'document oriented' which is perfect for storing and retrieving XML docs. It's also extremely flexible, extremely secure (NSA, CIA, FAA, and 80+ million other users), and fast to program with (RAD), and supports tons of open standards. For you fans of "View Model Controller" - Domino has been using this architecture for over 15 years now...

      The XML classes are built in (or easily extend your own classes using LotusScript, Java, C++, COM, anything really!!!)
  • by munkinut ( 171466 ) on Friday March 14, 2003 @04:49PM (#5514495)
    When I worked on the Ananova [ananova.com] project, we started off using Tamino by Software AG, which was great while we were in development, but we had trouble scaling from tens of stories per day to dealing with thousands of stories per day when we went live. Backing up, moving data between versions, and restoring onto higher spec boxes proved to be a nightmare, and we soon moved to Oracle instead. This was 3 years ago however, and the product may have matured since then. It would meet your requirements as stated certainly, and would be worth checking out. There are also Netbeans modules to aid development in Java.
  • Is an XML database the right tool for the job?

    I'm not a relational-zealot like the sorts found at dbdebunk.com; I don't worship the table and the join, but neither do I worship the DTD and the entity. If you're just starting a project, think long and hard about your options. Maybe an XML database will be the best tool for the job, or maybe a relational database will, or maybe an OODBMS will work better, or maybe you'd be better off with an object-persistence system such as Prevayler.

    I can't know the answe

Two can Live as Cheaply as One for Half as Long. -- Howard Kandel

Working...