Choosing the Right XML Database? 65
Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"
your xml (Score:5, Funny)
first
</post>
<!-- take that beyotches -->
Re:your xml (Score:1)
--
Stefan
DevCounter [berlios.de] - An open, free & independent developer pool
created to help developers find other developers, help, testers and new project members.
Re:your xml (Score:2)
Optimized, open, standards-based, buzzword compliance.
Re:your xml (Score:1)
Are you trying to determine Elements that contain a given Attribute value or name? Are you trying to return Text nodes that contain a particular search string? Or, are you simply storing small Blobs of XML data organized through some higher level data?
As a starting point for discussion (and having not researched XML databases) here's is a simple table structure:
[ElementMapper]
Pa
Re:your xml (Score:2)
It shouldn't have to. Some RDBMS systems allow you to optimize locality of rows from multiple tables based on some key. Oracle calls this a "table cluster". Rows from multiple tables are stored together on disk when their keys match across tables. They behave like normal tables in most other respects. Que
Re:your xml (Score:2)
What's the advantage? You're spending runtime (disk i/o, data varification )joining every row together as you load the tables. Not knowing if you're going to need every row joined later on.
As opposed to just doing a join for what you need later when you're pulling data out of the tables.
Maybe I just don't understand.....
Berkley DB XML also an option (Score:4, Informative)
Re:Berkley DB XML also an option (Score:5, Informative)
If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.
Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)
Re:Berkley DB XML also an option (Score:3, Informative)
Re:Berkley DB XML also an option (Score:5, Informative)
Re:Berkley DB XML also an option (Score:2)
Re:Berkley DB XML also an option (Score:2, Informative)
Other commercial alternatives are Ipedo [ipedo.com] or Tamino [softwareag.com] if your development house has the cash. Education discounts of 99% are availible I believe from Tamino, but the Ipedo people aren't as forthcoming with what they
Re:Berkley DB XML also an option (Score:1)
Is this ever
Re:Berkley DB XML also an option (Score:2)
Impossible? Not at all, ever. There's always a way to represent it in an RDBMS, but it usually makes it quite hairy to retrieve in a meaningful fasion. It may not be effecient, but it's always possible.
Basically, that whole paragraph describes ALL XML databases (although, many do it better), not just Xindice. The benifit of native XML databases is t
Update: Just found another good DB (Score:2)
I believe we've found our solution (hope I'm not speaking too soon). But we happened upon eXist [exist-db.org] for an XML database solution. While sourceforge lists it as alpha, the currunt version number is 0.9 and it seems rather mature, and FAR faster than Xindice. It looks to be a really good solution, and is easy to administrate. It also boasts Cocoon interoperability. Since you're going to be using Java
why an xml database? (Score:5, Insightful)
Re:why an xml database? (Score:2)
the other option for me was to use a pure OODB like objctivity, which i have used in other project. I could still use it for this projekt.
But I thought it would be better if I use some engine that support XPath and XQuery.
Re:why an xml database? (Score:5, Informative)
At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.
DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.
Re:why an xml database? (Score:5, Informative)
http://www.dbazine.com/pascal9.html [dbazine.com]
http://www.dbazine.com/pascal8.html [dbazine.com]
And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:
http://www.dbdebunk.com/lauri1.htm [dbdebunk.com]
(PostgreSQL is an example of a DBMS with extensible types)
Re:why an xml database? (Score:2)
What use is it to store the XML in a table? wouldn't that just be storing a string?
please help
--moi
Re:why an xml database? (Score:3, Informative)
Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.
A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...
But, it would also allow you to perform relational queries and place integrity constr
Re:why an xml database? -- There are many reasons (Score:1)
For those not sure what a BOM is, it stands for bill of materials. In those relationships, you have a part. It is made up of other parts. Each of those parts is made up of parts, etc. etc. The end result of large complex parts is a non-determinant SQL join. Say you need to find how many screws you need for a car. It's a nasty issue for relationals. XML systems, OTOH, handle it beautifully. XPath would do that query simply, pulling out a single part t
Re:why an xml database? -- There are many reasons (Score:2, Insightful)
The relational model is a logical model and I challenge you to find any example of data that cannot be represented quite easily in the relational model. In your example, you have traded any notion of data integrity for what you assume will be faster data access. In fact, since the relational model makes no recommendations on how data is physically stored, this is not necessarily the case.
How would XPath enforce your rules on how parts can relate to
Re:why an xml database? -- There are many reasons (Score:1)
I can build something called WIDGET1, for example.
WIDGET1 uses 10 screws and 3 of WIDGET2.
WIDGET2 uses another 20 screws and 4 more of WIDGET3.
Write SQL query which can express this information. The only requirement placed upon your tables is they must be at least 2nf.
You're going to end up rejoining your BOM file at least a couple times for tha
Re:why an xml database? -- There are many reasons (Score:1)
First, you use SQL and relational interchangeably. That is incorrect.
Second, you fail to provide a coherent logical model of your data - something that is necessary regardles of your preference for a RDBMs or a "XMLDB".
For example, your refer to WIDGET1 as an entity when really it is a type. In your database, you will need to track the instances of WIDGET1. Something is a WIDGET1 because it needs to relate to 10 screws and 3 instances of WIDGET2.
So far we
Take the easyst way (Score:5, Insightful)
a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
b) does it support your programming language of choice?
c) Where do you get fastest a running prototype?
C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.
Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.
b) is only a matter of your flexibility
I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.
I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.
And all XML was stored in a few tables, element, attribute and such
So much to "relational" wont fit your needs
Regards,
angel'o'sphere
P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is
Re:Take the easyst way (Score:2)
Re:Take the easyst way (Score:2)
Re:Take the easyst way (Score:2)
If you need to hit the DB from some type of programming environment I'd recommend using a DB with an implementation of the XML:DB API [xmldb.org]. I've been looking at Xindice [apache.org], and Software AG [softwareag.com]'s Tamino [softwareag.com], both of which support the Java XML:DB API, which actually seems rather nice.
As for the speed, I can't comment from personal experience, but according to the Software AG folks it's quite fast even for their customers who are indexing terabytes of data. Of course, that's pr bunny speak so it's to be taken with a grain of
someone please explain... (Score:4, Interesting)
Re:someone please explain... (Score:2)
An ORDBMS might work in some situations.
Re:someone please explain... (Score:2)
Trees are easy to implement in an RDBMS. Just think of it as a series of one-to-many relationships. Just because your data is in an XML format doesn't mean you need to store it that way. XML is just another file format, and it's a horribly inefficient one for data storage and retrieval. It's the data that you really need to worry about, not the XML code wrapped around it. Generating XML on the fly from a relational database gives you all sorts of flexibility.
Re:someone please explain... (Score:5, Insightful)
But if you want the database to be aware of the *structure* of the data, you have to decompose the data into pieces, stick them in various tables, keep the integrity between the tables, and, oh yeah, write some code to convert the data back into XML when you want to get the whole document.
For instance if you are storing an XML document that's made with one-or-more Chapters, Paragraphs, or Sentences and each Chapter has one-or-more Paragraphs, and each Paragraph can contain Sentences
XML databases take care of this automatically and also can *index* the various parts of the document so that queries (XPath or otherwise) run faster (i.e., give me the documents that contain sentences beginning with "Hello").
Re:someone please explain... (Score:3, Insightful)
Take care of what? Parsing? That is a parser, not a database. How about a specific example.
Relational is pretty flexible if you just know how to use it. (I agree that existing commercial relational systems could use some adjustments, but lets not throw the Cray out with the bathwater.)
Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?
Re:someone please explain... (Score:2)
<order id="1" customer="Aunt Bea">
<apple type="golden" color="yellow"
<orange
</order>
<order id="2" customer="Bob">
<car type="pinto" color="yellow"
</order>
</orderlist>
What sort of relational schema do you use to save the above data? How do I query for orders with 2 items? orders with yellow items? yellow apples? How about the items that a customer who bought a chair and a yellow apple in possibly different trips has bought? XP
Re:someone please explain... (Score:3, Interesting)
To handle the specific attributes of each product, one way to do it is to have a separate table for each product type that has unique
Re:someone please explain... (Score:3, Insightful)
Spending lots of time and money designing a system that the customer can not imagine is a waste of money, because you will have to change the design as the business units focus on what they want, normally after they see your initial results.
Sometimes you have to use duct tape.
I have one app in production that uses XML files as data stores. There are about 24 users. I also h
Re:someone please explain... (Score:1)
I will leave the SQL for such a query as a reader excercise because that kind of query tends to vary per dialect. It will probably involve a GROUP BY and a COUNT operation, or perhaps a correlated subquery. (SQL is not the ideal relational language IMO.)
Here is one schema approach. Note that it may vary per business.
Table: Customers
----------------
custID
nameMI
last N ame
etc...
Table: Products
-------
Re:someone please explain... (Score:3, Insightful)
This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for. Applictions evolve. The flexibility offered by an unstructed data store like XML lets you eveolve the data model like the rest of the application.
You gloss over the hard part with "etc..." Attributes or even structured child tags can not be anticipated and built into the schema or els
Re:someone please explain... (Score:1)
You have to "probe" them. Study the manual process. Look at their manual reports. Look at other systems for similar companies. Make some sample screens and reports for the client to jog their mind. Ask them questions like, "Is there only one address per employee, or could they have multiple addresses/contacts?"
XML is NOT going to make up for a lack of understanding of what is needed. You can make an or
Re:someone please explain... (Score:2)
You ca
Re:someone please explain... (Score:1)
Adding a new column is a snap on some systems. What is the complaint? I realize that some shops have rather static rules WRT schemas, but that is a political issue, not a technical one.
BYW, nice website. You are wrong
If wrong, then show it.
(My back ground is large company IT projects.)
So? A wide variety of techniques are used on large
Re:someone please explain... (Score:1)
I think the advantage would be its "hierarchical", not "navigational", nature. And this is the problem with relational databases for the kind of problems I encounter. Ever tried to store complex *inherently* hierarchical data in them? It's just the wrong idiom, and shouldn't even be attempted. Of course, some clown alw
Re:someone please explain... (Score:1)
If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with. Trees are easy for managers and users to grok, but they simply don't reflect the complexities of the real world relationships and chang
Re:someone please explain... (Score:2, Insightful)
What about cases where entities can contain instances of *themselves*? Or where the depth and width of the nesting is not necessarily known up front?
You end up creating these artificial "id" fields, and in so doing build a "tree" on top of the relational database, which is a very silly thing to do.
And what about cases where ordering of contained elements is important?
Re:someone please explain... (Score:1)
Sure! I have yet to use it. Now is as good a time as any to try it I guess.
particularly for engineering, scientific, and mathematical problem domains.
I deal most in the custom biz app domain. Maybe math is different. I never said relational was always the best solution. But, it is often bashed or passed up for the wrong reasons IMO.
Re:someone please explain... (Score:1)
Well, create and entry and we'll start slugging it out
I'd certainly be interested to see if ordered data can be represented easily in a relational model. My suspicion is that it can't, and since I do a lot of "modelling" and "infrastructure software" (that needs persistifiable state to ensure QOS), ordered whole-part relationships come up a lot. And scoping. And nesting.
I deal most in the custom biz app domain
That's where I
Re:someone please explain... (Score:1, Flamebait)
XML data is in general not well hold in RDBMSs, Tabelizer. There are exceptions of course as I pointed out in my post above.
A RDBMS returns on an SQL querry, what? A textual table starting with a header of column names, followed by rows of text.
That is not XML, is it?
Further more: to querry a full document, for regeneration XML by the querrying application, you need to make several querries one after the other based on the returned data from
two articles on the subject (Score:3, Interesting)
http://www.devx.com/xml/article/9796
Transformation views (Score:1)
Don't count out object databases (Score:3, Informative)
I just finished writing an article for an online magazine on object databases and
And I'll point everybody to my article when it's published.
Tamino (Score:1)
logical versus physical (Score:2, Interesting)
Re:logical versus physical (Score:2)
An XML Database (Score:1)
Re:An XML Database (Score:1)
Here's an option: (Score:2)
I don't see the point... (Score:2, Interesting)
I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.
The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...
So there are tw
IBM's (Score:2)
IBM's Domino is well suited (Score:2)
It's not relational, it's been described as 'document oriented' which is perfect for storing and retrieving XML docs. It's also extremely flexible, extremely secure (NSA, CIA, FAA, and 80+ million other users), and fast to program with (RAD), and supports tons of open standards. For you fans of "View Model Controller" - Domino has been using this architecture for over 15 years now...
The XML classes are built in (or easily extend your own classes using LotusScript, Java, C++, COM, anything really!!!)
Tamino by Software AG (Score:3, Interesting)
The major question is... (Score:2)
I'm not a relational-zealot like the sorts found at dbdebunk.com; I don't worship the table and the join, but neither do I worship the DTD and the entity. If you're just starting a project, think long and hard about your options. Maybe an XML database will be the best tool for the job, or maybe a relational database will, or maybe an OODBMS will work better, or maybe you'd be better off with an object-persistence system such as Prevayler.
I can't know the answe