What Do You Know About Databases And XML? 257
Dare Obasanjo writes: "XML has become a pervasive part of significant
segments of software development in a relatively short
time. From file formats to network protocols to
programming langauges, the influence of XML has been
felt. I have written an
overview of XML schemas, XML querying languages,
XML-Enabled databases and native XML databases.
Below is a shortened version of the article." Obasanjo's original OODBMS
article
has been updated to reflect more of the disadvantages
between picking an OODBMS over an RDBMS.
Super short intro to XML (Score:3, Flamebait)
By this, it is meant that XML allows two systems that do not share a predetermined data exchange protocol to share data.
Thats it.
Where two systems share a common predetermined protocol, it is almost always more efficient than XML.
Applications of XML to programming lang design (XSL) and other domains are largely a waste of time and won't last.
Re:Super short intro to XML (Score:3, Insightful)
So if someone designs a new (not like XML) format for exchanging data, and manages to get it standardized, then won't this also allow two systems that do not share a predetermined data exchanged protocol to share data? One could also be careful in this design and make sure it is more efficient than XML, not only in space and bandwidth, but also in CPU time and programming time. Now does such a format need to be text based as XML is?
That would be ASN.1 then? (Score:2)
But a lot of effort has gone into XML, and we can afford the extra overhead now, and it is standard and widely available for most languages and platforms. It isn't time to throw that away. I would use XML for all now application development, however the benefits of migrating old applications and their datatypes to XML is marginal - why fix something that isn't broken?
Re:That would be ASN.1 then? (Score:2)
At least XML is more open than ASN.1 is. Not that that means a lot. You can debug ASN.1 with something a little more sophisticated than the "cat" or "more" command.
I recently embarked on writing an XML parser because existing APIs IMHO sucked. But digging into the XML documentation, which was huge, also reveal a "mine field" of bizarre syntax and ambiguities. Meant for human consumption? Certainly machines could have some trouble with it. I went back to using expat, and of course find it to be buggy. But it is huge code and not easy to debug, so I just have to live with it for now.
The first attempt I saw back in the early 1970's for this, and it appeared to have originated with early PL/1 or Algol work, was something IBM called HDF. Too bad they let it drop, even though I see it all over the place today; it's just not recognized.
XML was intended for documents, and giving to those documents certain useful properties. The text string "John Smith" might not be so obvious that it is a name, but "<name>John Smith</name>" is. Then if someone wants their browser to make all names be hyperlinks to look them up in the staff directory, that works. But what is good as a document format just doesn't seem to be all that great for bulk data.
So we have all this storage capacity and bandwidth, so let's waste it? Let XML turn a terabyte of database into two terabytes of text transfer format. That's the ticket! I think I'd rather go with ASN.1 and BER even if documents for them are not so readily available. But if those don't get opened up, I'm sure something new can be built to replace them as well.
Re:That would be ASN.1 then? (Score:2)
It doesn't need to be XML to be standard. Part of the problem I see is that XML (with DTDs) is trying to make it possible to have "standards" without a standardizing process. I see many pitfalls in that. But even a DTD doesn't really attach semantics to names. We're going to end up with a huge mess of DTDs all over the place. We'll be swimming in DTDs.
My email list format is RFC822 addresses separated by newline characters. When you get the first copy you can look at that and see what it is. No DTD lookup needed.
Who decides what DTD will be used as an interchange between businesses anyway? A committee? Why not just have a standards group decide these things?
Re:Super short intro to XML (Score:2)
Define "correctly". Perhaps your idea of that is like mine, which is more along the lines of "do the right thing". When I see some discussions of XML, though, "correct" tends to be more a case of pedantic completeness. But how can all of this solve problems if transferring data between databases of different design, where not only are relationships different, but names and tables are as well. Someone still has to establish some kinds of localized semantics, regardless of the DTD. The transfer data simply becomes a sort of "data Esperanto" if carefully designed. But how does XML aid in that?
Re:Super short intro to XML (Score:2)
What? Me *for* XML? When did that happen? :-)
And why do I need to have my data encapsulated? And why is XML redundantly closing contexts? Why not just close everything with </> instead of having to repeat the name? This is not human (error prone) data entry, this is machine generated data. If an implementation generates bad XML, you find out in the testing phases and go back and fix or redesign. Doing verification on every instance of usage is like leaving all the debug prints enabled and writing them to a log file.
Java is not any more capable than say, C, for adding in logic. You still have to code the logic and that could be done in most languages.
The data Esperanto concept can work, as long as both the speaker and listener (sender and receiver ... encoder and decoder) know the same language. And that comes about from standardization. Now the next thing I'm likely to hear about in the XML world is that if you don't like one DTD, there are plenty more to choose from to do the same thing. DTDs are almost like standards themselves now ... without the thought being really put into them.
Re:Super short intro to XML (Score:2, Informative)
Wow, this guy posts early but it looks good so he gets modded up. What a crock.
XML solves the interchange problem. By this, it is meant that XML allows two systems that do not share a predetermined data exchange protocol to share data. Thats it.
That is a VAST oversimplification. What if instead you had said "computers allow us to carry out a repeated set of instructions. That's it." Doesn't quite tell the whole story, does it? Nor does your kindergarten-level definition of XML tell the whole story.
Applications of XML to programming lang design (XSL) and other domains are largely a waste of time and won't last.
Hmmm.... And will the stock market rebound in the next six months? Will Jesus FINALLY return and lift the Believers up into heaven? Will it rain next Friday?
You can speak out of your ass all you like. Doesn't mean it's gonna happen. XSL/XSLT has been around for a while now, and its user base has only been expanding.
He's a moron, obviously has done nothing more than skimmed a few chapters in some cheap-ass Wrox text, and he gets modded up to 5. There is no justice, I tell you!
Re:Super short intro to XML (Score:2)
Yet its the working definition you'll find in many articles at XML.com. Why would you presume XML serves some larger purpose?
XSL/XSLT has been around for a while now, and its user base has only been expanding.
Once again, XML.com has some very informed articles trashing XSL, and they aren't naive posts by someone who just read the WROX book. Stop by and read them.
Re:Super short intro to XML (Score:5, Funny)
I hear you. The product that I'm working on right now is XML heavy. It's using entirely proprietary data formats, and the XML processing is taking up 80% of the query time. After achieving full buzzword compliance, we decided that the system is way too slow, and now have to strip the whole bloody lot back out again.
Note that there was no reason to use XML in the first place, other than some designers wanted to put it on their resumes. I kid you not.
Re:Super short intro to XML (Score:2)
On the other hand, if you've let the XML-aware code permeate all parts of the system, it's going to be a lot of work to strip it out.
Re:Super short intro to XML (Score:2)
If you can agree on the schemas of course. Given the large amount of wrangling / discussion / argument over what schemas to adopt, e.g. ebXML, XML is getting bogged down, for those of us that would like to actually create something better than EDI that can be the basis for lowering transaction costs and making the world a saner place for doing business it is a pain to have to deal with all this. I'm wondering is a better way of doing things would be to use RDF as the semantics are embedded with the data, and the syntax is easy to read both for machines and humans.
Re:Super short intro to XML (Score:2)
Re:Super short intro to XML (Score:3, Interesting)
Lord knows how annoying it is to write a document so generic, that translating it to other forms can be possible. XML is the perfect format since there is always some middle ware that can turn XML say, into PDF's or HTML. To html, you have XSLT, its a no brainer. But to say a PDF, you can use another scripting language to process the XML and write out the PDF binary. Now we can create a handbook and have some cool stuff on the web without destroying the site.
XML can also have internal uses for say, templating. Using XSLT, you can build a tempalte that would do cool stuff like
[html][body]Hi [username/][/body][/html]
which would be translated into something like
[html]
[body]
[script language="php"]
getUsername();
[/script]
[/body]
[/html]
VERY nice stuff for designers to use.
Yes, I know my php tags and html open/close entities are arcane/wrong... but this is to make it easier to type on my part
XSL-FO (Score:2)
Converting to PDF is easy too. Just use XSL-FO. Apache has an implementation [apache.org] of this. Currently they only create PDFs, but it could easily be sent to a printer directly. You can also convert directly to TeX [ibm.com]:
Re:Super short intro to XML (Score:2)
[Start Flamebait] Please would the moderators have some knowledge of the topic BEFORE flagging things as "Interesting" or "Informative"[End flamebait]
XML solves the data format problem, and nothing more. It does not solve the interchange problem because apps still need to know where to locate relevant information in an XML doc, and how to interpret it. i.e. they have to have knowledge of the DTD and translate from the XML (structured according to the DTD) into their own internal format.
So instead of needing to create a reader for a binary EDI format, you plug in a bog standard parser and get named values. So it makes interchange EASIER for the programmer. Especially those with languages that don't do binary data very well.
God only knows what XSL has to do with "programming language design". XSL has two explicit goals: 1. (XSLT) a generic translation from one XML format to another. Why? Because everyone wants to use their OWN XML DTD, so to interact with umpteen other products you need to understand the DTDs of each ... or you write an XSL for each to change it into your format. 2. (XSL:FO) display primitives to allow an XML document to be transformed into a display language, so we can see the damn thing.
Re:Super short intro to XML (Score:2)
Actually, less. It solves the metaformat problem.
It does not solve the interchange problem because apps still need to know where to locate relevant information in an XML doc
A simple policy is to reject any file without a valid DOCTYPE.
As for what meaning you infer from tagged data, no standard is ever going to tell you that.
Re:Super short intro to XML (Score:2)
Isn't that statement an oxymoron? If both systems understand XML then they DO 'share a predetermined data exchange protocol.'
--jeff
Missing the Big Picture (Score:5, Insightful)
The interesting thing about XML to me is NOT that it solves the interchange problem (though it helps with that). The great thing is that it solves the PARSING problem. No longer do I have to write a parser everytime I have some simple task of reading in something externally.
What XML does is define for you a standard means of parsing, and by defining the API for parsing and the structure of the documents lets you think about how you want to structure external information, not how you're going to read it in.
Also, because the API for parsing is now hiding the engine details below, parsers can be specialized depending on what kind of task you have. Parsing thousands of 1k XML documents would seem to demand a different processor altogether from a few multi-GB documents, but you only have to know one parser (Ok, really two - SAX and the DOM interface). You could even have specialized XML processors that did write the stream out in a wierd custom binary format for compactness and read it back in with the normal DOM API so clients wouldn't have to adjust. I'll grant you that there don't seem to be many specialized XML processors - yet.
I also like the robustness of XML exchanges (here I'm getting more into your main point). If you add or drop attributes from an XML document, clients that read that document are less likley to break (unless of course they relied entirely on the node(s) you have removed!). That is especially true of XSL, where missing nodes of a document simply correspond to missing parts of output (which can also be a useful effect).
You might think of XSL as a useless language, but I'll be happy to make a counter-prediction that it will grow and thrive. It's simply too useful a transformation tool to do anything else. I know the syntax seems overbearing, but for the kinds of short transformational work it's normally put to that's not much of an issue and you get used to it quickly.
Re:Super short intro to XML (Score:2)
Then why not compress the other more efficient protocol and receive the proportionately larger gains?
Re:Super short intro to XML (Score:2)
The optimized protocol will always win.
Re:Super short intro to XML (Score:2)
With compressed XML, you have to use some tool like "zcat" to read it. Isn't the whole idea of XML to make it with with "cat"? :-)
Seriously, what is the readability issue all about, anyway? What's so wrong with using a tool to read a format that happens to be in a binary form? I personally find XML is harder to read than HTML and HTML is not that easy to read. And XML is only getting worse.
Re:Super short intro to XML (Score:3, Informative)
You're thinking LZ or huffman. But you could very easily perform utilize tag-id,data-length,data.
If tag-id and data-length are binary integers, then you reduce any tag combination to 8 bytes (which, except for single character tag-names is shorter). It most definately produces faster read-times, since you read entire chunks without lexical comparisons.
For 1-level-deep data-structures, this is pretty good.. You can even reduce the tag-size down to 1 Byte (thus have only 5 bytes overhead per CDATA). This is especially good for protocols between web-server apps, and the like. For multi-levels data-structures, you have the choice of either combining all the levels into new tag-types (though this doesn't allow for recursion), or have the reader keep track of state.
Since this can easily be converted back and forth between XML, what this could mean is that externally XML is used, internally compressed XML is used.
Note that even this has limited usefulness; only at all useful when interacting with 3'rd party apps, or when being saved to disk (to allow vi-modification).
-Michael
Re:Super short intro to XML (Score:2)
Tweak being the keyword. Tweaking fonts / colors is completely different than structure. The goal of an web art designer is to "structure" the web site in an erogonic and visually pleasing manner. Being a pure-hacker, I find distaste for such tweaks. I focus on just getting the data in and out of the users's head. I long for the ability of our graphic's designer to be able to take the data generated by my "business logic" and format it properly.
ASP/PHP/JSP/CGI generation of tables / forms does not easily lend to "tweaking" of format (only primative CSS tweaks). Instead what we want is a "winamp" style skinnability where each widget can be relocated at the graphic's designers whim, completely independant of how those widgets work.
There are only two technologies today that I can imagine that do this.. One is XSL, and the other is *gack* component-based programming. From this the components generate zero HTML, but instead internal data-structures which the front-end designer attaches to various HTML widgets (and can thus choose between drop-downs, check-boxes, tables, etc).
In another post, I suggest that if future browsers are XSL compliant, then we can offload all formatting to the browser and just render raw XML. The CSS and XSL would be static pages mapped to the XML data, which thus augments performance incredibly.
-Michael
Re:Super short intro to XML (Score:2)
Non-profitable dot.coms didn't last very long. How long will a non-profitable XML implementor last? eh?
Re:Super short intro to XML (Score:2)
The "profitably" bit already shows that you're not reasonable.
No - profitability is ultimately the only viable test for any technology.
Re:Super short intro to XML (Score:2)
Hello! I whole-heartedly disagree that ASP / PHP / JSP performs any separation (though not familiar enough with Cold Fusion). Instead they perform the exact opposite.. They interlace content, presentation and even security. They do not natively allow for the separation of work a hacker, an artist, and a DBA performs, since they all require arbitrarily aggreed apon artificial APIs.
Presentation is only separateable when function calls to "header", "footer", URIs are made. A new developer wouldn't necessarily know which file(s) to use for such linkage. Login authentication / authorization are often relegated to the individual web pages, which makes the vulnerability dependent on the bug-free-ness of each page (weakest link is weakest/buggiest URI). Granted, there are propritary means within each system to enforce authen/authz.
Though I'm not a great fan on XML, XSL allows a 100% separation of style (beyond even that of style-sheets). Even the use of head/foot within PHP/ASP doesn't allow external formatting of the individual tables as XSL does. What's more, theoretically future XML / XSL compliant browsers can off-load the server overhead of the presentation layer completely. Thus there is zero "extra layers" of work for the server. You generate XML however you please (just as you would HTML), then associate static style sheets / XSL sheets from which the future browser renders the image.
For the time being, all this technology is in flux. As I said, I'm no big fan of XML, but I've had to produce proprietary solutions to the data/format separation problem, since no existing "ASP-style" architecture solved it to my company's satisfaction.
-Michael
Re:Super short intro to XML (Score:2)
Hear at the PA in the UK we handle a vast amount of Sports data from football (soccer) to athletics, more and more of that data is being generated, transmitted and processed as XML.
There are several reasions for this, the three I can think of off the top of my head are
1) is that XML is mostly repeated text which allows very good compression rates
2) XML makes it easy to grab just the data you require out of a stream and discard the rest.
3) XML must meet sertan criteria in order to be valid, which allows most of the processing to be done by pre built modules sutch as expat
That does not mean that things can go wrong. I have one feed, I won't say from whome, that isn't well formed and another with 160 diffrent DTDs most of which could have been merged if somone had thaught about things first.
Web-Apps need XML (Score:5, Informative)
The breakdown is not on the logic/content side of the equation, or the presentation/content side, but mainly in the presentation/logic arena.
Imagine an HTML designer who has mocked up a page for a web-app, and hands it off to the dev team for them to add in the neccessary laogic to dynamically include the user-name, current balance, contents of the shopping cart, etc. Depending on the exact paragdigm taht their tools use, they will either:
a) Chop up the page and include various fragments in the programs that are designed to emit said fragments at the opportune times to be assembled into a text stream eventually recived by a browser
or b) Various bits of logic get stuck into the page in oder to parameterize and/or conditionalize it, using either some sort of speacial tagging format or actual inlined blocks of code.
Whichever approach the dev team's tools use, the result is the same: the designer can no longer change the altered page.
Even in case b), which maintains some semblance of a coherent 'page', the designer cannot load the page-with-logic into their favorite visual editor and see anything resembling the actual page. They certainly can't edit it to change the look-and-feel without breaking the carefully constructed logic.
The end result is that the designer has no recourse other than to take their page design, change it, and hand it over to the dev-team again for them to re-include (in some cases re-code) all of their logic.
This is obviously a very wasteful approach.
Amazingly, there actually is a solution to this problem. It's called Template Attribute Language [zope.org] (TAL), and it solves the problem by adding programming directives to the page via XHTML attributes on the existing tags. The language is deliberately designed to only be suitable for presentation logic, relegating business logic code to some other objects, where the designer can't see them. This helps enforce the appropriate distinction between presentation logic and business logic that most current development environments ignore, thus encouraging their admixture.
Currently, TAL (and the related specifications TALES [zope.org] and METAL [zope.org]) are only implemented in one environment [zope.org], but the language has been deliberately designed to be as platform agnostic as possible. Other implementations of the specification are possible, and even desireable.
Articles:
Zope Page Templates: Getting Started [zope.org]
Zope Page Templates: Advanced Usage [zope.org]
Using Zope with Amaya, Dreamweaver, and other WYSIWYG Tools [zope.org]
Re:Web-Apps need XML (Score:3, Informative)
Re:Super short debunking of XML (Score:2)
Note that you can have zero or more required arguments, zero or more optional arguments, and zero or more named arguments.
(document
'(
(paragraph
(
"Hello"
(style
"World."
(style
Where paragraph has one required argument, the data in the paragraph, and one or more optional keyword arguments containing the style, formatting, etc.
More on OODBMS (Score:2, Informative)
Same article on kuro5hin (Score:3, Funny)
An other interesting link (Score:3, Informative)
There was a good discussion on XML data bases on the XML-Dev mailing list, which is summarized pretty well by Leigh Dodds XML and Databases? Follow Your Nose [xml.com].
xml is an interchange format, not a storage format (Score:5, Interesting)
Oracle is taking some BIGTIME performance hits for stacking all that OO crap in there, and MS SQL Server is seeing the same thing now that they've got the XML in theirs. Don't believe me?
Why is NASA switching to MySQL from Oracle [fcw.com] and noticing speed increases?
Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.
Re:xml is an interchange format, not a storage for (Score:5, Insightful)
However, citing NASA as a source for technology or trends is a bit silly, for a number of reasons. The primary one is this: NASA is so large, and so diverse, that at one of their sites/on one of their projects they use one of just about every technology product you can name.
I was once running two back-to-back software evaluations for products in the $20-million range. For both applications, the top ten vendors all claimed that their system was "used by NASA for the Space Shuttle". We checked up and guess what - they were all telling the truth.
So you need a better example.
sPh
Re:xml is an interchange format, not a storage for (Score:2)
Re:xml is an interchange format, not a storage for (Score:3, Interesting)
But what if your data representation is already an XML schema? And a pretty complicated one at that? For example, look at METS [loc.gov] : The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.
Have a look at that schema [loc.gov] and tell me how you'd store that in a traditional RDBMS (I'd be interested if you could, because I know SQL, I don't know OODMBS or XML repositories - this is painful for me). Databases have been for storing data, but when your data is already a complex XML representation of an object, there's little use in saying don't use OODBMS.
Re:xml is an interchange format, not a storage for (Score:2)
Re:xml is an interchange format, not a storage for (Score:3, Insightful)
So what do you think of using XML for system configurations? That tends to be in UNIX systems a lot of separate files, traditionally edited with vi although today the tools are getting more and more dummy friendly and have a smaller space of possibiities.
Re:xml is an interchange format, not a storage for (Score:3, Informative)
Re:xml is an interchange format, not a storage for (Score:2)
XML tends to be good for hierchial, widely-parseable data. In this sense, XML is good for configuration files, because many of the more advanced ones need some type of hierarchy to be sane. Also, it makes it easy to have one editing mode for many different configuration files, and configurations can be displayed/queried in a more universal manner.
Re:xml is an interchange format, not a storage for (Score:2)
I find the tags are a major hindrance proper editing tekniq. If the tool is vi, I have to deal with the tags manually. If the tool hides the tags, then it has to be interpreting them and presenting some logical construct. But I've yet to see any tool that can let me do all I want with config files. How would /etc/rc look in XML?
Re:xml is an interchange format, not a storage for (Score:5, Informative)
xml is an interchange format, not a storage format
Absolutely, positively agree. Not only is XML only an interchange format, but it only makes sense in some situations (for instance if we have an embedded piece of hardware that we have to communicate with, and we're communicating to it from a Windows box, and there is no shared common data encapsulation format, I'd greatly prefer XML (with XSD) vastly over Jimmy the Programmer making up his own data encapsulation format/documentation method/extraction system, but if I have two Windows machines running SQL Server and they're in a common security context and they'll never change, I'd use DTS or replication, not XML).
and MS SQL Server is seeing the same thing now that they've got the XML in theirs
The XML "in" SQL Server is surface fluff (I love SQL Server and I'm saying this as a good thing, not a bad thing). i.e. Some modules that'll convert an XML query to an underlying DB query, and the results back to XML, and some basic XML importing and exporting routines. This hasn't affected the underlying operations of SQL Server whatsoever.
"SQL Server" (Score:2, Funny)
Oracle vs. MySQL performance (Score:3, Insightful)
I wouldn't be surprised if "OO crap" does indeed slow down Oracle, but I know the JVM for Oracle is completely optional. I can't speak to any XML features in Oracle, I'm not familiar with them.
Re:Oracle vs. MySQL performance (Score:2)
is simply because MySQL doesn't have transactional support
Actually, the newer versions are adding transactions. They still have table locking, so performance will probably suffer if you have alot of concurrency.
Re:Oracle vs. MySQL performance (Score:2)
What is the overhead of [dis]assembling the data? (Score:2)
Exactly, and XML is a format for encoding structured data. There are many kinds of documents that live their live their entire lives as XML, from XHTML documents to configuration files to myriad kinds of XML documents that exist today [xml.org].
Why is NASA switching to MySQL from Oracle [fcw.com] and noticing speed increases?
If all you want is speed then MySQL is all you need. Similarly I can quote how much faster TUX is than Apache but that means nothing if I have dynamic database driven content that I want to use JSP or Perl to access.
There is more to picking a database than how quickly it performs some SQL queries.
Don't get me wrong, I'm a big fan of XML.. as a data interchange format.. but when i want tight storage and quick retrieval, give me a normalized RDBMS any day of the week. Because that's what it's for.
This means you're suggesting that people shred XML documents into relational data to store them in the DB and then reassemble them whenever they retrieve them. This is massive overhead and error prone since you're depending on your developers to come up with custom ways of doing this for each application. Also typically very difficult to ensure that the XML that was stored in the DB can be accurately reconstructed (what happens to comments, processing instructions, enbtities, etc).
Re:xml is an interchange format, not a storage for (Score:2)
Huh?
What tasks don't perform faster when you run them on faster hardware? Are you trying to say that the code and architecture are absolutly optimal, and no performance gains are possible without a hardware upgrade? Not likely.
Re:xml is an interchange format, not a storage for (Score:2)
Re:xml is an interchange format, not a storage for (Score:2)
Yes it is true that some software does not scale well but that's not nearly enough information to mean anything. Does that mean that if you add another machine you get more performace? Another CPU? More memory? Software that "scales well" in one environment (say on 4-8 CPU x86 machines) may not scale well in other environments (large mainframes).
Another point. Say that the performance of the software scales linearly, and your performace is multiplied by the number of whatever hardware devices you're adding that you have. You could argue that that software scales well, but if said software has a slow section of code in it's main execution path, optimization of that code (or removal if it's a fluff feature) shifts your whole curve. There is no reason that a piece of software can't both architecturaly scale well and perform like crap at the same time.
I have no experience with SQL server, so I cannot if this is the case or not. I do know that I would not be able to make a decision about it only knowing that is scales well.
Re:xml is an interchange format, not a storage for (Score:2)
Are there any tools to access XML with SQL? (Score:2)
Re:Are there any tools to access XML with SQL? (Score:2)
The problem with XML is... (Score:5, Funny)
Granted, XML has some advantages. Data interchange among disimilar clients, for one. But storing XML in a database is a gross waste of space and processing power, and is realistically impossible for all but the smallest of databases.
Re:The problem with XML is... (Score:2)
Requesting a list of clients and their sales will return an XML file that describes this list and sublists.
We definately would not use XML as the actual storage format.
SAX! (Score:2)
Now, if you're pointing out that XML provides no mechanism for indexing so you'll have to scan the file *until* you reach the record you're interested in, I agree. But as others have pointed out, nobody uses XML as the storage format for anything but the smallest databases. (E.g., configuration files.) But the translation to/from XML format for queries no more breaks its 'purity' than converting SQL "insert" clauses into binary data stored in B-tree or ISAM tables breaks its relational purity.
Re:SAX! (Score:2)
My experiences with OODBMS (Score:2, Informative)
After several weeks of dealing with growing pains and general brokenness, my manager wisely decided to transition our systems back to a UNIX environment. I worked in the group that was responsible for this, and after obtaining source code to several of our accounting and inventory applications, we moved the operation over to a Linux 2.2 (Debian potato) system. Things have worked flawlessly since then, and the OODBMS and Java developers are long gone. The promise of an OO architecture was great, but it just didn't work out in the real world - Linux was the solution for us.
-CT
Re:My experiences with OODBMS (Score:2)
Methinks from the above (and your handle) that your post is a joke aimed at highlighting the moderator's ineptness. Shame on the moderator for letting it through - evidence enough that all you need to do be moderated up is throw in enough buzzwords to confuse the moderators (who don't really know much about the issues anyways).
Now watch me get modded down.
Re:My experiences with OODBMS (Score:2)
JAVA RUNS ON UNIX. He just tossed out the Linux reference to get you guys to mod it up: "Ooh, Windows and Java failed! Linux worked! +1, informative!"
The decision of whether or not you use Linux has absolutely nothing whatsoever to do with the decision to use Java and/or OO techniques. Further, I've never seen an unstable JRE in my life -- the JRE is the single most stable Windows app I have ever used (although the instability of Windows itself still leaves it undesirable). The last time I saw a JRE crash (even once) was, I believe, three years ago using JDK 1.2 beta 4. I program Java seven days a week, and it simply does not crash.
And I'm also pretty impressed that you could hire new people, redesign a complex system, reimplement the new design in a completely different language/platform/database, realize it wasn't working, fire the new people, assign new people to the project, and transition over to yet *another* new platform in the space of a few short months. That's the quickest turnaround I've ever heard of.
(Translation: this guy's a troll. Please stop handing the frickin' trolls karma points.)
Re:My experiences with OODBMS (Score:2)
We really didn't have a choice. Porting the original system to Linux was the most cost-effective option available.
And yes, we did accomplish everything within a few months. Our developers spend significant amounts of time doing actual work (it's part of the corporate culture) and very little time playing your alleged "troll busting" game on Slashdot. That goes a long way toward explaining our unusually high productivity.
-CT
Re:My experiences with OODBMS (Score:2)
"...we had reasons to believe it was the JRE and/or OS..."
I obviously have no retort to this other than to stand by my assertion that Sun's JRE is rock-solid. I have already stated that I would never use Windows in a production environment, but that's Windows' problem, not Java's. A real Java program could have been moved to any of the discussed platforms in a few minutes. I actually develop my server software on Windows and then deploy to Solaris, and in three years of doing this I've never had an issue.
"Our developers spend significant amounts of time doing actual work (it's part of the corporate culture) and very little time playing your alleged 'troll busting' game on Slashdot"
Yet, here you are posting on Slashdot, same as me. And you're implying that you guys are more efficient because
"That goes a long way toward explaining our unusually high productivity."
It actually wasn't your high productivity I was commenting on. After all, the net result was that you spent a few months and (presumably) tens of thousands of dollars, and in the end all you accomplished was porting from HP-UX to Linux. That's a remarkably slow and expensive porting job. The bit I was commenting on was how quickly the plans were abandoned and the guys were fired -- you said "a few months", and presumably most of that time was doing the port. How long did you give them to try to fix it? It just sounded like the new plan wasn't given a serious chance for survival, but then I wasn't there so I don't know how long they dicked around with it.
Everybody hires idiots now and then, and kudos to you guys for getting rid of them so quickly, assuming they really didn't know what they were doing. But these problems were not caused by Java, Windows, or an OODBS -- they were caused by incompetence, plain and simple.
RFC (Score:2)
Can we please, please, please append the definition of XML to allow "" to close whatever the last tag was?
That simple change would probably cut the size of the average XML file in half.
Re:RFC (Score:2)
Can we please, please, please append the definition of XML to allow "</>" to close whatever the last tag was?
That simple change would probably cut the size of the average XML file in half.
(corrected post, please moderate my other one down. I have plenty of Karma...)
Re:RFC (Score:2)
This would take away from the self-documenting nature of XML, I think.
Inevitably, authors would begin terminating their deeply nested documents with tags like:
which is a lot less informative/helpful/debuggable than:
Know what I mean?
Re:RFC (Score:2)
You would still be able to do it that way, but I don't see the advantage of requiring the trailing tags. If you're creating files that are only ever going to be read by machines, it makes no sense to waste the space. Heck, if you're debugging something that always used the "shortcut", it would be trivial to make a little filter to fill in the trailing tags with the full names.
The biggest problem with XML is the incredibly wasteful and verbose nature.
Re:RFC (Score:2)
Look, I respect the Ivory Tower as much as the next guy, but at some point you need to live in the real world.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
Agreed. It's a cost/benefit analysis. It seems to me that my suggestion has a HUGE benefit at a very slight cost.
XML documents should be human-legible and reasonably clear.
Minimal legibility is sacrificed. The opening tag is the important one, not the trailing tag.
Terseness in XML markup is of minimal importance.
Apparently it is to you, but not to those of us who have suffered this problem. The average document would get have the size, but many documents would become 1/10th the size. Think about a database with lots of single character data, but with long column names.
Re:RFC (Score:2)
Nonsense. All the XML I have seen insists on using 1-letter command names in order to get things reasonably short. I fully agree with the original poster, having shorter close tags would probably halve the reasons for these short tags and actually make things more readable.
In my opinion XML is a mistake. We should have copied ASCII design, which reserved 32 control characters for exactly this purpose.
I would like to see an XML replacement where the text is UTF-8, there are NO "escape" sequences. And "<" is replaced with the control character "^[", ">" is replaced with "^\", close commands replaced with "^]", and '="' is replaced with "^[" and closing quote replaced with "^]".
This would change "<command arg="foo">data</command>" into "^[command arg^[foo^]^\data^]" and would be much easier for a program to parse and editors could be made to display this in a user-friendly way just like they special-case ^J and ^I now.
Other more controversial ideas I had:
^A, ^B, etc (all except ^I..^M) mean the same as ^[A^\, etc, so they are 1-byte shortcuts for all 1-letter commands.
^^...^J is a comment, ie ^^ is defined as introducing an end-of-line comment. This can also be used to remove linebreaks from the data.
The sequence ^M^J should cause the machine to crash immediately :=)
XML + XSL(t) client side database. (Score:3, Interesting)
It requires Netscape 6.(not out yet), IE 6, or Mozilla 0.9.5+ because of it's use of XSL Transform functions.
You can view the page here. [singleclick.com]
Joseph Elwell.
Closed minded people sadden me... (Score:3, Informative)
XML allows data to be stored with context. For example if you have the data element "CmdrTaco", that doesn't mean much. But with xml, you can store this bit of information with context:
<SlashDot>
<Editor>
<Name>CmdrTaco</Name>
</Editor>
</Slashdot>
Isn't that more informative?
It is surprising to me that people who like OO don't like XML. OO allows you to have functionality attached to your data. XML allows you to put context (and even functionality) around your data.
Another big advantage of XML databases is the lack of a schema. If you want to have a dynamic database is the relational world, you are looking at a large schema migration. An XML database allows you to just add the information with no migration at all.
Advanced storing techniques allows query of the XML database to be just as fast as a relational database. How can that be? The XML is stored in a specialized indexed form that allows for fast retrival.
Sure, there are applications where it doesn't make sense to use an XML database. Using an XML database to store relational data doesn't make sence, that's what relational databases are for. But if you can think outside the mold, and store your data in a new way, XML databases are for you.
I might be a little biased in this area, since I work for a XML database company (http://www.neocore.com). I have seen XML in action, and it is more than just a data transport. I hope that I can convince at least one person to look at this advanced technology.
Re:Closed minded people sadden me... (Score:2)
> But with xml, you can store this bit of information with context:
<SlashDot>
<Editor>
<Name>CmdrTaco</Name>
</Editor>
</Slashdot>
Isn't that more informative?
Yes, and I can do the same thing in Scheme with about half as many characters, and with the added advantage of being able to treat parts my data and stylesheet as executable code if I wish.
Nor do I have to reformat it with a bunch of ampersands to post it to Slashdot, by the way.
(SlashDot
(Editor
(Name CmdrTaco)
)
)
Even more readable, IMO.
Oh, and people had been doing this for years before XML was ever misbegotten.
XML: More snake oil to the rescue.
Yes (Score:2)
I really don't get people who complain about Lisp syntax and then tell me how wonderful XML is - XML is 10x more annoying than Lisp!
Also, if you want to deal with XML in a semi-sane way, may I recommend just transforming it into Scheme, processing it with the normal LISPy tricks, then pretty-printing it back out... See here [sourceforge.net] for the best we to deal with XML weenies.
Re:Closed minded people sadden me... (Score:2)
I've noticed XML is basically Scheme too. One question though, how do you do XML attributes in Scheme?
<Slashdot>
<Editor Type="Full-Time">
<Name>CmdrTaco</N ame>
</Editor>
<Slashdot>
Re:Closed minded people sadden me... (Score:2)
(Slashdot
((Editor
(type Full-Time)
(worthless-stock-options yes))
(name CmdrTaco)))
--jeff
Re:Closed minded people sadden me... (Score:2)
Well that breaks stuff, right?
If elements with attributes start with two parentheses that makes them different from elements without attributes. There's gotta be a way since the SSAX [sourceforge.net] project has to handle it somehow.
Re:Closed minded people sadden me... (Score:2)
> Well that breaks stuff, right? If elements with attributes start with two parentheses that makes them different from elements without attributes.
I wouldn't use the double parens. Something like -
(Editor
(type Full-Time)
(worthless-stock-options yes)
(name CmdrTaco)
)
would work. In fact that's what I would probably do (depending on exactly what I needed to represent).
Notice that if you have already found the Editor structure, you can take the cdr to get a list of key-value pairs, and use assoc to find the key-value pair that you want.
This can be abstracted pretty easily into a hierarchy of lookup tables, and you can write really simple functions to extract the parts you want.
Your criticisms are naive (Score:2)
Re:Closed minded people sadden me... (Score:2)
[snip]
Isn't that more informative?
When was the last time you looked at the data files of your database system? I don't think I've ever looked at the actual on-disk data of MSSQL or MySQL in quite some years. Who gives a rats arse whether that data is readable? The output from the database perhaps, but thats a formatting issue, not an architectural one.
RFC (corrected) (Score:5, Interesting)
Can we please, please, please append the definition of XML to allow "</>" to close whatever the last tag was?
That simple change would probably cut the size of the average XML file in half.
(corrected post, please moderate my other one down. I have plenty of Karma to spare...)
They'd never do that... (Score:2)
<foo> <bar>baz</bar> <mumble>grumble</mumble></foo>
is equivalent to
<foo> <bar>baz</> <mumble>grumble</></>
which is semantically equivalent to
(foo (bar "baz") (mumble "grumble"))
And if they did that, they might have to admit that XML is semantically equivalent to Lisp S-expressions, and not a major advance in computer science after all.
And they'd never do that.
Re:They'd never do that... (Score:2)
(car (cdr (car (cdr (cdr (car "x y m q")))))))))))
Is cool - and thus, we have editors that automagically balance parenthesis. But don't get me wrong, I have a real appreciation for people that can do "real programming" (like video codecs) in Lisp.
Yeah.. (Score:2)
Of course, one of the "ideas" of XML is that you can just strip out all of the tags and have a document you can sort of read. That would be anathema to a Lisp person, and for good reason. Lisp is all about simple, minimalistic expression and manipulation of hierarchical data. XML is about an underspecified hodgepodge of structure and free form data.
Which is not to say that it's not useful, regardless.
Re:They'd never do that... (Score:2)
The reason why XML uses the notation it does is that it is somewhat more robust. The problem with S-Expressions is that one misplaced parenthesis can cause the entire semantics of the expression to be changed, or as we computer scientists say 'be fucked beyond recognition'.
Most major advances in computing are not major advances in computer science. There was absolutely nothing original in C, it was merely a version of B with a few additional features added back from BCPL which was itself merely a subset of CPL which was merely a revision of ALGOL and so on.
Packaging counts for an awful lot.
Re:They'd never do that... (Score:2)
Worst of Both Worlds (Score:2)
ControllID
ParentControllID
DataType
FormLocationX
FormLocationY
Then they had a giant data table like this
DataID
ParentDataID
ControllID
Data
Argh! The madness of it all. Everything of substance was in these two tables. I'll admit that it's a nice hack, and they can tell all their clients that their data is 'easily exported into a CSV file.', but good greif! It reminds me of those people whoe made so many # define macros in C as to make it look like Pascal.
Performance (Score:2, Informative)
This may be great for acadamia, or perhaps small projects, but in "The Real World"(tm) this won't fly. As a performance guy working on a big system, I can tell you that using OO databases and/or XML queries/storage will butcher performance.
For most of our clients, performance is the #1 concern, as that is what dictates hardware. Buying one 32-way p680 for a typical RDMS solution -vs two for a fancy OO/XML solution isn't much of a choice.
XML is the storage format for some things (Score:3, Interesting)
The solution is XML. You create a XML Schema and start storing stuff. Some company wants more parameters - no problem, extend the schema. You need to migrate previous XML docs to adhere to the current schema, use XSLT. Or you can add these as optional parameters and every document that exists already will conform to the schema.
Speed in XML is an issue. But people who think you need to read the entire XML document to process don't know what they're talking about. You can do modular processing. Also, you can do smart indexing to increase speed. And in a production environment, you turn Schema cheking off unless you're getting documents from untrusted sources. Will XML ever be as fast as RDBMS? Probably not. But XML doesn't store relational data. And with current research in XML Query languages, I'm sure XML's speed will be good enough for most applications in the future that deal with fuzzy schemas. (If you need high performance DB, then you have to bite the bullet and use a RDBMS).
My two cents.
Database storage in XML format is fine, if... (Score:4, Insightful)
1) Can certain records be considered 'atomic'?
This is similar to the RDBMS question of whether or not it makes sense to construct a view or not. View definitions represent a common query. If you considering a query as a means of tying together disparate data from many tables into a single, denormalized set of records, the record could just as easily be expressed in some XML format.
Now, if that record represents some physical or conceptual entity in the data model, it is in fact a set of properties about an object. This is what XML is good at representing. Decomposing that set of object data (record) into normalized relations may not make sense if such 'objects' are frequently requested; but there other considerations...
2) Ad hoc queries are difficult when data is stored internally in XML, because each XML blob has to be parsed and checked for the query values. If you don't know in advance if the XML structure even has the fields you're looking for, then you must do an exhaustive search. Some have used indexed XPath information to work around this issue. Since we're mentioning indexes...
3) How do you find the XML blobs you're looking for. We've used an ORDBMS for our XML data, and indexed on the ID or key values (as defined in an XML Schema) for each element stored in the database. This makes looking up element instances easier. It also makes relating them easier, too, if you use IDREF or keyrefs as your foreign keys.
Now every XML document has a single root element. If you're storing that document in a database, you could choose to store just that one root element instance. More likely, you'll want to decompose the root so that accessing subelements by ID or key in the database will be easier.
Got to run off now,
Jeff Lowery
most problems xml is used for (Score:2)
I'm not just talking out of my ass either. I've worked with EDI systems(data in binary format means you need proprietary software on both ends), XML, and plain old text files. I've used all 3 in the context of transferring data between businesses, which is what XML aims to solve. My feeling is that plain old text files, along with a descriptive file of how the text file is laid out, is overall the best solution for most data interchanges between businesses.
One really good example of this is using diff. Suppose your supplier maintains a database of products you can order, and this data changes daily. Using text files you can easily diff todays file with the one you retrieved the day before and get a much smaller file to use to update your internal database. I can't imagine a more elegant solution using XML.
I have found one good solution that uses XML - outputting XML on the fly over the net in response to a query. If you have customers that query your data regularly over the web, any change to the HTML will throw their queries off if they are "screen scraping" to get at your data. XML solves this problem nicely, even if new fields are added or if the XML page layout changes in some way. I don't see the logic of actually storing XML in the database though.
My experience of being in a business where data interchanges take place on a regular basis with other businesses, is that formatted text files are still the best way overall. They are easier to deal with and faster than XML ever will be.
Re:most problems xml is used for (Score:2)
The difference being that for an XML file, the code for loading and parsing the data into an object model, manipulating and querying it is the same for every XML document. Whereas, for plain text files and human-readable descriptions you need a programmer to write and test code for each type of file. For XML this code has already been written and tested.
I don't agree with the 'diff' example either, for example the diff between two text files tells you nothing about the context of the diff, ie what the meaning of the change is (and no, just knowing the few lines above/below doesn't necessarily tell you anything, either). You have to manually refer to the original document and the description of the file-format in order to work out what has changed: just knowing that a particular line changed doesn't tell you what that change actually means. With an XML document it's easy to automatically derive the context of the diff, and there are already many programs which will do this.
Re:most problems xml is used for (Score:2, Insightful)
XML is not a language, notwithstanding its own name. It's a metacodification, used to create codifications such as XQL, HTML, DocBook and so on.
OO people are usually programmers with very little CS fundamentals, so they don't even get this right: when they are talking about XML in database contexts, they should at least specify the coding they want to use. And then it should be understood that you need to use it for storage encoding, or for data communications, or both.
Thus one cannot say that XML was created for data interchange -- it was created for metacodification. One can create a data interchange codification based on XML -- but that's kind of stupid, since XML codifications usually will give big overheads. We've been doing data interchange with text files with little problems for years. The issue of agreeing on data model and codification between applications does not go away just because you agreed on using some codification with a big overhead.
But I haven't still touched on the worst on using XML codifications in database contexts -- it is that both XML and OO are hierarchical, thus a regression to thirty years ago when there were navigational databases, no data independence, hierarchical and network systems... we are throwing away thirty years of relational research without ever having implemented it right.
But that's the way of an uneducated world... just as people adopting proprietary technologies have thrown away open systems ideals without ever having got it right.
--
My XML and DB experiences (Score:2)
For my small projects, using DBXML has been a joy. There are certain things for which using XML makes a lot more sense. Some data models just fit more naturally into hierarchcal structures, for example users and groups. If you have unique usernames, you can pull data on a user, then pull their group quite easily without the need for a reference table simply by pulling hte user's parent.
This isn't to say I think XML databases are the answer to everything. One of the largest problems I find so far is that it is that queries that are relatively easy in SQL can get a bit tedious is XPath. Also, as of yet there doesn't seem to be any truly standard query language. This is understandable, given how new the designs are, but it is a bit difficult to decide how to do things sometimes. Do you check in a document, or XUpdate it? Play with DBXML and you'll see what I'm talking about.
For those of you complaining about XML not being an efficient way of data storage because of the high memory cost of keeping documents in memory, bear in mind that there are more parsers out there than just DOM and its relatives. SAX is quite efficient, and even if you're using DOM it is entirely possible to pull fragments out of the document as you see fit; in fact XPath makes this quite easy.
I may be crazy, but I eventually see XML databases providing solid competition to standard RDBMS systems. I've seen complaints about performance -- I think much of this is lodged in the fact that a lot of these systems are not native XML databases -- they are RDBMSs with XML capabilities thrown on top. One way or another, it should be interestign to see how things pan out.
End rant.
Report writers for non-relational databases (Score:2, Insightful)
I have spent a lot of time training non-technical users to get their own damn reports from databases. It's hard to imagine putting data--any data--into a system where the tools to get it out haven't been written yet.
Triple stores (Score:4, Interesting)
In a triple store, you have objects that are defined by a set of properties. The word "triple" comes from the fact that you have triples of objects, properties and property values. For example, you could have a person; John Q, who has an age 37, a phone number 1234 and an employer Foo Ltd. Foo Ltd. in turn has a phone number 5678 and any number of other properties. This forms the following tripples: John Q --age--> 37, John Q --phone number--> 1234. John Q --employer--> Foo Ltd. Foo Ltd --phone number--> 5678.
When you look at these, you can see that Foo Ltd. is both the employer of John Q (a property value) but also an object in itself that is described by a set of properties. In RDF, the tripples form a graph that describes your data. The graph is typically serialized as XML.
At first, it would seem that this lends itself very well for relational databases. A row in a table would be the object to be described and columns are the properties. The intersection is the value. However, the problem - and strength of RDF - is that you can have any number of properties for an object. Basically, you could have any number of columns and sometimes, the property value is not just a value - it can be a database row in itself or even a set of rows.. or a set of values.
The app I wrote mapped arbitrary RDF files to relational databases and back as well as provided an API to perform queries on the data. The result of the queries were RDF graphs in themselves.
While this was quite cool, it turned out to be quite difficult to turn the query result graphs into meaningful stuff in a user interface. Also, queries on the RDF graphs could turn out to be extremely complex SQL queries... Most of these problems were eventually solved but the code wasn't used directly for any real world app, except heavily modified as a metadata database for a web publishing system.
A markup weenie rebuts. (Score:4, Interesting)
But these are critiques directed at the hype machine, not the specification. This is really distressing me. The machine is so efficient that there are API's for XML (which shall remain nameless) being written and optimized for message passing which cannot handle mixed content as a matter of design. As though it were somehow so useful in this area that a section of the spec should be tossed to make it efficient. As though there weren't already gallons of ink being spilled on EDI, etc.
XML was not designed to replace S-expressions, to facilitate cross-platform communications, revolutionize EDI or DBMs, to theorize about language design, yada, yada. XML is just that, an Extensible bloody Markup Language, a document tagging scheme. In this regard it is a tremendous advance. It is 80% less suck, by volume, than what went before. If you think your XML parser is bloated, have a look at any SGML parser. Part of what gets stripped out is tag minimization, the absence of which another poster complained about.
Hey, its text and not binary because I need to write it and read it. Yes, Virginia, I've got 400 users tagging XML in flat-file editors. They complained about the loss of tag-minimization, too. But my svelte little Xerces needs a hand to stay so lean.
The goal is to get structural and semantic information into my documents. (Yes, it's data, but a special kind of data called a document. You can call the message your passing a document, and use XML to format it, but there is some overhead the hype machine may not have emphasised in their rush to market.) I also strive to eliminate formatting or presentation instructions from the document (or hide them in PIs) to facilitate multi-target outputs. This lets my typesetters typeset and my data-entry people enter data.
XML is designed to bring something of this model to the web. HTML is too presentation oriented. SGML is too bulky. That's what it do, babe. I take a single source file from somewhere on the filesystem, incorporate pieces from elsewhere (entity resolution, DB queries, etc.), turn it into one of five possible outputs. I use two different pagination engines with different proprietary formatting macros, XSL(T|FO), or a trap door on the bottom to dump pretty-printed ASCII. Its a publishing tool.
Re:A markup weenie rebuts. (Score:2, Insightful)
XSL is also cool, once you climb the steep learning curve and bend your mind around it's declarative style.
As for native xml db's - that is probably mostly hype.
performance (Score:3, Funny)
Re:XML == bad job (Score:2)
Perhaps your opinion does not count for very much if you don't know enough about the subject you are prattling on about that you have to make such an inane statement.
SGML predated UTF by at least a decade. SGML also predated the fad for reading Chomsky in the compiler writing community. The original SGML 'standard' is more or less documentation of Goldfarb's original code (COBOL from the looks of the spec).
XML is a cleanup of SGML which removes the more demented parts of the original architecture. The DTDs are one such part.