An Overview of Modern XML Processing Techniques and APIs

An Overview of Modern XML Processing Techniques and APIs 40

Posted by timothy on Thursday July 10, 2003 @04:00PM from the three-letters-that-spell-work dept.

Dare Obasanjo writes with a link to his article "A Survey of APIs and Techniques for Processing XML" on xml.net. It starts off "In recent times the landscape of APIs and techniques for processing XML has been in the process of reinventing itself as developers and API designers learn from their experiences and some past mistakes. APIs such as DOM and SAX which used to be the bread and butter of XML APIs are giving way to new models of examining and processing XML. However although some of these techniques have become widespread amongst developers who primarily work with XML they are still unknown to the general body of developers. Nothing highlights this better than a recent article by Tim Bray one of the co-inventors of XML entitled XML is too Hard for Programmers and the subsequent responses on Slashdot." Read the entire article to learn more about the state of the XML art. Added in the missing link.

An Overview of Modern XML Processing Techniques and APIs

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 40 Comments Log In/Create an Account

Comments Filter:

You know... (Score:1, Offtopic)

by avalys ( 221114 ) writes:

With the new technology known as a hyperlink, we can simply click a location on the screen and be taken to the article, instead of having to go to xml.net and find it ourself.

Argh...
- Re:You know... (Score:5, Funny)
  
  by ComputerSlicer23 ( 516509 ) writes: on Thursday July 10, 2003 @05:02PM (#6409732)
  
  Don't worry, it'll get fixed up on the duplicate post in about 4 hours...
  Kirby
  
Actually on xml.com (Score:5, Informative)

by DeathBunny ( 24311 ) writes: on Thursday July 10, 2003 @04:10PM (#6409334)

The article is actually on xml.com, not xml.net. Here is the url: http://www.xml.com/pub/a/2003/07/09/xmlapis.html [xml.com]

No Link? (Score:4, Insightful)

by Snerdley ( 98439 ) writes: on Thursday July 10, 2003 @04:14PM (#6409361)

This is a horrible post!

There is no link to the article, and the one link that comes close (to xml.net [xml.net] ) points to a site that says:
xml.net will be online soon. Sign up now and we'll keep you posted on our progress.
Timothy, how did you read this as the editor?
I am interested in the topic: please fix the post so that we can read the article.

- Re:No Link? (Score:4, Funny)
  
  by Anonymous Coward writes: on Thursday July 10, 2003 @04:21PM (#6409416)
  
  I am interested in the topic: please fix the post so that we can read the article.
  
  The fix will be uploaded in a few days but subscribers can click now and beat the rush!
  
Plaint Text and XML (Score:5, Insightful)

by cpeterso ( 19082 ) writes: on Thursday July 10, 2003 @04:16PM (#6409378) Homepage

XML sucks because it's being used wrongly. It is being used by people who view it as being an encapsulation of semantics and data, and it's not. XML is purely a way of structuring files, and as such, really doesn't add much to the overall picture. XML came from a document preparation tradition. First there was GML, a document preparation system, then SGML, a document preparation system, then HTML, a document preparation system, and now XML. All were designed as ways humans could structure documents. Now we've gotten to the point where XML has become so obscure and so complex to write, that it can no longer be written by people. If you talk to people in Sun about their libraries that generate XML, they say humans cannot read this. It's not designed for human consumption. Yet we're carrying around all the baggage that's in there, because it's designed for humans to read. So XML is a remarkably inefficient encoding system. It's a remarkably difficult to use encoding system, considering what it does. And yet it's become the lingua franca for talking between applications, and that strikes me as crazy.

People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data. Now, for some things there are standards. For example, there are some standards like RSS and RDF, which give you very simple ways of describing web page content. But a random XML file, especially machine generated XML files, can be as obscure as binary data.

Ant is a really good example, because in that case you're using XML as a user-specified input language, which is really inappropriate in that context. I'd much rather have a genuine grammar. I want to be able to type something simple and easy for me. I don't care if it's easy for the tool to parse, that's the tool's problem. I want it to be easy for me to write. And in cases like that, it's really the case of the programmer saying, "Oh look, here's an XML parser. I can just take XML files. That's easier." So one programmer in one context puts a burden on the other 100,000 programmers trying to use it.

- Re:Plaint Text and XML (Score:5, Insightful)
  
  by battjt ( 9342 ) writes: on Thursday July 10, 2003 @04:36PM (#6409528) Homepage
  
  Once data is in XML I can manipulate it without having to write a parser. This is pretty handy in an enterprise setting where data is coming from all over and headed somewhere else. Efficiency of the overall business process is important, not the efficiency of my program. Joe
  
  - Re:Plaint Text and XML (Score:3, Informative)
    
    by __past__ ( 542467 ) writes:
    
    Once data is in XML I can manipulate it without having to write a parser.
    Um, no. Or yes, but only in not too interesting ways.
    An XML processor (Note that the W3C XML Rec carefully avoids the term "parser". That is for a reason.) is more like a lexer than a parser in traditional terms. It tells you about the syntactic elements of an XML document, but nothing about their meaning or relation. In other words, XML is not a language, XML applications are. Yet languages is what people need.
    It turns out tha
  - Re:Plaint Text and XML (Score:2, Informative)
    
    by the hermit ( 176716 ) writes:
    
    And those manipulations you do are just another form of parsing. There's really no difference in writing a grammar that parses data and manipulates it and using SAX or DOM to manipulate some XML data. In either case, you still have to know the semantics to do anything useful. Using SAX is a big pain in the neck to interpret/manipulate the data hierarchy. Using DOM wastes alot of memory making a tree out of your whole dataset.
    
    However, it's ten million times easier for the end user of your data to create
- - Re:Plaint Text and XML (Score:1)
    
    by Ataru ( 50540 ) writes:
    
    XML is a way of wrapping human-readable content in such a way that it can be easily processed and transformed by machines.
    
    But it's rubbish at that! XML files are inherently hierarchical, but you can't skip stuff you don't need, you have to parse the whole thing. It's so frighteningly inefficient, I have absolutely no idea why it is championed so. But then I'm a game programmer so that sort of thing frightens me easily.
    
    So, given that XML is not particularly human read/writeable, and that it isn't effic
    - Re:Plain Text and XML (Score:4, Informative)
      
      by jwdg ( 676461 ) writes: on Thursday July 10, 2003 @06:43PM (#6410613)
      
      Manipulating XML may be cheaper than you think. libxml2 [xmlsoft.org] is very fast (IME) - I've used it with PostgreSQL for doing XPath queries on database columns and it is fast enough to make an XPath search (which involves building a DOM, parsing the XPath query and then executing it, for each row) across 1200 rows sufficiently fast to be useful. (It was a fraction of a second IIRC - obviosuly dependent on the nature of our XML docs).
      Yeah, I was surprised too.
      I disagree about the human readable/writable bit. It is easily human readable/writable if it's properly structured (if it's complex because the information is complex, that's an inevitability. Make the data model simpler, if that's a problem to you). In terms of efficiency - sure, binary formats are more efficient, but they are much harder to debug when they go wrong.
      I agree that XML documents are not necessarily self-documenting. That isn't surprising. XML is about syntax, not semantics. You can use XSD to provide basic (integer vs char) semantics, but anything more complicated comes back to human understanding and agreed specification. If you understand the objects in your schema, XML can provide a good presentation of those objects.
      
      - Re:Plain Text and XML (Score:1)
        
        by jwdg ( 676461 ) writes:
        
        It was a rhetorical you.
        Yes, it is obviously slower. But that's a tradeoff that many people are willing to make. And I didn't say that libxml2 was the fastest parser - I'm not qualified to say that, as I've not benchmarked it properly.
        Bear in mind that many uses of XML are for data interchange, where speed is less important than compatibility. XML gives you more potential to add extra data into the format and still use a mix of old and new tools. Binary formats generally require that all programs using
- Re:Plain Text and XML - Use DocUtils! (Score:2)
  
  by FFFish ( 7567 ) writes:
  
  So if you're unhappy with working directly wit XML -- lord knows I am: it obscures the content far too much -- use a formal structured human readable markup system like DocUtils [sourceforge.net] or ASCIIDoc [methods.co.nz].
  
  They're both quite robust, well-suited to documenting APIs, writing technical manuals, etcetera. They can both pump out DocBook-XML from the plaintext, lightly-formatted input.
  
  The beauty of these formats is that they are simple and often intuitive.
  
  You emphasize text by wrapping it in *asterisks*, just like you used t
- good points (Score:1)
  
  by pyrrho ( 167252 ) writes:
  
  you make good points.
  
  I understand the idea that if you need an XML editor (or compiler!) to make it easy to write XML what's the point of having a human readable format. Would you suggest standard binary formats?
  
  I think, in fact, there are good binary encodings that also can encapsulate DOM shaped structures. It seems to me that the accessibility is still a good thing, I like XML as a universal interchange format that supports arbitrary nesting properties for the embedded blocks. All that could be done
- Re:Plaint Text and XML (Score:1)
  
  by jo42 ( 227475 ) writes:
  
  Ah... Someone that groks that "XML" is just a text-based file format, and YAT (Yet Another Three-Letter-Acryonm), and not some magick thingy that makes the world a better place...
XML is too hard for *rubbish* programmers (Score:5, Insightful)

by vbweenie ( 587927 ) writes: <dominic,fox1&ntlworld,com> on Thursday July 10, 2003 @05:45PM (#6410053) Homepage

It's also "too hard" in a variety of circumstances where the reason it's too hard is that it's the wrong thing to use.

Good programmers can cope with XML just fine when it's just what they need to get the job done, and are smart enough to avoid it when it isn't.

- Re:XML is too hard for *rubbish* programmers (Score:1)
  
  by JonnyRo88 ( 639703 ) writes:
  
  What then is the proper use for XML?
  
  I am working on a project where I have to link a TCL based program with a PHP program.
  
  I have been having a rediculously horrible time figuring out how to link the two using XML-RPC. I briefly tried SOAP but that wasnt much better.
  
  Is XML and XML-RPC any easier in python or C?
  - Re:XML is too hard for *rubbish* programmers (Score:1)
    
    by vbweenie ( 587927 ) writes:
    
    There are dozens of ways to link one program with another. If you can send and receive XML, you can send and receive text formatted in any other format you like - or "raw" binary data, if that's your bag. The only reason to use XML is if both sender and receiver have some nifty tools (like the MS SOAP toolkit, xmlrpclib, SOAPPy, SOAP::LITE or whatever) for sending and receiving messages that happen to use some XML vocabulary to encode things on the wire.
    
    If you have such tools, you should use them (unless
    - Re:XML is too hard for *rubbish* programmers (Score:1)
      
      by JonnyRo88 ( 639703 ) writes:
      
      Neither end starts out as XML. The troubles arose when trying to pass arrays from within TCL over the XML-RPC layer. I found that the available XML-RPC layers for TCL were very picky.
      
      I am going to give python a shot. TCL is a really cool language, but I think I need to hit this problem from a different perspective.
      
      I also recently added a book about XML-RPC from Oreilly Press to my Safari Bookshelf, so I'm going to RTFM, as the online guides on the matter have proved a bit too difficult to use (esp
Object Mapping / Marshalling techniques (Score:2, Informative)

by jamesmrankinjr ( 536093 ) writes:
My team (myself and another guy) implemented a mapping framework in Java that I think is more useful than the other frameworks I've seen.
1. Order of fields in mapping file specifies order of elements in generated XML.
2. Formatting of String, Date, etc. classes determined by formatter string in the mapping file.
3. Can use an XPath like path to specify the location in the XML, not just a key name. This lets you decouple the structure of the object and the structure of the XML.
4. Likewise, object fields are spe
XML often violates relational rules (Score:1)

by Tablizer ( 95088 ) writes:

I know some of you don't care and/or are tired of hearing this, but XML data tends to violate relational rules. I would like to see a souped-up comma-delimited standard for data sharing. XML is perhaps suited okay for documents, but NOT structured data (except in relatively rare circumstances). Dr. Codd knew what he was doing. Relational has a more ordered, consistent structure than XML.
- Re:XML often violates relational rules (Score:3, Insightful)
  
  by vidarh ( 309115 ) writes:
  
  More often than not the data I work on don't fit naturally in a relational structure. A lot of data is more naturally structured in tree structures or graph structures than in a matrix. One of the reasons I like XML is because it fits my data much better than a matrix.
  - Re:XML often violates relational rules (Score:1)
    
    by Tablizer ( 95088 ) writes:
    
    More often than not the data I work on don't fit naturally in a relational structure. A lot of data is more naturally structured in tree structures or graph structures than in a matrix. One of the reasons I like XML is because it fits my data much better than a matrix.
    
    I am a bit skeptical of this. What are some real-world examples? Trees are often the improper design IMO. Besides, trees can also be represented relational if need be.
    - Re:XML often violates relational rules (Score:2)
      
      by gregfortune ( 313889 ) writes:
      
      Besides, trees can also be represented relational if need be.
      
      And XML can represent the structured content when necessary. Because XML allows some flexibility, we may see programmers *use* it incorrectly, but it can certainly be used in a relational fashion for structured data. The "relatively rare circumstances" you mention in the parent post are likely a result of programmer quality rather than a reflection of XML.
      - Re:XML often violates relational rules (Score:1)
        
        by Tablizer ( 95088 ) writes:
        
        And XML can represent the structured content when necessary. Because XML allows some flexibility, we may see programmers *use* it incorrectly, but it can certainly be used in a relational fashion for structured data. The "relatively rare circumstances" you mention in the parent post are likely a result of programmer quality rather than a reflection of XML.
        
        That is why we need a standard relational alternative IMO. It would make it harder to violate relational rules and still be called a "relational forma
What is the best toolkit for simple XML (Score:1)

by JonnyRo88 ( 639703 ) writes:

I have an application that does not have any access to a database or any database libraries on the server in which it will be run.

It needs to store a small amount of data in a text file. Initially I thought I would use XML, but figuring out how to parse the data after it was created proved very difficult. I had some small luck with DOM inside tcldom, but it seemed like a lot more effort than it was worth.

This file is a basic tree with branches of depth 2. All branches have the exact same structure.
- Re:What is the best toolkit for simple XML (Score:2)
  
  by Col. Klink (retired) ( 11632 ) writes:
  
  There's another DOM processor for Tcl called tDOM [tdom.org]. I prefer tDOM, and the mailing list [yahoo.com] is very helpful (despite the fact that it's hosted by Yahoo! egroups), but you can see the TclDOM vs tDOM [mini.net] wiki for more info.
  But also, if you don't mind Tcl, you can just store the data in Tcl lists. Tcl's syntax is simple enough that it makes this sort of thing pretty straightforward.
  - Re:What is the best toolkit for simple XML (Score:1)
    
    by JonnyRo88 ( 639703 ) writes:
    
    Thanks for the information. I'll definately check it out.
    
    I really enjoy working in TCL because of the very clean syntax and straightforwardness of TCL.
    
    Still, it will take me a little while to get used to not having access to pointers.
    - Re:What is the best toolkit for simple XML (Score:1)
      
      by ultratimepass ( 688959 ) writes:
      
      Take a look at XPath. It's designed to eliminate the manual traversal of the DOM tree, and makes creating and parsing XML files very easy.
The value of XML (Score:1)

by sh4na ( 107124 ) writes:

Just because XML is a human-readable format and anyone can make one is I think the biggest problem of all in this ongoing xml-is-the-biggest-thing-since-sliced-bread saga.

I've been working with XML for 2 years now, and I am constantly reminded that, just as it happens with html, vb, or any other "simple to use" technology out there, anyone can use it, but few know how to use it well. I've seen xml structures that would have you rolling on the floor laughing, so inneficient and dumb were they.

It's just lik
XPath r0x0rs, can't wait for XQuery (Score:1)

by r4lv3k ( 638084 ) writes:

XPath is awesome for getting at what you really want. SAX and DOM are too low level for implementing anything other than an XPath or XSLT engine :) Even easier is putting System.Xml.Serialization attributes on your properties in C#. Blammo, instant configuration file for your classes. And I hear XQuery [w3.org] shall revolutionize the world as we know it. There are some early implementations already. r4lv3k
Attributes (Score:2, Funny)

by Skeme ( 687563 ) writes:

XML sucks because of attributes. I can have a <something stuff="thing"> and a <something> <stuff>thing</stuff> and they are treated differently. How pointless that is.

Plus any drooling idiot can come up with a way to represent a tree in a file. They did that 100 years ago with Lisp.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

An Overview of Modern XML Processing Techniques and APIs 40

An Overview of Modern XML Processing Techniques and APIs More Login

An Overview of Modern XML Processing Techniques and APIs

You know... (Score:1, Offtopic)

Re:You know... (Score:5, Funny)

Actually on xml.com (Score:5, Informative)

No Link? (Score:4, Insightful)

Re:No Link? (Score:4, Funny)

Plaint Text and XML (Score:5, Insightful)

Re:Plaint Text and XML (Score:5, Insightful)

Re:Plaint Text and XML (Score:3, Informative)

Re:Plaint Text and XML (Score:2, Informative)

Re:Plaint Text and XML (Score:1)

Re:Plain Text and XML (Score:4, Informative)

Re:Plain Text and XML (Score:1)

Re:Plain Text and XML - Use DocUtils! (Score:2)

good points (Score:1)

Re:Plaint Text and XML (Score:1)

XML is too hard for rubbish programmers (Score:5, Insightful)

Re:XML is too hard for rubbish programmers (Score:1)

Re:XML is too hard for rubbish programmers (Score:1)

Re:XML is too hard for rubbish programmers (Score:1)

Object Mapping / Marshalling techniques (Score:2, Informative)

XML often violates relational rules (Score:1)

Re:XML often violates relational rules (Score:3, Insightful)

Re:XML often violates relational rules (Score:1)

Re:XML often violates relational rules (Score:2)

Re:XML often violates relational rules (Score:1)

What is the best toolkit for simple XML (Score:1)

Re:What is the best toolkit for simple XML (Score:2)

Re:What is the best toolkit for simple XML (Score:1)

Re:What is the best toolkit for simple XML (Score:1)

The value of XML (Score:1)

XPath r0x0rs, can't wait for XQuery (Score:1)

Attributes (Score:2, Funny)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot

You know... (Score:1, Offtopic)

Re:You know... (Score:5, Funny)

Actually on xml.com (Score:5, Informative)

No Link? (Score:4, Insightful)

Re:No Link? (Score:4, Funny)

Plaint Text and XML (Score:5, Insightful)

Re:Plaint Text and XML (Score:5, Insightful)

Re:Plaint Text and XML (Score:3, Informative)

Re:Plaint Text and XML (Score:2, Informative)

Re:Plaint Text and XML (Score:1)

Re:Plain Text and XML (Score:4, Informative)

Re:Plain Text and XML (Score:1)

Re:Plain Text and XML - Use DocUtils! (Score:2)

good points (Score:1)

Re:Plaint Text and XML (Score:1)

XML is too hard for *rubbish* programmers (Score:5, Insightful)

Re:XML is too hard for *rubbish* programmers (Score:1)

Re:XML is too hard for *rubbish* programmers (Score:1)

Re:XML is too hard for *rubbish* programmers (Score:1)

Object Mapping / Marshalling techniques (Score:2, Informative)

XML often violates relational rules (Score:1)

Re:XML often violates relational rules (Score:3, Insightful)

Re:XML often violates relational rules (Score:1)

Re:XML often violates relational rules (Score:2)

Re:XML often violates relational rules (Score:1)

What is the best toolkit for simple XML (Score:1)

Re:What is the best toolkit for simple XML (Score:2)

Re:What is the best toolkit for simple XML (Score:1)

Re:What is the best toolkit for simple XML (Score:1)

The value of XML (Score:1)

XPath r0x0rs, can't wait for XQuery (Score:1)

Attributes (Score:2, Funny)

Related Links Top of the: day, week, month.

Slashdot Top Deals

XML is too hard for rubbish programmers (Score:5, Insightful)

Re:XML is too hard for rubbish programmers (Score:1)

Re:XML is too hard for rubbish programmers (Score:1)

Re:XML is too hard for rubbish programmers (Score:1)