Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet

The Future of XML 273

An anonymous reader writes "How will you use XML in years to come? The wheels of progress turn slowly, but turn they do. The outline of XML's future is becoming clear. The exact timeline is a tad uncertain, but where XML is going isn't. XML's future lies with the Web, and more specifically with Web publishing. 'Word processors, spreadsheets, games, diagramming tools, and more are all migrating into the browser. This trend will only accelerate in the coming year as local storage in Web browsers makes it increasingly possible to work offline. But XML is still firmly grounded in Web 1.0 publishing, and that's still very important.'"
This discussion has been archived. No new comments can be posted.

The Future of XML

Comments Filter:
  • by Ant P. ( 974313 ) on Thursday February 07, 2008 @06:43PM (#22342468)
    Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.
    • by DragonWriter ( 970822 ) on Thursday February 07, 2008 @06:58PM (#22342650)

      Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.


      JSON/YAML is/are better (not considering, of course, the variety and maturity of available tools; but then, perhaps, you don't always need most of what is out there in XML tools, either) for lots of things (mostly, the kinds of things TFA notes XML wasn't designed for and often isn't the best choice for),things that aren't marked-up text. Where you actually want an extensible language for text-centric markup, rather than a structured format for interchange of something that isn't marked-up text, XML seems to be a pretty good choice. Of course, for some reason, that seems to be a minority of the uses of XML.

    • by MagikSlinger ( 259969 ) on Thursday February 07, 2008 @07:32PM (#22343000) Homepage Journal

      Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.

      JSON is inflicting Javascript on everyone. There are other programming languages out there. Also, XML can painlessly create meta-documents made up of other people's XML documents.

      • by DragonWriter ( 970822 ) on Thursday February 07, 2008 @08:00PM (#22343292)

        JSON is inflicting Javascript on everyone.


        No, it really doesn't, but if "JavaScript" in the name bothers you, you might feel better with YAML.

        There are other programming languages out there.


        And there are JSON and/or YAML libraries for quite a lot of them. So what?
        • by MagikSlinger ( 259969 ) on Thursday February 07, 2008 @08:33PM (#22343584) Homepage Journal

          JSON is inflicting Javascript on everyone.

          No, it really doesn't, but if "JavaScript" in the name bothers you, you might feel better with YAML.

          No, it wouldn't because JSON is bare bones data. It's simply nested hash tables, arrays and strings. XML does much more than that. XML can represent a lot of information in a simple, easy-to-understand format. JSON strips it out for speed & efficiency. Which sort of gets into the point I did want to make but was too impatient to explain: JSON is good where JSON is best, and XML is good where XML is best. I dislike the one-uber-alles arguments because it's ignoring other situations and their needs.

          There are other programming languages out there.

          And there are JSON and/or YAML libraries for quite a lot of them. So what?

          Would you like to live in a world of S-expressions [wikipedia.org]? The LISP people would point out there are libraries to read/write S-expressions, so why use JSON? The answer of course is that we want more than simply nesting lists of strings. We want our markup languages to fit our requirements, not the other way around. And saying "JSON for Everything", which the original poster did was... silly.

          My problems with JSON are:

          • No schema: XML Schema not only makes it easier to unit test, but it can be fed into tools that can do useful things like automatic creation [sun.com] of Java classes and code to read/write. Does JSON have anything like that? Of course not, because it would defeat JSON's purpose: easy Javascript data transmission.
          • Expressability: With XML, I can create a model that fits my logical model of the data where I use attributes to augment the data in the child elements. Doing that in JSON is a kludge with a hash-table to represent an element which can't be easily converted into a graph for easy understanding.
          • Diversity: I use GML [wikipedia.org] in my day job. A lot. I can easily set up an object conversion rule with Jakarta Digester [apache.org] that I can painlessly drop into future projects without modification. That's the power of namespaces. I can build an XML document using tags from a dozen different schema, and then feed it to another application that only looks for the tags it cares about.
          • XPath [slashdot.org]. 'Nuff said. Ok, one thing: this should have replaced SAX/DOM years ago.

          JSON is great for AJAX where XML is clunky and a little bit slower (my own speed tests hasn't shown there's a huge hit, but it is significant). XML is great for document-type data like formatted documents or electronic data interchange between heavy-weight processes. My point was that the original poster's JSON is everything was narrow-minded, and that XML answers a very specific need. There are tonnes of mark-up languages out there, and I think XML is a great machine-based language. I hate it when humans have to write XML to configure something though. That really ticks me off. But that's the point: there should not be one mark-up language to rule them all. A mark-up language for every purpose.

          • by aoteoroa ( 596031 ) on Thursday February 07, 2008 @09:20PM (#22343928)
            Hear! Hear!

            One file (format) will not rule them all.

            XML is good if you want to design a communication protocol between your software, and some other unknown program.

            JSON is much lighter. Far less kilobits needed to transfer the same information so when performance is important and you control everything then use JSON.

            When it comes to humans editing config files I find traditional ini files, or .properties easier to read and perfectly suitable in most cases.

            Writing more complex, relational data to disk? Sqlite often solves the problem quickly.
          • by frank_adrian314159 ( 469671 ) on Thursday February 07, 2008 @09:43PM (#22344162) Homepage
            Would you like to live in a world of S-expressions?

            If you're giving me a choice... why yes, please! Where can I get one of these worlds you're talking about?

          • by jhol13 ( 1087781 ) on Thursday February 07, 2008 @10:55PM (#22344718)
            You forgot XSLT.

            It is extremely powerful tool, I once (ages ago) made a pure XSLT implementation to convert XML into C. Whith a CSS the XSLT was even browser/human viewable (the output was somewhat similar to the C program output).

            I do not think JSON can do that.
            • Re: (Score:3, Interesting)

              by Simon Brooke ( 45012 )

              You forgot XSLT.

              It is extremely powerful tool, I once (ages ago) made a pure XSLT implementation to convert XML into C. Whith a CSS the XSLT was even browser/human viewable (the output was somewhat similar to the C program output).

              I do not think JSON can do that.

              XSLT is a nice backwards chaining theorem prover, very similar to Prolog. I like it and use it a lot - currently for me it venerates SQL, Hibernate mappings, C# code and Velocity macros from a single source XML document. But there's nothing magic about it, and if we didn't have XSLT it would be very easy to do the same sort of thing in LISP or Prolog, or (slightly more awkwardly) in conventional programming languages.

          • Given that XPath is a query language for the DOM, I'm not really sure how it would replace it. Given that SAX is a way that a DOM can be built, I'm not sure it would replace that, either. I suppose you're just talking about all the tedious code to traverse a DOM to find the elements and attributes you're looking for, or the stacks and other data structures necessary to figure out where you are in a document when using SAX? Yeah, XPath is way better than that stuff, but it's never going to replace them. They
          • by CoughDropAddict ( 40792 ) * on Friday February 08, 2008 @03:20AM (#22346032) Homepage
            I'm so depressed. You represent an entire generation of programmers who can't figure out the difference between marked-up text and data, and why mark-up languages suck so bad for data interchange.

            Pop quiz. Here's an excerpt of GML from that page you linked to.

            <gml:coordinates>100,200</gml:coordinates>
            Do the contents of this node represent:
            1. the text string "100,200"
            2. the number 100200 (with a customary comma for nice formatting)
            3. the number 100.2 (hey, that's the way that the crazy Europeans do it)
            4. a tuple of two numbers: 100 and 200
            "Obviously it's two numbers, they're coordinates" you may say. But such things are not "obvious" to an XML parser. If you're an XML parser the answer is (1): it's a simple text string. So to get to the real data you have to parse that text string again to split on a comma, and to turn the two resulting text strings into numbers. Note this is a completely separate parser and is completely outside the XML data model, so all your fancy schema validation, xpath, etc. are useless to access data at this level.

            Why all this pain? Because XML simply has no way to say "this is a list of things" or "this is a number."

            Sure, you can approximate such things. You could write something like:

            <gml:coordinates>
                <gml:coordinateX>100</gml:coordinateX>
                <gml:coordinateY>200</gml:coordinateY>
            </gml:coordinates>
            But the fact remains that even though you may intuitively understand this to be two coordinates when you look at it (and at least you can select the coordinates individually with xpath in this example, but they're still strings, not numbers) to XML this is still nothing but a tree of nodes.

            Did you catch that? A tree of nodes. You're taking a concept which is logically a pair of integers, and encoding it in a format that's representing it in a tree of nodes. Specifically, that tree looks something like this:

            elementNode name=gml:coordinates
            \-> textNode, text="\n " *
            \-> elementNode name=gml:coordinateX
                \-> textNode text="100"
            \-> textNode, text="\n " *
            \-> elementNode name=gml:coordinateY
                \-> textNode, text="200"
            \-> textNode, text="\n" *


            (*: yep, it keeps all that whitespace that you only intended for formatting. XML is a text markup language, so all text anywhere in the document is significant to the parser.)

            So let's recap. Using XML, we've taken a structure which is logically just a pair of integers and encoded it as a tree of 7 nodes, three of which are meaningless whitespace that was only put there for formatting, and even after all this XML has no clue that what we're dealing with is a pair of integers.

            Now let's try this example in JSON:

            {"coordinates": [100, 200]}
            JSON knows two things that your fancy shmancy XML parser will never know: that 100 and 200 are numbers, and that they are two elements of an array (which might be more appropriately thought of as a "tuple" in this context). It's smart enough to know that the whitespace is not significant, it doesn't build this complex and meaningless node tree; it just lets you express, directly and succinctly, the data you are trying to encode.

            That's because JSON is a data format, and XML is a marked up text format. But we're suffering from the fact that no one realized this ten years ago, and compensated for the parity mismatch by layering mountains of horribly complex software on top of XML instead of just using something that is actually good at data interchange.
            • by Anonymous Coward on Friday February 08, 2008 @05:48AM (#22346692)
              You are arguing the merits of isolated XML, while in fact it is a collective technology. Yes, the XML itself is not strongly typed, that is why you have SCHEMA (formerly DTD). Using XML-SCHEMA (coincidently also written in XML) you DO get a strongly typed document where you can say that a tag can only contain one letter followed by 12 digits or whatever. Then you can use XSL to transform the document, knowing with certainty every single bit of the format.

              The only difference here is that XML separates these 3 (markup, validation, transformation) operations, since you might find situations where you don't need all of them.
              • Re: (Score:3, Insightful)

                What you are describing is the "mountains of horribly complex software on top of XML" that I was referring to. Sure, you can always add one more layer of standards and software to address the deficiencies of what you started with. You could add a validation layer on top of absolutely anything; that doesn't mean that what you started with is any good.

                Also, this isn't just a matter of validation. It's a matter of actually being able to access the structure of the data you're trying to encode. OK, so let's
          • My problems with JSON are:
            • No schema: XML Schema not only makes it easier to unit test, but it can be fed into tools that can do useful things like automatic creation [sun.com] of Java classes and code to read/write. Does JSON have anything like that? Of course not, because it would defeat JSON's purpose: easy Javascript data transmission.
            While I hate XML Schema, it's still better for many uses than just throwing random crap on the wire and hoping that the other end can make sense of it. And no, throwing some javascript on the wire as well doesn't help that much. I want the computer to do stuff with the data without having to ship a program specifically for the purpose; after all, I can't think of all the purposes for the data right now and I want to let others come up with new cool stuff too.

            • XPath [slashdot.org]. 'Nuff said. Ok, one thing: this should have replaced SAX/DOM years ago.
            I use a DOM library [tdom.org] that includes XPath support so that I can simply do a search starting at any node. It makes working with DOM much more pleasant.

            I hate it when humans have to write XML to configure something though. That really ticks me off.
            Simpler stuff works quite well when you've got configuration of software from a single vendor, but when you've got to combine stuff from lots of sources, XML stops being quite such a bad choice. (If only people who edit config files would actually demonstrate an ability to write well-formed XML though. Idiots...)
            • Re: (Score:3, Insightful)

              While I hate XML Schema, it's still better for many uses than just throwing random crap on the wire and hoping that the other end can make sense of it.


              XML Schema may let the other end validate it, but it doesn't let the other end make sense of it. The other end can only make sense of it if they've got code written to handle the kind of data it contains: which is true, really, of any data format.
          • Re: (Score:3, Interesting)

            JSON is inflicting Javascript on everyone.

            No, it really doesn't, but if "JavaScript" in the name bothers you, you might feel better with YAML.

            No, it wouldn't because JSON is bare bones data. It's simply nested hash tables, arrays and strings.

            "You might feel better..." -> "No, it wouldn't..."? WTF is that supposed to mean? How is taht even a response to what precedes it?

            XML does much more than that.

            "JSON is..." -> "XML does much more than that." Again, this is incoherent. XML is simply tree-structure

      • Re: (Score:3, Insightful)

        by filbranden ( 1168407 )

        JSON is inflicting Javascript on everyone. There are other programming languages out there.

        On the browser? If you want to use AJAX-like technology, JavaScript is still the only viable and portable option as the programming language for the client side.

    • Re: (Score:2, Insightful)

      by slawo ( 1210850 )
      JSON is fit mostly for communication and transfer of simple data between JS and server side scripts through object serialization. But it remains limited. You can compare JSON to XML only if your knowledge of XML stops at the X in AJAX
      Beyond that scope comparing these two unrelated "things" is irrelevant.

      The tools and libraries available for XML go well beyond JSON's scope. DOM [w3.org], RSS & ATOM [intertwingly.net], OASIS, Xpath, XSLT, eXist DB [sourceforge.net] are just few examples of tools and libraries surrounding XML.
      XML is designed to le
    • Re: (Score:3, Interesting)

      Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.

      While I understand your pain, XML is still a very nice *markup* language, for marking up documents and simple content trees.

      Can you imagine HTML / XHTML implemented as JSON? I doubt that.

      The fault with people here lies in XML abuse, namely SOAP-like XML API-s and using XML for everything, where binary formats, or more compact and simpler formats, like JSON, do better.
  • by Anonymous Coward on Thursday February 07, 2008 @06:44PM (#22342474)
    XML is like violence. If it doesn't solve your problem, you're not using enough of it.
  • by ilovegeorgebush ( 923173 ) * on Thursday February 07, 2008 @06:44PM (#22342476) Homepage
    I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular. For the majority of applications that use it, it's overboard. Yes, it's easier on the eye, but ultimately how often do you have to play with the XML your CAD software uses?

    I'm a programmer, just like the rest of you here, so I'm quite used to having to write a parser here or there, or fixing an issue or two in an ant script. The thing that puzzles me, is why it's used so much on the web. XML is bulky, and when designed badly it can be far too complex; this all adds to bandwidth and processing on the client (think AJAX), so I'm not seeing why anyone would want to use it. Formats like JSON are just as usable, and not to mention more lightweight. Where's the gain?
    • by daeg ( 828071 )
      I can see using it for some program data formats, but for one reason only: upgrading old file formats to a new format via XSL. In practice, I'm unaware of many software packages that do this, though.
    • by SpaceHamster ( 253491 ) on Thursday February 07, 2008 @07:29PM (#22342970) Homepage
      My best stab at the popularity:

      1. Looks a lot like HTML. "Oh, it has angle brackets, I know this!"
      2. Inertia.
      3. Has features that make it a good choice for business: schemas and validation, transforms, namespaces, a type system.
      4. Inertia.

      There just isn't that much need to switch. Modern parsers/hardware make the slowness argument moot, and everyone knows how to work with it.

      As an interchange format with javascript (and other dynamically typed languages) it is sub-optimal for a number of reasons, and so an alternative, JSON has developed which fills that particular niche. But when I sit down to right yet another line of business app, my default format is going to be XML, and will be for the foreseeable future.
    • by El Cubano ( 631386 ) on Thursday February 07, 2008 @07:32PM (#22343010)

      For the majority of applications that use it, it's overboard.

      You mean like this? [thedailywtf.com]

    • by GodfatherofSoul ( 174979 ) on Thursday February 07, 2008 @07:33PM (#22343020)

      XML gives you a parsable standard on two levels; generic XML syntax and specific to your protocol via schemas. It's verbose enough to allow by-hand manual editing while the syntax will catch any errors save semantic errors you'll likely have. It's also a little more versatile as far as the syntax goes. Yes, there are less verbose parsing syntaxes out there, but you always seem to lose something when it comes to manual viewing or editing.

      Plus, as far as writing parsers, why burn the time when there are so many tools for XML out there? It's a design choice I suppose like every other one; i.e. what are you losing/gaining by DIYing? Personally, I love XML and regret that it hasn't taken off more. Especially in the area of network protocols. People have been trying to shove everything into an HTML pipe, when XML over the much underrated BEEP is a far more versatile. There are costs, though as you've already mentioned.

      • Re: (Score:3, Informative)

        by Viol8 ( 599362 )
        "Especially in the area of network protocols."

        Oh please. Its bad enough having this bloated standard in data files , but please don't start quadrupaling the amount of bits that need to be sent down a pipe to send the same amount of data just so it can be XML. XML is an extremely poor format to use for any kind of streamed data because you have to read a large chunk of it to find suitable boundaries to process. Not good for efficiency or code simplicity. And if you say "so what" to that then you've obviously
    • by machineghost ( 622031 ) on Thursday February 07, 2008 @07:36PM (#22343046)
      The "bulkiness" of XML is also it's strength: XML can be used to markup almost any data imaginable. Now it's true that for most simple two-party exchanges, a simpler format (like comma separated values or YAML or something) would require less characters, and would thus save disk space, transmit faster, etc.

      However, the modern programming age is all about sacrificing performance for convenience (this is why virtually no one is using C or C++ to make web apps, and almost everyone is using a significantly poorer performing language like Python or Ruby). We've got powerful computers with tons of RAM and hard drive space, and high-speed internet connections that can transmit vast amounts of data in mere seconds; why waste (valuable programmer) time and energy over-optimizing everything?

      Instead, developers choose the option that will make their lives easier. XML is widely known, easily understood, and is human readable. I can send an XML document, without any schema or documentation, to another developer and they'll be able to "grok it". There's also a ton of tools out there for working with XML; if someone sends me a random XML document, I can see it syntax colored in Eclipse or my browser. If someone sends me an XML schema, I can use JAXB to generate Java classes to interact with it. If I need to reformat/convert ANY XML document, I can just whip up an XSLT for it and I'm done.

      So yes, other formats offer some benefits. But XML's universality (which does require a bit of bulkiness) makes it a great choice for most types of data one would like to markup and/or transmit.

      P.S. JSON is just as usable? Try writing a schema to validate it ... ok I admit, that wasn't so hard, just some Javascript right? But now you have to write a new batch of code to validate the next type of JSON you use. And another for the next, and so on. With XML, you have a choice of not one but four different schema formats; once you learn to use one of them, you can describe a validation schema far more quickly than you ever could in Javascript.

      Same deal with transformations: if you want to alter your JSON data in a consistent way, you have to again write custom code every time. Sure XSLT has a learning curve, but once you master it you can accomplish in a few lines of code what any other language would need tens or even hundreds of lines to do.
    • by Anonymous Coward on Thursday February 07, 2008 @07:43PM (#22343130)
      I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular.

      Because it's a standard that everyone (even reluctantly) can agree on.

      Because there are well-debugged libraries for reading, writing and manipulating it.

      Because (as a last resort) text is easy to manipulate with scripting languages like perl and python.

      Because if verbosity is a problem, text compresses very well.
    • by Xtifr ( 1323 ) on Thursday February 07, 2008 @07:55PM (#22343254) Homepage
      Like a lot of things, XML is popular because it's popular. Parsing is done with libraries, so programmers don't have to see or care how much overhead is involved, and it's well-known and well-understood, so it's easy to find people who are familiar with it. Every programmer and his dog knows the basics. It's easy to cobble up some in a text editor for testing purposes. You can hand it off to some guy in a completely separate division without worrying that he's going to find it particularly confusing. And you can work with it in pretty much any modern programming language without having to worry about the messy details. It's the path of least resistance. It may not be good, but it's frequently good enough, and that's usually the bottom line.

      I mean, yeah, when I was a kid, we all worked in hand-optimized C and assembler, and tried to pack useful information into each bit of storage, but systems were a lot smaller and a lot more expensive back then. These days, I write perl or python scripts that spit out forty bytes of XML to encode a single boolean flag, and it doesn't even faze me. Welcome to the 21st century. :)
      • Re: (Score:2, Insightful)

        by solafide ( 845228 )

        so programmers don't have to see or care how much overhead is involved

        Which is how we got to the point where, Dr. Dewar and Dr. Schonberg [af.mil]:

        ...students who know how to put a simple program together, but do not know how to program. A further pitfall of the early use of libraries and frameworks is that it is impossible for the student to develop a sense of the run-time cost of what is written because it is extremely hard to know what any method call will eventually execute.

        And you're saying overhead doesn't matter?

        • by Xtifr ( 1323 ) on Thursday February 07, 2008 @09:09PM (#22343852) Homepage
          From an academic viewpoint, it probably matters. From a point of view of trying to get the job done...not so much. I studied the performance and efficiency of a wide variety of sort algorithms when I was in school, but nowadays, I generally just call some library to do my sorting for me. It may not be quite as efficient for the machine to use some random, generic sort, but for me, it's the difference between a few seconds to type "sort" vs. a few hours to code and debug a sort routine that is probably, at best, only a few percent faster.

          XML is, in many cases (including mine), the path of least resistance. It's not particularly fast or efficient, but it's simple and quick and I don't have to spend hours documenting my formats for the dozens of other people in the company who have to use my data. Many of whom are probably not programmers by Dewar and Schonberg's definition, but who still do valuable work for the company.
          • Two things:

            How do you know if what you've done actually gets the job done? Any monkey can type away randomly and get something done, but it's usually not the job that actually needs doing. For that, you need the skills academic work teaches.

            You missed the point of studying sorting algorithms. They are taught not so that you can reimplement a quicksort later in life, they are taught because they are a great no-frills case study of the basic concepts you need to get a job done while knowing that you got t

    • by batkiwi ( 137781 ) on Thursday February 07, 2008 @07:59PM (#22343272)
      XML IS:
      -Easily validated
      -Easily parsed
      -Easily compressed (in transit or stored)
      -Human readable in case of emergency
      -Easily extendable

      • Re: (Score:2, Insightful)

        -Easily compressed (in transit or stored)

        Which just means that it has lots of redundancy. Or, as one might call it, bloat.
        • by Otto ( 17870 ) on Thursday February 07, 2008 @09:41PM (#22344142) Homepage Journal

          -Easily compressed (in transit or stored)

          Which just means that it has lots of redundancy. Or, as one might call it, bloat.
          Test question: Which is quicker?
          1. Spending a few hours coding your formats in some binary format making maximum use of all the bits.
          2. Spending a few minutes writing code to send your internal data structure to a library that will serialize it into XML and then running the XML through a generic compression routine (if space/speed actually makes any difference to your particular application).

          Consider the question in both the short and the long term. Also consider that you're paying that programmer a few hundred an hour.

          Discuss.

          • by cyborch ( 524661 )

            Which just means that it has lots of redundancy. Or, as one might call it, bloat.

            Test question: Which is quicker?
            1. Spending a few hours coding your formats in some binary format making maximum use of all the bits.
            2. Spending a few minutes writing code to send your internal data structure to a library that will serialize it into XML and then running the XML through a generic compression routine (if space/speed actually makes any difference to your particular application).

            Consider the question in both the short and the long term. Also consider that you're paying that programmer a few hundred an hour.

            Discuss.

            or...

            3. Spending a few minutes writing code to send your internal data structure to a library that will serialize it into YAML and then NOT running the YAML through a generic compression routine (since YAML has far less bloat and therefore far less need for compression).

            I think I'll go for option 3.

          • Re: (Score:3, Interesting)

            by SimonInOz ( 579741 )
            >> Test question: Which is quicker?
            >> 1. Spending a few hours coding your formats in some binary format making maximum use of all the bits.
            >> 2. Spending a few minutes writing code to send your internal data structure to a library that will serialize it into XML and then running the XML through a generic compression routine (if space/speed actually makes any difference to your particular application).

            A while back (before XML parsers were common) I built a kinda cool system whereby a mainfr
        • Re: (Score:3, Insightful)

          by batkiwi ( 137781 )
          Why is it bloat? How does it affect anything?

          -it doesn't affect transit time when compressed
          -it minimally takes more cpu to gunzip a stream, but the same could be said of translating ANY binary format (unless you're sending direct memory dumps, which is dangerous)
          -it's never really in memory as the entire point is to serialize/deserialize

        • by MrNaz ( 730548 )
          You've obviously never worked with accounts. Or networks. Or anything requiring reliability, really. Redundancy != bloat, although sometimes it may be.
    • Re: (Score:3, Insightful)

      by Anonymous Coward
      For one, it has off-the-shelf parsers and validation tools. Parsing XML is, at it's hardest, a matter of writing a SAX parser. XML binding APIs make things even easier. The standardized validation tools also make it great for ensuring that people in charge of generating the data are using the form expected by those receiving the data.

      Our biggest usage is in our customer data feeds. These feeds are often 1GB+ when compressed. Since switching to an XML format from a tab-delimited format, we've been able to gi
    • Re: (Score:3, Insightful)

      by kwerle ( 39371 )
      I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular. For the majority of applications that use it, it's overboard. Yes, it's easier on the eye, but ultimately how often do you have to play with the XML your CAD software uses?

      Let's say you need to store data, and a database is not an option. What format shall you store it in?
      1. Proprietary binary
      2. Proprietary text
      3. JSON
      4. XML

      1 & 2 are untried, untested, and it is not possible to fi

      • Either sqlite or custom binary, depending on what the data is and why a database is "not an option".
    • by TummyX ( 84871 )
      Duh.

      Because XML is a standard. Almost all languages have a standards compliant XML parser that you can easily use. Why invent a new format and a parser, when you can use an existing standard that has most of the issues already sorted out? You don't have to spend time working out if a bug is caused by your parser or something else. XML handles things like character escaping, unicode, etc gracefully whereas a format you design may not unless you spend a lot of time on it.


      Formats like JSON are just as usab
    • this all adds to bandwidth and processing on the client (think AJAX),

      Why? You can perform XSL transformations on the server and return plain HTML.
    • by TheMCP ( 121589 )
      XML is so popular because business people don't understand it and think it can magically do a lot of things it can't, so they choose software that uses XML when it really doesn't matter.

      I have a lot of experience consulting with various organizations - some Fortune 500, some nonprofit, some educational - about their software selection process. I've watched many times as a vendor gives a presentation to my employer or client talking about how wonderous it is that their software saves all its data in XML so y
    • Re: (Score:3, Insightful)

      by l0b0 ( 803611 )

      Maybe another comparison would help: QWERTY vs. Dvorak. The one "everyone" knows and uses - and, incidentally, design keyboard shortcuts according to; I'm looking at you, Vim - was designed to avoid jams in mechanical keyboards [wikipedia.org] way back in the ass-end of time, while the other was designed to be efficient on electronic hardware.

      A "Dvorak solution" for XML would have to solve some fundamental problem while keeping all the good attributes (no pun intended) of the original. IMO, that would mean more readable c

  • It seems to me to be a slight improvement on ini files, csv and the like. But parsing it is hideously inefficient compared to a binary format. It's bloated too, so it takes more time to send it over the net or save it to disk. I've seen some XML schema that are aggressively hard to read too. And yet it's become something that every new technology, protocol or applications needs to namecheck.
    • Re: (Score:2, Informative)

      XML is not necessarily for human eyes. With the strict rules on non-overlapping closing of tags, its parsing and expansion is very easily stored and visualized as a tree. So parsing in general is actually quite easy. Also when you consider people like this http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=247&page=4 [acmqueue.org] (ACM! So take it seriously!) who want to convert all Turing complete programming languages into XML abstractions, and call it the future, well... I'm honestly not sure
      • by cp.tar ( 871488 ) <cp.tar.bz2@gmail.com> on Thursday February 07, 2008 @07:26PM (#22342954) Journal

        Then again, maybe it's a response to, "Hey! _Anything_ is better than LISP!"

        Funny, that. I've heard LISPers say "XML looks quite like LISP, only uglier."

      • Re: (Score:3, Informative)

        by Hal_Porter ( 817932 )
        its parsing and expansion is very easily stored and visualized as a tree

        Why not store it as a tree in a format computers can parse efficiently? Invent binary format with parent and child offsets and binary tags for the names and values. It's smaller in memory and faster. Better basically. You don't need to parse them if machines are going to read them. And decent human programmers can read them with a debugger or from a hexdump in a file, or write a tool to dump them as a human friendly ASCII during develop
    • Re: (Score:2, Insightful)

      by InlawBiker ( 1124825 )
      Because everyone said "XML is the future." And because it has an "X" it was perceived as shiny and cool. So therefore all managers and inexperienced developers jumped all over it. Now I have to encapsulate a few bytes into a huge XML message and reverse it on incoming messages, when I could have just said "name=value" and been done with it. I can see a use for XML in some applications, but it's been dreadfully overused.
  • by milsoRgen ( 1016505 ) on Thursday February 07, 2008 @06:50PM (#22342572) Homepage
    FTA:

    Netscape's dream of replacing the operating system with a browser is also coming true this year.

    They've been saying that for years, and frankly it won't happen. A vast amount of users relish the control that having software stored and run locally provides. Of course there will always be exceptions as web based e-mail has shown us.

    As far as the future of XML... I can't seem to find anything in this article that states anything more than the obvious, it's on the same path it's been on for quite some time.

    FTA:

    Success or failure, XML was intended for publishing: books, manuals, and--most important--Web pages.

    Is that news to anyone? My understanding of XML is that it's intended use is to provide information, about the information.
  • The thing with XML (Score:3, Interesting)

    by KevMar ( 471257 ) on Thursday February 07, 2008 @06:53PM (#22342606) Homepage Journal
    The think with XML is that it so easily supports whatever design the developer can think of. Even the realy bad ones. Now that it is such a buzz word, the problem gets worse.

    I had someone call me up to design them a simple web app. But he wanted it coded in XML because he thought that was the technology he wanted. His Access database was not web frendly enough.

    I did correct him a little to put him in check and atleast gave him the right buzz words to use to the next guy.

    I think XML is dead simple to use if used correctly. I do like it much better that ini files. That is about all I use it for now. Easy to use config files that others have to use.
  • by Bryan Ischo ( 893 ) on Thursday February 07, 2008 @07:17PM (#22342850) Homepage
    I have had far too many 'this stuff sucks' moments with XML to ever consider using it in any capacity where it is not forced upon me (which unfortunately, it is, with great frequency).

    I first heard about XML years ago when it was new, and just the concept sucked to me. A markup language based on the ugly and unwieldy syntax of SGML (from which HTML derives)? We don't need more SGML-alikes, we need fewer, was my thought. This stuff sucks.

    Then a while later I actually had to use XML. I read up on its design and features and thought, OK well at least the cool bit is that it has DTDs to describe the specifics of a domain of XML. But then I found out that DTDs are incomplete to the extreme, unable to properly specify large parts of what one should be able to specify with it. And on top of that, DTDs don't even use XML syntax - what the hell? This stuff sucks.

    I then found that there were several competing specifications for XML-based replacements for the DTD syntax, and none were well-accepted enough to be considered the standard. So I realized that there was going to be no way to avoid fragmentation and incompatibility in XML schemas. This stuff sucks.

    I spent some time reading through supposedly 'human readable' XML documents, and writing some. Both reading and writing XML is incredibly nonsuccinct, error-prone, and time consuming. This stuff sucks.

    Finally I had to write some code to read in XML documents and operate on them. I searched around for freely available software libraries that would take care of parsing the XML documents for me. I had to read up on the 'SAX' and 'DOM' models of XML parsing. Both are ridiculously primitive and difficult to work with. This stuff sucks.

    Of course I found the most widely distributed, and probably widely used, free XML parser (using the SAX style), expat. It is not re-entrant, because XML syntax is so ridiculously and overly complex that people don't even bother to write re-entrant parsers for it. So you have to dedicate a thread to keeping the stack state for the parser, or read the whole document in one big buffer and pass it to the parser. XML is so unwieldy and stupid that even the best freely available implementations of parsers are lame. This stuff sucks.

    Then I got bitten by numerous bugs that occurred because XML has such weak syntax; you can't easily limit the size of elements in a document, for example, either in the DTD (or XML schema replacement) or expat. You just gotta accept that the parser could blow up your program if someone feeds it bad data, because the parser writers couldn't be bothered to put any kind of controls in on this, probably because they were 'thinking XML style', which basically means, not thinking much at all. This stuff sucks.

    Finally, my application had poor performance because XML is so slow and bloated to read in as a wire protocol. This stuff sucks.

    XML sucks in so many different ways, it's amazing. In fact I cannot think of a single thing that XML does well, or a single aspect of it that couldn't have been better planned from the beginning. I blame the creators of XML, who obviously didn't really have much of a clue.

    In summary - XML sucks, and I refuse to use it, and actively fight against it every opportunity I get.
    • Re: (Score:3, Insightful)

      by Belial6 ( 794905 )
      I'm not saying that XML is the end all be all, but if your application blows up because someone fed it bad data in XML, your program is broken, and no data format is going to fix it. As the developer, it is your responsibility to vet the data before trying to use it.
      • by tjstork ( 137384 )
        I'm not saying that XML is the end all be all, but if your application blows up because someone fed it bad data in XML, your program is broken, and no data format is going to fix it. As the developer, it is your responsibility to vet the data before trying to use it.

        As long as you guys want to fit the bill for supporting that shoddy format, go right ahead!

        interoperability is overrated.
      • I agree with you 100%. And so the only way to parse XML in a nonbroken way is to write your own XML parser, adding "nonstandard" constraints that prevent your program from blowing up. So you don't get to re-use existing parsing technologies like expat, you have to write everything yourself. This is a direct consequence of the suckfulness of XML, which is so lame that nobody even bothers to write good free parsers for it.

        An example of nonstandard constraints you have to put on your parser - DTD doesn't al
        • Whoops, I forgot to properly escape my example text.  The first example was supposed to be:

          <foo>{100 megabytes of the letter 'a'}</foo>

          And the second was supposed to be:

          <{100 MB of 'a'}>hello</foo>
    • by QRDeNameland ( 873957 ) on Thursday February 07, 2008 @07:32PM (#22343016)

      Too bad I used up all my mod points earlier...this post deserves a +1 Insightful.

      I was just a neophyte developer when XML first surfaced in buzzword bingo, but it was the beginning of my realization of how to recognize a "Kool-aid" technology: if the people who espouse a technology can not give you a simple explanation of what it is and why it's good, they are probably "drinking the "Kool-aid".

      Unfortunately, I also have since discovered the unsettling corollary: you will have it forced down your throat anyway.

    • by cpeterso ( 19082 )
      But what is the alternative?
    • by pla ( 258480 )
      I searched around for freely available software libraries that would take care of parsing the XML documents for me.

      Not "free", but believe it or not, .NET actually has pretty decent XML support... Except as you point out:



      Then I got bitten by numerous bugs that occurred because XML has such weak syntax

      Based on the exhibited behavior, I suspect virtually all programs that parse XML use SelectSingleNode() (or comparable). And there we have a problem, in that XML itself doesn't require node uniquenes
    • Re: (Score:3, Interesting)

      by jma05 ( 897351 )
      I just am adding finishing touches for a several year long project where I was bitten by XML (My problems were with schema namespace support in libraries at the time). I had to resort to non-standard hacks.

      While I share your disdain (and I agree with everyone of your points), the question is this - What other *standard* way do we have to describe a format that has *moderate to high* level of complexity. JSON is great when I don't need to apply any constraints on the data. I would gladly choose it (along wit
      • Actually I have been working on this problem off and on for a couple of years. I wrote a description of a binary format which could encode any hierarchical data structure, and had all of the features of XML (that I know of) while being fast and safe for computers to parse and emit. I also had a re-entrant parser that could read a document byte by byte if necessary and properly managed its state (allowing the programmer to drive the parser using for example a select loop with sockets). It worked really we
    • Re: (Score:2, Insightful)

      by Anonymous Coward
      Wow...that's a lot of FUD to fit into one single post.

      To pick just a few of your actual points...

      So you have to dedicate a thread to keeping the stack state for the parser, or read the whole document in one big buffer and pass it to the parser.

      Why on earth would you use a separate thread. SAX callbacks allow you ample opportunity to maintain whatever state you need and DOM parsers cache the entire thing into a hierarchy that you can navigate to avoid having to maintain any state of your own. Granted, the us

      • It's been a couple of years (like, 2 or 3) since I wrote the code to which I am referring, or had the experiences to which I am alluding. So my memory of details is fuzzy, and I may have missed the mark on some of it because I may be misremembering. However, I do very clearly recall that, when I was in the thick of my XML efforts, and had a clear idea of what the problems were, that I had many 'this stuff sucks' moments like the ones I described. Maybe the details are a little off, but the point remains,
      • Re: (Score:3, Interesting)

        Finally, my application had poor performance because XML is so slow and bloated to read in as a wire protocol.

        When was the last time you tried it, 1995? Nowadays, compression algorithms require so little processing power that XML adds only a minimal amount of overhead when transfered over the wire.

        Actually, you are demonstrating some cluelessness here. Size bloat is only a small part of why XML massively sucks as a wire protocol compared to functionally equivalent universal representations such as ASN.

    • by ad0gg ( 594412 )
      I spent some time reading through supposedly 'human readable' XML documents, and writing some. Both reading and writing XML is incredibly nonsuccinct, error-prone, and time consuming. This stuff sucks.

      There's so many more readable formats like json. Or just using byte offsets. Hell we could being using pipe delimited data.

    • XML sucks in so many different ways, it's amazing. In fact I cannot think of a single thing that XML does well, or a single aspect of it that couldn't have been better planned from the beginning. I blame the creators of XML, who obviously didn't really have much of a clue.

      XML was never intended as a data storage format. It was intended as a document markup format. The fact that people started immediately using it for arbitrary data came as a surprise to the people who created it.

  • by kbob88 ( 951258 ) on Thursday February 07, 2008 @07:26PM (#22342962)
    The future of XML?

    Probably a long, healthy life in a big house on the top of buzzword hill, funded by many glowing articles in magazines like InformationWeek and CIO, and 'research papers' by Gartner. Sitting on the porch yelling, "Get off my lawn!" to upstarts like SOA, AJAX, and VOIP. Hanging out watching tube with cousin HTML and poor cousin SGML. Trying to keep JSON and YAML from breaking in and swiping his stuff. Then fading into that same retirement community that housed such oldsters as EDI, VMS, SNA, CICS, RISC, etc.
  • by MasterC ( 70492 ) <cmlburnett@gm[ ].com ['ail' in gap]> on Thursday February 07, 2008 @07:29PM (#22342980) Homepage
    XML is easy to understand because of the prevalence of HTML knowledge. XML is easy because it's text. XML is easy because, like perl, you can store the same thing in 15 ways. XML is easy because there is only one data type: text. XML is flexible because you can nest to your heart's content.

    All these things are why people use it.

    All these things are why people abuse it.

    All these things are why we won't be able to get rid of it soon.

    TFA has nothing to say about the future of XML but the tools to use XML. XQuery and XML databases. Whoopity do. The threshold for getting posted on /. steps down yet another notch. IMHO: if you loathe/hate XML then you should think about a change in career because it's not going away any time soon...
  • YAML (Score:3, Informative)

    by CopaceticOpus ( 965603 ) on Thursday February 07, 2008 @07:33PM (#22343028)
    JSON is lightweight, and yet it remains human readable and editable. XML lets you forget some of the security concerns of JSON, and has the advantage of not being tied to a specific programming language.

    If only there was a standardized format that combined these advantages, without all that XML bloat. There is! Try YAML [yaml.org].

    XML's big win is supposed to be its semantics: it tells you not only what data you have, but what sort of data it is. This allows you to create all sorts of dreamy scenarios about computers being able to understand each other and act super intelligently. In reality, it leads to massively bloated XML specifications and protracted fights over what's the best way to describe one's data, but not to any of the magic.

    As my all time favorite Slashdot sig said: "XML is like violence: if it doesn't solve your problem, you aren't using enough of it."
    • JSON is almost exactly equivalent to LISP S-expressions. Unfortunately, JSON has major security problems due to a Javascript design error. In LISP, there's the "reader", which takes in a string, parses it, and generates a tree structure (or graph; there's a way to express cross-links), and just returns the data structure. Then lISP has "eval", which takes a data structure created by the reader and runs it.

      Javascript combines both functions into one, called "eval", which takes a string, parses it, and

  • Doesn't that mean I can use it until um... er... text runs out?

    It's not rocket science - MS were using it in MediaPlayer long before EkksEmmEll came along... it was called "sticking your crap in angle brackets and parsing it" - HTML is a subset of SGML and I'm pretty sure that it (in its XHTML form) will be around for a while yet.

    How does that die out? Just because you give it a name and rant about standards in some poxy white paper/media blag doesn't mean it's going to die and go away...
  • ok true story.

    We once had to port live data from Texas to Oregon from giant tables repeatedly, not too well built. So we looked to send XML, enforcing a DTD/schema on the sender teams. We ended up writing the encoders because we used an early and crude compression scheme:

    We took the source table and counted the number of duplicate sets per column, then returned sorted data in order of highest duplicates to lowest.
    Then, we encoded in XML using a column, then row order. Scanning dow
  • In the long run, we are all dead. XML's future is in the graveyard. Alas, that is probably too much to ask. :(
  • by corsec67 ( 627446 ) on Thursday February 07, 2008 @08:17PM (#22343434) Homepage Journal
    S-expressions [wikipedia.org] (think the lisp format) are much nicer, more compact, and easier to use than XML, while sharing almost all of the same properties otherwise.

    For example:
    <tag1>
        <tag2>
          <tag3/>
        </tag2>
    <tag1>

    becomes:
    (tag1
        (tag2
            (tag3)
        )
    )

    • Re: (Score:3, Informative)

      by jefu ( 53450 )

      Sure, you can build a different text representation for XML as sexps. But if it represents the same thing, it doesn't much matter.

      Imagine that you do so, and you can write a function P that takes xml into sexps and a function Q that takes it back. If Q(P(xml-stuff)) == xml-stuff and P(Q(sexps)) == sexps, then they both do the same thing and you can effectively use either syntax. So you use the syntax you want and convert when you need to. Of course, if either equality doesn't work, then one syntax

    • In Concise XML [concisexml.org], that's:

      <tag1
        <tag2
          <tag3/>
      />
      />
  • by leighklotz ( 192300 ) on Thursday February 07, 2008 @08:18PM (#22343442) Homepage
    XML has tremendous, huge, giant levels of adoption that dwarf its use as XHTML and in XMLHTTPRequest (AJAX) stuff.
    WHATWG's HTML 5 and JSON will have no effect on these other uses. It's just that nobody in hangouts like this sees it.

    For example, the entire international banking industry runs on XML Schemas. Here's one such standard: IFX. Look at a few links: http://www.csc.com/industries/banking/news/11490.shtml [csc.com] , http://www.ifxforum.org/home [ifxforum.org] , http://www.ifxforum.org/home [ifxforum.org]
    But there are other XML standards in use in banking.

    The petroleum industry is a heavy user of XML. Example: Well Information Transfer Standard Markup Language WITSML (http://www.knowsys.com/ and others).

    The list goes on and on, literally, in major, world-wide industry after industry. XML has become like SQL -- it was new, it still has plenty of stuff going on and smart people are working on it, but a new generation of programmers has graduated from high school, and reacts against it. But it's pure folly to think it's going to go away in favor of JSON or tag soup markup.

    So yes, suceess in Facebook applications can make a few grad students drop out of school to market their "stuff," and Google can throw spitballs at Microsoft with a free spreadsheet written in Javascript, but when you right down to it, do you really think the banking industry, the petroleum industry, and countless others are going to roll over tomorrow and start hacking JSON?
  • by Qbertino ( 265505 ) <moiraNO@SPAMmodparlor.com> on Thursday February 07, 2008 @08:19PM (#22343452)
    Ok. I've once again seen the full range of XML comments here. From 'cool super technology modern java' to 'OMFG it sucks' right over to 'XML has bad security' - I mean ... WTF? XML is a Data Format Standard. It has about as much to do with IT security as the color of your keyboard.

    And for those of you out there who haven't yet noticed: XML sucks because data structure serialisation sucks. It allways will. You can't cut open, unravel and string out an n-dimensional net of relations into a 1-dimensional string of bits and bytes without it sucking in one way or the other. It's a, if not THE classic hard problem in IT. Get over it. It's with XML that we've finally agreed upon in which way it's supposed to suck. Halle-flippin'-luja! XML is the unified successor to the late sixties way of badly delimited literals, indifference between variables and values and flatfile constructs of obscure standards nobody wants. And which are so arcane by todays standards that they are beyond useless (Check out AICC if you don't know what I mean). Crappy PLs and config schemas from the dawn of computing.

    That's all there is to XML: a universal n-to-1 serialisation standard. Nothing more and nothing less. Calm down.

    And as for the headline: Of-f*cking-course it's here to stay. What do you want to change about it (much less 'enhance'). Do you want to start color-coding your data? Talking about the future of XML is allmost like talking about the future of the wheel ("Scientist ask: Will it ever get any rounder?"). Give me a break. I'm glad we got it and I'm actually - for once - gratefull to the academic IT community doing something usefull and pushing it. It's universal, can be handled by any class and style of data processing and when things get rough it's even human readable. What more do you want?

    Now if only someone could come up with a replacement for SQL and enforce universal utf-8 everywhere we could finally leave the 1960s behind us and shed the last pieces of vintage computing we have to deal with on a daily basis. Thats what discussions like these should actually be about.
    • by Shados ( 741919 )
      Completly agree with you, to the security comment, all the way to the "UTF-8 everywhere and be gone with SQL" thing.

      Just out of curiosity, have you ever had to work with EDI? Because you sound like someone who probably got burnt by something like that in the past :)

    • Cheers, Qbertino. This is the best explanation of XML's raison d'etre I have ever heard.

      I think what people might hate most is DTDs. That makes sense. Even their creator says they suck. There are many ways around them... Lisp can be one big full-service XML processor. Easily. With happy ending and no need for the DOM or SAX.

      The bottom line is, XML is nothing (literally) until you spec YourML. And most people don't have a need for that! So it seems useless to them. If you are writing markup languages for
      • by Shados ( 741919 )
        I'm not the person you replied to, but... One thing i've noticed from my personal experience: XML sucks hard without schema. XSD sucks hard to make schemas. XSD is the norm. Put everything together, and you get "XML SUCKS!"

        There are XSD alternatives, and also nice tools and editors to handle XSDs: then you're fine.

        Also, having taken a look at the mainstream C++ APIs for XML, that would make most anyone hate it. It isn't bad in Java or .NET really (I didn't do any XML in Java in a long time, but a few years
  • Use it if mandated, try to avoid using it for application configuration if possible, try to avoid transformations as much as possible, accept that web services / ajax do make sense in certain situations.

    Basically like any tool use where it makes most sense, avoid using it in other cases.
  • by chip2004 ( 913568 ) on Thursday February 07, 2008 @09:43PM (#22344160)
    XML tries to make everything fit into a single hierarchy. Most real-world information is comprised of graphs of data. ISO STEP provides better readability compared to XML, a more strongly typed schema mechanism, and a more compact size. Best of all, programs can process and present results of STEP incrementally instead of requiring closing tags so you can hold gigabytes of information in the same file and seek randomly.

    Example:
    #10=ORGANIZATION('O0001','LKSoft','company');
    #11=PRODUCT_DEFINITION_CONTEXT('part definition',#12,'manufacturing');
    #12=APPLICATION_CONTEXT('mechanical design');
    #13=APPLICATION_PROTOCOL_DEFINITION('','automotive_design',2003,#12);
    #14=PRODUCT_DEFINITION('0',$,#15,#11);
    #15=PRODUCT_DEFINITION_FORMATION('1',$,#16);
    #16=PRODUCT('A0001','Test Part 1','',(#18));
    #17=PRODUCT_RELATED_PRODUCT_CATEGORY('part',$,(#16));
    #18=PRODUCT_CONTEXT('',#12,'');
    #19=APPLIED_ORGANIZATION_ASSIGNMENT(#10,#20,(#16));
    #20=ORGANIZATION_ROLE('id owner');
    • Strange. I don't know why, but this STEP reminds me of BASIC. :-)
      Is this supposed to be a step forward?

      Wikipedia page for ISO STEP mentions that many consider replacing it with XML [wikipedia.org], or rather creating XML schemas to represent the information STEP does (I didn't find Wikipedia's external reference for this though).

      ...programs can process and present results of STEP incrementally instead of requiring closing tags...

      It's not true that XML cannot be rendered incrementally. This Mozilla FAQ [mozilla.org] points out that versions before Firefox 3/Gecko 1.9 don't support it, which makes me believe that Firefox 3 does suppo

  • by scottsevertson ( 25582 ) on Friday February 08, 2008 @12:06AM (#22345168) Homepage

    "XML is really just data dressed up as a hooker."

    --Dave Thomas [wikipedia.org]

    XML does suck if you stick with some of the W3C standards and common tools. Suggestions to make it less painful:

    • Ditch W3C's XML Schema

      W3C Schema is painful; it forces object-oriented design concepts onto a hierarchical data model. Consider RELAX NG [relaxng.org] (an Oasis-approved standard) instead; it's delightful in comparison. Use the verbose XML syntax when communicating with the less technical - if you've seen XML before, it's pretty easy to comprehend:

      <r:optional>
      <r:element name="w3cSchemaDescription">
      <r:choice>
      <r:value>painful</r:value>
      <r:value>ugly</r:value>
      <r:value>inflexible</r:value>
      </r:choice>
      </r:element>
      </r:optional>

      Switch to the compact syntax when you're among geeks:

      element w3cSchemaDescription { "painful" | "ugly" | "inflexible" }?

      There's validation support on major platforms, and even a tool (Trang [thaiopensource.com]) to convert between verbose/compact formats, and output to DTD and W3C Schemas. And, if you need to specify data types, it borrows the one technology W3C Schema got right: the Datatypes library [w3.org].

    • Don't use the W3C DOM

      The W3C DOM attempts to be a universal API, which means it must conform to the lowest common denominator in the programming languages it targets. Consider the NodeList [w3.org] interface:

      interface NodeList {
      Node item(in unsigned long index);
      readonly attribute unsigned long length;
      };

      While similar to the native list/collection/array interfaces most languages provide, it's not an exact match. So, DOM implementers create an object that doesn't work quite like any other collection on the platform. In Java, this means writing:

      for(int i = 0; i < nodeList.length(); i++)
      {
      Node node = nodeList.item(i);
      // Do something with node here...
      }

      Instead of:

      for(Node node : nodeList)
      {
      // Do something with node here...
      }

      Dynamic languages allow an even more concise syntax. Consider this Ruby builder code to build a trivial XML document:

      x.date {
      x.year "2006"
      x.month "01"
      x.day "01"
      }

      I thought about writing the W3C DOM equivalent of the above, but I'm not feeling masochistic tonight. Sorry.

      The alternatives depend on your programming language, but plenty of choices exist for DOM-style traversal/manipulation.

    • Forget document models entirely (maybe)

      In-memory object models of large XML document can consume a lot of resources, but often, you only need part of the data. Consider using an XMLPull [xmlpull.org] or StAX [codehaus.org] parser instead. Pull means you control the document traversal, only descending into (and fully parsing) sections of the XML that are of interest. SAX [saxproject.org] based parsers have equivalent capabilities, but the programming model is uncomfortable for many developers.

      Even better, some Pull processors are wicked fast, even when using them to construct a DOM. In Winter 2006, I benchmarked an XML-heavy application, and found WoodStox [codehaus.org] to be an order of magnitude faster at constructing thousands of small DOM4J documents

  • by Aceticon ( 140883 ) on Friday February 08, 2008 @04:31AM (#22346354)
    The future of XML is where it's past is: in the back-end, connecting systems designed by different teams and even different companies.

    I've been working with XML ever since it first came out and the whole XML on the front-end is a fad that comes and goes periodically.

    The pros of XML
    • There are a gazillion libraries out there to parse and process XML. Any idiot can pick an XML library and in 5 minutes enable his/her program to read and write XML. This means less development work is needed to get the data into and out of the messages and more time can be used for actually dealing with the data itself (as in, figuring out what should it be and what to do with it).
    • Build-in validation. This is great when different teams are doing 2 sides of an interface between 2 system using XML as the transport format: basically the XML Schema acts as the de facto Interface Requirements Specification - it lists all fields, their type, their mandatory status, their location in the data structure and, if well done, even their allowed values or minimum and maximum values. If both the sender and the receiver actually enable validation of the XML messages against the schema, then in practice it's close to impossible for a sender to create a message which breaks the receiver. However, when both the sender and the receiver are being developed by the same team, this is a lot less useful.
    • People can actually open an XML message and check it with standard text editor. This is only good for relatively small messages though - if whatever is generating the messages doesn't put end of lines anywhere, for big messages interpreting it's contents is still a head ache


    Cons of XML
    • XML is the best file expansion scheme known to man. Encoding something in XML can easily turn 10KB worth of data into an 1MB monstrosity - just try encoding an n-by-m matrix of integers into XML without using fancy tricks like (non-XML) separators to see what i mean
    • High memory usage for parsers. This is both related to the first con listed and to the fact that the most common standard used to represent an XML document in memory in an Object Oriented language (DOM), actually uses one object per XML entity (elements, attributes, etc) - which means that an XML document is further expanded when loaded into memory (unless your element and attribute names are really large, the memory footprint of a DOM entity is usually bigger than the XML representation of that entity)
    • Parsing XML can be a lot slower than parsing most binary formats. This is again related to the cons listed above
    • High cross dependency between different parts of the file. More specifically, in order to reach any element inside an XML stream, the whole stream up to that point has to be parse.


    The pros and cons mean that the best place to use XML is for interoperability between systems/applications developed by different teams/vendors where not much data is sent around and processing is not time sensitive. This does cover some front-end applications where the data can be generated by a program done by one vendor and read by a program done by a different vendor. It does, however, not cover files which are meant to be written and read by the same application.

    The second best place is to quickly add support for a tree structured storage format for data to an application (for example, for a config file), since you can just pick-up one of the XML libraries out there and half your file format problems will be solved (you still have to figure out and develop the "what to put in there" and "where to put it" part, but need not worry about most of the mechanics of generating, parsing and validating the file)

Avoid strange women and temporary variables.

Working...