Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming The Internet IT Technology

Does the World Need Binary XML? 481

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."
This discussion has been archived. No new comments can be posted.

Does the World Need Binary XML?

Comments Filter:
  • For Starters (Score:2, Insightful)

    what can be done to make XML better, faster and stronger.

    For starters, keep Microsoft out of it.

    • Re:For Starters (Score:5, Interesting)

      by Omega1045 ( 584264 ) on Friday January 14, 2005 @01:00PM (#11364249)
      Why? Microsoft has done a fairly good job promoting XML and SOAP XML Web Services. As long as they stick to the standards (yes, I know) I see no reason to keep them out.

      IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks, etc, that should not be in a pure node-based system like XML. Why are not you picking on them in your comment?

      As far as SOAP and XML Web Services (standardized protocols for XML RPC transactions) Microsoft was way ahead of the pack. And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.

    • Re:For Starters (Score:3, Insightful)

      by leerpm ( 570963 )
      Good idea. Without Microsoft's support from their tools division, this idea will be dead on arrival..
    • Their .Net XML components are pretty damn nice. It makes parsing XML really easy. The ability to save Office documents as XML is really nice as well. So far, Microsoft has only helped spread the usage of XML.
  • Then what (Score:3, Funny)

    by chris_mahan ( 256577 ) <chris.mahan@gmail.com> on Friday January 14, 2005 @12:55PM (#11364124) Homepage
    Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?
    • DIME attackments.
    • by Tackhead ( 54550 ) on Friday January 14, 2005 @01:09PM (#11364407)
      > Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?

      Of course not! That's not XML!

      <file=xmlbinary> <baseencoding=64> <byte bits=8> <bit1>0 </bit><bit2>1 </bit><bit3>1 </bit><bit4>0 </bit><bit5>1 </bit><bit6>0 </bit><bit7>0 </bit><bit8>1 </bit> </byte>
      <boredcomment>(Umm, I'm gonna skip a bit if y'all don't mind)</boredcomment>
      </baseencoding> </file>

      Now it's XML!

      • Since others feel the need to correct you, I'll join in:

        <file type="xmlbinary">
        <baseencoding base="64">
        <byte bits="8">
        <bit seq="0">0</bit>
        <bit seq="1">1</bit>
        <bit seq="2">1</bit>
        <bit seq="3">0</bit>
        <bit seq="4">1</bit>
        <bit seq="5">0</bit>
        <bit seq="6">0</bit>
        <bit seq="7">1</bit>
        </byte>
        <!--
        (Umm, I'm gonna skip a bit if y'all don't mind)
        -->
        </baseencoding>
        </file
      • by kahei ( 466208 ) on Friday January 14, 2005 @01:34PM (#11364855) Homepage

        Aside from the mistakes pointed out by others, you also forgot to reference the xmlbinary namespace, the xmlbyte namespace, and the xmlboredcommentinparentheses namespace, and to qualify all attributes accordingly. You also didn't include anything in or any magic words like CDATA, and you didn't define any entities. You also failed to supply a DTD and an XSL schema.

        This is therefore still not _true_ XML. It simply doesn't have enough inefficiency. Please add crap to it :)

  • by LordOfYourPants ( 145342 ) on Friday January 14, 2005 @12:55PM (#11364128)
    Use the Z-modem protocol between Information Superhighway routers to compress the plaintext.
  • by Anonymous Coward
    Binary XML = zip file.xml > file.xml.zip
    Thats all you need. XML compresses great.
    • by Dasein ( 6110 ) <tedc@@@codebig...com> on Friday January 14, 2005 @01:14PM (#11364483) Homepage Journal
      The problem is that many systems that produce XML have a more compact internal storage (rows from a DB or whatever), then they go through an "expansion" to produce XML.

      So, to propose simply compressing it means that there's and expansion (which is expensive) followed by a compression (which is really expensive). That seems pretty silly. However, given an upfront knowledge of which tags are going to be generated, it's pretty easy to implement a binary XML format that's fast and easy to decode.

      This is what I did for a company that I worked for. We did it because performance was a problem. Now, if we don't get something like this through the standards bodies, more companies are going to do what mine did and invent thier own format. That's a problem -- back to the bad old days before we had XML for interoperability.

      Now, if we get something good through the standards body then, even though it won't be human readable, it should be simple to provide converters. To have something fast that is onvertable to human readable and back seems like a really good idea.

      • Why are you using XML? If you're using it for buzzword compliance, then you're wrong. And nobody but your PHBs cares anyway so it doesn't matter. If you're using it for interchange with other companies, then why are you worried about inefficency, and why is compressing it too much of a barrier? There's lots of obstacles in the way of direct communication with other businesses, compressing your XML is pretty trivial. If you're using it internally as an exchange format, maybe you should consider using someth
    • Zip functionality is so easy to implement in servers and clients that there really isn't any argument about "binary XML".

      This is all about different companies trying to get THEIR binary format to be the "standard" with XML.

      From the article

      Manufacturers of consumer devices such as Canon, as well as mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones, they argue.

      Images are already

  • KISS (Score:5, Interesting)

    by stratjakt ( 596332 ) on Friday January 14, 2005 @12:55PM (#11364141) Journal
    On the face of it, compressing XML documents by using a different file format may seem like a reasonable way to address sluggish performance. But the very idea has many people -- including an XML pioneer within Sun -- worried that incompatible versions of XML will result.

    I agree with his point.

    What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?
    • This is often done (large feeds to Amazon.com are compressed). However, you still have to decompress and parse the resulting stream, which is where a big penalty is incurred. I'm hoping that whatever compression they are considering, it will reduce the uncompressed size, as well as making parsing/searching faster.
    • My previous company used XML as a realtime protocol (I know very lame) and its not the size of the docs is the overhead in parsing especially when you have several Mb a second and only one intel cpu. Ascii --> binary --> Ascii really kills an app.
    • Re:KISS (Score:2, Interesting)

      What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2? I'll tell you one thing that's wrong: these compression algorithms might run fine on your desktop or server; but on an embedded system with restricted memory and CPU power, that's another matter...
    • Re:KISS (Score:5, Informative)

      by Ramses0 ( 63476 ) on Friday January 14, 2005 @01:37PM (#11364911)
      On the surface that works, but it only solves a portion of the problem.

      Data => XML.

      XML == large (lots of verbose tags)

      XML == slow (have to parse it all [dom], or
      build big stacks [sax] to get at data)

      Solution:

      XML => .xml.gz

      You've solved (kindof) the large problem, but you still keep the slow problem.

      What they're suggesting is nothing more than:

      XML => .xml.gzxml

      Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

      "Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

      That is all.

      --Robert
      • Re:KISS (Score:3, Interesting)

        by e2d2 ( 115622 )
        What you said is right on target. I've worked with XML in a few applications (specifically web services) and everytime we saw a performance drop it was not because of a network bandwidth issue but instead it was because the documents were so large that the parser became the bottleneck. And then when you throw in style sheets for manipulation.. well you get the point.

        So if the need is for compression over networks, well thats only half of XML performance problems. And if the end result becomes a binary form
  • But make it a open source one...

    I guess this is another itch to scratch by the community...
  • Check out CWXML/BXML [cubewerx.com]. Especially significant though perhaps unintuitive is the savings in compression time from the source data being more compact.
  • looks like the developer in question is a little too close to his prize development. speeding up xml by removing all the bloat, however that would be accomplished, be it compiling xml into some sort of byte code or whatnot, seems like a much better idea from the client and server point of view. why transfer 100kb of text data when you can send 10kb of binary data for the same message?
    • by SnapShot ( 171582 )
      Considering that for most purposes XML contains a lot of redundant formatting it seems like you could get nearly 10:1 compression simply by using (as has already been mentioned) zip or some other compression algorithm.

      However, you wanted to go to a binary encoding you could try for something relatively straight forward like:

      original:

      <tag name="value"/>

      patented XML encoding algorithm (hexideximal):

      3c746167 206e616d 653d2276 616c7565 222f3e00

  • by PipianJ ( 574459 ) on Friday January 14, 2005 @12:57PM (#11364182)
    Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

    One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) [sourceforge.net] which is used in the Matroska media container [matroska.org].
    • Of course, there are a zillion ways to binary encode XML, but none are a W3C standard.
    • Exactly... the question shouldn't be "Does the world need Binary XML?" because the answer is "the world already has it, about 1000 different kinds in fact!" It's not like Tim Bray's whining is going to make it go away. ("Waaaahh... someone doesn't like my stuff!")

      The question should instead be "How can we best standardize binary XML?"

      My main fear is the typical "design by committee" style of standards bodies will lead to a super-bloated binary standard containing every pet feature of each participan

  • FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format."

    I'm not sure why they think that one has to come before the other.

    Frankly, make it a standard so I can write proper code to handle it, and you'll have me (joe random developer) interested.
    • Re:Goals (Score:2, Insightful)

      FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format." I'm not sure why they think that one has to come before the other.

      Because standards written in a vacuum tend to suck. Why wouldn't you want input from developers with different backgrounds and needs, then cherry pick the best ideas (many of which you didn't think of), toss out universally reviled ones, and implement a broad, useable standard?
      • I agree in principle that standards written in a vacuum as you say tend to suck. However, they could release a "preliminary" spec, and I (and others interested) could write to that, give feedback, etc, and they could perhaps use it to develop a release-1.0 spec. Specifications can change as long as it's clear what specification a particular piece of software relies on.

        Basically, they could start with some structure, to ensure that structure may always be present. Hopefully. :)
  • gzip ? (Score:2, Interesting)

    by JonyEpsilon ( 662675 )
    Am I missing something, or would just gzip'ing xml when it goes over the network not solve the problem ? And isn't this sort of solution already widely implemented for web content ?

    Somebody fill me in ...

  • by ophix ( 680455 ) on Friday January 14, 2005 @12:58PM (#11364201) Homepage
    ... its called zipping, most webservers have it as an option to zip the data up as it streams to the client browser

    i fail to see the need to have a "binary xml" file format when there are already facilities in place to compress text streams
    • by rootmonkey ( 457887 ) on Friday January 14, 2005 @01:11PM (#11364441)
      I'll say it again.. Its not the size of the document its the overhead in parsing.
    • Binary xml wouldn't be just about getting the files smaller, but also about making the parsers simpler. Parsing a xml file today is quite complex and slow, sure it doesn't matter much for a webpage or two, but if you have larger amounts of data its really no fun at all, a proper binary XML standard might speed that up by an order of magnitude or two.
  • by Stevyn ( 691306 ) on Friday January 14, 2005 @01:00PM (#11364237)
    Programs written in assembly can run faster than programs written in C, but it's easier for someone to open a .c file and figure out what's going on.

    I'm sure when C came out, the argument was similar that the performance hit doesn't make up for the readability or cross compatibility. But as computers and network connections became faster, C becomes a more viable alternative.
    • Holy smokes, that's wrong. C code will run exactly the same speed as assembly code if they are both compiled to the same machine code. Computers don't read C or assembly. They read binary computer instructions, whether those instructions were originally written in assembly, C, Java, Perl, Python, etc... If a computer had to read C code every time it wanted to run, it would take so, so, so much longer to do anything. XML is great for humans, but sucks for computers. Not only are you sending gobs of string da
  • by Nom du Keyboard ( 633989 ) on Friday January 14, 2005 @01:00PM (#11364238)
    XML's verbosity and lack of inherent compression...XML standard calls for information to be stored as text.

    Text compresses quite well, especially redundant text like the tags. So why not just leave XML alone and compress it at the transportation level with protocols like sending it as a zip, let v.92 modems do it automatically, or whatever. No need to touch XML itself at all.

    • Actually, you could compress XML by a significant amount by making one simple change to the language. Picture the following piece of XML:

      <SomeTagName>some character data</SomeTagName>

      According to the XML spec, the closing tag must close the nearest opening tag. So why does it have to include the opening tag's name? This is 100% redundant information, and is included in every XML tag with children or cdata. An obvious compression would be to replace this with:

      <SomeTagName>some chara

  • by Saint Stephen ( 19450 ) on Friday January 14, 2005 @01:01PM (#11364264) Homepage Journal
    For starters, we already have binary XML, it's called ASN.1. Don't argue, I know it's not exactly the same.

    But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.

    One of the easiest ways to shrink your XML by about 90% is use tags like:
    <a><b><c>
    instead of
    <FirstName><CompanyName><Address>
    You can use a transformation to use the short names or long names on the wire.
  • Amen To That (Score:5, Insightful)

    by American AC in Paris ( 230456 ) * on Friday January 14, 2005 @01:02PM (#11364275) Homepage
    XML, as originally designed, is deliciously straightforward. Data is encoded into discrete, easy-to-process chunks that any given XML parser can make sense of.

    XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.

    Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way. Thus, a perfectly good data format has been bastardized by legions of lazy/overprotective coders.

    The slew publications exist for the sole purpose of "clarifying" XML serves as testament to the abuse of XML.

    • The problem with trying to solve the connector conspiracy (in this case obtuse undocumented binary files) is that not everybody [b]wants[/b] to solve the connector conspiracy. Some people would rather have their file format die off than have a competitor gain any advantage whatsoever over their product. They also don't want people buying cheap knockoffs of their products and think they can stop this by not giving away any details on how to interface with their product. If we find a way to change this per
    • Come on - how many real projects have you had to deal with "huge gobs of proprietary data" wrapped in XML? People AGREE on a data exchange format, everything else defeats the purpose.

      If the nails look bent - blame the hammer or the carpenter?
  • A huff transform will give you entropy +1 compression. Not suitable for larger data sets (dictionary based compression is even better for this). 7z compression (or is it z7?) will give you a neat storage format.

    Lets talk about where this verbose talk of verbosity is stemming from:


    apple
    orange
    pineapple


    this is a data set. Noone knows what it is.
    Here it is again with some pseudo xml style tags
    I am listing vegetables here

    this is a list of vegetables
    vegetables are listed on thier own without any children
  • 1) Isn't the greatest benefit of XML that it can be opened in a text editor, and made sense of?

    2) Can't webservers and browsers (well, maybe not IE, but then it's not a browser... it's an OS component, haha) transparently compress XML with gzip or some other?

    3) Making it binary won't compress it all that much, using a proper compression algo will.

    4) Doesn't something like XML, that makes use of latin characters and a few punctuation marks, compress with insane ratios even in lame compression algo's?

    5) I
    • The problem is that XML is being used for web services which are unlike HTML: the requesting machine will not like waiting 2-3 seconds for the response to the method call. These are interoperating applications, not people downloading text to read, so the response time is much more critical.

      I agree that gzip compression is a simple solution to the network problem. It does not address the parsing time problem, and in fact exacerbates it, but in my opinion the network issue is the big one. Time works in favor

  • I've had to work with binary XML for formatting WAP push messages and it is the ghastliest thing ever. Yes, I can see that it has low-bandwidth applications but my opinion is that I'd much rather have less bandwidth than have to deal with binary XML :-)
  • I would suggest that people seeking fast, standard ways to deliver binary data look at SMPTE KLV (key, length, value) coding. It is SMPTE 336M, and is the standard for metadata coding in television, video, and digital cinema.
  • I totally drank the XML kool-aid, so don't interpret this as saying that I hate XML or anything. I really love it. However, you don't really get an appreciation of just how slow and bloaty XML is until you see it used in real life a few times. I sometimes wonder if these guys have ever built a system on something that wasn't a top-notch research bed.

    I'm not seeing in the article where he submits a solution to the problem, he just said as computers and networks get faster, the bloat won't be slow anymore. T
  • by Anonymous Coward
    The XML guys are funny. First make a text version of binary protocols to make it easy to sell XML them to the mass of "31137 HTML PRogrammers" who feel comfortable "programming" in dreamweaver; and then make a binary version to make it work.
  • Roy Fielding, who is developing the Waka protocol, which is binary, argued at ApacheCon 2000 that as long as the protocol is still understood, binary utilities could be made to decode things for debugging. But the 99.9% of other requests would be more important and benifit more from being in binary.
  • XML transfer protocol.

    Ok, we got a name. Now all we need is one fart smella to design it.
  • What the world needs now, it binary XML?

    Nope, sorry, those lyrics suck. We're gonna stick with Mr. Bacharach's version.
  • That's what you get when somebody forgets to choose "BIN" in their FTP client and dumps a bunch of XML to a directory, right?
  • by MarkWPiper ( 604760 ) on Friday January 14, 2005 @01:13PM (#11364470) Homepage
    The fact is, ASCII is a binary format. It just happens to be a format that has become universally accepted. As the article says, there are certainly benefits to having ASCII-based XML: "The fact that XML is ordinary plain text that you can pull into Notepad... has turned out to be a boon, in practice," he said. "Any time you depart from that straight-and-narrow path, you risk loss of interoperability."

    However, if anything, XML has shown us the power of well-structured information. XML has given the possibility of universal interoperability. Developments in XML-based technologies have led us to the point where we know enough now to create a standard for structured information that will last for several decades.

    It's time that we had a new ASCII. That standard should be binary XML.

    When I think of the time that has been wasted by every developer in the history of Computer Science, writing and rewriting basic parsing code, I shudder. Binary XML would produce a standard such that an efficient, universal data structure language would allow significant advances in what is technically possible with our data. For example: why is what we put on disk any different from what's in memory? Binary XML could erase this distinction.

    A binary XML standard needs to become ubiquitous, so that just as Notepad can open any ASCII file today, SuperNotepad could open any file in existance, or look at any portion of your computer's memory, in an informative, structured manner. What's more, we have the technology to do this now.

    • Jesus Christ, no. The solution is simple:
      (1) Have every PC OS contain a DBMS (this is not as difficult as you would think)
      (2) Always keep your data in a DBMS
      (3) Have said DBMS transfer the data via whatever method it would like. Chances are this would be some sort of compact, efficient binary method.
  • by morane ( 773038 )
    Without it, large files such as images will take too long to download !

    Yeah, right ! XML binary images... So needed...

    <image>
    <pixel x="0" y="1">
    <r value="255" />
    <g value="255" />
    <b value="255" />
    <pixel/>
    ...
    <image/>
  • by GOD_ALMIGHTY ( 17678 ) <(curt.johnson) (at) (gmail.com)> on Friday January 14, 2005 @01:17PM (#11364538) Homepage
    of "I told you so!" coming over. Between all the people who jumped on the web services bandwagon without any clue how to handle distributed systems efficiently and the "OMG! It's human readable!" crowd, the architecture de jour has become a bloated PITA. Why this wasn't built into the spec in the first place alludes me. If we can use tools like ethereal to read those binary IP datagrams, why wouldn't the same concept be used for this standard? A standardized, compressed, data format with a standardized API for outputting plaintext (XML), would have allowed this system to be much more efficient.

    Didn't anyone remember that text processing was bulky and expensive? Sometimes the tech community seems to share the same uncritical mind as people who order get-rich-quick schemes off late night infomercials. I doubt XML would have gotten out of the gate as is, had the community demanded these kinds of features from the get-go.
    • The obvious advantage of a text format, and the reason XML became popular, is that we had tons and tons of text processing tools already available. All you needed were parsers, and your dev tools already worked. Starting with a binary protocol would have been too steep. Same reason HTML succeeded.
  • just gzip, and proceed as before. it would require only minimal changes in the work case and none at all in the best case. isn't this how OpenOffice works?
  • by digitalgimpus ( 468277 ) on Friday January 14, 2005 @01:25PM (#11364681) Homepage
    I think that's where the true problem lies. HTTP.

    We need to look towards http 2.0. What I would want:

    - pipelining that works, so that it could be enabled for use on any server that supports http 2.0
    - gzip and 7zip [7-zip.org] support.
    - All data is compressed by default (a few excludes such as .gz files, .zip files etc. since that would be pointless).
    - Option to initiate persistant connection (remove the stateless protocol concept), via a http header on connect. This would allow for a whole new level for web applications via SOAP/XML.

    There are tons of other things that could be enhanced for today's uses.

    HTTP is the problem. Not XML

  • It's a markup language, it's not supposed to be ideal for general purpose data transfer.

    People should stop trying to optimize it for a task it wasn't designed for. Focus on making XML better for markup, and for pity's sake come up with something else that's concise and simple and efficient for general purpose use.

  • by DunbarTheInept ( 764 ) on Friday January 14, 2005 @01:39PM (#11364957) Homepage
    The real problem with XML is that it adds the extra verbosity of the metadata text tag for EACH INSTANCE of a pice of data even in cases where that metadata is identical for row after row of data. In the case of table data, that is really stupid. There should be some sort of XML means to handle a table of values better. A way to say "Column 1 has the following XML properties: name, etc", then "Column 2 has the following XML properties: name, etc".... and then after that section, a way to syntactically list just the values up until the end of the loop.

    This is what made us balk at using XML for storing NMR spectroscopy data, even though it is already in a textual form to begin with. The current textual form is whitespace-separated, little short numbers less than 5 digits long, for hundreds of thousands of rows. That isn't really that big in ascii form. But turn it into XML, and a 1 meg ascii file turns into a 150 meg XML file because of the extra repetative tag stuff.

    In another bit of irony, we can't find an in-memory representation of the data as a table which is more compact than the ascii file is. The original ascii file is even more compact than a 2-D array in RAM. (because it takes 4 bytes to store an int even when that int is typically just one digit and is only larger on rare occasions.)
  • by Da VinMan ( 7669 ) on Friday January 14, 2005 @01:45PM (#11365038)
    It doesn't tell us what the specific performance problems are with XML. Does it take too long to transmit? Does it take too long to validate? Does it take too long to parse? Does it take too long to format? What's the real problem here?

    From experience, I can state that using XML in any high performance situation is easy to screw up. But once you get past the basic mistakes at that level, what other inherent problems are there?

    Oh, and just stating "well, the format is obviously wasteful" just because it's human readable (one of its primary, most useful, features) is NOT an answer.

    I get the feeling that this perception of XML is being perpetuated by vendors who do not really want to open up their data formats. Allowing them to successfully propagate this impression would be a very real step backwards for all IT professionals.
    • Anecdotal example (Score:3, Interesting)

      by plopez ( 54068 )
      Had data to be delivered to client, dumped from a database. As flat files they were ~20mb in size as flat files. That bloated ~120mb after conversion to XML.

      Client attempted to open in a DOM based application which I suspect used recursion to parse the data (easy to code, recursion). Needless to say it brought their server to its knees.

      We switched to flat files shortly there after.

      In my problem domain, where 20MB is a small data set, XML is useless. XML seems does not scale well at all (though using a SA
  • by iabervon ( 1971 ) on Friday January 14, 2005 @01:59PM (#11365261) Homepage Journal
    Three ideas, in order of increasing significance and increasing difficulty:

    Stop using bad DTDs. There seems to be a DTD style in which you avoid using attributes and instead add a whole lot of tags containing text. Any element with a content type of CDATA should be an attribute on its parent, which improves the readability of documents and lets you use ID/IDREF to automatically check stuff. Once you get rid of the complete cruft, it's not nearly so bad.

    Now that everything other than HTML is generally valid XML, it's possible to get rid of a lot of the verbosity of XML, too. A new XML could make all close tags "</", since the name of the element you're closing is predetermined and there's nothing permitted after a slash other than a >. The > could be dropped from empty tags, too. If you know that your DTD will be available and not change during the life of the document, you could use numeric references in open tags to refer to the indexed child element type of the type of the element you're in, and numeric references for the indexed attribute of the element it's on. If you then drop the spaces after close quotes, you've basically removed all of the superfluous size of XML without using a binary format, as well as making string comparisons unnecessary in the parser.

    Of course, you could document it as if it were binary. An open tag is indicated with an 0x3C, followed by the index of the element type plus 0x30 (for indices under 0xA). A close tag is (big-endian) 0x3C2F. A non-close tag is an open tag if it ends with an 0x3E and an empty tag if it ends with an 0x2F. Attribute indices are followed with an 0x3D. And so forth.
  • Wrong Problem (Score:3, Insightful)

    by slyckshoes ( 174544 ) on Friday January 14, 2005 @02:57PM (#11366043)
    It seems to me that the problem isn't with XML, it's with what people are using it for. I read some complaints here from people saying "I tried to use XML for BLAH and it was too slow." However, if they'd thought about it, BLAH would have been better served by some binary format in the first place. The article also discusses the fact that mobile devices need something less cumbersome for transferring pictures/media. Why are they using XML for that at all? One of the benefits of XML is that it's human readable, but in those applications you don't need that benefit, so don't use XML. Instead of coming up with a binary XML standard, come up with a generic binary standard that does exactly what you want. Too many people have been given the hammer of XML and now everything looks like a nail.
    • Re:Wrong Problem (Score:3, Insightful)

      by johnjaydk ( 584895 )
      Dead on.

      Use XML in places where it makes sense: Interfaces between different companies/business partners/departments etc, interfaces between mutually hostile vendors, really long time data storage.

      Using xml as data format between two tightly coupled Java programs, standing next to each other and who's exchanging massive amounts of data is insane.

      This is of course a simplified example BUT the point is ALWAYS beware of the trade-offs you do when you make a technology choice. Same things go for algorithms

  • by smcdow ( 114828 ) on Friday January 14, 2005 @03:06PM (#11366169) Homepage
    Our applications (real-time geographically distributed RF DSP) involve shipping around lots and lots and lots and lots of digitized RF data. We have our share of wonks who think we should be using XML for this kind of thing. We all agree that XML would solve many problems for us. Except there's no convenient way to represent the actual data payloads, which consist of scads of binary data.

    A good binary XML specification could be an extremely good fit for us.

    And, don't suggest that we just compress XML and send that. Here's why: first we have to expand all that digitized data into some sort ASCII encoding, which is then compressed. End result: no gain and a possible loss of precision in the data.

    A real, live, useful binary XML spec could help us immensely. I say BRING IT ON!!!!

    BTW, wasn't DIME [wikipedia.org] supposed to address these problems? What happened to DIME, anyway?

  • by smittyoneeach ( 243267 ) * on Saturday January 15, 2005 @09:20AM (#11372785) Homepage Journal
    ...but I thought that the strategic goal of XML is to sell more hardware.
    We should rejoice, buy more CPUs, and move the problem from XML, to languages with poor concurrency support.

To the landlord belongs the doorknobs.

Working...