Does the World Need Binary XML? 481
sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."
A Fortran compiler is the hobgoblin of little minis.
For Starters (Score:2, Insightful)
For starters, keep Microsoft out of it.
Re:For Starters (Score:5, Interesting)
IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks, etc, that should not be in a pure node-based system like XML. Why are not you picking on them in your comment?
As far as SOAP and XML Web Services (standardized protocols for XML RPC transactions) Microsoft was way ahead of the pack. And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.
The fake grass is always greener... (Score:3, Insightful)
You had me until then; no self-respecting engineer would ever use those terms.
Re:For Starters (Score:3, Insightful)
Re:For Starters (Score:2)
Re:For Starters (Score:4, Insightful)
However, let me re-phrase the grandparent:
"For starters, make sure Microsoft can't extend it to lock out compeditors in some way."
Better?
Soko
Then what (Score:3, Funny)
Two words. (Score:2)
Re:Two words. (Score:2)
Re:Two words. (Score:2)
Then we wrap it again, that's what! (Score:5, Funny)
Of course not! That's not XML!
<file=xmlbinary> <baseencoding=64> <byte bits=8> <bit1>0 </bit><bit2>1 </bit><bit3>1 </bit><bit4>0 </bit><bit5>1 </bit><bit6>0 </bit><bit7>0 </bit><bit8>1 </bit> </byte>
<boredcomment>(Umm, I'm gonna skip a bit if y'all don't mind)</boredcomment>
</baseencoding> </file>
Now it's XML!
Re:Then we wrap it again, that's what! (Score:3, Funny)
<file type="xmlbinary">
<baseencoding base="64">
<byte bits="8">
<bit seq="0">0</bit>
<bit seq="1">1</bit>
<bit seq="2">1</bit>
<bit seq="3">0</bit>
<bit seq="4">1</bit>
<bit seq="5">0</bit>
<bit seq="6">0</bit>
<bit seq="7">1</bit>
</byte>
<!--
(Umm, I'm gonna skip a bit if y'all don't mind)
-->
</baseencoding>
</file
Vast omissions! (Score:5, Funny)
Aside from the mistakes pointed out by others, you also forgot to reference the xmlbinary namespace, the xmlbyte namespace, and the xmlboredcommentinparentheses namespace, and to qualify all attributes accordingly. You also didn't include anything in or any magic words like CDATA, and you didn't define any entities. You also failed to supply a DTD and an XSL schema.
This is therefore still not _true_ XML. It simply doesn't have enough inefficiency. Please add crap to it
The solution is clear... (Score:4, Funny)
Re:The solution is clear... (Score:3, Insightful)
Step 1 to getting binary XML (Score:2, Insightful)
Thats all you need. XML compresses great.
Re:Step 1 to getting binary XML (Score:4, Insightful)
So, to propose simply compressing it means that there's and expansion (which is expensive) followed by a compression (which is really expensive). That seems pretty silly. However, given an upfront knowledge of which tags are going to be generated, it's pretty easy to implement a binary XML format that's fast and easy to decode.
This is what I did for a company that I worked for. We did it because performance was a problem. Now, if we don't get something like this through the standards bodies, more companies are going to do what mine did and invent thier own format. That's a problem -- back to the bad old days before we had XML for interoperability.
Now, if we get something good through the standards body then, even though it won't be human readable, it should be simple to provide converters. To have something fast that is onvertable to human readable and back seems like a really good idea.
Re:Step 1 to getting binary XML (Score:3, Insightful)
This is really about making it proprietary. (Score:3, Insightful)
This is all about different companies trying to get THEIR binary format to be the "standard" with XML.
From the article
Images are already
Re:Step 1 to getting binary XML (Score:2, Interesting)
KISS (Score:5, Interesting)
I agree with his point.
What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?
Re:KISS (Score:2)
Re:KISS (Score:2)
Re:KISS (Score:2, Interesting)
Re:KISS (Score:5, Informative)
Data => XML.
XML == large (lots of verbose tags)
XML == slow (have to parse it all [dom], or
build big stacks [sax] to get at data)
Solution:
XML =>
You've solved (kindof) the large problem, but you still keep the slow problem.
What they're suggesting is nothing more than:
XML =>
Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)
"Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.
That is all.
--Robert
Re:KISS (Score:3, Interesting)
So if the need is for compression over networks, well thats only half of XML performance problems. And if the end result becomes a binary form
Re:KISS (Score:2)
Re:KISS (Score:2)
Re:KISS (Score:3, Informative)
There's no reason why it couldn't be used for xml just as it is for html.
Ewan
Re:KISS (Score:2)
Make a XML compiler... (Score:2)
I guess this is another itch to scratch by the community...
Re:Make a XML compiler... (Score:2)
Re:Make a XML compiler... (Score:2)
Oooh, limelight! (Score:2)
a kabosh? (Score:2)
Re:a kabosh? (Score:2, Funny)
However, you wanted to go to a binary encoding you could try for something relatively straight forward like:
original:
patented XML encoding algorithm (hexideximal):
Binary XML has been around a while... (Score:5, Informative)
One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) [sourceforge.net] which is used in the Matroska media container [matroska.org].
Re:Binary XML has been around a while... (Score:2)
Re:Binary XML has been around a while... (Score:3, Insightful)
The question should instead be "How can we best standardize binary XML?"
My main fear is the typical "design by committee" style of standards bodies will lead to a super-bloated binary standard containing every pet feature of each participan
Goals (Score:2)
I'm not sure why they think that one has to come before the other.
Frankly, make it a standard so I can write proper code to handle it, and you'll have me (joe random developer) interested.
Re:Goals (Score:2, Insightful)
Because standards written in a vacuum tend to suck. Why wouldn't you want input from developers with different backgrounds and needs, then cherry pick the best ideas (many of which you didn't think of), toss out universally reviled ones, and implement a broad, useable standard?
Re:Goals (Score:2)
Basically, they could start with some structure, to ensure that structure may always be present. Hopefully.
gzip ? (Score:2, Interesting)
Somebody fill me in ...
Re:gzip ? (Score:2)
there are already standards for this... (Score:3, Interesting)
i fail to see the need to have a "binary xml" file format when there are already facilities in place to compress text streams
Re:there are already standards for this... (Score:5, Insightful)
Re:there are already standards for this... (Score:3, Insightful)
Binary formats contain pointers all over the place... pointers that say "this many bytes to the next record", or if the binary format is designed to be very fast to read, will even contain pointers that say "record 22031 is at offset XXX, record 22032 is at offset YYY". It's very quick to get to record 22032 for these formats, you just jump there and don't even have to wait eons for a physical disk to read in every single byte in between.
Now, compare to XML. EVEN
Re:there are already standards for this... (Score:2)
Maybe this is like comparing assembly to C (Score:5, Insightful)
I'm sure when C came out, the argument was similar that the performance hit doesn't make up for the readability or cross compatibility. But as computers and network connections became faster, C becomes a more viable alternative.
Re:Maybe this is like comparing assembly to C (Score:3, Insightful)
Re:WHO NEEDS FREAKING READABILITY ?! (Score:3, Informative)
And when someone sends me a bunch of data they want importing into a database, in what format should they send it? I'd like to be able to ensure that their data is correct before giving it to my import routine, and when my validator says there's an error, I'
Human readability makes it much easier (Score:3, Informative)
Many people claim that XML is so great because you can "just read and understand it" without having to use cumbersome and hard to understand specifications. This exactly is what makes XML, indeed, nice for typesetting purposes like HTML, maybe as an alternative for simple configuration files etc, but indeed NOT for RPC and databases as you write. I couldn't agree more.
I have seen so much time and money lost due
Re:WHO NEEDS FREAKING READABILITY ?! (Score:3, Insightful)
The best use for XML is at system or domain boundaries, where you cannot control the software on both sides.
For example, a support system might use file exchange to open support tickets in a vendors system for hardware failures. In this case, the vendor probably needs to deal with multiple different customers, and each of their customers might be dealing with several vendors.
Being able to encapsulate to XML, in this case, is v
You don't need to change XML itself (Score:3, Insightful)
Text compresses quite well, especially redundant text like the tags. So why not just leave XML alone and compress it at the transportation level with protocols like sending it as a zip, let v.92 modems do it automatically, or whatever. No need to touch XML itself at all.
Re:You don't need to change XML itself (Score:3, Interesting)
<SomeTagName>some character data</SomeTagName>
According to the XML spec, the closing tag must close the nearest opening tag. So why does it have to include the opening tag's name? This is 100% redundant information, and is included in every XML tag with children or cdata. An obvious compression would be to replace this with:
<SomeTagName>some chara
Binary XML is called ASN.1 (Score:3, Insightful)
But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.
One of the easiest ways to shrink your XML by about 90% is use tags like: instead of You can use a transformation to use the short names or long names on the wire.
Re:Binary XML is called ASN.1 (Score:2)
And it becomes even slower to parse as a result. Binary XML's advantage isn't its size, it is its parsing performance.
Re:Binary XML is called ASN.1 (Score:2)
Amen To That (Score:5, Insightful)
XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.
Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way. Thus, a perfectly good data format has been bastardized by legions of lazy/overprotective coders.
The slew publications exist for the sole purpose of "clarifying" XML serves as testament to the abuse of XML.
Re:Amen To That (Score:2)
Re:Amen To That (Score:2)
If the nails look bent - blame the hammer or the carpenter?
Re:Amen To That (Score:5, Insightful)
The data is interchangable either way - only difference is that binary XML file is not immediatly human readable.
Re:Amen To That (Score:2)
Compression and huffing around (Score:2, Insightful)
Lets talk about where this verbose talk of verbosity is stemming from:
apple
orange
pineapple
this is a data set. Noone knows what it is.
Here it is again with some pseudo xml style tags
I am listing vegetables here
this is a list of vegetables
vegetables are listed on thier own without any children
Several points. (Score:2)
2) Can't webservers and browsers (well, maybe not IE, but then it's not a browser... it's an OS component, haha) transparently compress XML with gzip or some other?
3) Making it binary won't compress it all that much, using a proper compression algo will.
4) Doesn't something like XML, that makes use of latin characters and a few punctuation marks, compress with insane ratios even in lame compression algo's?
5) I
Re:Several points. (Score:3, Informative)
The problem is that XML is being used for web services which are unlike HTML: the requesting machine will not like waiting 2-3 seconds for the response to the method call. These are interoperating applications, not people downloading text to read, so the response time is much more critical.
I agree that gzip compression is a simple solution to the network problem. It does not address the parsing time problem, and in fact exacerbates it, but in my opinion the network issue is the big one. Time works in favor
Oh please god no (Score:2)
SMPTE KLV (Score:2)
it's needed today, not tomorrow (Score:2)
I'm not seeing in the article where he submits a solution to the problem, he just said as computers and networks get faster, the bloat won't be slow anymore. T
Sounds like CORBA or any other RPC. (Score:2, Insightful)
Fielding on binary Waka (HTTP replacement) (Score:2)
xtp:// (Score:2)
Ok, we got a name. Now all we need is one fart smella to design it.
Doesn't work at all (Score:2)
Nope, sorry, those lyrics suck. We're gonna stick with Mr. Bacharach's version.
Binary XML? (Score:2)
But ASCII is binary after all... (Score:3, Interesting)
However, if anything, XML has shown us the power of well-structured information. XML has given the possibility of universal interoperability. Developments in XML-based technologies have led us to the point where we know enough now to create a standard for structured information that will last for several decades.
It's time that we had a new ASCII. That standard should be binary XML.
When I think of the time that has been wasted by every developer in the history of Computer Science, writing and rewriting basic parsing code, I shudder. Binary XML would produce a standard such that an efficient, universal data structure language would allow significant advances in what is technically possible with our data. For example: why is what we put on disk any different from what's in memory? Binary XML could erase this distinction.
A binary XML standard needs to become ubiquitous, so that just as Notepad can open any ASCII file today, SuperNotepad could open any file in existance, or look at any portion of your computer's memory, in an informative, structured manner. What's more, we have the technology to do this now.
Re:But ASCII is binary after all... (Score:3, Interesting)
(1) Have every PC OS contain a DBMS (this is not as difficult as you would think)
(2) Always keep your data in a DBMS
(3) Have said DBMS transfer the data via whatever method it would like. Chances are this would be some sort of compact, efficient binary method.
Re:But ASCII is binary after all... (Score:3, Insightful)
However (as I tried to emphasize), ASCII is binary too. It's not that binary is inherently more difficult to debug. It's that we need a binary standard as universal as ASCII has become.
Imagine debugging before in the 1960's, when ASCII wasn't standardized. We forget about those
XML images !? (Score:2, Funny)
Yeah, right ! XML binary images... So needed...
Overwhelming feeling... (Score:5, Insightful)
Didn't anyone remember that text processing was bulky and expensive? Sometimes the tech community seems to share the same uncritical mind as people who order get-rich-quick schemes off late night infomercials. I doubt XML would have gotten out of the gate as is, had the community demanded these kinds of features from the get-go.
Re:Overwhelming feeling... (Score:2)
what's wrong with GZip? (Score:2)
Why not re-examine http? (Score:3, Interesting)
We need to look towards http 2.0. What I would want:
- pipelining that works, so that it could be enabled for use on any server that supports http 2.0
- gzip and 7zip [7-zip.org] support.
- All data is compressed by default (a few excludes such as
- Option to initiate persistant connection (remove the stateless protocol concept), via a http header on connect. This would allow for a whole new level for web applications via SOAP/XML.
There are tons of other things that could be enhanced for today's uses.
HTTP is the problem. Not XML
Re:Why not re-examine http? (Score:3, Insightful)
Please remember that not all XML data is transmitted by HTTP however (thank god).
It's a markup language (Score:2)
It's a markup language, it's not supposed to be ideal for general purpose data transfer.
People should stop trying to optimize it for a task it wasn't designed for. Focus on making XML better for markup, and for pity's sake come up with something else that's concise and simple and efficient for general purpose use.
Binary not needed - better table format neeeded. (Score:3, Insightful)
This is what made us balk at using XML for storing NMR spectroscopy data, even though it is already in a textual form to begin with. The current textual form is whitespace-separated, little short numbers less than 5 digits long, for hundreds of thousands of rows. That isn't really that big in ascii form. But turn it into XML, and a 1 meg ascii file turns into a 150 meg XML file because of the extra repetative tag stuff.
In another bit of irony, we can't find an in-memory representation of the data as a table which is more compact than the ascii file is. The original ascii file is even more compact than a 2-D array in RAM. (because it takes 4 bytes to store an int even when that int is typically just one digit and is only larger on rare occasions.)
The article doesn't go far enough... (Score:5, Insightful)
From experience, I can state that using XML in any high performance situation is easy to screw up. But once you get past the basic mistakes at that level, what other inherent problems are there?
Oh, and just stating "well, the format is obviously wasteful" just because it's human readable (one of its primary, most useful, features) is NOT an answer.
I get the feeling that this perception of XML is being perpetuated by vendors who do not really want to open up their data formats. Allowing them to successfully propagate this impression would be a very real step backwards for all IT professionals.
Anecdotal example (Score:3, Interesting)
Client attempted to open in a DOM based application which I suspect used recursion to parse the data (easy to code, recursion). Needless to say it brought their server to its knees.
We switched to flat files shortly there after.
In my problem domain, where 20MB is a small data set, XML is useless. XML seems does not scale well at all (though using a SA
XML doesn't need to be non-ascii to be small (Score:4, Informative)
Stop using bad DTDs. There seems to be a DTD style in which you avoid using attributes and instead add a whole lot of tags containing text. Any element with a content type of CDATA should be an attribute on its parent, which improves the readability of documents and lets you use ID/IDREF to automatically check stuff. Once you get rid of the complete cruft, it's not nearly so bad.
Now that everything other than HTML is generally valid XML, it's possible to get rid of a lot of the verbosity of XML, too. A new XML could make all close tags "</", since the name of the element you're closing is predetermined and there's nothing permitted after a slash other than a >. The > could be dropped from empty tags, too. If you know that your DTD will be available and not change during the life of the document, you could use numeric references in open tags to refer to the indexed child element type of the type of the element you're in, and numeric references for the indexed attribute of the element it's on. If you then drop the spaces after close quotes, you've basically removed all of the superfluous size of XML without using a binary format, as well as making string comparisons unnecessary in the parser.
Of course, you could document it as if it were binary. An open tag is indicated with an 0x3C, followed by the index of the element type plus 0x30 (for indices under 0xA). A close tag is (big-endian) 0x3C2F. A non-close tag is an open tag if it ends with an 0x3E and an empty tag if it ends with an 0x2F. Attribute indices are followed with an 0x3D. And so forth.
Wrong Problem (Score:3, Insightful)
Re:Wrong Problem (Score:3, Insightful)
Use XML in places where it makes sense: Interfaces between different companies/business partners/departments etc, interfaces between mutually hostile vendors, really long time data storage.
Using xml as data format between two tightly coupled Java programs, standing next to each other and who's exchanging massive amounts of data is insane.
This is of course a simplified example BUT the point is ALWAYS beware of the trade-offs you do when you make a technology choice. Same things go for algorithms
XML not useful for xferring copious binary data (Score:3, Insightful)
A good binary XML specification could be an extremely good fit for us.
And, don't suggest that we just compress XML and send that. Here's why: first we have to expand all that digitized data into some sort ASCII encoding, which is then compressed. End result: no gain and a possible loss of precision in the data.
A real, live, useful binary XML spec could help us immensely. I say BRING IT ON!!!!
BTW, wasn't DIME [wikipedia.org] supposed to address these problems? What happened to DIME, anyway?
Possibly I'm a cynic (Score:3, Funny)
We should rejoice, buy more CPUs, and move the problem from XML, to languages with poor concurrency support.
Re:Binary = Proprietary (Score:2)
Re:Binary = Proprietary (Score:3, Insightful)
Of course binary doesn't equal proprietary. Those are two completely different concepts.
PNG is a binary format. It isn't proprietary, though. And although I can't immediately find a text-based proprietary format, such formats are not impossible (although arguably easier to reverse-engineer than binary proprietary formats).
But if the XML is really such a problem, I suggest the simple solution. Compressing XML with a simple and open algorithm like gzip or bzip2, is the way to go. XML usually compresses very
Re:Binary = Proprietary (Score:3, Insightful)
As long as it's standardized, the standard is freely available to anyone who wants it, it does not depend on an external library, and it is unencumbered by any sort of patent, it isn't proprietary.
I hate XML right now because of all the string processing and parsing. Text is a sloppy way of defining something, and it begets lots of big processing libraries. It's OK for big PC memory hog apps, but I can't build a small enough one that is still robust enough to w
Microsoft XML (Score:3, Interesting)
If Microsoft doesn't respect text-only XML, what do you think will happen when^H^H^H^Hif binary XML is out?
Re:Binary = Proprietary ... I disagree (Score:3, Insightful)
It far outweighs it huh? I guess you have never heard of a large segment of the computing world refered to as embeded systems.
If you can develop a good parser (not that hard), the cost difference is negligable, if any.
This is simply untrue, development of a good parser is easy, but it's added bloat that isn't negligable for many computing devices outside of the PC/Server realm. Not to mention the added network tra
Re:Binary = Proprietary ... I disagree (Score:3, Insightful)
Re:ZIP ?! (Score:2)
Because smaller file sizes is only one of the reasons for Binary XML.
Simply compressing it makes it smaller, but does nothing to simplify handling. Parsing XML is the big hairy deal in this case. Things like XML include a lot of ambiguities and complex things, parsing/representing the trees can be a challenge. Think processing of name-spaces and all of the myriad things in XML.
I suspect the purp
Because it's freaking slow (Score:2)
As it happens, most soap requests are NOT human readable. Sure i can sit and figure one out, but unless it's a trivial example, trying to decipher it isn't easy.
A standard binary xml format would allow a standard binary soap variant. Debuggers could hand bsoap->soap tran