Stephane Rodriguez Dismantles Open XML 188
Elektroschock writes "Stephane Rodriguez, a reengineering specialist who became popular for his article on MS Office 2007 binary data, now comprehensively debunks Microsoft's new Open XML format. With small case studies he demonstrates the impossible challenges third-party developers will face. His conclusion: it is 'defective by design.' Next week members of the International Standard Organization are likely to approve the format as a second official ISO standard for office documents, even though most nations have submitted comments. Rodriguez claims he is 'not affiliated to any pro-MS or anti-MS party/org[anization]/ass[ociation].'"
This is not proof of OOXML being defective by desi (Score:3, Insightful)
Re:This is not proof of OOXML being defective by d (Score:4, Insightful)
Re:This is not proof of OOXML being defective by d (Score:5, Insightful)
Re:This is not proof of OOXML being defective by d (Score:2, Insightful)
That's why the title says "Microsoft Office XML Formats? Defective by design"
not "OOXML defective by design"
He is dissing the Microsofts claims of transparency and openness of Microsoft Office XML
ISO Credibility (Score:1, Insightful)
Re:This is not proof of OOXML being defective by d (Score:5, Insightful)
If Office can't read OOXML files produced by other tools, and other tools can't read Office OOXML files, where do you suppose end users will place the blame?
And what do you suppose users will do when faced with incompatibilities?
It's a brilliant strategy: Define a new "standard" but don't quite implement it yourself, ensuring that no one can implement a competitive office suite that is compatible with yours. Further, make the standard complex and weird enough that you can always blame inconsistencies on the other implementations. Voila! You get to proclaim to the world that your de facto standard office suite supports an open, ISO-blessed international standard format -- but with no worries about losing your lock-in.
Some Points Are Valid, Others Not (Score:3, Insightful)
For example, the part about "Entered versus stored values" is certainly valid (though I wonder if that's not a problem with Excel itself, and not the format). The complaint about the date format is also on the money.
However, other things seem either wrong or have a bias towards hand editing of the files, e.g. "International, but US English first and foremost". He complains that it uses U.S. English settings. He may not like the U.S., but it's called picking a canonicalized format. Consider the alternative for implementing this in software, parsing of the values in the XML would now depend on settings also found in the XML. That would be insane.
Re:Some Points Are Valid, Others Not (Score:3, Insightful)
XML provides an application independent way of sharing data. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.
A lot of forums are emerging to define standard DTDs for almost everything in the areas of data exchange. Take a look at: CommerceNet's XML exchange and http://www.schema.net./ [www.schema.net]
Re:Can anyone repro? (Score:5, Insightful)
One of the advantages of UTF-8 for text files is that you don't need a BOM. With XML it's even easier because, as you point out, the XML declaration ("XMLDecl" in the spec) header can contain the "EncodingDecl" to tell explicitly you the file is in UTF-8. If the EncodingDecl says UTF-8, and the file is encoded in UTF-8, then if an XML parser cannot handle that, it's seriously fucked an needs to be fixed.
You might also want to go read STD-63 at some point. It points out that there are a few problems with using BOMs in UTF-8, and that if there is a way for UTF-8 to be determined in a way other than with the use of a BOM, that should be used instead. Given that XML specifically includes support for an "EncodingDecl" in the "XMLDecl", it is clear that best practices dictate that you *shouldn't* use a BOM when working with UTF-8 encoded XML files. Even if your tools _insist_ on writing BOMs to such files, they had *better* still be able to work if the BOM is missing.
Heck, with OOXML, you could also use the ZIP's manifest file to keep track of file metadata like the character encoding.
Re:Personally.. (Score:4, Insightful)
Yes.
Didn't you read the original article? Haven't you been following the OOXML story at all? There is every evidence that Microsoft has not changed, and works hard to pervert standards and processes to favor their platform over any other. Not just here, but in other areas, as well. Name one major Microsoft product that follows open, published standards without proprietary deviation. Just one. I dare you.
Also important to note, Bill Gates isn't running MS anymore.
No. Ballmer is. Bill Gates is a very smart guy (in business, at least). Ballmer is vicious, and even more cold-blooded than Gates (if that can be possible). And the corporation idolizes Gates. His influence will remain long after he's completely retired from the company.
Call me a cynic (Score:3, Insightful)
OOXML will be voted in as an ISO standard.
Third party vender's trying to implement the "standard" will waste time, money and effort and accomplish nothing of import.
MS will continue as normal, claiming support for open standards while locking anyone they can into formats/software they own.
ODF will continue as a marginalized format used by people on the "fringe".
Re:This is not proof of OOXML being defective by d (Score:3, Insightful)
No, I don't think so. It will serve Microsoft's purposes better if they too cannot properly implement the OOXML standard. Then their fully proprietary file formats would continue to be used since no one could trust that an OOXML document hasn't been corrupted by the OOXML save process.
This is how Microsoft destroyed the nascent RTF standard that the US Navy wanted to use: they implemented it, but gee there were problems in getting it to work right so maybe all you sailor boys should use Word's native file formats until we get things worked out (which never happened).
Windows just don't belong on a battleship or aircraft carrier. You would have thought the US Navy would have known that, but no, they had to go and try it anyway.
Re: Brilliant Strategies (Score:4, Insightful)
(Scene at office)
ComputerGuy: "Sure, let's open that with GoogleApps."
Colleague: "Why am I getting a catastrophic failure? Maybe I better use Excel."
Re:This is not proof of OOXML being defective by d (Score:4, Insightful)
Yep. Brilliant, isn't it. Given a horribly complex and incomplete specification, Microsoft can easily blame any problems on the other tools -- and they can do this with a straight face because they'll be right! (Quietly ignoring the fact that their own tool produces non-compliant OOXML). Even better, they can smugly point out how their tools fix the "errors" caused by other crappy tools, even as the text of their messages frighten users away from trying any tool that doesn't come from Microsoft ("catastrophic failure", no less!).
If MS weren't trying to pull a fast one, they'd have designed a more reasonable format, one that does make it practical to make small edits to the XML and expect reasonable results or, even better, used an existing standard like ODF. If ODF can't fully represent all facets of Office documents, the format has a well-defined technical and procedural path to add any necessary extensions.
By way of comparison, try the same series of experiments with a .ods document, using any of the handful of available applications that supports it, and you'll quickly see how a format that is designed to be straightforward, accessible and specifiable in less than 500 pages compares to the brilliantly-executed monstrosity that is OOXML.
I see Miguel has flown in fast to defend Microsoft (Score:1, Insightful)
Otherwise, horrible things could happen, like ODF could be used instead, or it could be extended to include stuff in OOXML and then the world would have one unified standard, instead of two of them even experts can't use that are not interoperable. We couldn't have that.
So the entire FOSS world wishes to thank Miguel for helping Microsoft keep its users locked in. Hey, man, what kind of game are you playing?
Re: US English not "canonical" (Score:5, Insightful)
I don't think you intended it that way, but you should be aware of the vast number of people you just insulted. US English and US dates are only "canonical" in the minds of US citizens. If not for Microsoft purposely and determinedly screwing up the implementation of anything but US standards in their software the usage would have no traction at all.
The majority of the "English speaking" world still uses the English language and English formats and standards, not US variant ones. The fact that the USA has seen fit to re-invent English, still refer to that as English, and then foist it on the rest of the world doesn't make it "canonical."
As the author of this article so aptly describes, date formats and language implementations are a multi-stage nightmare in Office. To the point that the majority of users even in English speaking countries like Canada, Australia, New Zealand and the UK itself, often end up using American English and American dates simply because Office is the only game in town and you cna only bash your head against the wall on these things for so long. That doesn't make it right, and that doesn't mean that those users wouldn't be happier and more productive if they were not forced to use a US standard when they may have not even traveled to the US.
Any kind of English except the US variant, is severely broken in Office and always has been. Your answer sounds to me a lot like: "So what, they should all be using our standards and language anyway." Not helpful at all, and illogical as well.
Re:Are you sure of that? (Score:3, Insightful)
Then we could judge if his example is reasonable or not. I realize we could all do this ourselves, but I for one am not going to go out and buy Excel 2007 just to do that!
Not so much. (Score:3, Insightful)
Instead, he has to go update all the reference and dependency information, which programs have to generate and update all the time anyway. I can't really think of a good reason this information needs to be saved to disk, and I certainly can't think of a good reason that Excel deletes the cell, rather than updating the dependencies itself to reflect the physical document.
In fact, I can't think of a good reason to store the value alongside the formula, except as an optional cache, which a program can recalculate if needed.
They are using XML in the first place. The point of XML is interoperability and human-readability/editability, not performance.
Re:This is not proof of OOXML being defective by d (Score:5, Insightful)
> and all others must (attempt to) conform to the behavior of that implementation or
> be judged defective.
It's worse than that. Since MS defines a number of aspects of the specification solely
in terms of compliance with MS application software, the MS implementation is not only
the -defacto- standard, but the very explicit standard. Not only can no one conform
to a sufficient level to be judged compliant in the marketplace, for all contractual
specifications, -nothing- but MS software can -ever- be 100% compliant.
This means on big, contract driven projects, such as many government projects, MS
and vendors using MS tools are effectively the only possible competitors, unless
the contracts and specifications specifically waive vendor compliance with those
parts of the spec.
And I strongly doubt anyone would ever write a contract like that.