Forgot your password?
typodupeerror
Microsoft

Stephane Rodriguez Dismantles Open XML 188

Posted by kdawson
from the some-kind-of-joke dept.
Elektroschock writes "Stephane Rodriguez, a reengineering specialist who became popular for his article on MS Office 2007 binary data, now comprehensively debunks Microsoft's new Open XML format. With small case studies he demonstrates the impossible challenges third-party developers will face. His conclusion: it is 'defective by design.' Next week members of the International Standard Organization are likely to approve the format as a second official ISO standard for office documents, even though most nations have submitted comments. Rodriguez claims he is 'not affiliated to any pro-MS or anti-MS party/org[anization]/ass[ociation].'"
This discussion has been archived. No new comments can be posted.

Stephane Rodriguez Dismantles Open XML

Comments Filter:
  • by tsa (15680) on Sunday August 26, 2007 @08:13AM (#20361263) Homepage
    This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly.
    • by darkatom (94914) on Sunday August 26, 2007 @08:26AM (#20361313)
      But that's still a problem. Microsoft's implementation becomes the de facto standard and all others must (attempt to) conform to the behavior of that implementation or be judged defective. This is what happened when MS published the MAPI (Mail API) spec and then released an implementation alongside it. Lotus and others could never fully mimic what the MS implementation did, so they eventually languished.
      • by GIL_Dude (850471)
        I seem to recall Lotus didn't like MAPI and wanted to push their own API called VIM? (http://en.wikipedia.org/wiki/Vendor_Independent_M essaging).
        • by Gription (1006467)
          Seeing that MS controls the platform and the platform calls MAPI it seems like a silly battle to fight.

          To go back and reiterate darkatom's comment: Microsoft has always taken 'standards' and extended them to break everyone else's version except theirs. Nothing has ever stopped them except a court order (like JAVA, maybe...) but if they don't dominate and control they always try to take their ball and go home. ("I'll see your JAVA and raise you an Active-X (who cares if it makes using the web uncontrollabl
      • Re: (Score:3, Insightful)

        by Anonymous Coward

        Microsoft's implementation becomes the de facto standard

        No, I don't think so. It will serve Microsoft's purposes better if they too cannot properly implement the OOXML standard. Then their fully proprietary file formats would continue to be used since no one could trust that an OOXML document hasn't been corrupted by the OOXML save process.

        This is how Microsoft destroyed the nascent RTF standard that the US Navy wanted to use: they implemented it, but gee there were problems in getting it to work right so maybe all you sailor boys should use Word's native fi

      • Re: (Score:3, Interesting)

        by tsa (15680)
        But that's still a problem. Microsoft's implementation becomes the de facto standard and all others must (attempt to) conform to the behavior of that implementation or be judged defective.

        I wonder what happens if OOXML is not voted a standard. Will MS simply discard it, and embrace ODF, or will they continue to use .doc as if nothing happened? I guess they will do the latter since it's the most economical option for them. If that happens I'm curious what the EU will think of that, and how long it will take
      • by QuestorTapes (663783) on Sunday August 26, 2007 @06:34PM (#20365967)
        > But that's still a problem. Microsoft's implementation becomes the de facto standard
        > and all others must (attempt to) conform to the behavior of that implementation or
        > be judged defective.

        It's worse than that. Since MS defines a number of aspects of the specification solely
        in terms of compliance with MS application software, the MS implementation is not only
        the -defacto- standard, but the very explicit standard. Not only can no one conform
        to a sufficient level to be judged compliant in the marketplace, for all contractual
        specifications, -nothing- but MS software can -ever- be 100% compliant.

        This means on big, contract driven projects, such as many government projects, MS
        and vendors using MS tools are effectively the only possible competitors, unless
        the contracts and specifications specifically waive vendor compliance with those
        parts of the spec.

        And I strongly doubt anyone would ever write a contract like that.

      • "Microsoft's implementation becomes the de facto standard and all others must (attempt to) conform to the behavior of that implementation..."

        Didn't Java have a reference standard?

        Two vendors can't even implement HTML to render the same results from a given set of pages, since default fonts, sizes, margins, padding, and so on for many elements are implementation dependent.

        Just seems like another excuse (not that we need one) to bash MS...
        • by bentcd (690786)

          Didn't Java have a reference standard?

          I believe you were able to get the source code for that though (under a "research" license or somesuch).

          The real question, though, is how willing the vendor is to fix bugs in the reference implementation. If the vendor has intentionally made the standard incompatible with the reference implementation and/or is unwilling to bring the reference in line with the standard (or vice versa) and/or is unwilling to tell you what it is, exactly, that the reference is doing/expecting then you're out of luck. If, on t

    • by Anonymous Coward on Sunday August 26, 2007 @08:29AM (#20361327)
      "by design" is of course about motivation which we can know in OOXML from emails, quotes, obtuse or brittle design, and lack of specification.

      The document contains all of these. I suggest that you read it.

      By the way -- there's newly discovered undocumented Microsoft tech present in OOXML, such as SSPI ("Security Service Provider Interface") which is a proprietary Microsoft developed protocol for security providers, and OLE ("Object Linking and Embedding") which is for embedding (eg, taking an Excel spreadsheet and putting it into a Word document). This is undefined in OOXML only available on Microsoft Windows.
      • Re: (Score:2, Troll)

        by man_of_mr_e (217855)
        Dude. You do realize that OpenOffice also has OLE and SSPI support, right? These are platform specific features, and any office product on Windows has to support them, or they won't be very popular.

        You're not coming up with some kind of revelation. It's more of a "Duh, no shit sherlock".
        • There's a big difference between supporting platform specific IPC mechanisms in an application and integrating those mechanisms into file format that you claim can be implemented on other platforms.

      • Re: (Score:3, Informative)

        OO.o also support OLE.
        Also, Mac Office supports OLE as well, so it's not "Windows-only".

        And you claime that OLE is "newly discovered"? It's been around for over 13 years, and was present in the very first OOXML specs.

        I don't know about SSPI, but given that your OLE knowledge is so woeful, I feel safe in assuming that your SSPI complaint is FUD as well.
    • by bomanbot (980297) on Sunday August 26, 2007 @08:29AM (#20361331)

      This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly

      Um, isnt the fact that not even Microsofts own software can handle OOXML which btw. is designed by Microsoft themselves, proof enough that something is seriously wrong with the design of OOXML?

      I mean if not even the maker of OOXML can get it to work properly in its own products, how are third parties supposed to do it? And if no one is able to implement OOXML correctly, what is this "standard" good for besides being a great smoke-and-mirrors tactic by Microsoft themselves?
      • by setagllib (753300) on Sunday August 26, 2007 @08:33AM (#20361349)
        It's deliberate. The standard is just a distraction, to keep competitors busy trying to implement it, while documents are actually being created in the Office 2007 variant of OOXML. A few months of legacy almost guarantees a transition to the real OOXML would be an uphill battle, especially with no real documentation of how *either* format works. So even with a supposed 'standard' and a near-enough implementation, the vendor lockin is just as strong as it was with the binary formats.
        • by Sweetshark (696449) on Sunday August 26, 2007 @08:51AM (#20361441)
          This "OpenXML" stunt is just a smokescreen covering Microsofts controlled retreat in the office format battle. It only needs to keep parties distracted until Microsoft has reclaimed the control over business content by means of vendor lockin v2.0 aka Microsoft Office Sharepoint Server.

          http://weblog.infoworld.com/openresource/archives/ 2007/04/while_you_were.html [infoworld.com]
          http://www.itbusinessedge.com/blogs/mia/?p=198 [itbusinessedge.com]
          • by Danse (1026)

            This "OpenXML" stunt is just a smokescreen covering Microsofts controlled retreat in the office format battle. It only needs to keep parties distracted until Microsoft has reclaimed the control over business content by means of vendor lockin v2.0 aka Microsoft Office Sharepoint Server.

            The problem is that while they note that Sharepoint leads to lock-in, and they suggest open source as an alternative, they don't mention anything specifically. That means there's no OSS equivalent or competitor to Sharepoint? Is it a roll-your-own situation? That can get really expensive too. Not every organization has the ability to fund that kind of development, with its associated risks, in the time-frame they have to get a solution in place. Unless someone comes up with an alternative that can fun

        • Re: (Score:3, Informative)

          by zlogic (892404)
          Still it's better than the original DOC format.
          A DOC is actually a FAT12-like filesystem (called OLE) that has files and clusters. Clusters can be lost and files can be fragmented. One of the files is the document's text; it's not plaintext but rather another obscure binary format, with text chunks seperated by some kind of metadata (my brain nearly exploded when trying to understand how to separate text from the metadata and I gave up). Images, videos and embedded objects are stored as separate files in th
      • Re: (Score:3, Funny)

        by man_of_mr_e (217855)
        No. What it means is that Office has so much legacy code that they can't rewrite it all to be conformant. Think of OOXML as a target that MS feels they can eventually meet with office, not necessarily what office will actually meet today. After all, much was changed in OOXML after Office 2007 went to bed. One would expect the next version of Office to be much closer to the spec, since they will have had a full design cycle to conform to it.
    • OOXML is a theoretically perfect standard that just happens to have no implementations whatsoever.
    • You are correct.
      That's why the title says "Microsoft Office XML Formats? Defective by design"
      not "OOXML defective by design"
      He is dissing the Microsofts claims of transparency and openness of Microsoft Office XML
    • by canuck57 (662392) on Sunday August 26, 2007 @09:06AM (#20361509)

      This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly.

      OK, lets have MS have their choice either way on this one.

      If their office tools work well but are not using the OOXML spec, they must be using some other spec, perhaps MOOXML. In which case they are not OOXML compliant.

      On the other hand, if they want to be OOXML compliant then I guess Redmond programmer can't read their own spec and thus are having problems being compliant.

      Either way, and for whatever reason Microsoft is not compliant with their own spec. Shall we call this MOOXML? And while I have only read a part of the spec, it is far too "undefined" and thus ambiguous to be reliable used by itself. A standard needs to be defined enough, that 2 or more parties could take the standard document specifications, run off and program it from scratch. And have a reasonable chance that their code will inter operate on the same data sets.

      Trouble is, if Microsoft cannot do that, how is anyone else?

      But might I submit, Microsoft wrote office and then wrote the spec. A poster child of why you think about and write the spec before the software is a good practice.

      • Did it ever occur to you that the Office 2007 was finished before the OOXML spec was? Remember, there were many changes in ECMA comittee long after Office 2007 was finalized.
        • by orcrist (16312) on Sunday August 26, 2007 @03:55PM (#20364515)

          Did it ever occur to you that the Office 2007 was finished before the OOXML spec was? Remember, there were many changes in ECMA comittee long after Office 2007 was finalized.


          My guess is, yes, it occurred to the poster you were responding to, since I highly doubt that when he wrote exactly that, it was in his sleep. Did it occur to you that reading his post all the way to the end might have resulted in slightly less of your foot being inserted into your mouth? ;-)
        • by bigpat (158134)
          Yes, and Microsoft has every right to do whatever the hell they want to do with their own damn proprietary OOXML format.

          I believe it is the part where Microsoft is pretending that OOXML is an open standard and pretending that it is what is being implemented in Office 2007 that people are calling MS on.
    • This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly.

      If Office can't read OOXML files produced by other tools, and other tools can't read Office OOXML files, where do you suppose end users will place the blame?

      And what do you suppose users will do when faced with incompatibilities?

      It's a brilliant strategy: Define a new "standard" but don't quite implement it yourself, ensuring that no one can implement a competitive office suite that is compatible with yours. Further, make the standard complex and weird enough that you can always blame inconsistencies on the other implementations. Voila! You get to proclaim to the world that your de facto standard office suite supports an open, ISO-blessed international standard format -- but with no worries about losing your lock-in.

      • by TaoPhoenix (980487) <TaoPhoenix@yahoo.com> on Sunday August 26, 2007 @01:02PM (#20363059) Journal
        Don't forget the delicious language. Instead of the legendary "syntax error", we now get a "catastrophic failure". Do it yourself FUD!

        (Scene at office)
        ComputerGuy: "Sure, let's open that with GoogleApps."
        Colleague: "Why am I getting a catastrophic failure? Maybe I better use Excel."

      • by miguel (7116)

        This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly.

        If Office can't read OOXML files produced by other tools, and other tools can't read Office OOXML files, where do you suppose end users will place the blame?

        And what do you suppose users will do when faced with incompatibilities?

        but Office can read OOXML files produced by other tools; You just have to generate proper files.

        As its pointed out in this thread:

        http://blogs.msd [msdn.com]

        • In addition, Excel happens to recover nicely from the lack of data that Stephane complains so loudly about, you just happen to get a warning if the file you feed it happens to be incorrectly formed and even offers you an option to "repair" it.

          Yep. Brilliant, isn't it. Given a horribly complex and incomplete specification, Microsoft can easily blame any problems on the other tools -- and they can do this with a straight face because they'll be right! (Quietly ignoring the fact that their own tool produces non-compliant OOXML). Even better, they can smugly point out how their tools fix the "errors" caused by other crappy tools, even as the text of their messages frighten users away from trying any tool that doesn't come from Microsoft ("catastrophic failure", no less!).

          If MS weren't trying to pull a fast one, they'd have designed a more reasonable format, one that does make it practical to make small edits to the XML and expect reasonable results or, even better, used an existing standard like ODF. If ODF can't fully represent all facets of Office documents, the format has a well-defined technical and procedural path to add any necessary extensions.

          By way of comparison, try the same series of experiments with a .ods document, using any of the handful of available applications that supports it, and you'll quickly see how a format that is designed to be straightforward, accessible and specifiable in less than 500 pages compares to the brilliantly-executed monstrosity that is OOXML.

    • The examples given by Rodriguez do indeed only prove that Microsoft's implementation sucks. Parent's assertion is correct.

      On the other hand, a rather lengthy list of objections against the standard itself can be found here:
      http://www.grokdoc.net/index.php/EOOXML_objections [grokdoc.net]

      So it seems that both the standard and its implementation suck ;-)
      • Some of the complaints only indicate that MS sucks at implementing it's own standard.
        Other complaints are with the format itself, such as numerous different ways of marking up the same thing; dependencies hidden in various files instead of listed up front (forcing a parsing of the entire zip file to make a trivial change); inclusion of proprietary, undocumented, or partially documented parts, like VML; including assinine legacy structure, like the way dates are improperly stored, and on and on.
      • by bwt (68845)
        Microsoft's implementation sucks

        What implementation is that? They don't have an implementation. OOXML is a sham. MS Office does not, nor will it ever, implement OOXML. The existence of OOXML, which offers rivals to many existing standards (like ISO 8601, 639, 8632, 26300, 10118-3 and W3C SVG, MathML, XML-ENC, and SMIL) is justified by the backwards compatability argument under the pretense of helping Microsoft to document its existing document format, which it doesn't. There are two valid paths here: create
    • It should also be pointed out that many of his complaints would require application specific extensions in ODF as well. i.e. ODF doesn't define a way to encrypt documents, or store filesystem metadata. Where he talks about calculation chains and other aspects that have no equivelents in current ODF documents because of a lack of spreadsheet formula definition, etc...

      Basically, many of his arguments could be said about ODF (though not all), since ODF doesn't provide a standard way to do those things, they
      • by Bert64 (520050)
        I always figured that, since an ODF file is basically zipped, an encrypted ODF is just an encrypted ZIP.
      • It should also be pointed out that many of his complaints would require application specific extensions in ODF as well. i.e. ODF doesn't define a way to encrypt documents,

        I remain to be convinced that encryption is a useful concept at the application level. (Note : this is a challenge ; convince me!)

        • For encryption to be useful as a protection against e.g. laptop theft, then it needs to be active at the operating system or platform level, so that all data in non-volatile storage is encrypted. User's can't
        • All-or-nothing thinking has its place in computer security, but it isn't the only kind of security there is. Consider the case of clerical workers who routinely handle data that can be used to commit petty fraud. A certain amount of trust is necessary, and a certain amount of dishonesty is inevitable. The security situation here is not an all-or-nothing one: it is bad for the company when a clerk pulls off a petty scam, but it is bad in a limited, survivable way. It is also impossible to have perfect s
    • Disingeneous (Score:5, Informative)

      by golodh (893453) on Sunday August 26, 2007 @03:52PM (#20364489)
      I see three questions here:

      -Q(1) What does Rodriguez's article show?

      -Q(2) is OOXML in and by itself flawed?

      -Q(3) What's the practical relevance of the question whether OOXML is flawed?

      -Q(4) So what's in it for Microsoft? Why do they bother?

      -

      - Q(1) : What does Rodriguez's article show?

      - A(1) : Rodriguez's article show that the OOXML format written by latest Microsoft Office applications, among them MS Excel, is:

      - sorely defective in that you can't be sure to get your original data back after saving it to OOXML

      - impossible to change outside MS Office applications

      - tied to the MS Office way of representing internationalised versions of documents because "of the way Microsoft chose to store XML using the US English locale, no matter how good your implementation is, you have to retrofit it to work just like Office does" in order to accommodate internationalised documents

      - MS Office legacy formats supported throughout, greatly (and unnecessarily) contributing to the size and complexity of the 6,000 page standard.

      - Q(2): Is OOXML flawed in and by itself?

      - A(2):Yes, I think so, partly because of Rodriguez's article, partly because of flaws documented elsewhere: see http://www.noooxml.org/petition [noooxml.org] The points 2,3,4,5 listed there seem especially crippling to me:

      (2) There is no provable implementation of the OOXML specification: Microsoft Office 2007 produces a special version of OOXML, not a file format which complies with the OOXML specification;

      (3) There is information missing from the specification document, for example how to do a autoSpaceLikeWord95 or useWord97LineBreakRules;

      (4) More than 10% of the examples mentioned in the proposed standard do not validate as XML;

      (5) There is no guarantee that anybody can write software that fully or partially implements the OOXML specification without being liable to patent lawsuits or patent license fees by Microsoft;

      - Q(3): What's the practical relevance of the question whether OOXML is flawed?

      - A(3): Enormous. We currently see that Microsoft is trying to convince the world to accepted OOXML as an ISO "standard", whereas it's no such thing. It's too loosely defined, and opposed to the existing Opendoc standard there is no open-source reference implementation. So there will be a morass of possible implementations, of which only Microsoft's own implementations will be guaranteed mutually compatible. That's a polite way of saying that Microsoft simply aims at continuing its format lock-in, only this time the under the name of OOXML.

      - Q(4) : So what's in it for Microsoft? Why do they bother?

      - A(4) : Well ... Microsoft has a policy whereby it quite explicitly does not want other people's software, let alone Open Source software, to render MS Office documents correctly.

      For reference, see this email, (cited from Rodriguez's article):

      From: Bill Gates

      Sent: Saturday, December 5 1998

      To: Bob Muglia, Jon DeVann, Steven Sinofsky

      Subject : Office rendering

      One thing we have got to change in our strategy - allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.

      We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.

      Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows.

      I would be glad to explain at a greater length.

      Likewise this love of DAV in Office/Exchange is a huge problem. I would also like to make sure people understand this as well.

      Is that

    • It only shows that apparently MS's software isn't able to handle OOXML properly.

      The question is whether they have any intention of supporting it "properly".

      I say the answer is a big "no". Their XML is just a thin ASCII veneer applied to their existing format.

      The only reason for making OOXML it was political, they never had any intention of it being useful to anybody except Microsoft.

      Users of OOXML will be just as locked in to Office as if they kept right on using the old binary format.
  • Personally.. (Score:5, Interesting)

    by nrgy (835451) on Sunday August 26, 2007 @08:42AM (#20361391) Homepage
    Personally I like this link [slated.org] (pdf) in the ariticle.

    From: Bill Gates
    Sent: Saturday, December 5 1998
    To: Bob Muglia, Jon DeVann, Steven Sinofsky
    Subject : Office rendering

    One thing we have got to change in our strategy - allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.

    We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.

    Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows.

    I would be glad to explain at a greater length.

    Likewise this love of DAV in Office/Exchange is a huge problem. I would also like to make sure people understand this as well.

    I'm not saying this as some linux nut job but its things like that which just drive me nuts. Regardless of which ever os I prefer that kind of thinking just boils my blood.

    How can any committee deciding on open standards seriously take a company which has been proven time and time again to play by its own rules and whenever it offers something labeled OPEN its about as open as the doors to Fort Knock are to the average person.

    • by dpilot (134227)
      >How can any committee deciding on open standards seriously take a company which has been
      >proven time and time again to play by its own rules and whenever it offers something labeled
      >OPEN its about as open as the doors to Fort Knock are to the average person.

      Plain and simple, arm twisting and blackmail, though both are no doubt couched in far more polite and legal-sounding terms. Microsoft-apology has become the dominant counter-culture on Slashdot of recent. But the fact remains that in spite of
    • Yeah, that mails explains want Sharepoint is really for - and why it is part of the Office line.
  • I don't believe OOXML should be a standard, but it seems to me to be pretty nit-picking to complain that numeric values are stored with "rounding errors" since that is inherent in converting between ASCII values and any binary format, including IEEE-standard floats. How does ODF handle this? It explicitly defines how the conversions are to be done? Or it caches the string the user typed?

    Other than that, most of the other stuff he talks about is rather damning.
    • by The New Andy (873493) on Sunday August 26, 2007 @09:30AM (#20361625) Homepage Journal
      The relevant code from an ODF spreadsheet:

      <table:table-row table:style-name="ro1">
      <table:table-cell/>
      &#8722;
              <table:table-cell office:value-type="float" office:value="123456.123456789">
      <text:p>123456.12</text:p>
      </table:table-cell>
      </table:table-row>
      • Separating the value and the display solves the problem. As long as the value stored is preserved, other programs can work with it without introducing arbitrary changes. That M$ does not store the exact value and relies on the reader to make the same rounding error is crazy. It's a trap for every system that is not M$, and might not even work across different processors for M$.

        I've run into this problem in my own work, where it did not matter. A data acquisition system I used required Winblows. It cou

    • Re: (Score:3, Interesting)

      by putaro (235078)
      No, this is a pretty reasonable thing to point out. It wasn't a value that was undisplayed. When you look at the cell it shows it (in decimal) as 1234.1234 (without the cell rounding). So it shows you that on the screen but doesn't store it properly in the XML file. I would say it's a problem. If it were stored as a binary floating point number in the XML I'd say you might have a point, but if it's displayed on the screen in decimal and then the decimal value in the file is different, that's pretty bro
      • by epine (68316)

        Did I just eat too many syrup coated waffles? He's telling me the rounding error is 10^-4 or 10^-5 on values with more trailing nines than I can count between sugary blinks. Not long ago I came across a slide presentation from a HEP lab concerning C++0x with a slide proclaiming that decimal floating point in hardware was the wave of the future. Now while I don't see any numerical advantage to this change, it will probably reduce the number of floating point gurus who gouge their own eyes out after rubbi
    • by jbengt (874751)
      When you use MSExcel, you type in decimal numbers, represented by ASCII (ANSI?) characters.
      You expect to get that stored exactly in the ANSI characters of the XML file.

      And you can store IEEE floating point numbers exactly using ASCII characters.
      (after all, you can code binary as a series of ASCII "0"s and "1"s)
    • by HeroreV (869368)
      Only an extremely poor programmer could not understand that it's possible to represent and work with numbers of arbitrary complexity. IEEE floats are fast and easy, but that doesn't mean it's impossible to represent a number that can't be fit into an IEEE float.

      We've had lots of well tested relatively fast bignum libraries for years. Introducing rounding errors in a spreadsheet without being explicitly told by the user that such errors are allowed is absolutely unacceptable.
    • by Eivind (15695)
      That wasn't the problem. The problem is the numbers are stored with rounding-errors *AND* Excel contains some undocumented method of consistently correcting this and display the number as originally entered.

      This method is not documented in the standard. Thus *other* programs that want to read Excel-files have to resort to guesswork to do a very basic thing that Excel does: Display a number that was entered by the user, the way the user entered it.

      This means if we both get sent a valid OOXML-document, and yo
  • Can anyone repro? (Score:2, Interesting)

    by figleaf (672550)
    I tried to repeat the cell changes experiment but I do not see the Excel error.

    I bet Mr. Stephane is not saving the sheel xml in utf-8.
    The header of the xml file says its utf-8, but he might be saving it without the UTF-8 BOM header.

    • by gardyloo (512791) on Sunday August 26, 2007 @09:22AM (#20361583)
      Interesting experiment. However, I suggest you do not title your posts "Can anyone repro?" on Slashdot. The answers you get may be, well, .... exciting and very, very scary.
    • Re:Can anyone repro? (Score:4, Informative)

      by YA_Python_dev (885173) on Sunday August 26, 2007 @10:49AM (#20362049) Journal

      The header of the xml file says its utf-8, but he might be saving it without the UTF-8 BOM header.

      So? It's still perfectly valid XML even without the BOM. XML it's a real standard and I suggest you read it, it's not Notepad.

      And don't even start talking about malformed UTF-8 since he only used characters in the ASCII subset, so even saving it as Latin-1 would have generated valid XML.

    • by Karellen (104380) on Sunday August 26, 2007 @11:00AM (#20362113) Homepage
      Uh, UTF-8 files do not need a BOM. What the fuck is the point of a byte-order-mark on an encoding that is byte-order neutral?

      One of the advantages of UTF-8 for text files is that you don't need a BOM. With XML it's even easier because, as you point out, the XML declaration ("XMLDecl" in the spec) header can contain the "EncodingDecl" to tell explicitly you the file is in UTF-8. If the EncodingDecl says UTF-8, and the file is encoded in UTF-8, then if an XML parser cannot handle that, it's seriously fucked an needs to be fixed.

      You might also want to go read STD-63 at some point. It points out that there are a few problems with using BOMs in UTF-8, and that if there is a way for UTF-8 to be determined in a way other than with the use of a BOM, that should be used instead. Given that XML specifically includes support for an "EncodingDecl" in the "XMLDecl", it is clear that best practices dictate that you *shouldn't* use a BOM when working with UTF-8 encoded XML files. Even if your tools _insist_ on writing BOMs to such files, they had *better* still be able to work if the BOM is missing.

      Heck, with OOXML, you could also use the ZIP's manifest file to keep track of file metadata like the character encoding.
      • by HeroreV (869368)

        What the fuck is the point of a byte-order-mark on an encoding that is byte-order neutral?

        So you know what the character encoding is, just like STD 63 aka RFC 3629 explains. It's not a very good method of specifying the character encoding, but it's better than not saying anything at all, which most text files do. Whenever there's another method, as there is in XML, you should use it (without the BOM), but there often is no other way to specify the character encoding.

        I remember a time when I was working on a UTF-8 text file with lots of characters that weren't ASCII-compatible. One day I opened

  • Well, I suppose that's an improvement over Vista, "defective by nature." I can just imagine Bill Gates stamping his foot and crying. "Defective? I meant to do that!"
  • by SwashbucklingCowboy (727629) on Sunday August 26, 2007 @10:21AM (#20361871)

    For example, the part about "Entered versus stored values" is certainly valid (though I wonder if that's not a problem with Excel itself, and not the format). The complaint about the date format is also on the money.

    However, other things seem either wrong or have a bias towards hand editing of the files, e.g. "International, but US English first and foremost". He complains that it uses U.S. English settings. He may not like the U.S., but it's called picking a canonicalized format. Consider the alternative for implementing this in software, parsing of the values in the XML would now depend on settings also found in the XML. That would be insane.

    • Re: (Score:3, Insightful)

      by kabz (770151)

      He may not like the U.S., but it's called picking a canonicalized format. Consider the alternative for implementing this in software, parsing of the values in the XML would now depend on settings also found in the XML. That would be insane.

      Here's a reference to XML DTDs [w3schools.com]. This is exactly what should be used to defining localized formula names etc. With XML, you might not be able to do much with it, but given a 'real', properly defined XML format, it should at *least* be possible to parse all the information in the damn thing!!

      Why use a DTD?

      XML provides an application independent way of sharing data. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.

      A lot of forums are emerging to define standard DTDs for almost everything in the areas of data exchange. Take a look at: CommerceNet's XML exchange and http://www.schema.net./ [www.schema.net]

      Where is a DTD referenced? That's right, at the top of the XML file.

    • by Jeremy_Bee (1064620) on Sunday August 26, 2007 @04:49PM (#20364991)

      However, other things seem either wrong or have a bias towards hand editing of the files, e.g. "International, but US English first and foremost". He complains that it uses U.S. English settings. He may not like the U.S., but it's called picking a canonicalized format.
      This is offensive bull.

      I don't think you intended it that way, but you should be aware of the vast number of people you just insulted. US English and US dates are only "canonical" in the minds of US citizens. If not for Microsoft purposely and determinedly screwing up the implementation of anything but US standards in their software the usage would have no traction at all.

      The majority of the "English speaking" world still uses the English language and English formats and standards, not US variant ones. The fact that the USA has seen fit to re-invent English, still refer to that as English, and then foist it on the rest of the world doesn't make it "canonical."

      As the author of this article so aptly describes, date formats and language implementations are a multi-stage nightmare in Office. To the point that the majority of users even in English speaking countries like Canada, Australia, New Zealand and the UK itself, often end up using American English and American dates simply because Office is the only game in town and you cna only bash your head against the wall on these things for so long. That doesn't make it right, and that doesn't mean that those users wouldn't be happier and more productive if they were not forced to use a US standard when they may have not even traveled to the US.

      Any kind of English except the US variant, is severely broken in Office and always has been. Your answer sounds to me a lot like: "So what, they should all be using our standards and language anyway." Not helpful at all, and illogical as well.
      • This is offensive bull. I don't think you intended it that way, but you should be aware of the vast number of people you just insulted.

        As a manager of mine once said, sometimes people TAKE offense. They take it where none was given. Sometimes people are looking to be offended - I suspect that's true in your case.

        A CANONICAL format is generally preferred for storing data (e.g. storing time in GMT and then adjusting for local time). MS picked U.S. English as the canonical format for OOXML. They could

    • by drew (2081)
      Actually, the entered vs. stored values seemed like the least compelling part of the link to me. It sounds like it's probably just a matter of using IEEE floats to store values. Unless the spec never defines how floating point values are to be stored, or the implementation differs from what is described, this appears to me to be a red herring. In either case, we need more information, as I am not particularly interested in digging through the spec myself.
  • Foresight (Score:4, Informative)

    by akaiONE (467100) on Sunday August 26, 2007 @11:53AM (#20362489) Homepage Journal
    "..Next week members of the International Standard Organization are likely to approve the format as a second official ISO standard for office documents.."

    Err.. Next week news called, they want their draft story back.

    There is no certain outcome of next weeks vote; and the fact that we even are discussing the defects of OOXML are proof that the ISO body will have much problems just waiving this through. Please refrain from taking sides just because this is an 'Microsoft-standard'.

    I'd say it's possible that OOXML will NOT be approved next week. It will probably have to take the long road through the ISO as a real standard proposal, not just a fast-tracked 6000 page gorilla.
    • Microsoft is manipulating many members of the ISO technical committee into voting 'yes', and even rigging the voting process in several countries.

      The voting process is 'defective by design' as votes from each country must be unanimous. Microsoft is a member in most (if not all) countries and will always vote 'yes'. This means that the vote can only be 'yes' or (when no unanimous vote can be reached) 'abstain'. All other votes will be declared invalid, and only the 'yes' votes count. Still believe the outcom
  • Call me a cynic (Score:3, Insightful)

    by PinkyGigglebrain (730753) on Sunday August 26, 2007 @12:06PM (#20362577)
    I already know how this is going to turn out.

    OOXML will be voted in as an ISO standard.

    Third party vender's trying to implement the "standard" will waste time, money and effort and accomplish nothing of import.

    MS will continue as normal, claiming support for open standards while locking anyone they can into formats/software they own.

    ODF will continue as a marginalized format used by people on the "fringe".
    • I've seen this show before. A horrible standard (XML or otherwise) can have repurcussions that transcend the evil acts of one company- but it can also backfire on them simultaneously. That company may have an edge if the format takes off, but usually, the way it ends up is with that company supporting its proprietary format under two names: the old proprietary name, and a "public" name. Same crap, new umbrella.

      I've been burned by a poorly written XML specification before, which was tightly coupled to one ve
  • What his first item seems to come down to, when you put all the parts together, is that if you want to change data in a spreadsheet that contains formulas that reference other cells, you have to write a program that understands spreadsheets that contain formulas that reference other cells, so you can make sure to update the reference and dependency information.

    Well, duh!

    • Exactly. He ignored the spec, made whatever changes seemed okay to him, and produced a non-standards-conforming document. With that kind of "I don't have time to read the spec" attitude you have to wonder how he came to care about international standards at all. It's actually a *good thing* (in the long run) for programs to reject clearly nonconformant documents instead of attempting to render them.

      Some of his other points look good, though, and in any case, it just strains credulity that Microsoft would
    • Not so much. (Score:3, Insightful)

      He wanted to remove a formula from a given cell. His first attempt was to simply remove the formula and change the value.

      Instead, he has to go update all the reference and dependency information, which programs have to generate and update all the time anyway. I can't really think of a good reason this information needs to be saved to disk, and I certainly can't think of a good reason that Excel deletes the cell, rather than updating the dependencies itself to reflect the physical document.

      In fact, I can't t
  • Except he doesnt. (Score:4, Informative)

    by miguel (7116) on Sunday August 26, 2007 @02:12PM (#20363627) Homepage

    Stephane has for a long time presented a weak case against OpenOffice XML.

    "1) Self-exploding spreadsheets"

    His top issue "1) Self-exploding spreadsheets" has been discussed on Brian Jones' weblog:

    http://blogs.msdn.com/brian_jones/archive/2007/08/ 15/why-there-s-no-microsoft-in-open-xml.aspx [msdn.com]

    It boils down to: the fact that is XML does not mean that you can modify it in any way you want; There are rules for modifying the schema and Mr Stephane is not happy with that. Had he followed the actual rules he would have had no issue.

    This is a case where two locations must be updated per the spec; He can avoid updating the two locations by removing the chainCalc.xml file (which is optional, and Excel will reconstruct). He later gets upset because if he does that, he claims performance on load will be slower.

    "2) Entered versus stored values"

    His second point in "2) Entered versus stored values" in an interesting distinction between entered values and stored values. It reflects the way that Excel works (and so does Gnumeric) by storing the values instead of the data that was entered by the user. This responds to the need of the spreadsheet to do something interesting with the data, for example when you enter a date, it is stored as a number with a format applied not as a string. This allows computations on dates to happen based on the underlying numeric value. The featured is used extensively by spreadsheets.

    In the Excel/gnumeric case you have to generate a single value, in the ODF case you must generate and update the two values (which just a point before, Stephane was having a seizure about).

    The precision issue that he brings up, I suspect is merely an issue with double format precision. He claims that the data is unusable and there is a loss of precision, but handing that out to a C compiler will produce the expected result with no loss of precision. I do not know how "atof" or the compiler work internally to cope with this issue, but at least my libc/gcc combo does not have this problem.

    I would not be surprised if this is an artifact of floating point, someone with more background on doubles and floating point math could probably answer the question with more authority, but a cursory read of "What Every Computer Scientist Should Know about Floating Point" seems to validate that there is no error in the floating point representation for the values that he uses: http://docs.sun.com/source/806-3568/ncg_goldberg.h tml [sun.com]

    3) Optimization artefacts become a feature instead of an embarrasment

    His 3rd point is open for debate, like the 1st case, we have a case where he has to handle things differently. Stephane sells a commercial product to handle Excel files and I suspect that his product has to cope with the same patterns in different ways, which has naturally upset him. OOXML might be inspired by Excel's needs, but it does not mean that it has to be a 1-to-1 match.

    4) VML isn't XML

    VML is labeled as "deprecated" in the OOXML documentation (Section 8.6.2, page 25) and it states: "The VML format is a legacy format originally introduced with Office 2000 and is included and fully defined in this Standard for backwards compatibility reasons. The DrawingML format is a newer and richer format created with the goal of eventually replacing any uses of VML in the Office Open XML formats. VML should be considered a deprecated format included in Office Open XML for legacy reasons only and new applications that need a file format for drawings are strongly encouraged to use preferentially DrawingML."

    So the standard basically says "VML is still in use, but its better to use DrawingML". Stephane misconstrues the above statement and tries to portray this as evil

    • by spectecjr (31235)
      I only wish that the Java runtime specs back in the 1.0 days had gotten this much attention. Having to implement UI controls three different ways for Netscape's JVM vs. the Sun JVM vs. the Microsoft JVM was a painful experience.

      I also wish that Stephane didn't have such an obvious chip on his shoulder. Its completely destroying his credibility, and making him come to some incredibly sloppy conclusions.

      I took at look at his examples. "The calc chain doesn't work if you modify the data". Oh wow. You mean, the
    • So what you seem to be saying is that, if OOXML gets voted in as a standard, MS-Office or atleast Excel might fail OOXML certification? That is probably good news because, if MSFT backs a truck full of money on the driveway of these committees and buy their votes, there is a second avenue open to challenge the adoption of MS-Office, claiming it does not comply with the OOXML standard. Right?
  • As was explained in A Beautiful Mind, there's no point everyone hitting up the hot blond who is going to reject you anyway, when your failure to achieve your first objective then compromises your chances to succeed with a second objective. Does anyone here think that a total anhilation of Open XML is in the cards? The point of that scene is not that it's a particularly good expression of Nash equilibrium, but that even a blond can understand it, which indirectly serves as a good example of settling for se
  • I have great doubt that Microsoft fixed the issues with OOXML in this short period of time so one must ask why is this format being addressed again so quickly?! One has to wonder if the ISO simply realizes that no matter what they do Microsoft will just keep pushing this until they finally get a yes vote. Are they simply caving?

    If they agree to this it will simply be plain evidence that they are being influenced or the members are not competent. I'm of the opinion that they need to force a delay between
  • This shows that neither OO.o nor K-Office handle ODF faithfully, nor are they compatible with each other.
    http://develop.opendocumentfellowship.org/testsuit e/summary.html [opendocume...owship.org]

    Also, OO.o adds things to its files that are outside of the ODF spec. If MSO's files aren't true OOXML files, then OO.o's files aren't true ODF files either.

    Same situation as many other standard formats, such as HTML. Different apps handle formats differently, and often not 100% faithful to the spec.

Sentient plasmoids are a gas.

Working...