Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Dark Corners of the OpenXML Standard

Posted by CowboyNeal on Thu Jan 04, 2007 11:54 PM
from the dared-to-comply dept.
Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • It's not a true standard... (Score:4, Funny)

    by creimer (824291) on Thursday January 04 2007, @11:58PM (#17469738)
    (http://www.creimer.ws/ | Last Journal: Friday January 26 2007, @12:40PM)
    Until it supports WordStar [wikipedia.org] documents.
    • Bah! by mrchaotica (Score:3) Friday January 05 2007, @12:58AM
      • Re:Bah! by Solra Bizna (Score:1) Friday January 05 2007, @01:28AM
        • Re:Bah! by MrMr (Score:2) Friday January 05 2007, @07:19AM
    • Changing file formats and Archiving by Alien54 (Score:2) Friday January 05 2007, @01:08PM
    • 1 reply beneath your current threshold.
  • Length (Score:4, Funny)

    by jcnnghm (538570) on Friday January 05 2007, @12:08AM (#17469792)
    I don't know why anyone would complain, the spec is only 6,000 pages long.
    • Size (Score:5, Funny)

      I don't know why anyone would complain, the spec is only 6,000 pages long.

      And the best part is, these [umn.edu] are the pages it uses... (I mean, why else do those specs cost so much?)
      [ Parent ]
    • Re:Length by MillionthMonkey (Score:2) Friday January 05 2007, @12:59AM
      • Re:Length by Heir Of The Mess (Score:2) Friday January 05 2007, @01:26AM
        • Re:Length by chthon (Score:3) Friday January 05 2007, @02:42AM
        • Re:Length by redcane (Score:2) Friday January 05 2007, @02:47AM
          • 1 reply beneath your current threshold.
        • Re:Length by 99BottlesOfBeerInMyF (Score:3) Friday January 05 2007, @09:51AM
          • Re:Length by Heir Of The Mess (Score:2) Saturday January 06 2007, @03:15AM
        • Re:Length by computational super (Score:2) Friday January 05 2007, @11:25AM
    • MS areslow learners (Score:4, Interesting)

      by WebCowboy (196209) on Friday January 05 2007, @02:14AM (#17470484)
      ...but they do learn....slowly...eventually.

      Their "open" XML format for office docs is a prime example of this.

      I think Steve Jobs was the one who first said "Microsoft just doesn't get it". Microsoft was probably the very first third-party software developer for the Mac and this was Jobs' reaction to Microsoft's first Mac applications (I think a port of Multiplan--which was re-incarnated into Excel IIRC, and MSBasic). They really WERE "tasteless", ugly and took almost no advantage of the revolutionary GUI interface--their DOSness really showed through--I think in the case of Multiplan the mouse could be used only to jump the cursor to a certain cell and that was it--the rest was all like in DOS.

      MS Windows is another example--Microsoft didn't "get it" well enough until the third major release. Now MS is SLOWLY "getting it" with the beneficial characteristics of XML standards. Microsoft's early XML efforts are like Windows 1.0--there is some very rudmentary understanding of the mechanics but not the philosophy of XML, and I wonder if this is why SOAP ended up NOT so simple (given Microsofties were involved in its creation and seemed to be trying to make it a DCOM-in-XML-but-dumber thing). Microsoft's "Version1" XML might look like this:

      <Soap:Envelope>
      <Soap:Body>
      <wsWriteLegacyData>
        <encodedBinaryData>
      SDFgkdfkljSDFJLDFSJKLkjdfbks df jklsdfklj;hk/jkjnb.kndf
      jk.sdfjkldfsddfsdfkkjsdfh kvbkjnkjkjksdfkjsdfkeuieru903
      oijooeoefvkmefmklef lmkseflkvfeklmlmermklemleflmdvldflk
      </encodedBina ryData>
      </wsWriteLegacyData>
      </Soap:Body>
      </Soa p:Envelope>
      "See? We're using XML and SOAP! We're hip! We're cooool! You can't say we don't play by the rules now!"

      Of course, this is an obtuse, opaque and obsfucated way to use XML andtotally NOT in the spirit of interoperability and openness. I won't even go into the nifty XML tools MS has made...nifty to use but they've done a lot to obliterate the S out of SOAP in their crazy output.

      OOXML (Opaque and Obsfucated XML) standard is "version 2.0"--they're doing their best to eliminate ambiguity but now we've gone over to hyper-specificity, and the standard is being shared a bit better...problem is that they don't fully describe the interpretation of the standard elements so as to keep its advantage. All they've done is taken every formatting option and mapped it to an XML element--it is monolithic and completely non-extensible. But hey, at least its publicly available and doesn't involve weirdness like encoded-binary-blobs.

      In a few years MS will reach version 3.0 of "getting" XML...
      [ Parent ]
  • The author is exactly right. (Score:4, Insightful)

    by JoshJ (1009085) on Friday January 05 2007, @12:09AM (#17469800)
    This is why the Microsoft Office XML (let's not kid ourself, this is far from "open") format should not become an ISO standard.
  • The power of legacy systems... (Score:5, Insightful)

    by Anonymous Coward on Friday January 05 2007, @12:09AM (#17469804)
    The power of legacy systems is at once both Microsoft's greatest strength and greatest weakness. Nobody in OSS is going to have the patience to rebuild the same level of backwards compatibility needed to displace them but the code must be an absolute tarpit of accumulated cruft and security holes that's incredibly difficult for them to keep going.
    • Re:The power of legacy systems... by blincoln (Score:2) Friday January 05 2007, @01:40AM
      • The hitch here is that *not* having them means tons and tons of reverse engineering, and that's only after tracking down every release of every version of every MS Office ever.

        The real hitch, as the article hints, is that the releases are contradictory. For instance, the Mac version of small caps is different from others. This is part of the reason Word is so bloated and does not preserve printing type setting from one machine to the next.

        Ten years ago, a state agency I was working for was forced to move from Word Perfect to Word. Hundreds, if not thousands, of documents were painstakingly converted from one format to the other. The typesetting, which they had never had a problem with previously, was easily broken by moves from one machine to the other or by changing printers. That is the kind of thing that no program can account for - it was broken from then and can not be created correctly today. It's also probably the reason for all of the nebulous "guidance" sections that don't tell you anything other than to look at, and presumably measure, old printed examples. Not even M$ knows what it was really doing in the field. As I saw at the time, no two were alike.

        Of course, the time to get things right is not in your XML it's when you import the document. The author tells us this in so many words. The XML should be general enough to encompass any kind of typesetting. It is the importing program's task to figure out what the old format wanted things to look like. As the author points out, the spec does not do anything other create something impossible to follow. It's not going to magically make things look right no matter how hard they wish it would.

        [ Parent ]
      • Re:The power of legacy systems... by clodney (Score:2) Friday January 05 2007, @10:23AM
      • Re:The power of legacy systems... by Bert64 (Score:2) Friday January 05 2007, @01:57PM
  • by AiY (175830) on Friday January 05 2007, @12:10AM (#17469806)
    (http://slashdot.org/)
    Sweet! I actually have copies of those somewhere. The reverse engineering process will begin immediately. Now where did I put my 286....
  • Basically (Score:5, Insightful)

    by DrYak (748999) on Friday January 05 2007, @12:11AM (#17469810)
    (http://www.sympato.ch/)
    ODF is the former SXW format that was taken and transformed into a standard by a committee comprising several Office software makers. It's suppose to describe the normal features that anyone should expect from any Word processing application, be it OpenOffice.org, KWord, AbiWord, Corel Word Perfect, etc. all this in a perfectly neutral way. It was designed with a function in mind (storing word processing documents in an open and interoperable way). Its benefits are comparable to the standardisation of HTML.

    OpenXML is Microsoft trying to translate its proprietary DOC file inside a XML container (because it's a big buzzword) and propose it as a standart to ECMA (because everyone is speaking about ODF being an ISO standard). It describes not only what is to be expected from a word processor, but also all MS-Word specific microsoftism. It was designed with a specific software in mind (and partly derives from the internal functionning of MS-Word). It's only a small improvement over the previous MS XML format (which had a lot of informations hidden in a binary blob).

    The good thing for Microsoft, is that they can pretend this limitation is "Not-a-bug-but-a-feature", and brag around that there are a lot of stuffs that MS-Word couldn't store inside an ODF and only OpenXML can carry.

    Microsoft's plan :
    1. Embrace
    2. Extend <- They are here
    3. Extinguish
    • Don't forget the page counts... (Score:5, Interesting)

      by Anonymous Coward on Friday January 05 2007, @12:34AM (#17469946)

      ODF spec page count: 722 [iso.org].

      OpenXML spec page count: 6000 [regdeveloper.co.uk]!!
      [ Parent ]
    • No bragging rights there. by Erris (Score:2) Friday January 05 2007, @01:09AM
    • Re:Basically by megabyte405 (Score:3) Friday January 05 2007, @01:30AM
      • Re:Basically (Score:4, Insightful)

        by blincoln (592401) on Friday January 05 2007, @01:45AM (#17470338)
        (Last Journal: Sunday March 21 2004, @11:14PM)
        There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea).

        I would argue that when it's taken to the extreme of Office prior to 2007, it *is* a bad thing. AFAIK, the old Word format is more or less a (very) partial RAM dump (which is why you can often find all sorts of interesting stuff in Word files that the authors think they've deleted). That makes for faster dev times, but because the load and save functions don't really "understand" the content of the file, IMO the developers made things a lot harder for themselves in the big picture. I imagine reproducing issues in testing is a particular nightmare.
        [ Parent ]
        • Re:Basically by megabyte405 (Score:2) Friday January 05 2007, @02:00AM
      • Re:Basically (Score:5, Interesting)

        by Nicopa (87617) <nickNO@SPAMreloco.com.ar> on Friday January 05 2007, @01:46AM (#17470342)
        No. ODF has several real, factual, benefits. It might have been originated in a single product but... it reuses existing standard technologies (SVG, CSS...). It has properly designed XML tags that act as "markup", in OpenDocument xml tags act as container for chunks of data. ODF tries to separate content from style.

        And about your RTF suggestion... can I draw diagrams with RTF? Can I have a ToC? Can I do complex styling? Can I have a "galery" of styles? Can I include images? No. RTF is not a solution.
        [ Parent ]
        • Re:Basically by megabyte405 (Score:3) Friday January 05 2007, @01:58AM
        • Re:Basically by I'm Don Giovanni (Score:2) Friday January 05 2007, @02:16AM
        • Re:Basically by dominator (Score:3) Friday January 05 2007, @10:31AM
      • Re:Basically (Score:4, Informative)

        by iluvcapra (782887) on Friday January 05 2007, @02:15AM (#17470486)
        (http://www.soundepartment.com/)

        After having written some tools on OS X that do stuff with RTF:

        RTF is well documented and you can make an RTF document on all manner of platforms (I've done it in Ruby and Cocoa), but many platforms have extended RTF in their own way in order to support special features. OS X has added a few special methods to RTF files to support Mac OS X typography, and I've noticed that different versions of Word handle document attributes (like headers and page numbers) in different ways.

        RTF is great if you want to make up something quick that is ONLY formatted text, but readers have all manner of different ways of interpreting the exact appearance of tables, page layouts and margins, and there doesn't seem to be any manageable common mechanism for including images or other documents, something Word and OO.org excel(pun) at. Even HTML seems to be better at this.

        I use RTF output in a few little in-house tools I have, so people can get the text+attributes they create and open them in a text editor of their choice for touching-up and delivery. When my tools have to create something that is supposed to be finished, they make PDFs.

        RTF is great for interoperability, but I never expect an RTF file to contain a "finished product," unless the recipient expects quality on par with a Selectric. It is merely a relatively-open serialization format for strings with attributes.

        [ Parent ]
      • Documents outlive applications (Score:5, Insightful)

        by Geof (153857) on Friday January 05 2007, @03:03AM (#17470690)
        (http://www.geof.net/)

        There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea).

        Documents are worth far more than software, and they outlive the applications used to create them. See the comment [robweir.com] to the original article - reading documents after 5, 20, 30, 100 years or more is not optional. You can pay the price of developing an independent format now, or you can pay the price of reverse engineering over and over again every time you change your internal representation.

        Repeated implementation limits future change and innovation. It's expensive: it likely costs more even for Microsoft. But they can afford it; their competitors may not be able to. Plus, Microsoft already has their first implementation.

        interoperability seems to work best when taken from the ground up - when working with another application's data structure of any complexity, you simply can't do a lossless roundtrip without losing before you've started.

        Perhaps so. But compare that cost to the cost I've just outlined. It is in the best interest of users and software developers (maybe even of Microsoft) to bite the bullet now, do the conversion once, and develop a clean format for the future.

        Maybe you have in mind an argument you're not making, but I don't see any sufficient basis for your broad contention that using a file format based on an internal representation is a "darn good idea". In specific cases, yes (e.g. where the cost of development time or effort are the most important factors). In general, I very much doubt it. That successful applications in the past have taken that approach is weak evidence. They were developed when the up-front cost of development in a time of rapid innovation, the loss of customer lock-in, and a lack of open-format competition where good business reasons for making such a choice - even if it was inferior technically, increased cost in the long term, and was bad for consumers. In today's climate of slower innovation, competition from open formats, and customers who are running into their own long-term interests, the situation is different.

        Which is not to say Microsoft's apparent attempt to set the rules of the game and throw sand in the gears of change is not in their interests, or that it will be unsuccessful.

        [ Parent ]
      • Re:Basically by AuMatar (Score:3) Friday January 05 2007, @04:10AM
      • OOXML's Origin Is Not The Problem (Score:4, Interesting)

        by NickFortune (613926) on Friday January 05 2007, @09:15AM (#17472688)
        (http://www.nymar.demon.co.uk/)
        ODF is a nice idea in theory, but really, it's a similar situation (OpenOffice.Org internal dataformat jammed into a standard, so designed with OO.o in mind by necessity)
        The ODF format must necessarily describe the structure and layout of an office document. There's no need for it to reflect the internal data structures of any specific application, except to the extent that they too describe office documents.

        OOXML includes data elements that should be part of internal import routines rather than being enshrined in the document format, and it includes elements that are not specified except by reference to applications for which no public specs exist. This is the problem, not the fact that OOXML is derived from MS Office file formats.

        RTF. It may not get press attention, but it's actually a fairly well-documented standard, has been working as an interchange format for years, and yet is designed with enough expandability that it's still useful with the kinds of documents produced today. It's a true de-facto standard.
        Well, I was a big fan of RTF at one time. But a few years back I found that documents with any kind of formatting more complex than paragraph+justification+font just wasn't working between MS Office and back. I don't know if this was because the format couldn't cope, or because of faulty implementations. In either case, it led me to give up on RTF.

        In any event, to be a replacement, RTF would need to work for spreadsheets and presentations at a minimum - something I don't think there's a lot of support for in the current RTF specification. We'd also lose the benefits of an XML based format, which given the amount of work on the seamless integration of XML documents into databases, web services and other data management applications means losing a lot of functionality.

        for those who really want interoperability, RTF is the way to go with today's software
        Interoperability is only part of the problem. We also want a spec that can be fully and freely implemented by anyone, which isn't under the control of any single vendor.We want a format to which we can entrust documents, knowing that in twenty years time there will be an application capable of reading them.

        an unnecessary dichotomy is drawn between OpenXML and ODF with regard to their design goals - both are repurposed native formats for a single application.
        I don't know what you mean by native in this case, but the repurposing of OOXML isn't the problem. It's one of size and obfuscation, and as TFA points out specification by reference to closed formats and the behaviour of extinct proprietary software. These are non trivial problems with OOXML which are not (to the best of knowledge) found in ODF.

        There's nothing wrong with ODF. Re-creating it based on the non-XML RTF would be a waste of time and effort.

        [ Parent ]
      • Formats... by DrYak (Score:2) Friday January 05 2007, @02:30PM
      • 3 replies beneath your current threshold.
    • Re:Basically by telso (Score:1) Friday January 05 2007, @03:52AM
    • Re:Basically by urbanradar (Score:2) Friday January 05 2007, @05:25AM
    • binary blob RAS? by ifknot (Score:1) Friday January 05 2007, @06:55AM
      • No. by DrYak (Score:2) Friday January 05 2007, @02:11PM
    • Re:Basically by fitten (Score:1) Friday January 05 2007, @09:38AM
      • Re:Basically by Bert64 (Score:2) Saturday January 06 2007, @06:40AM
  • Solution? (Score:1)

    by Nicopa (87617) <nickNO@SPAMreloco.com.ar> on Friday January 05 2007, @12:17AM (#17469852)
    Is it the only solution to all this to attack key Microsoft executives while they are sleeping?
  • The site seems to be slow... (Score:5, Informative)

    by junglee_iitk (651040) on Friday January 05 2007, @12:18AM (#17469854)
    (Last Journal: Monday October 23 2006, @03:10AM)
    You want to hire a new programmer and you have the perfect candidate in mind, your old college roommate, Guillaume Portes. Unfortunately you can't just go out and offer him the job. That would get you in trouble with your corporate HR policies which require that you first create a job description, advertise the position, interview and rate candidates and choose the most qualified person. So much paperwork! But you really want Guillaume and only Guillaume.

    So what can you do?

    The solution is simple. Create a job description that is written specifically to your friend's background and skills. The more specific and longer you make the job description, the fewer candidates will be eligible. Ideally you would write a job description that no one else in the world except Guillaume could possibly match. Don't describe the job requirements. Describe the person you want. That's the trick.

    So you end up with something like this:

    * 5 years experience with Java, J2EE and web development, PHP, XSLT
    * Fluency in French and Corsican
    * Experience with the Llama farming industry
    * Mole on left shoulder
    * Sister named Bridgette

    Although this technique may be familiar, in practice it is usually not taken this extreme. Corporate policies, employment law and common sense usually prevent one from making entirely irrational hiring decisions or discriminating against other applicants for things unrelated to the legitimate requirements of the job.

    But evidently in the realm of standards there are no practical limits to the application of the above technique. It is quite possible to write a standard that allows only a single implementation. By focusing entirely on the capabilities of a single application and documenting it in infuriatingly useless detail, you can easily create a "Standard of One".

    Of course, this begs the question of what is essential and what is not. This really needs to be determined by domain analysis, requirements gathering and consensus building. Let's just say that anyone who says that a single existing implementation is all one needs to look at is missing the point. The art of specification is to generalize and simplify. Generalizing allows you to do more with less, meeting more needs with few constraints.

    Let's take a simplified example. You are writing a specification for a file format for a very simple drawing program, ShapeMaster 2007. It can draw circles and squares, and they can have solid or dashed lines. That's all it does. Let's consider two different ways of specifying a file format for ShapeMaster.

    In the first case, we'll simply dump out what ShapeMaster does in the most literal way possible. Since it allows only two possible shapes and only two possible line styles, and we're not considering any other use, the file format will look like this:

    <document>
    <shape iscircle="true" isdotted="false"/>
    <shape iscircle="false" isdotted="true"/>
    </document>
    Although this format is very specific and very accurate, it lacks generality, extensibility and flexibility. Although it may be useful for ShapeMaster 2007, it will hardly be useful for anyone else, unless they merely want to create data for ShapeMaster 2007. It is not a portable, cross-application, open format. It is a narrowly-defined, single application format. It may be in XML. It may be reviewed by a standards committee. But it is by its nature, closed and inflexible.

    How could this have been done in a way which works for ShapeMaster 2007 but also is more flexible, extensible and considerate of the needs of different applications? One possibility is to generalize and simplify:

    <document>
    <shape type="circle" lineStyle="solid"/>
    <shape type="square" lineStyle="dotted"/>
    </document>
  • zOMG (Score:1)

    by dexomn (147950) on Friday January 05 2007, @12:26AM (#17469892)
    Does this mean the big E won't work on my windows 97 anymore?
  • Backwards compatibility (Score:3, Interesting)

    by Bob54321 (911744) on Friday January 05 2007, @12:29AM (#17469916)
    I thought most people considered themselves lucky if there documents could open in successive versions of Office. Why would anyone want to implement support for really old versions if Microsoft does not do it themselves?
  • I understand that these tags will be needed when converting legacy documents, but how many people are going meet all the following conditions to even be effected by this:

    A) Desire to convert an old Word 5/95/WordPerfect 6 document to OOXML.
    B) Have the original document actually use one of the undocumented legacy features
    C) After converting the file actually experience a problem in formatting

    First of all, the number of people who fall into category A is going to be small to begin with, same with category B. Although Microsoft is not providing the documentation like they should be, category C would be up to Corel, Sun, and other producers of future OOXML compatible word processors to implement. They're going to implement OOXML, so they're going to be encountering these issues as they program anyway. I trust they can figure out a way to display "full-width East Asian characters" and other such issues that are not fully documented in the standard.

  • My favorite quote (Score:4, Insightful)

    by IvyKing (732111) on Friday January 05 2007, @12:50AM (#17470032)
    From TFA


    This is not a specification; this is a DNA sequence.


    Outrageously funny and to the point.

    • M$ DNA by Erris (Score:2) Friday January 05 2007, @01:38AM
      • 1 reply beneath your current threshold.
  • companies make it their policy to only purchase software which uses truly open standards to store their data.
  • by PurifyYourMind (776223) on Friday January 05 2007, @01:48AM (#17470350)
    (http://trollchat.org/)
    ...14- and 16-year-olds is illegal.
  • Unfair (Score:1)

    by BCoates (512464) on Friday January 05 2007, @02:27AM (#17470540)
    This spec sounds like a bloated monster, but the criticism the FA is making is entirely unfair. If OOXML is going to be a useful one-size-fits-all document format, it'll need to be a superset of all existing things word processors can do, even the weird old bits that don't make much sense. There's two ways to do this: Either spec out the broken behavior into the already-bloated specification, or add a flag that says "old broken spacing" and let implementors decide how faithfully to represent it.

    If they take the first option, then writing a tool that converted to and from OOXML would be a nightmare, you'd have to work out all those broken options into something that looked right, even if the end application supported it natively, since the converter app would be the last chance to attempt this obscure conversion. Making the old format->OOXML->old format loop actually end with a document that rendered anything like the starting document would be pretty much impossible.

    The way they did it, a converter app that reads in those standards can just set the appropriate flag, and let the downstream renderer deal with it. If the user actually needs these crazy old features they can go get a patch to their wordprocessor to support it; or they can find a special-purpose converter that modifys the document to not need the flag anymore; or they can convert the doc back to the original obsolete format and open it in the ancient app itself. If the document had already been mangled by a half-baked conversion/export tool, the user couldn't have done any of these.

    Tools that don't care about legacy support are unaffected by this; they can just pick the closest modern option to whatever the legacy flag calls for on input, and not output documents that use them.

    • Re:Unfair (Score:4, Insightful)

      Tools that don't care about legacy support are unaffected by this; they can just pick the closest modern option to whatever the legacy flag calls for on input, and not output documents that use them.

      And thus tools, legally, are not OOXML, and won't qualify for purchasing by companies that specify OOXML. Which is the entire point.

      There's a difference between 'We need to make sure that old documents can be converted correctly.', and 'We will literally convert old documents into a new representation that contains all their weirdness, and we won't explain how to implement said weirdness in the standard.'.

      What Microsoft has produced is not even a standard. Standards must specify everything, or reference other standards that specify everything. They can't reference applications.

      If Microsoft wants to keep secret how to turn Office 95 documents into OOXML, fine. Producing a standard doesn't mean you have to explain how to convert things into that standard.

      It does, however, mean you have to explain exactly what should happen if mwSmallCaps is true, to the pixel. You can't just pawn it off on the unexplained hypothetical behavior of some other application.

      [ Parent ]
      • Re:Unfair by Todd Knarr (Score:2) Friday January 05 2007, @11:47AM
        • Re:Unfair by DavidTC (Score:1) Friday January 05 2007, @12:28PM
      • Re:Unfair by Simetrical (Score:1) Friday January 05 2007, @04:13PM
        • Re:Unfair by DavidTC (Score:1) Saturday January 06 2007, @11:47AM
    • Re:Unfair by Askmum (Score:3) Friday January 05 2007, @03:16AM
    • Re:Unfair by SanityInAnarchy (Score:2) Friday January 05 2007, @04:15AM
    • Re:Unfair by spitzak (Score:2) Friday January 05 2007, @03:36PM
    • 1 reply beneath your current threshold.
  • by Knutsi (959723) on Friday January 05 2007, @02:51AM (#17470630)
    This was a worrying, but good, article. I'm sure MS is a bit in a thight spot as well, if they really desire backwards compatibility (which is what they survive on in a way). But it would make more sence to make supporting legacy documents more optional.

    When I save a Word 2007 document to the old .doc format, it warns me that "minor loss of fidelty" may happen. Similarly, when opening a document, supporting waybackthen formats could be optional/plug-based, and the app rather warning that "minor loss of fidelity" may happen since the document was converted from an old source.

    Forcing every new app ever written to this standard to support diffuse behaviours from the good ol'days is just ridiculus. Besides, most of it appears to apply just to apperance.
  • by shadowmatter (734276) on Friday January 05 2007, @03:58AM (#17470932)
    You can view all the atrocities of OpenXML that he's blogged about here [robweir.com]. Highlights include dumping bitmasks into XML as hexadecimal on a byte-by-byte basis, and an XML element for specifying whether the dates in the workbook start in 1904.

    I'm can't believe this became a ratified standard.

    "Let him who has understanding calculate the number of the beast, for the number is that of a standard; and its number is three hundred and seventy-six." Common-freaking-sense 13:16-18

    - shadowmatter
  • Blind leading the blind (Score:3, Interesting)

    It's instructive to observe the panic-ridden frenzy with which Microsoft have approached the business of using XML as a file format. The marketing influence is all too plain to see, with the result that they feel an inner compulsion to preserve the appearance of the document at all costs, sacrificing all logic and common-sense to do it.

    OOo did the same, but with greater elegance and less haste because they were ahead of the field. Corel screwed it up with WordPerfect by keeping their stylesheet format proprietary so that transfer between WP document code and XML was made as hard as possible (a Class A blunder, given that their XML editor is actually quite good). AbiWord makes a good job of saving DocBook XML, but it's not trying to pretend it's reimportable; it screws up LaTeX formidably, though, by trying to pretend that it absolutely has to preserve line-length and font-size, which is evidence of the same neurotic attitude as Microsoft.

    The problem in all cases is not that the assorted authors and coders don't understand XML (although some of them clearly failed that test too), but that they don't understand documents. This is particularly true at Microsoft, where leaders such as Jean Paoli have been proselytizing XML for years. They still think a document is a jumble of letters; they have no idea of structure, and the DOM is simply laughable as a non-model of a document. Microsoft's particular problem with XML is that they came to it too late, and viewed it as a way of storing data, not text...indeed to this day many XML users, trained with Microsoft blinkers on, are unaware that XML can be used for normal text documents.

    With this level of ignorance surrounding Microsoft, it's hardly unexpected that they should blunder so badly.

  • by rjungbeck (1038398) on Friday January 05 2007, @05:04AM (#17471232)
    Where is the problem in doing the conversion (for the legacy features) in the converter, so that the new format is free from this bloat? OK, its harder to write the converter (which has to implement this old behaviors), but its Microsoft who wants to have the backward compatibility. So it only needs to be done once.
  • by gjuk (940514) on Friday January 05 2007, @05:14AM (#17471290)
    As often, purism is the enemy of progress here. Whilst it'd be great to be able to render, faithfully, every detail of any legacy document - it's an unnecessary and unrealistic constraint. One day, Microsoft themselves will choose to drop support for WPx or WW8 etc. They will. Really, they will. For owners of documents whose only record is held in proprietary formats - that will happen one day. Might as well happen with the adoption of a standard which prevents it happening again. Let's face it - PC's no longer ship with 5.25 inch floppies. Try opening an EBCDIC WordStar document in anything now; or a Tasword III document. Legal documents usually specify in the preamble that the layout is purely for ease of reading and of no legal significance. Even old photocopies and faxes are usually mashed in some way. To be honest - the inability of many earlier versions of Word to render correctly on different printers (even making it dificult to use A4 in the UK when Word insists on US Letter) was much more of an issue for many of us than the lack of WP support. At the end of the day, if the document retains the right characters and numbers in the right order, it meets the real needs of users. Let's make it easier to open up the document market by being realistic; not close it down through artificial maintenance of unnecessary standards.
  • Seems fair enough (Score:2)

    by istartedi (132515) on Friday January 05 2007, @08:00AM (#17472122)
    (Last Journal: Thursday April 18 2002, @07:50PM)

    If you were faced with output from a 15 year old program, what would you do? 15 years? In software, that's an eternity. These tags are essentially saying "here is where this old crap used to be". How many people are actually using these programs? Maintaining documents in the old format? I defy any of you out there in Linux-land to say you wouldn't take the same approach under the same set of circumstances. Actually, Linux people would probably just say "it may not open old documents properly, but that's OK because you have the source". Really not much better.

  • Guillaume Portes = Bill Gates (Score:2, Interesting)

    by tendays (890391) on Friday January 05 2007, @08:49AM (#17472420)
    I don't know how many of you noticed: The fictional name "Guillaume Portes" is actually a literal translation of "Bill Gates" in French ...
  • Looks the same--who cares! (Score:2, Interesting)

    by Anonymous Coward on Friday January 05 2007, @10:37AM (#17473992)
    'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"
    Someone needs to tell every developer of word processing and page layout software on the planet to abandon the 'must look the same' obsession described by the above. Why worry about making content in application B look like content in Application A? I create books out of Word files submitted by several people. The last thing I want is all the inconsisent formatting from each of them to control a book's look.

    Named styles is the answer. If a paragraph is body text, call it that. If it's an inset quote, call it a quote. If a term is in italics, label it as italicized style not Times Italic 12 point. But don't get all hung up in the distinctions between Times Roman and Times New Roman. The purpose of XML is to define what something is. Not what someone thought it ought to look like on Tuesday three weeks ago.

    Ditto transfers between applications. Why is there so much effort devoted to importing every little odd quirk of Word into InDesign as if the quirk mattered. Bring in the text tagged with what it is and let InDesign determine what it looks like. InDesign is far more powerful and predictable than Word anyway.

    It is, of course, the Microsoft's advantage for everyone to define RTF and now OpenXML as the "standard" and obsess over the sort of things described above. But there's no sane reason for this obsession to exist. If you want a text to always look the same, use PDF. If you want a document to look good, make it look good in the application you're using. Don't try to make that application retain the 'sorta-looks-ok" feel of another application. That's too much work for too little result. It's why all too many ordinary users shrug their hands and buy Word rather than hassle with import quirkiness.

  • by Assmasher (456699) on Friday January 05 2007, @10:55AM (#17474260)
    (Last Journal: Saturday April 03 2004, @07:10PM)
    ...you would indeed need to support everything that is covered by the specification, duh, right? Well, how likely is anyone to even want to support OOXML fully except for Microsoft?

    If I want to write a plugin for an open source text editor so that people can exchange word 2000 and later documents with my own editor I would certainly concern myself with supporting the aspects of OOXML which denote Word 95 emulation.
  • MS is lazy! (Score:1)

    by Bob-taro (996889) on Friday January 05 2007, @11:59AM (#17475322)
    I doubt the spec was written specifically to make rendering the xml difficult, but rather to make creating it easy. They probably don't know how to explain the behavior of these old word processors because they were buggy and inconsistent. So rather than have to figure out how small Word 5.0 would render small caps in a given situation, they can just tack on a "do it like Word 5.0" attribute. Much simpler!
  • Yo (Score:2)

    by jav1231 (539129) on Friday January 05 2007, @12:38PM (#17476016)
    I say keep it real, Yo! ASCII FOREVER!
  • "Dark Corners"? (Score:1)

    by chris-chittleborough (771209) on Saturday January 06 2007, @09:41AM (#17487464)
    (Last Journal: Wednesday August 03 2005, @01:39AM)
    "Dark Corners"? Those corners aren't just dark, they're full of grues. And they're not just corners, they're entire dungeons. All dark, and full of enormous, very hungry grues ...
  • Re:MIcrosoft sucks. (Score:2, Insightful)

    by theLOUDroom (556455) on Friday January 05 2007, @12:11AM (#17469812)
    The crazy amount of backwards compatibility is what allowed Microsoft to rise to the position it holds today...

    Or maybe it was their illegal business tactics?

    It would be pretty easy for me to run a successful business too if I could break federal law with impunity.
    [ Parent ]
    • Re:MIcrosoft sucks. by Brandybuck (Score:3) Friday January 05 2007, @12:15AM
      • Re:MIcrosoft sucks. (Score:5, Insightful)

        by Aadain2001 (684036) on Friday January 05 2007, @12:36AM (#17469950)
        (Last Journal: Monday June 23 2003, @07:07PM)
        But they broke plenty of laws to keep their monopoly :) And while their actions during their rise to the top may not have been illegal, they could easily be called 'strong-armed'.
        [ Parent ]
        • Re:MIcrosoft sucks. by kjart (Score:2) Friday January 05 2007, @02:00AM
          • Re:MIcrosoft sucks. (Score:5, Insightful)

            by hachete (473378) on Friday January 05 2007, @06:50AM (#17471726)
            (http://www.badstep.net/ | Last Journal: Tuesday December 30 2003, @06:04AM)
            Yes, they got into trouble for bundling but it misses the point every time. The secret sauce that Microsoft uses is to strong-arm the OEMs into bundling windows with PCs, espeicially for consumers. I'm also thinking that the Windows Tax is levied even if you buy Linux on a Dell. This is the lynch-pin of Microsoft domination, without it all their other strategies whither on the vine. Without bundling of windows with new pcs, the bundling of IE (and all the other sofware), the resistance against inter-operability, the mysterious file formats etc wither on the vine. I've been disappointed that *none of the investigations I've read about have gone after the OEM-Microsoft link. Break that, and you'll have a free-market again.

            I think the Office XML format style is a play straight out of IBM's hand-book: make the standard complex and incomprehensible, and the little players - that's you - will find it hard to compete. In a way, that's a good sign: Microsoft is now lumbering into middle-age, hoist on their own evermore complex petard.

            The other thing about middle-age is that every little technological step away from their established base-line is treated as a revolution. In reality, it's no such thing, just a small stepping stone to shouting "pesky kids. Get off my lawn." Or maybe they've reached that stage already.
            [ Parent ]
          • Re:MIcrosoft sucks. by Overly Critical Guy (Score:2) Friday January 05 2007, @11:05AM
      • Re:MIcrosoft sucks. (Score:4, Insightful)

        by edwdig (47888) on Friday January 05 2007, @02:26AM (#17470532)
        (http://slashdot.org/)
        Microsoft broke no laws getting DOS onto every PC. They happened to be in the right place at the right time, and the market fell onto them. But from there, Microsoft bended and broke the law every chance they got to ensure that there never was any competition.

        Also don't forget that although MS's purchase of DOS was perfectly legal, it was ethically horrible. They arrived at a handshake agreement to license the code from Seattle Computer Company. While the MS paperwork was being finalized by the lawyers, SCC then made arrangements to finance other business ventures using the MS money. MS then presented them a contract to buy the code rather than license it, and told SCC to take it or leave it. As SCC had already committed to the other deals, they had no choice but to take MS's offer. Sure, no one held a gun to the head of the SCC executives forcing them to take the deal, however, they didn't have any other reasonable alternatives. MS's behavior was legal, but certainly not ethical.
        [ Parent ]
        • 1 reply beneath your current threshold.
      • Re:MIcrosoft sucks. by tyme (Score:3) Friday January 05 2007, @02:45AM
      • Re:MIcrosoft sucks. by CAPSLOCK2000 (Score:2) Friday January 05 2007, @08:28AM
      • Re:MIcrosoft sucks. by Mo Bedda (Score:1) Friday January 05 2007, @08:40AM
      • Re:MIcrosoft sucks. by 99BottlesOfBeerInMyF (Score:3) Friday January 05 2007, @09:25AM
      • Re:MIcrosoft sucks. (Score:5, Insightful)

        by TrekkieGod (627867) on Friday January 05 2007, @02:28AM (#17470544)

        If you get to the point where you build up a company that can even consider garnering the term "monopoly", then get back to us...At that point, maybe, just maybe, you may come to thinking that you you earned what you got, and the government has no right to tell you how to run your business...

        Yeah. Because the person best suited to decide what a company should or should not be allowed to do are the people who own the company. Of course you're going to want to be completely unrestricted to mow down your competitors using whatever advantages you have if you are in a position to do so. What you're missing is that no one should be allowed to use unfair practices to do it. Some people think we should idolize the free market as some sort of religion. We don't like free market economy because it was given to us by the gods. We like it because it tends to result in better products and lower prices. That ceases to be true when you have a monopoly in the mix.

        That being said, I'm not really informed about any Microsoft specifics, so I'm not going to argue in favor or against any "federal laws" as it applies to them (or failed to apply to them). However, suggesting that only people who have built a company that holds a monopoly should be able to decide what is fair regulation isn't rational. It may even be that the current federal laws regarding monopolies may be unfair and in need of reform, but the fact remains that the existence of a set of laws to regulate businesses is necessary.

        [ Parent ]
      • Re:wow, subtle... by poopdeville (Score:2) Friday January 05 2007, @03:15AM
        • Re:wow, subtle... by Zontar The Mindless (Score:2) Friday January 05 2007, @10:19AM
        • 1 reply beneath your current threshold.
      • Re:MIcrosoft sucks. (Score:4, Insightful)

        by ArsenneLupin (766289) on Friday January 05 2007, @03:41AM (#17470850)

        If you get to the point where you build up a company that can even consider garnering the term "monopoly", then get back to us. Until then, you have no idea what you're talking about, especially when quoting arbitrary and esoteric "federal laws". Call me nuts, but if you ever got to that point, you'd might even get a crazy idea in your head that those "federal laws" that you are so damned proud of, are about as fair and just as our drug laws. At that point, maybe, just maybe, you may come to thinking that you you earned what you got, and the government has no right to tell you how to run your business that you started in your teens, and proceeded to build to make it one of the most successful companies in the history of capitalism.

        Until you get to that point, I suggest that you those "federal laws" out your ass, Mr. Ashcroft.
        I agree 100% with you. However, for fairness' sake, we should then abolish all those unjust business-hampering federal laws, including copyright and patent law.

        Oh, and also those so-called "computer misuse" laws. Indeed, if I want to set up a consultancy where I propose to convert customers ASP scripts to PHP I should be allowed to demo to my prospective customers in great graphical detail why ASP is so insecure, even if I don't yet have an existing business relationship. Why should I tolerate that the government tells me how I may and may not recruit new customers?

        Anything less would be one-sided and unfair.

        [ Parent ]
      • Re:MIcrosoft sucks. by Overly Critical Guy (Score:2) Friday January 05 2007, @11:09AM
      • Re:wow, subtle... by I'm Don Giovanni (Score:2) Friday January 05 2007, @11:26AM
      • 3 replies beneath your current threshold.
    • Posters sucks, then blows, then sucks some more. by Anonymous Coward (Score:1) Friday January 05 2007, @12:29AM
    • Re:MIcrosoft sucks. by Al Dimond (Score:3) Friday January 05 2007, @01:00AM
    • Re:MIcrosoft sucks. by WalterGR (Score:2) Friday January 05 2007, @01:00AM
  • Re:Suck it up (Score:3)

    by oohshiny (998054) on Friday January 05 2007, @01:00AM (#17470118)
    Are you kidding? This is not a format specification. And it reflects badly on Microsoft and the engineers that authored this document: either they are too stupid to know that this is not a specification, or they are taking everybody else to be fools.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:Suck it up (Score:2)

    by Al Dimond (792444) on Friday January 05 2007, @01:13AM (#17470178)
    (Last Journal: Tuesday April 12 2005, @01:04AM)
    Because there's an opportunity for the format to not be ugly, so that the engineers can get as much done with less work and spend the rest of the time doing something that's really useful instead of duplicating their futile efforts. Or they might just kick off early and sip margaritas on the beach for all I care.
    [ Parent ]
  • Forbidden partial implementation? (Score:5, Interesting)

    by tepples (727027) <slash2006@pineight.com> on Friday January 05 2007, @01:52AM (#17470370)
    (http://myatomic.com/ | Last Journal: Sunday November 19 2006, @12:31AM)

    OOXML is just as open as ODF

    The behavior of years-old proprietary word processing software is included by reference into OOXML. How is any spec that includes by reference the behavior of proprietary software exactly "open"? True, implementors could produce a partial implementation of the spec that degrades away the legacy baggage (more or less) gracefully, but some standards' patent licensors forbid implementors to publish a partial implementation. I don't know if this applies to OOXML's license.

    [ Parent ]
  • Re:Suck it up (Score:5, Funny)

    by animaal (183055) on Friday January 05 2007, @03:32AM (#17470820)

    What's the deal with you people? I have seen engineers take apart the most difficult situations. You have the format in your hands. It's ugly and crappy, go figure. Just get it done and stop bitching. Why is everyone so lazy?
    Jeff, is that you? Haven't seen you much since you became a project manager. Congrats on getting the MBA!
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:Suck it up (Score:2)

    by mwvdlee (775178) on Friday January 05 2007, @04:41AM (#17471132)
    (http://www.vanderlee.com/)
    Ever heard of the saying "good programmers are lazy programmers"?

    Yes, we could all duplicate the significant effort of reverse engineering the missing parts of the standard (you still have a working copy of WP5 around somewhere?). Or we could just save everybody a lot of time and money in the future by making a one-time small investment of fixing the standard now.
    [ Parent ]
    • Re:Suck it up by tehcyder (Score:1) Friday January 05 2007, @08:42AM
      • Re:Suck it up by mwvdlee (Score:2) Friday January 05 2007, @10:06AM
  • Who's Scared? (Score:4, Informative)

    ... immediately render billions of existing MSO documents obsolete if you could get govt to mandate ODF exclusively. And the bonus is that such govt mandate would render any and all features not supported by ODF (i.e. not supported by OO.o) irrelevant.

    Eh? Isn't that why M$ made this supposedly "open" format? Because governments were tired of paying through the nose for secret formats that broke between versions? The purpose of an archive is to read it later. Governments and companies have already moved to pdf for archives. They are going to move their working documents to reasonable formats next.

    But MS opened their own format, thus leveling the playing field so that you must again compete on features ...

    You must not have read the 6000 page spec, which includes lots of sections like this:

    Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance

    That's neither open, nor a standard.

    Microsoft is hoping people believe what you say, but everyone knows better. Shit like OOXML this only proves that they have not changed. It's just another, more elaborate and more expensive lie. Even the name, by using "OO" is intentionally confusing. The New Office is everything the old Office was and always will be. Vista and Office 2007 are non starters.

    [ Parent ]
  • by cching (179312) on Friday January 05 2007, @03:13PM (#17478992)

    is faster to load than ODF, and has smaller file size than ODF
    That's because what they don't implement in the file format has to be implemented in code. Indeed:

    lineWrapLikeWord6
    means that's all you have to specify in the document, but then you have more, specific code *in the application* to handle just that. And they they aren't even going to tell you *how to write that code*. They just say "go load up word 1.0 and figure it out for yourself."

    Some people just can't think past what's in front of their eyes.
    [ Parent ]
  • 7 replies beneath your current threshold.