Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft Software

Microsoft Releases Pre-2007 Binary File Format Specs 269

An anonymous reader writes "Microsoft has released the specifications for the binary file formats used by pre-2007 Microsoft Office applications. They're accurate this time! Honest! While the documents are enormous (Word alone requires 533 pages; Excel runs over 1000 plus another 850 pages for the Office 2007 binary format), they hopefully will be useful to developers trying to create or extract information from Microsoft Office files (which despite their flaws, have been the de facto standard in many fields for some time now)."
This discussion has been archived. No new comments can be posted.

Microsoft Releases Pre-2007 Binary File Format Specs

Comments Filter:
  • by MickLinux ( 579158 ) on Monday June 30, 2008 @03:28PM (#24006605) Journal

    I know it's old hat by now, but back in the Office 98 days, file corruption was a big deal.

    I wonder what was going on, but it occurs to me that now I could concievably actually back out
    the errors, and figure the thing out.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      It might have been software state corruption unrelated to the file format, and so this might not help (I'm not asserting it does help either way).

      If this is anything like their previous documentation it will be full of errors and omissions [blogspot.com]. Wait until this has been reviewed by engineers who reverse engineer their formats and then you'll know if this is more useful than (for example) the KOffice source code, or OpenOffice.org, Abiword, Gnumeric, etc.

    • Re: (Score:3, Informative)

      by stewbacca ( 1033764 )
      Not trying to troll, but why is it that only Microsoft products get "corrupted"? Seriously, I use three platforms (Windows, Solaris, and OSX) and I don't ever recall a corrupted file in anything that was NOT made by Microsoft.
      • Re: (Score:3, Interesting)

        Work in an office with other people using the same stuff. It happens all the time. I just got back into my own office from being upstairs repairing a designers OS X.5 permissions. It happens everywhere, but because we all detest Microsoft we make more of a note of it.

        Continuing off topic for a moment: I actually notice that there are a stack of bugs I come across all the time on my Debian or CentOS boxes that I just fix and move on without ever really registering that they occured - it's a technical skill
  • Personally, the VBA .pdf is the most interesting of the lot.
    Wouldn't want to sound ungrateful about some of the tasty bits not present, so let me hope that this is yet another positive step that encourages follow-on.
  • by Anonymous Coward on Monday June 30, 2008 @03:30PM (#24006653)

    A far cry from the 6,000 pages for OOXML ..

    • by peragrin ( 659227 ) on Monday June 30, 2008 @03:33PM (#24006721)

      actually that's inaddition to the 6,000 pages for the OOXML spec since the OOXMl spec references that data.

      • by kestasjk ( 933987 ) on Tuesday July 01, 2008 @02:32AM (#24012751) Homepage
        Because "pages" are a great way to measure a specs size..

        What about line spacing, detail of information, number of examples? If the spec is clearest when fully expanded who cares if they can squeeze it onto a single page in microfilm by cutting out helpful documentation?

        Rather than looking at the number of pages why not look at the number of distinct node types/attributes? Surely that would give a better idea of spec size?
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      You're not counting the documents for Powerpoint and various other supporting components (VBA, Forms, etc.). When all of that is included, the total is around 5000 pages. And I don't think that that counts the OLE file format specification.

  • by clang_jangle ( 975789 ) * on Monday June 30, 2008 @03:30PM (#24006663) Journal
    ...to finally share proper doc of the old standards. This just means they feel confident that MS Office 2007 will take firm enough root to ensure that the old game of catch up for FOSS projects will stay the same.
    And wasn't it just yesterday some twits had an artice about how MS is changing/will change? I sure wouldn't hold my breath!
    • by 10scjed ( 695280 ) on Monday June 30, 2008 @03:35PM (#24006759) Homepage
      Not that Open...

      Some of the Microsoft protocols include patented inventions, and others do not. You may benefit from a patent license if you are distributing implementations of these protocols commercially or if you use an implementation of any of the protocols covered by Microsoft patents. For more information, contact the Microsoft Open Protocols Team.

      Check out the patent maps here [microsoft.com]

      • Re: (Score:3, Informative)

        Thanks, good info for those who will not RTFA. Same old self-serving MS...
      • by KokorHekkus ( 986906 ) on Monday June 30, 2008 @03:44PM (#24006909)
        To be fair, the article also adds:

        It is important to note that open source developers, whether commercial or non-commercial, will not need a patent license for the development of implementations of these protocols or for the non-commercial distribution of these implementations, according to Microsoft's Patent Pledge for Open Source Developers.

        • by Xtifr ( 1323 ) on Monday June 30, 2008 @04:26PM (#24007539) Homepage

          It is important to note that open source developers, whether commercial or non-commercial, will not need a patent license for the development of implementations of these protocols or for the non-commercial distribution of these implementations,

          So...commercial developers can develop as long as they don't distribute. Boy, that's helpful/useful. About as helpful and useful as a kick in the nuts. :)

          I still say the idea that a protocol can be patented is silly to the point of almost being an oxymoron. We can, perhaps, debate whether an implementation of a protocol can be patented, but the idea that the protocol itself can be patented seems like blatant abuse of the patent system, even if you're one of those who believes that software or business-method patents are a valid notion.

          Fortunately, it does seem to be getting easier to challenge patents. Now if only we could get MS to admit what patents they think various open source projects might be violating, so we can start the search for prior art.... :)

          (Alternatively, maybe we can keep them muttering vague threats about their patents without being specific long enough that we can ask for estoppel or laches if they ever do try to get specific. The rumblings help because that way they can't pretend that they didn't know about the supposed violations all along, a vital point in raising a defense of laches.)

          • Re: (Score:2, Interesting)

            by Kjella ( 173770 )

            It is important to note that open source developers, whether commercial or non-commercial, will not need a patent license for the development of implementations of these protocols or for the non-commercial distribution of these implementations,

            So...commercial developers can develop as long as they don't distribute. Boy, that's helpful/useful. About as helpful and useful as a kick in the nuts. :)

            Maybe someone with a law degree could sort it out but I thought it simply meant that a commercial company like Novell, Canonical or Red Hat could develop code as long as the distribution of the implementation itself is non-commercial. In short:

            1. Give this away for free
            2. Get more users and support for your distro
            3. Profit
            4. ??? (sorry)

          • by fyoder ( 857358 )

            So...commercial developers can develop as long as they don't distribute. Boy, that's helpful/useful.

            It's all a part of their long term commitment to encouraging the development open source software, nothing new here.

            (I assume from the summary that we're talking about the mirror universe Microsoft, the universe in which in 1976 Bill Gates wrote an open letter to the hacker community praising them for their efforts and exhorting them to "keep software free for the good of everyone, for the good of the world.")

        • The exception for "non-commercial distribution" doesn't sound compatible with OSS licenses like GPL. I'm pretty sure that GPL'd code can be commercially distributed, as long as the source is made available.

        • by spitzak ( 4019 )

          Yep, they want to make sure everybody thinks "open source" and "non-commercial" are the same thing. Same old Microsoft.

      • Re: (Score:2, Funny)

        by Chris Burke ( 6130 )

        You may benefit from a patent license if you are distributing implementations of these protocols commercially or if you use an implementation of any of the protocols covered by Microsoft patents. For more information, contact the Microsoft Open Protocols Team.

        Ah, well, at least now I know that "Open" in this context means "Open Your Wallet".

      • Re: (Score:3, Insightful)

        by Z34107 ( 925136 )

        Sigh. Microsoft can never do anything right, can they?

        A week or so ago people were whining that they wouldn't release the specs. Well, they've started external documentation for the 2003 binaries - and your link has documentation links for 2007 as well.

        At least they warn you that they might have patents - this isn't some kind of submarine patent trolling operation. For commercial products, they even give you a link to some Nice People who will help you wade through the minefield.

        Not perfect, amazing, mir

        • by DickBreath ( 207180 ) on Monday June 30, 2008 @04:36PM (#24007693) Homepage
          > Sigh. Microsoft can never do anything right, can they?

          They *could* do something right, but they choose not to. It would work against their business model.

          They *could* release specs unencumbered by patents. They simply don't want to.

          True interoperability is the last thing that they truly want.

          This has happened before. It will happen again. See IBM decades ago. The entrenched monopolist is never in favor of true interoperability -- nevermind whatever they may say. Everybody else who lives on the scraps is in favor of interoperability. Who you think is right depends on whether you think the currently in power monopolist has the God given right to be the only one in the business.
      • Re: (Score:3, Interesting)

        F***ing bullshit, I say! Nice of them to give us precise royalty rates, but "patented" and "applied for patents" ticks instead of patent numbers? Is there *any* sane way to get to the list of USPTO patent numbers in question at all? For me, this is another FUD along the lines of "pay for something but do not ask for what you are paying (and why) otherwise we might sue you". I am so happy to live in Europe (and, at the same time, afraid that this might change really soon with all those US companies' attempts
    • Re: (Score:3, Insightful)

      by _xeno_ ( 155264 )

      It's useful for people who want to generate Word documents. A project I worked on wanted to generate Excel spreadsheets as a way to download reports from a web application. We got it to work using Apache POI's HSSF, which while it doesn't implement everything reverse-engineered enough for it to work.

      ...Wait a moment. Allowing people to generate documents using old formats that work with the current Office actually helps Microsoft's Office monopoly, doesn't it? And here I thought they were just being kind.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        A project I worked on wanted to generate Excel spreadsheets as a way to download reports from a web application.

        Or you could NOT be a fucking retard and just use CSV.

        But then it would be interoperable with every spreadsheet and you wouldn't be able to make the Microsoft bash. So I guess that wouldn't serve your purpose.

        • Re: (Score:2, Insightful)

          by Anonymous Coward

          not playing devil's advocate here but csv is just that: "comma separated values." he might want to include formatting, simple formulae, etc. in the generated excel file.

          • This is a good point. Once, I was on a project where literally everything possible to save poor IIS 4.0 was worth doing.
            So, in addition to setting the content header to "application/excel" so that LoserNet Explorer would send the information that way, I also had the HTML table include a "Total" row, with "=SUM(a1:a75)" or whatever the final row number would be within the markup, so that the total would be calculated on the client.
            Oh, what a right disaster that .asp was, he remembered bitter-fondly.
        • by Fallus Shempus ( 793462 ) on Monday June 30, 2008 @04:21PM (#24007457) Homepage
          Or you could just tab delimit it and stick a .xls on the end...
      • by neokushan ( 932374 ) on Monday June 30, 2008 @03:57PM (#24007103)

        If they keep hold of the spec and don't release it, you'll bitch about them not being very friendly.

        If they release the spec to everyone and promise not to go after any Open Source projects that may take advantage of it, you'll bitch about them still trying to line their own pockets.

        Really, Microsoft has no chance of pleasing you, do they? Just accept that it's good for everyone to have open standards, regardless of the possible ulterior motives involved.

        • I'm sure this move was somewhat forced to please the European Union or something.

          In any case, I'm sure this would be just what Sun needs to make OpenOffice(.org) more compatible with MS Office than MS Office itself :)

        • by jsebrech ( 525647 ) on Monday June 30, 2008 @04:23PM (#24007493)

          Really, Microsoft has no chance of pleasing you, do they? Just accept that it's good for everyone to have open standards, regardless of the possible ulterior motives involved.

          The point is that MS's patent licenses (and therefore their specs), due to the non-commerce clause, are not GPL compatible. See, MS is not threatened by a BSD license, because if a BSD product takes off, they can just embrace, extend, extinguish. They're really worried about GPL though, because any GPL project that succeeds is a true competitive threat.

          In short, I don't think they've opened the specs. Documented them, yes, published them, sure, but they have NOT opened them.

    • by Lord Crc ( 151920 ) on Monday June 30, 2008 @03:51PM (#24007023)

      ...to finally share proper doc of the old standards. This just means they feel confident that MS Office 2007 will take firm enough root to ensure that the old game of catch up for FOSS projects will stay the same.

      I guess that whole ISO [slashdot.org] voting [slashdot.org] stuff [slashdot.org] on [slashdot.org] OOXML [microsoft.com] just passed you by?

  • interesting... (Score:5, Interesting)

    by AmaDaden ( 794446 ) on Monday June 30, 2008 @03:31PM (#24006673)
    Did anyone else notice this is coming out on the first business day at MS that is Gates free...?
  • by bragolach ( 855994 ) on Monday June 30, 2008 @03:31PM (#24006675)
    is WHEN are they going to release the source code to the Flight Sim in Excel 98?
    • by MiniMike ( 234881 ) on Monday June 30, 2008 @03:47PM (#24006969)

      That's actually hidden in the released documents. You have to go to a specific page of the Excel portion, and by starting at a specific line and skipping the correct numbers of lines between read lines, the spec will be revealed. The exact details are left as an exercise for the morbidly curious.

  • by advocate_one ( 662832 ) on Monday June 30, 2008 @03:33PM (#24006727)
    the "license" conditions no doubt will contain several pitfalls for anyone who actually wants to use it to implement a file input/output filter in conjunction with free software... and the other problem is once having seen the specification, you'll never be able to safely work on other free software projects again...
  • Old News (Score:3, Informative)

    by nmb3000 ( 741169 ) on Monday June 30, 2008 @03:34PM (#24006737) Journal

    Isn't this old news? I mean, it's been covered on Slashdot at least twice [slashdot.org] now [slashdot.org]. (Dear timothy, I'd like to introduce you to my friend Google [google.com].)

    Yes, the formats are large and complicated, but for a variety of good, if antiquated, reasons. I'd suggest anyone interested read Joel Spolsky's [joelonsoftware.com] blog post on it (which, being posted last February, isn't news either but hey, this is Slashdot).

  • Honest Attempt (Score:5, Insightful)

    by clampolo ( 1159617 ) on Monday June 30, 2008 @03:35PM (#24006763)

    I honestly believe that they are trying to give out complete information. It's just that they have 20 years of spaghetti code to somehow shape into an API document. I doubt if anyone at Microsoft really knows how the code works.

    With a 1000 page document describing how to list off spreadsheet information, I shudder to think about how organized their kernel is.

    • Re: (Score:3, Interesting)

      by kentrel ( 526003 )

      It's just that they have 20 years of spaghetti code to somehow shape into an API document. I doubt if anyone at Microsoft really knows how the code works

      Really? Care to provide some evidence for that "20 years of spaghetti code" comment. If MS can make Office 07 faster and more efficient for me to use than OpenOffice with its painfully slow operation, then surely its a miracle that they can do that despite using 20 year old spaghetti code

      • by eggz128 ( 447435 )

        OpenOffice.org is a descendant of Star Office, originally released in 1984 (according to Wikipedia anyway). They have plenty of their own legacy spaghetti code :)

        • OpenOffice.org is a descendant of Star Office, originally released in 1984 (according to Wikipedia anyway). They have plenty of their own legacy spaghetti code :)

          I was given to understand that OO being Open Source meant that hundreds of developers with unlimited free time would magically fix all of that for us and/or spin some straw into gold. :)

    • Undocumented spaghetti code is a feature you implement to ensure that your company doesn't dare fire or outsource you! ;)
    • Re:Honest Attempt (Score:5, Informative)

      by Blakey Rat ( 99501 ) on Monday June 30, 2008 @04:05PM (#24007231)

      Read this article:

      http://www.joelonsoftware.com/items/2008/02/19.html [joelonsoftware.com]

      Summarizing how Office file formats were made super complex without anybody necessarily doing anything wrong, or anybody writing bad code.

      • Re:Honest Attempt (Score:5, Insightful)

        by syousef ( 465911 ) on Monday June 30, 2008 @05:25PM (#24008387) Journal

        Joel on Software my arse. I do wish people would stop quoting that shill. He's a Microsoft apologist who in the past has managed to present Bill Gates' unprofessional attitude (swearing at staff etc) as some kind of misunderstood genius. No Joel, your boss was an unprofessional asshole.

        As for this article. No intern should have been working on Microsoft's flagship product even 15 years ago. That's 1992 we're talking about, not 1982. It's entirely possible to write efficient code that isn't unreadable spaghetti and it's not always a good solution to use Office automation to read office documents.

      • Re: (Score:3, Informative)

        by Koiu Lpoi ( 632570 )
        And everything I can do in word I can do in LaTeX - and more. Strangely enough, the LaTeX specs don't make my head spin. As much.
    • This isn't API documentation, this is file format specification. Very different.

      More imporantly, people need to understand the basics of Office formats: They are essentially FAT filesystems unto themselves.

      They contain a "root" with any number of nodes, which themselves may contain nodes, etc.

      And this is why you can create a spreadsheet, embed a word document, and then embed that into a powerpoint.

      When you think of it from this POV, needing 1000 pages to document a filesystem isn't unreasonable.

  • The catch (Score:4, Funny)

    by Anonymous Coward on Monday June 30, 2008 @03:36PM (#24006777)

    The released specifications are in a pre-2007 MS Office binary file format.

    • Re: (Score:3, Funny)

      by MBGMorden ( 803437 )

      You laugh, but I remember seeing someone upload (to a BBS many years ago) a copy of PKZip in .zip format . . .

  • Kudos to them (Score:5, Insightful)

    I can't understand the negativity. Sure Microsoft has an unpleasant past, but this is a good move on their part and should be met with nothing less than praise.

    We want to encourage more behavior like this.

    • by KWTm ( 808824 ) on Monday June 30, 2008 @04:02PM (#24007193) Journal

      I can't understand the negativity. Sure Microsoft has an unpleasant past, but this is a good move on their part and should be met with nothing less than praise.
      We want to encourage more behavior like this.

      You are right. This is a great step forward. However, I think the Slashdot community, with its cynical eye on Microsoft, is reminding us to take this in the proper context. It remains to be seen whether this is the beginning of a slow but steady change of course for the world's largest software company, or whether this is a fake-out to fool people into thinking that Microsoft is nice.

      Personally, I suspect that this reflects internal conflict within Microsoft, with some portions of the behemoth trying to do something good, while another faction still trying to squeeze money out of Microsoft's unique position in the software world.

      In any case, remember how some people would say, "You always complain about Microsoft! What would it take for you to admit that Microsoft is doing something good?"

      #2 on the list was: Stop hijacking the HTML standard and make a compliant browser! Then they put out IE7. (Not perfect, but a heckuva lot better than IE6!)

      #1 on the list was: Open up the Word document file format. Okay, so they've done that. (Again, not perfect, but a heckuva lot better than what went on before!)

      Congrats, Microsoft. You did it. A little late in coming, and you really didn't impress us with your OOXML fiasco waving that money around, but I'm willing to adopt a wait-and-see attitude to see whether it's still those same money-grubbing upper level managers that are in control, or whether this really is a new day at Microsoft.

      • The weird thing about OOXML (which was pure evil) is that a Microsoft spokesman recently said they are admitting they've lost that battle, and thusly they're adding in support for ODF.

        I'm curious how accurately that comment represents Microsoft's actual future strategy.

  • by BobNET ( 119675 ) on Monday June 30, 2008 @03:42PM (#24006881)

    The only problem? They released them in Word format...

    (Okay, not really -- someone must have realized that that would be silly.)

  • by Tumbleweed ( 3706 ) on Monday June 30, 2008 @03:43PM (#24006905)

    Wait ... what did I just say? ...

    I don't think I'm feeling well. I'm gonna go lie down now.

  • Holy Crap! (Score:5, Interesting)

    by erroneus ( 253617 ) on Monday June 30, 2008 @03:45PM (#24006913) Homepage

    Or is it Wholly Crap?

    I guess we'll see. I'm rather shocked by this. This is a kind of "giving in" gesture that is MOST uncharacteristic of Microsoft. Is this was the "Post-Gates" Microsoft will be like? How much more cooperative spirit will the community enjoy?

  • It's a trap (Score:2, Interesting)

    by symbolset ( 646467 )
    It's always a trap.
  • If Microsoft hopes to enable an acceptable level of compatibility, automatic test suites (including a complete range of test data files) for the specifications are needed. Descriptive specifications this large is always unclear or simply inconsistent with themselves or just wrong, somewhere.

    Descriptive specifications alone are never good enough.

    • Tell that to the W3C, please. Their specs are a joke (IMO) until they're willing to commit to writing a reference implementation.

  • 2 things though... (Score:4, Insightful)

    by hee gozer ( 1261036 ) on Monday June 30, 2008 @03:48PM (#24006983)

    a) Does this mean the standard GNU response [gnu.org] is now invalid?

    b) If someone writes a FOSS implementation of a .doc/.xls viewer, does that mean MSFT could more easily throw their weight to declaring .doc a standard? (Since a standard ought to have multiple implementations, although maybe office 2003 and 2007 counts as two, or office and word/excel/powerpoint viewer :p )

    • Re: (Score:3, Insightful)

      by Sloppy ( 14984 )

      I don't think it has made GNU's response invalid, just a little weaker. It used to be somewhere in between legally impossible and nearly impossible, to implement Microsoft's format. Now it is "merely" pragmatically impossible. It's still a joke-of-a-format, with absurdly-unnecessary complexity.

      I don't think anyone will ever write a reliable and complete (*) viewer for these formats, but I guess I shouldn't misunderestimate the amount of money someone like Novell or Sun might throw into something like th

  • I knew about this since august 2007 and even submitted it to slashdot twice, although it didn't get picked for front page. See http://developers.slashdot.org/~mastropiero/journal/ [slashdot.org]

    This is definitely useful for app developers of free software.

    • free software .. (Score:3, Insightful)

      by rs232 ( 849320 )
      "This is definitely useful for app developers of free software"

      You mean as in you work on the implementation for free and Microsoft benefits from any commercial developments.
  • Raymond Chen [msdn.com] (well known Microsoft blogger) linked to Joel on Software today about Why the MS Office file formats are so complicated [joelonsoftware.com]
  • Visio (Score:5, Insightful)

    by llzackll ( 68018 ) on Monday June 30, 2008 @04:03PM (#24007201)

    Where is Visio ?

  • "In addition to posting this documentation, Microsoft also published a list indicating which of the published protocols built into the following products are covered [microsoft.com] by Microsoft patents or patent applications"

    "Some of the Microsoft protocols include patented inventions, and others do not. You may benefit from a patent license [microsoft.com] if you are distributing implementations of these protocols commercially or if you use an implementation of any of the protocols covered by Microsoft patents"
  • This might be Microsoft's way to help combat global warming, by freezing Hell. :-)
  • by stox ( 131684 ) on Monday June 30, 2008 @04:12PM (#24007341) Homepage

    20 years ago, at what was the world's largest software project, we used to joke that if we wanted to ruin our competition, we would send them a copy of our specs. It looks to me that Microsoft got the same idea.

  • Meh.. /.-ers (Score:5, Insightful)

    by comm2k ( 961394 ) on Monday June 30, 2008 @04:13PM (#24007363)
    for all those thinking that this has anything to do with Gates leaving - you're wrong, its neither right nor interesting AND CERTAINLY NOT 5+ INSIGHTFUL.
    Microsoft releases api/ protocol specs | Feb. 2008
    http://www.theregister.co.uk/2008/02/21/microsoft_goes_open/ [theregister.co.uk]
    Microsoft releases further specs | April. 2008
    http://www.theregister.co.uk/2008/04/08/microsoft_posts_protocol_documents/ [theregister.co.uk]

    And they state that more will come after gathering feedback between then and June.

    Between now and June it will garner feedback from the developer community. Then, at the end of June, Microsoft will publish the final versions of technical documentation - along with definitive patent licensing terms.

  • Hehe. I love how the PDF was produced by Microsoft Word 2007, nice little dig at Adobe after they kicked up a fuss about it being installed by default.

  • by Anonymous Coward on Monday June 30, 2008 @04:20PM (#24007435)

    This means, as far as I know, that GPL implementations are not allowed. So it's an even worse situation than before, because Free Software developers can't even look at this documentation to verify any of the conclusions of their reverse engineering.

  • Could somebody explain to me the "flaws" of the office documents format? Besides not being open format, that is. This is a genuine question for a genuinely interested person.
  • is that they're ready with their new "standard", and they're confident that that won't be Reverse Engineered....

8 Catfish = 1 Octo-puss

Working...