Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Programming Microsoft IT Technology

Microsoft Releases Office Binary Formats 259

Microsoft has released documentation on their Office binary formats. Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't. stm2 points us to some good advice from Joel Spolsky to programmers tempted to dig into the spec and create an Excel competitor over a weekend that reads and writes these formats: find an easier way. Joel provides some workarounds that render it possible to make use of these binary files. "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts."
This discussion has been archived. No new comments can be posted.

Microsoft Releases Office Binary Formats

Comments Filter:
  • by VosotrosForm ( 1242886 ) on Wednesday February 20, 2008 @09:25AM (#22486956)
    I would like to point out another good option Joel doesn't have on his list. It's a software called OfficeWriter, from a company named SoftArtisans in Boston. When I last checked/worked there, it was capable of generating Excel and Word docs on the server, and I believe Powerpoint was probably coming relatively soon. Creating a product that can write office documents isn't quite as impossible in terms of labor as Joel is saying.... but it's still way beyond any hobby project. Plus, he is suggesting that you use Excel automation or the like through scripts to create documents on the server, which is a decent suggestion, if you want Excel or Word to constantly crash and lock up your server, and you enjoy rebooting them every day. If you want to do large scale document generation on a server you are going to need something like Officewriter. -Vosotros/Matt
  • by ContractualObligatio ( 850987 ) on Wednesday February 20, 2008 @09:43AM (#22487132)

    If there are any optional parts of the spec, those parts aren't covered.

    RTFA. That's in the FAQ. Yes they are.

    If the spec refers to another spec to define some part of the format, that part isn't covered.

    In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!

    I'm not saying that there aren't any flaws, but this kind of ill informed, badly thought out comment (a.k.a. "+5 Insightful", of course) has little value.

  • Re:first post? (Score:4, Informative)

    by julesh ( 229690 ) on Wednesday February 20, 2008 @09:48AM (#22487156)
    I'd assume it has something to do with the antitrust action the EU was taking. Didn't they order that Microsoft had to open all their protocols/formats?

    As far as I remember, they only insisted on protocols (it was on the basis of a complaint from server OS vendors that MS was tying their market-leading desktop OSs to their server OSs and gaining an unfair advantage).
  • Re:Joel (Score:5, Informative)

    by zootm ( 850416 ) on Wednesday February 20, 2008 @09:49AM (#22487164)

    I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.

    The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.

    Oh god I've started another hungarian argument.

  • by leuk_he ( 194174 ) on Wednesday February 20, 2008 @09:52AM (#22487200) Homepage Journal
    Did you read the article? Nah, why would you do so for some MS bashing.

    If you read the article you would notice that the binary solution of winword 97 (and in fact it is compatible with it predecessors) was a good solution in 1992 when word for windows 2.0 was created. Machines did have have less memory and processing power that your phone, and still had to be able to open a document fast.

    my conclusion is that the open office devs are crazy that they ever supported the word .doct format, and did a surprisenly good job.
  • by Chief Camel Breeder ( 1015017 ) on Wednesday February 20, 2008 @09:55AM (#22487232)
    Actually, I think they're releasing it now because they were ordered to in a (European?) court settlement, not because they want to.
  • Re:Joel (Score:5, Informative)

    by mhall119 ( 1035984 ) on Wednesday February 20, 2008 @10:03AM (#22487294) Homepage Journal

    Programmers didn't understand why Hungarian originally used his famous notation
    It wasn't created by some guy named "Hungarian", it was created by Charles Simonyi.

    http://en.wikipedia.org/wiki/Hungarian_notation [wikipedia.org]
  • by jsight ( 8987 ) on Wednesday February 20, 2008 @10:34AM (#22487632) Homepage

    Hurr hurr. The Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.


    Among other issues, borderlayoutmanager did not behave properly in MS's implementation. It was buggy in incompatible ways, but your right, that in and of itself wasn't the big problem. The big problem was their insistence on both not fixing the bugs, and not going along with major initiatives (such as JFC/Swing).

    But back in the day, the Microsoft J++ development environment was far superior to anything Sun had to offer. We're talking a good 10 years ago. Sun has finally managed to catch up in the past two or three years, but still, Sun's problem wasn't that the Microsoft implementation was worse: their problem was that it was better.


    If by "2 or 3 years" you mean about 5 years, then I'd agree. Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.
  • Re:Joel (Score:4, Informative)

    by encoderer ( 1060616 ) on Wednesday February 20, 2008 @10:49AM (#22487834)
    "Programmers didn't understand why Hungarian originally used his famous notation"

    Uhh.. There was never a "Mr. Hungarian" ....

    It was invented by Charles Simonyi and the name was both a play on "Polish Notation" and a resemblance to Simonyi's father land (Hungary) where the family name precedes the given name.

  • by Pofy ( 471469 ) on Wednesday February 20, 2008 @10:51AM (#22487846)
    >As PJ pointed out over on Groklaw, MS are giving a "Promise"
    >not to sue but this is very very far from a license.

    Some (hypothetical?) questions:

    What would happen if those patents in some way was transfered to someone else?

    Despite the promise, are you still actually infringing the patent? Just with an assurance of the current patent holder that he won't do anything?

    If so, what would happen if it becomes criminal to break a patent (it was quite close to be part of an EU directive not so long ago)? Together with such suggestions one have also seen sugestions that police should be allowed (and required?) to act on those crimes even without a filing from someone suffering infringment. How would that apply to a situation with such a promise?
  • Re:Joel (Score:4, Informative)

    by encoderer ( 1060616 ) on Wednesday February 20, 2008 @10:55AM (#22487902)
    It's not the language that makes it obsolete, it's today's IDEs.

    First, understand that nearly every bit of "Hungarian Notation" you've ever seen is misused. The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.

    This is still valuable today.

    However, in days of lesser IDEs, the more common use of Hungarian Notation is still helpful, as it was a lot more work to trace a variable back to it's declaration to identify the type.

  • by Thundersnatch ( 671481 ) on Wednesday February 20, 2008 @10:59AM (#22487956) Journal

    Anyways, it's no surprise that it's all the OLE, spreadsheet-object-inside-a-document, stuff that would make it difficult to design a Word killer. (How often to people actually use that anyway?)

    At my company, our users do that every day. Excel spreadsheets embedded in Word or PowerPoint, Microsoft office Chart objects embedded in everything. It's what made the Word/Excel/PowerPoint "Office Suite" a killer app for businesses. MS Office integration beat the pants of the once best-of-breed and dominant Lotus 1-2-3 and WordPerfect. When you embed documents in Office, instead of a static image, the embedded doc is editable in the same UI, and can be linked to another document maintained by somebody else and updated automatically. It saves tremendous amounts of staff time.

  • by ozmanjusri ( 601766 ) <aussie_bob@hotmail . c om> on Wednesday February 20, 2008 @11:08AM (#22488050) Journal
    Microsoft implementation of Java wasn't buggy: far from it, it was actually superior to the Sun implementation. It was faster and integrated better with Windows.

    Ah, marketing. Where would we be without it?

    Microsoft developed J/Direct specifically to make Java non-portable to other OSs. The MS JVM wasn't better than Suns, it was just tied heavily into the OS, and code developed for it broke if run on any other VM.

    J++ was another lockin tool to ensure any "Java" developed in Microsoft's IDE would only run on Microsoft OSs. JBuilder was always a better package anyway.

  • by Jugalator ( 259273 ) on Wednesday February 20, 2008 @11:11AM (#22488102) Journal

    So while I'm not a conspiracy nut, I do believe one of Microsoft's goals here are to assist the process of those binary formats becoming obsolete, to drive Office 2007/2008 adoption.
    Not a chance. Microsoft is bound to release Office 2003 security updates until January 14, 2014 [microsoft.com].
  • by slapout ( 93640 ) on Wednesday February 20, 2008 @11:26AM (#22488350)
    Joel worked on the Excel team.
  • by mlwmohawk ( 801821 ) on Wednesday February 20, 2008 @12:23PM (#22489210)
    Ok, I was going to respond to this but I will not get dragged into another one of these discussions. It's worse than tabs vs. spaces, I tells ya.

    I have to disagree, tabs and spaces are easily handled with an "indent" program.

    On VERY LARGE projects where there are hundreds of include files and hundreds of source files, it is not convenient or even possible in all cases to find the definition of an object that may be in use.

    Context and type information in the name makes it easier to quickly read a section of code:

    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = pnReceived[ndx];
    }

    To anyone versed in your prefixing, it is easy to see pnUsrData is an array of integers, and we are assigning values from another array of integers.

    However:
    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = foobar[ndx];
    }

    In the above, it is clear we are assigning data to elements in an integer array from a subscript on an object, but what kind of object? Where do we find its definition?

    Now, renamed it looks like this:
    for(int ndx=0; ndx nLimit; ndx++)
    {
            pnUsrData[ndx] = mytypeFoobar[ndx];
    }

    No we can see it is a "mytype" object and we can easily find its reference and declaration.

    That's what Hungarian notation provides and it is not useless, IMHO, it's over zealous use made code less readable. Rather than give hints, zealous proponents attempted to create a whole new language for specifying variable and function names that was virtually impenetrable.

  • by sohp ( 22984 ) <snewton@@@io...com> on Wednesday February 20, 2008 @01:22PM (#22490156) Homepage
    ...is total BS.

    A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They're still in the file format for backwards compatibility, and because it doesn't cost anything for Microsoft to leave the code around.


    You better believe it costs Microsoft quite a bit to keep it around. At the lowest level, having the codebase that big means the tools and practices needed to manage it have to be equal to the task. Here's a hint: MS does not use SourceSafe for the Office codebase. (They use the Team tools in visual studio, so they do eat their own dogfood, but not the lite food).

    Far more insidious is the technical debt incurred by carrying around that backwards compatibility with Version-1-which-supported-123-bugs-and-all. Interdependencies that mean a bug either can't be fixed without introducing regressions, or can only be fixed dint of a complex scheme involving things like the 1900 vs. 1904 epoch split that Joel discusses.

    Oh yes, it costs a small fortune to carry around that baggage, and only a company as big as Microsoft with Microsoft's revenues can afford it. The price might seem like 'nothing' in the billions of dollars that flow in and out of Microsoft, but ignoring the elephant in the room doesn't make the elephant go away.
  • by baboo_jackal ( 1021741 ) on Wednesday February 20, 2008 @05:31PM (#22494038)

    Maybe MS does now want to move away from their old proprietary formats into new open ones. But the old formats were built over decades without that goal.

    No argument there.

    The summary also points out with links to why this release might not actually indicate MS is really releasing their formats to break with that past after all.

    No. The article doesn't make that claim. That's your own interpretation. The overall intent of the article is simply to convey a few simple points:

    1) Why the MS office document format is so crufty (minus conspiracy theories).
    2) How to work *with* the Windows OS to use those documents.
    3) How to use better, more open, alternatives to creating office documents.

    Nothing in the article contradicts anything I said earlier.

Old programmers never die, they just hit account block limit.

Working...