Microsoft Releases Office Binary Formats 259
Microsoft has released documentation on their Office binary formats. Before jumping up and down gleefully, those working on related open source efforts, such as OpenOffice, might want to take a very close look at Microsoft's Open Specification Promise to see if it seems to cover those working on GPL software; some believe it doesn't. stm2 points us to some good advice from Joel Spolsky to programmers tempted to dig into the spec and create an Excel competitor over a weekend that reads and writes these formats: find an easier way. Joel provides some workarounds that render it possible to make use of these binary files. "[A] normal programmer would conclude that Office's binary file formats: are deliberately obfuscated; are the product of a demented Borg mind; were created by insanely bad programmers; and are impossible to read or create correctly. You'd be wrong on all four counts."
Office Doc Generation on the Server (Score:5, Informative)
Re:patent promise doesn't sound very good (Score:5, Informative)
RTFA. That's in the FAQ. Yes they are.
In other words - if you do something related to a spec that isn't covered, it isn't covered. How could it be any different?!
I'm not saying that there aren't any flaws, but this kind of ill informed, badly thought out comment (a.k.a. "+5 Insightful", of course) has little value.
Re:first post? (Score:4, Informative)
As far as I remember, they only insisted on protocols (it was on the basis of a complaint from server OS vendors that MS was tying their market-leading desktop OSs to their server OSs and gaining an unfair advantage).
Re:Joel (Score:5, Informative)
I'm not going to say anything against the Microsoft doc; he's pretty much absolutely right and it's a great introduction to why older formats are how they are in general to boot.
The Hungarian thing – no, I still don't see it. Hungarian should not be used in any language which has a reasonable typing system; it's essentially adding unverifiable documentation to variable names in a way that is unnecessary, in a language which can verify type assertions perfectly well. The examples in the article are just ones where good variable naming would have been more than sufficient. It's not good enough.
Oh god I've started another hungarian argument.
Re:I thought it was pretty well known (Score:3, Informative)
If you read the article you would notice that the binary solution of winword 97 (and in fact it is compatible with it predecessors) was a good solution in 1992 when word for windows 2.0 was created. Machines did have have less memory and processing power that your phone, and still had to be able to open a document fast.
my conclusion is that the open office devs are crazy that they ever supported the word
Re:One possible reason for releasing the specs now (Score:5, Informative)
Re:Joel (Score:5, Informative)
http://en.wikipedia.org/wiki/Hungarian_notation [wikipedia.org]
Re:patent promise doesn't sound very good (Score:5, Informative)
Among other issues, borderlayoutmanager did not behave properly in MS's implementation. It was buggy in incompatible ways, but your right, that in and of itself wasn't the big problem. The big problem was their insistence on both not fixing the bugs, and not going along with major initiatives (such as JFC/Swing).
If by "2 or 3 years" you mean about 5 years, then I'd agree. Java development tools didn't really reach maturity until things like Eclipse came onto the scene about 5 years ago.
Re:Joel (Score:4, Informative)
Uhh.. There was never a "Mr. Hungarian"
It was invented by Charles Simonyi and the name was both a play on "Polish Notation" and a resemblance to Simonyi's father land (Hungary) where the family name precedes the given name.
Re:Promise not a license (Score:3, Informative)
>not to sue but this is very very far from a license.
Some (hypothetical?) questions:
What would happen if those patents in some way was transfered to someone else?
Despite the promise, are you still actually infringing the patent? Just with an assurance of the current patent holder that he won't do anything?
If so, what would happen if it becomes criminal to break a patent (it was quite close to be part of an EU directive not so long ago)? Together with such suggestions one have also seen sugestions that police should be allowed (and required?) to act on those crimes even without a filing from someone suffering infringment. How would that apply to a situation with such a promise?
Re:Joel (Score:4, Informative)
First, understand that nearly every bit of "Hungarian Notation" you've ever seen is misused. The original set of prefixes suggested by Simonyi were designed to convey the PURPOSE of the variable, not simply the data type. It was adding semantic data to the variable name.
This is still valuable today.
However, in days of lesser IDEs, the more common use of Hungarian Notation is still helpful, as it was a lot more work to trace a variable back to it's declaration to identify the type.
Re: "compound documents." oh no, run away! (Score:5, Informative)
At my company, our users do that every day. Excel spreadsheets embedded in Word or PowerPoint, Microsoft office Chart objects embedded in everything. It's what made the Word/Excel/PowerPoint "Office Suite" a killer app for businesses. MS Office integration beat the pants of the once best-of-breed and dominant Lotus 1-2-3 and WordPerfect. When you embed documents in Office, instead of a static image, the embedded doc is editable in the same UI, and can be linked to another document maintained by somebody else and updated automatically. It saves tremendous amounts of staff time.
Re:patent promise doesn't sound very good (Score:5, Informative)
Ah, marketing. Where would we be without it?
Microsoft developed J/Direct specifically to make Java non-portable to other OSs. The MS JVM wasn't better than Suns, it was just tied heavily into the OS, and code developed for it broke if run on any other VM.
J++ was another lockin tool to ensure any "Java" developed in Microsoft's IDE would only run on Microsoft OSs. JBuilder was always a better package anyway.
Re:One possible reason for releasing the specs now (Score:4, Informative)
Re:Joel being apologetic (Score:3, Informative)
Re:Joel - Hungarian Notation (Score:3, Informative)
I have to disagree, tabs and spaces are easily handled with an "indent" program.
On VERY LARGE projects where there are hundreds of include files and hundreds of source files, it is not convenient or even possible in all cases to find the definition of an object that may be in use.
Context and type information in the name makes it easier to quickly read a section of code:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = pnReceived[ndx];
}
To anyone versed in your prefixing, it is easy to see pnUsrData is an array of integers, and we are assigning values from another array of integers.
However:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = foobar[ndx];
}
In the above, it is clear we are assigning data to elements in an integer array from a subscript on an object, but what kind of object? Where do we find its definition?
Now, renamed it looks like this:
for(int ndx=0; ndx nLimit; ndx++)
{
pnUsrData[ndx] = mytypeFoobar[ndx];
}
No we can see it is a "mytype" object and we can easily find its reference and declaration.
That's what Hungarian notation provides and it is not useless, IMHO, it's over zealous use made code less readable. Rather than give hints, zealous proponents attempted to create a whole new language for specifying variable and function names that was virtually impenetrable.
old code costs nothing.. (Score:3, Informative)
You better believe it costs Microsoft quite a bit to keep it around. At the lowest level, having the codebase that big means the tools and practices needed to manage it have to be equal to the task. Here's a hint: MS does not use SourceSafe for the Office codebase. (They use the Team tools in visual studio, so they do eat their own dogfood, but not the lite food).
Far more insidious is the technical debt incurred by carrying around that backwards compatibility with Version-1-which-supported-123-bugs-and-all. Interdependencies that mean a bug either can't be fixed without introducing regressions, or can only be fixed dint of a complex scheme involving things like the 1900 vs. 1904 epoch split that Joel discusses.
Oh yes, it costs a small fortune to carry around that baggage, and only a company as big as Microsoft with Microsoft's revenues can afford it. The price might seem like 'nothing' in the billions of dollars that flow in and out of Microsoft, but ignoring the elephant in the room doesn't make the elephant go away.
Re:Don't Adopt. Convert. (Score:2, Informative)
No argument there.
No. The article doesn't make that claim. That's your own interpretation. The overall intent of the article is simply to convey a few simple points:
1) Why the MS office document format is so crufty (minus conspiracy theories).
2) How to work *with* the Windows OS to use those documents.
3) How to use better, more open, alternatives to creating office documents.
Nothing in the article contradicts anything I said earlier.