Slashdot Log In
Microsoft Releases Pre-2007 Binary File Format Specs
Posted by
timothy
on Mon Jun 30, 2008 03:25 PM
from the in-exchange-they-can-read-the-odf-spec dept.
from the in-exchange-they-can-read-the-odf-spec dept.
An anonymous reader writes "Microsoft has released the specifications for the binary file formats used by pre-2007 Microsoft Office applications. They're accurate this time! Honest! While the documents are enormous (Word alone requires 533 pages; Excel runs over 1000 plus another 850 pages for the Office 2007 binary format), they hopefully will be useful to developers trying to create or extract information from Microsoft Office files (which despite their flaws, have been the de facto standard in many fields for some time now)."
Related Stories
Submission: Microsoft releases binary file format specs. by Anonymous Coward
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
,,, or undo file corruption? (Score:5, Interesting)
I know it's old hat by now, but back in the Office 98 days, file corruption was a big deal.
I wonder what was going on, but it occurs to me that now I could concievably actually back out
the errors, and figure the thing out.
Re:,,, or undo file corruption? (Score:4, Interesting)
Considering that the Office files are almost binary dumps of the software state, you're saying the same thing ;)
Parent
So that's only about 2400 pages! (Score:5, Interesting)
A far cry from the 6,000 pages for OOXML ..
Re:So that's only about 2400 pages! (Score:4, Insightful)
actually that's inaddition to the 6,000 pages for the OOXML spec since the OOXMl spec references that data.
Parent
Re:So that's only about 2400 pages! (Score:4, Insightful)
What about line spacing, detail of information, number of examples? If the spec is clearest when fully expanded who cares if they can squeeze it onto a single page in microfilm by cutting out helpful documentation?
Rather than looking at the number of pages why not look at the number of distinct node types/attributes? Surely that would give a better idea of spec size?
Parent
How freaking "open" of them... (Score:5, Insightful)
And wasn't it just yesterday some twits had an artice about how MS is changing/will change? I sure wouldn't hold my breath!
Re:How freaking "open" of them... (Score:5, Informative)
Check out the patent maps here [microsoft.com]
Parent
Re: (Score:3, Informative)
Re:How freaking "open" of them... (Score:5, Funny)
Parent
Re:How freaking "open" of them... (Score:4, Insightful)
Parent
Re:How freaking "open" of them... (Score:5, Informative)
Parent
Re:How freaking "open" of them... (Score:5, Insightful)
It is important to note that open source developers, whether commercial or non-commercial, will not need a patent license for the development of implementations of these protocols or for the non-commercial distribution of these implementations,
So...commercial developers can develop as long as they don't distribute. Boy, that's helpful/useful. About as helpful and useful as a kick in the nuts. :)
I still say the idea that a protocol can be patented is silly to the point of almost being an oxymoron. We can, perhaps, debate whether an implementation of a protocol can be patented, but the idea that the protocol itself can be patented seems like blatant abuse of the patent system, even if you're one of those who believes that software or business-method patents are a valid notion.
Fortunately, it does seem to be getting easier to challenge patents. Now if only we could get MS to admit what patents they think various open source projects might be violating, so we can start the search for prior art.... :)
(Alternatively, maybe we can keep them muttering vague threats about their patents without being specific long enough that we can ask for estoppel or laches if they ever do try to get specific. The rumblings help because that way they can't pretend that they didn't know about the supposed violations all along, a vital point in raising a defense of laches.)
Parent
Re:How freaking "open" of them... (Score:5, Insightful)
They *could* do something right, but they choose not to. It would work against their business model.
They *could* release specs unencumbered by patents. They simply don't want to.
True interoperability is the last thing that they truly want.
This has happened before. It will happen again. See IBM decades ago. The entrenched monopolist is never in favor of true interoperability -- nevermind whatever they may say. Everybody else who lives on the scraps is in favor of interoperability. Who you think is right depends on whether you think the currently in power monopolist has the God given right to be the only one in the business.
Parent
Re: (Score:3, Insightful)
It's useful for people who want to generate Word documents. A project I worked on wanted to generate Excel spreadsheets as a way to download reports from a web application. We got it to work using Apache POI's HSSF, which while it doesn't implement everything reverse-engineered enough for it to work.
...Wait a moment. Allowing people to generate documents using old formats that work with the current Office actually helps Microsoft's Office monopoly, doesn't it? And here I thought they were just being kind.
Re:How freaking "open" of them... (Score:5, Insightful)
If they keep hold of the spec and don't release it, you'll bitch about them not being very friendly.
If they release the spec to everyone and promise not to go after any Open Source projects that may take advantage of it, you'll bitch about them still trying to line their own pockets.
Really, Microsoft has no chance of pleasing you, do they? Just accept that it's good for everyone to have open standards, regardless of the possible ulterior motives involved.
Parent
Re:How freaking "open" of them... (Score:5, Informative)
Really, Microsoft has no chance of pleasing you, do they? Just accept that it's good for everyone to have open standards, regardless of the possible ulterior motives involved.
The point is that MS's patent licenses (and therefore their specs), due to the non-commerce clause, are not GPL compatible. See, MS is not threatened by a BSD license, because if a BSD product takes off, they can just embrace, extend, extinguish. They're really worried about GPL though, because any GPL project that succeeds is a true competitive threat.
In short, I don't think they've opened the specs. Documented them, yes, published them, sure, but they have NOT opened them.
Parent
Re:How freaking "open" of them... (Score:4, Informative)
Actually the non-commercial clause is incompatible with the BSD license as well.
Parent
Re:How freaking "open" of them... (Score:4, Funny)
Parent
Re:CSV is crap (Score:4, Interesting)
Wise man say building all corporate data on excel spreadhseets is building a house of cards.
I couldn't agree with you more, but the more recent trend is to use Excel as the presentation layer, which is much, much safer. You build a web site that pumps the data out of the database, create Excel sheets dynamically, and you got a lot of happy Excel junkies.
Parent
Re:How freaking "open" of them... (Score:4, Insightful)
...to finally share proper doc of the old standards. This just means they feel confident that MS Office 2007 will take firm enough root to ensure that the old game of catch up for FOSS projects will stay the same.
I guess that whole ISO [slashdot.org] voting [slashdot.org] stuff [slashdot.org] on [slashdot.org] OOXML [microsoft.com] just passed you by?
Parent
interesting... (Score:5, Interesting)
Re:interesting... (Score:5, Informative)
Parent
Re: (Score:3)
Re:interesting... (Score:5, Funny)
Indeed. This is a strange new move by the borg.
This reminds me the episode of House M.D. when he started acting nice one day and everyone began freaking out.
You should chill out and think of this being more of a partial victory than an enemy's plan.
Parent
I think the real question (Score:5, Funny)
Re:I think the real question (Score:5, Funny)
That's actually hidden in the released documents. You have to go to a specific page of the Excel portion, and by starting at a specific line and skipping the correct numbers of lines between read lines, the spec will be revealed. The exact details are left as an exercise for the morbidly curious.
Parent
wouldn't touch it with a ten foot pole... (Score:3, Interesting)
By following the links.... (Score:3, Informative)
From here -> You or anyone else has nothing to worry about. Microsoft has changed its tune. [microsoft.com]:
Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification (âoeCovered Implementationâ), subject to the following. This is a personal promise directly from Microsoft to you, and you acknowledge as a condition of benefiting from it
are we clear ? (Score:5, Funny)
Parent
Re:By following the links.... (Score:5, Informative)
This has been dissected and shown to promise nothing - because it's impossible to clearly see what exactly the "necessary claims" are, and because useful implementation of the spec without the "merely referenced" stuff may be impossible.
Parent
Old News (Score:3, Informative)
Isn't this old news? I mean, it's been covered on Slashdot at least twice [slashdot.org] now [slashdot.org]. (Dear timothy, I'd like to introduce you to my friend Google [google.com].)
Yes, the formats are large and complicated, but for a variety of good, if antiquated, reasons. I'd suggest anyone interested read Joel Spolsky's [joelonsoftware.com] blog post on it (which, being posted last February, isn't news either but hey, this is Slashdot).
Honest Attempt (Score:5, Insightful)
I honestly believe that they are trying to give out complete information. It's just that they have 20 years of spaghetti code to somehow shape into an API document. I doubt if anyone at Microsoft really knows how the code works.
With a 1000 page document describing how to list off spreadsheet information, I shudder to think about how organized their kernel is.
Re: (Score:3, Interesting)
It's just that they have 20 years of spaghetti code to somehow shape into an API document. I doubt if anyone at Microsoft really knows how the code works
Really? Care to provide some evidence for that "20 years of spaghetti code" comment. If MS can make Office 07 faster and more efficient for me to use than OpenOffice with its painfully slow operation, then surely its a miracle that they can do that despite using 20 year old spaghetti code
Re:Honest Attempt (Score:5, Informative)
Read this article:
http://www.joelonsoftware.com/items/2008/02/19.html [joelonsoftware.com]
Summarizing how Office file formats were made super complex without anybody necessarily doing anything wrong, or anybody writing bad code.
Parent
Re:Honest Attempt (Score:5, Insightful)
Joel on Software my arse. I do wish people would stop quoting that shill. He's a Microsoft apologist who in the past has managed to present Bill Gates' unprofessional attitude (swearing at staff etc) as some kind of misunderstood genius. No Joel, your boss was an unprofessional asshole.
As for this article. No intern should have been working on Microsoft's flagship product even 15 years ago. That's 1992 we're talking about, not 1982. It's entirely possible to write efficient code that isn't unreadable spaghetti and it's not always a good solution to use Office automation to read office documents.
Parent
The catch (Score:4, Funny)
The released specifications are in a pre-2007 MS Office binary file format.
Re: (Score:3, Funny)
You laugh, but I remember seeing someone upload (to a BBS many years ago) a copy of PKZip in .zip format . . .
Kudos to them (Score:5, Insightful)
I can't understand the negativity. Sure Microsoft has an unpleasant past, but this is a good move on their part and should be met with nothing less than praise.
We want to encourage more behavior like this.
Yes, kudos for this ... but not for MS's past (Score:5, Insightful)
You are right. This is a great step forward. However, I think the Slashdot community, with its cynical eye on Microsoft, is reminding us to take this in the proper context. It remains to be seen whether this is the beginning of a slow but steady change of course for the world's largest software company, or whether this is a fake-out to fool people into thinking that Microsoft is nice.
Personally, I suspect that this reflects internal conflict within Microsoft, with some portions of the behemoth trying to do something good, while another faction still trying to squeeze money out of Microsoft's unique position in the software world.
In any case, remember how some people would say, "You always complain about Microsoft! What would it take for you to admit that Microsoft is doing something good?"
#2 on the list was: Stop hijacking the HTML standard and make a compliant browser! Then they put out IE7. (Not perfect, but a heckuva lot better than IE6!)
#1 on the list was: Open up the Word document file format. Okay, so they've done that. (Again, not perfect, but a heckuva lot better than what went on before!)
Congrats, Microsoft. You did it. A little late in coming, and you really didn't impress us with your OOXML fiasco waving that money around, but I'm willing to adopt a wait-and-see attitude to see whether it's still those same money-grubbing upper level managers that are in control, or whether this really is a new day at Microsoft.
Parent
Chicken and Egg (Score:5, Funny)
The only problem? They released them in Word format...
(Okay, not really -- someone must have realized that that would be silly.)
Yay for Microsoft! (Score:3, Funny)
Wait ... what did I just say? ...
I don't think I'm feeling well. I'm gonna go lie down now.
Holy Crap! (Score:5, Interesting)
Or is it Wholly Crap?
I guess we'll see. I'm rather shocked by this. This is a kind of "giving in" gesture that is MOST uncharacteristic of Microsoft. Is this was the "Post-Gates" Microsoft will be like? How much more cooperative spirit will the community enjoy?
2 things though... (Score:4, Insightful)
a) Does this mean the standard GNU response [gnu.org] is now invalid?
b) If someone writes a FOSS implementation of a .doc/.xls viewer, does that mean MSFT could more easily throw their weight to declaring .doc a standard? (Since a standard ought to have multiple implementations, although maybe office 2003 and 2007 counts as two, or office and word/excel/powerpoint viewer :p )
Visio (Score:5, Insightful)
Where is Visio ?
It's a Trap! (Score:5, Funny)
20 years ago, at what was the world's largest software project, we used to joke that if we wanted to ruin our competition, we would send them a copy of our specs. It looks to me that Microsoft got the same idea.
Meh.. /.-ers (Score:5, Insightful)
Microsoft releases api/ protocol specs | Feb. 2008
http://www.theregister.co.uk/2008/02/21/microsoft_goes_open/ [theregister.co.uk]
Microsoft releases further specs | April. 2008
http://www.theregister.co.uk/2008/04/08/microsoft_posts_protocol_documents/ [theregister.co.uk]
And they state that more will come after gathering feedback between then and June.
Between now and June it will garner feedback from the developer community. Then, at the end of June, Microsoft will publish the final versions of technical documentation - along with definitive patent licensing terms.
Has no one noticed that they're covered by "OSP"? (Score:4, Informative)
This means, as far as I know, that GPL implementations are not allowed. So it's an even worse situation than before, because Free Software developers can't even look at this documentation to verify any of the conclusions of their reverse engineering.
Re: (Score:3, Insightful)
free software .. (Score:3, Insightful)
You mean as in you work on the implementation for free and Microsoft benefits from any commercial developments.
What makes you think so? (Score:5, Funny)
Parent