Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Jean-loup Gailly On gzip, go, And Mandrake

Posted by Roblimo on Fri Mar 10, 2000 02:00 PM
from the ships-and-sealing-wax-and-kings-and-Linux dept.
Jean-loup is the kind of person I love to see us interview here. He's important in the sense that work he's done (positively) affects almost every Linux or Unix user, but the chance of Jean-loup ever getting any "mainstream" media attention is zero. Or possibly less. Without people like Jean-loup there would be no Open Source movement, and I consider the chance to present him as a Slashdot interview guest a *huge* honor. The readers who asked the excellent questions, and the moderators who helped select them, also deserve major kudos. So thanks to all of you for an excellent Q&A session!

1) bzip2 Support
by Aaron M. Renn

When is gzip going to provide (transparent) support for bzip2 files and the Burrows-Wheeler algorithm?

Will BW be an algorithm option within the gzip file format itself ever?

Gailly:

I have worked very closely with Julian Seward, the author of bzip and bzip2. The goal was to integrate a Burrows-Wheeler algorithm inside zlib 2.0 (upon which gzip 2.0 is based). One of the requirements was to avoid the kind of arithmetic coding used in bzip because of both patent and decoding speed concerns, so Julian wrote the Huffman coding code now used in bzip2. Another requirement was to put the code in library form and Julian did that too.

Unfortunately, Julian decided to release bzip2 independently instead of staying within the gzip 2.0 project. It was mainly my fault, since I couldn't spend enough time on the other parts of the project, and the project was not advancing fast enough. Since Julian left, the project progressed even more slowly, and new blood is obviously necessary, because other responsibilities no longer leave me enough time for gzip. If you're an expert in data compression, e-mail me to convince me that you are the most qualified person to turn the zlib/gzip 2.0 project into an overwhelming success :-)

2) The Data Compression Book
by drudd

I am a happy owner of The Data Compression Book (2nd Ed). With the increasing availability of compression routines within libraries (Java's GZIP streams spring to mind), does this make your book a little unnecessary?

Should software authors continue to write their own compression routines, or simply trust the versions available to them in library form?

I can see some definite advantages to library code, i.e. the ability to upgrade routines, and having standardized algorithms which can be read by any program which utilizes the library.

Gailly:

The compression routines in The Data Compression Book were written mainly for clarity, not for efficiency. The source code is present to help understand how the compression algorithms work. It is not designed to be used as is in other software packages, although it does work if efficiency is not a concern. Consider the book as teaching material, not as a data compression library distributed in printed form.

This doesn't mean that the book is unnecessary. Good data compression libraries don't appear magically; their authors had to learn compression techniques one day. If the book helps one person to get started in the data compression area and this person later writes a great compression library, the book will have been useful.

Judging by the success of my zlib data compression library, I think that a vast majority of software authors prefer using an existing library rather than reinventing the wheel. This is how the open-source model works: building upon the work of others is far more efficient than rewriting everything.

3) Compression patents
by Stephen

The compression world has many patents, notably for Lempel-Ziv compression as used in GIF. What is your view on companies patenting non-obvious algorithms for such processes as data compression?

Gailly:

The worst problem is companies patenting obvious algorithms. There are far more patents on obvious ideas than patents on really innovative ideas. In the data compression area, even something as basic as run-length encoding (replace "aaaaa" with a special code indicating repeat "a" 5 times) has been patented at a time where this technique had been well known and widely used for many years.

It is distressing to see the U.S. patent office granting such patents, in contradiction with the law requiring an idea to be both novel and non-obvious to be patentable. Philip Karn has made a good analysis of the problem.

Patents on non-obvious algorithms are a different matter. One view is that algorithms should not be patentable at all, whether obvious or not. This used to be the case, until the US patent office started to grant patents on methods which were nothing else than pure algorithms. I'm afraid that a switch back to the original situation is extremely unlikely.

Several reforms are necessary:

  • The patent term should be significantly shortened, at least for algorithms. The patent system was designed to benefit society as a whole, ensuring that new ideas would eventually be made public after a limited period of time instead of being kept as trade secrets. But 20 years is incredibly long in the software area. Granting a monopoly for such a long time no longer benefits society.
  • The non-obviousness requirement should be applied much more strictly. A little bit of common sense would avoid a lot of patents on trivial ideas.
  • Prior art should be checked more thoroughly. Even non-obvious ideas should not be patented if they have been in use for several years already.

4) A question about Mandrake...
by Mr. Penguin

As we all know, at first Mandrake was little more than a repackaged version of Red Hat. That's changed a bit with the newer versions. My question is this: to what degree will Mandrake continue to differ from RedHat and will there ever be a "developer" version (i.e. one that is centered towards those who are a bit more technically competant)?

Gailly:

That's changed more than a bit. Our distribution is now completely made by us. Believe me, doing everything ourselves represents a significant amount of work. Few people understand how much work is involved in making an independent distribution. We have our own development teams producing things like our graphical install DrakX, our disk partionner DiskDrake, management of security levels in msec, hardware detection with Lothar, etc... Our packages are more recent than those of Red Hat and have more functionality (such as supermount support in the kernel). Red Hat is now even copying packages made by MandrakeSoft (e.g. rpmlint). I hate having to speak like a salesman here, but it is really unfair to say that Mandrake just repackages RedHat; this is simply not true anymore.

Have you looked at Linux-Mandrake 7.0? It does include a developer version. At install time, select the option "Custom" then "Development". You will get all necessary development tools. We, as developers, use our own distribution :-)

5)Why is Mandrake better than Red Hat?

I guess that you have at least a little something to say about this.

Is the 586 optimization enough to justify Mandrake's position? Are you especially proud of any of the architectural differences between the distributions (from what I have been told, the Apache-PHP layout is quite a bit different).

How do feel about the steps that Red Hat has taken to change their distribution in reaction to yours?

Gailly:

Mandrake is far more than Red Hat plus 586 optimization. It is an independent distribution. (See the answer to A question about Mandrake above.) We have enhanced some packages (such as the kernel or Apache) to provide additional functionality for users.

It's clear that Mandrake pushes Red Hat to improve its own version and nowdays Red Hat includes some development from Mandrakesoft. There is a coopetition: Red Hat and MandrakeSoft both benefit from the same open-source community, but they compete for the customer. This coopetition is fully beneficial for the Linux users since we both need to constantly improve our version. We just make sure that Mandrake stays ahead :-)

6)Winzip
by Uruk

I noticed that you allowed the people who make the Winzip product to incorporate code written for Gzip. I think it's cool that you did that, because it would be horrible if winzip couldn't handle the gzip format, but at the same time, what are your thoughts about allowing free software code to be included in closed-source products?

Just out of curiosity, (tell me it's none of my business if you want to and I'll be OK with that) did you receive a licensure fee from the company that makes Winzip for the code?

Gailly:

I started writing compression code simply because my 20 MB hard disk, the biggest size one could get at the time, was always full. I didn't write it for money. Even after I got a bigger hard disk, I continued writing compression code for fun. In particular I was not interested in writing a Windows interface. This is why I allowed my code to be used in Winzip. I received exactly 0$ for this.

The zlib license also allows it to be used in closed-source products. This was an absolute requirement for the success of the PNG image format, which relies on zlib for data compression. If we had used a GPL license, Netscape and Microsoft Explorer wouldn't support PNG, and the PNG format would be dead by now. I also received 0$ for zlib, if you're curious...

Even though I allowed my code to be used in closed-source products, I am a strong supporter of the open-source model. That's also why I work for MandrakeSoft. The open-source model is getting so much momentum that it will in the end dominate the software industry.

7) What about wavelets? by Tom Womack

The Data Compression Book was an excellent reference when it came out, but there are some hot topics in compression that it doesn't cover - frequency-domain lossy audio techniques (MP3), video techniques (MPEG2 and especially MPEG4), wavelets (Sorenson video uses these, I believe, and JPEG2000 will), and the Burrows-Wheeler transform from bzip.

Do you have any plans for a new edition of the book, or good Web references for these techniques? BZip is covered well by a Digital research note, but documentation for MPEG2 seems only to exist as source code and I can't find anything concrete about using wavelets for compression. The data is all there on the comp.compression FAQ, but the excellent exposition of the book is sorely lacking.

Gailly:

These are all very worthy topics, and Mark Nelson and I would like to incorporate them into a new version of the book someday. However, the decision to produce a new version is taken by the publisher, not us.

Note also that these are all very big topics, and it would be quite easy to write an entire book on each one. I don't think they will fit well in a chapter or two. Covering JPEG in one chapter was difficult, and Mark Nelson has been criticized for not describing the specifics of the standard algorithm.

You can find some Web references here and there, in addition to the comp.compression FAQ.

8) Compression software
by jd

It is a "truism" in the Free Software community that code should be released early and released often.

However, much of the software you've written has started gathering a few grey hairs. Gzip, for example, has been at 1.2.4 for many, many moons, and looks about ready to collect it's gold watch.

Is compression software in a category that inherently plateus quickly, so that significant further work simply isn't possible? Or is there some other reason, such as Real Life(tm) intruding and preventing any substantial development?

(I noticed, for example, a patch for >4Gb files for gzip, which could have been rolled into the master sources to make a 1.2.5. This hasn't happened.)

Gailly:

I knew this question would come when I accepted a Slashdot interview. But I had to face it :-(

In short, you are completely right. While working on gzip 2.0, I continued to maintain gzip 1.x, accumulating small patches, and answering a lot of e-mail. But I was hoping to be able to release gzip 2.0 directly, without having to make an intermediate 1.x release. See my answer to the question bzip2 support concerning the state of gzip 2.0 and the Real Life interference. I'd be glad to hand over all my patches for 1.2.4 to the person who can help me getting the gzip 2.0 project to full speed.

9) Proprietary algorithms
by Tom Womack

The field of compression has been thronged with patents for a long time - but patents at least reveal the algorithm.

What do you think of the expansion of trade-secret algorithms (MP3 quantisation tables, Sorensen, RealAudio and RealVideo, Microsoft Streaming Media) where the format of the data stream is not documented anywhere?

Gailly:

The hardware specifications for some video cards were kept as trade secrets. As a result, the XFree86 project couldn't support these cards. Increasing pressure from users who didn't buy those cards because they couldn't be supported has led the manufacturers to release the hardware specifications, and those cards are now well supported.

Similarly, I think that pressure from the open-source community can become strong enough to force companies to open their formats. We're not completely there yet, but I believe that the open-source model will win in the end. Even a giant like Microsoft starts considering Linux as a real threat.

10) Go and Compression
by Inquisiter

When I think of a game like go or chess, I think that each player develops there own algorithm to beat their opponent. If you agree, what relationships or similarities do you see between your intrest in Go and your intrest in compression?

Gailly:

What a nice question!

Even though the rules of go are very simple, the complexity of go is astonishing. The best go programs can be beaten by a human beginner. The search space in go is so large that is impossible to apply the techniques that are so successful in chess. Professional go players never evaluate all possible moves. They are able to compress an enormous amount of information into a relatively small number of concepts.

Where a human beginner would have to painfully examine many possibilities to realize that a certain group is doomed, and would most likely fail in the process, a go expert can immediately recognize certain shapes and can very quickly determine the status of a group. One gets stronger at the game by reaching higher levels of abstraction, which are in effect better compression ratios. A professional go player can elaborate concepts that an average player would have great difficulties to understand.

Current go programs are overwhelmed by the amount of information present in a game of go. They are unable to understand what is really going on. Since brute force techniques can't work in go, programs will only improve by compressing the available information down to a manageable level.

+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by cprincipe (100684) on Friday March 10 2000, @09:14AM (#1211601) Homepage
    Yeah, I know it sounds hokey, but maybe it would be a good idea to have something on /. about the "heroes" of Open Source, the unsung people who really do the interesting and important things, but get no media attention.

    Maybe the slashdot effect will help these folks get some well-deserved attention.

  • by Mr. Penguin (87934) <drj&trivergent,net> on Friday March 10 2000, @09:17AM (#1211602) Homepage
    I do hope that this doesn't get moderated down because I am replying due to my own question.

    When I asked if there would ever be a "developer" version of Mandrake, I didn't mean a distribution that automatically included all of the developer tools. I do use Mandrake 7.0, and I do have all of the developer tools installed. What I was referring to would be something a bit more like Debian, where things are quite so user-friendly and "windowsish". In other words, a distribution that would be more fun for developers to play around with.

    I realize that Debian exists for this purpose, but I would like to see a variety of developer platforms of Linux. I think that the community as a whole seems to be forgetting that Linux started as Linus's hobby, and grew to be the hobby of dedicated hackers everywhere. Now, Linux has moved to the corporate world (even though I do love the new "business Tux" /. icon) and the roots and history are really gone.

    Brad Johnson
    --We are the Music Makers, and we
    are the Dreamers of Dreams

  • by Wah (30840) on Friday March 10 2000, @09:20AM (#1211603) Homepage Journal
    how about a quick bio for all interviewees? Some sort of resume, track record or something. When part of the payback is people knowing your name, it would be a nice service for /. to add a face and a history to it. (Yes, I am extremely lazy)

    --
  • by drenehtsral (29789) on Friday March 10 2000, @09:21AM (#1211604) Homepage
    I like this sort of thing, because a lot of people involved in the more "sexy" parts of development (videogames, GUI's, Compilers...) get heared from a lot. It's nice to hear from somebody who is in the trenches working on the mundane but absolutely invaluable day to day tools that we all use.
    One of my russian friends when talking about the programmers that write all the little bits of useful code translated (akwardly) a proverb that i cannot remember. It said basically that there is just as much dignity in being a farmer as [some high profile occupation] because without them we would all starve.
    If anybody knows the correct wording and or translation for this...
  • by Anonymous Coward
    I am perfectly satisfied with a Biscuit...

    Have you tried Powdermilk Biscuits?
    My, they're tasty, and expeditious...

    thank you.

  • I love this word "coopetition". (Well, I don't know it's new, but I've never seen it before). In one word, it describes exactly why and how the open source model can work.
  • by RancidPickle (160946) on Friday March 10 2000, @09:35AM (#1211608) Homepage
    First off, thanks to J-L and /. for another informative interview.

    Compression algorithms should be patentable, but for a 2-year period. This way, the company that develops a new one will have a temporary monopoly to make their money, but it can be included in other programs and libs after it is opened to public use. Developing new algorithms is great, but useless if they're not used because of a 20-year lock on them.

    I think that Mandrake is pushing RH to innovate. Remember, RH is used more by the newbies because it is more 'commercial' and it does some of their work for them. No, I'm not slamming RH, I'm just saying it's probably the most used by newcomers who, after they get used to Linux, can then either stay with RH or try the other flavors.

    I'm also quite pleased that J-L released the gz lib to Winzip. Even though he's allowing them to make money from his work, he is actually pushing the acceptability and openness of Linux. The man is to be commended for looking at the whole picture and truly accepting open-source GPL.
  • by Anonymous Coward
    If you go to mandrakes web site you will find a link to a project called "cooker". This is the bleeding edge development version for the next mandrake release. Okay here try this address. http://www.linux-mandrake.com/en/cookerdevel.php3
  • I for one had never heard of the author of Gzip nor Zlib and was just aware they worked. I was also not aware he allowed code to be used in winzip and other projects such as IE so that PNG could survive. Little pieces all fit toghether and he deserves props. Good Job. I am glad not everyone in the world is after money! And he could have easily asked for some money and im glad to see someones proud for NOT taking money! Good job again!

    Baaaaaa -Sheep

  • by pb (1020) on Friday March 10 2000, @09:42AM (#1211611)
    That "go" question was really cool. I wish I played it well enough to attempt to write a great computer player, as this is a classic hard problem in AI, with a $2,000,000 reward, IIRC...

    Of course, his "compression" is generally expressed in terms of "rules", and even some tips from image compression might help here. (recognize similar configurations, whether they be rotated, translated, etc., and adjust your strategy accordingly)

    bzip2 is a really great program, generally offering better compression than gzip at least for large bodies of text. What I'd really like to see is a meta-compression format that has some heuristic to identify the type of file, and use the appropriate (optimal) algorithm. I know most modern compression programs do something like this already, (like RAR and its multimedia compression) but it's still neat. The few bits to identify the compression methods can be well worth it...

    Also, hopefully those compression patches will eventually make it into Linux; it'd be great to see something like that working at the VFS level.

    If it used something like LZO [uni-linz.ac.at], there'd be up to gzip levels of compression with practically no performance hit on even a modest system. Maybe even speed improvements would be possible, due to having to read less data from the disk...

    Under those situations, I'd advocate comressing swap (and even memory!) where it would help (not recently used data), and maybe merging more of that into the filesystem too...
    ---
    pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
  • by dsplat (73054) on Friday March 10 2000, @09:43AM (#1211612)
    4) A question about Mandrake... by Mr. Penguin

    As we all know, at first Mandrake was little more than a repackaged version of Red Hat. That's changed a bit with the newer versions. My question is this: to what degree will Mandrake continue to differ from RedHat and will there ever be a "developer" version (i.e. one that is centered towards those who are a bit more technically competant)?


    I was a bit disappointed that Jean-Loup didn't mention the inclusion of quite a number of localization [linux-mandrake.com] packages into their release, and actively soliciting additional translations into any language they can find translators for [linux-mandrake.com]. In the spirit of full disclosure, my name does appear there, and I did receive a copy in exchange for some late-night translation efforts.

    Speaking of unsung heroes. I'd be interested in seeing an interview with one of the people who have kept the internationalization and localization of open source moving forward such as François Pinard of the Free Translation Project [umontreal.ca] or Pablo Saratxaga of MandrakeSoft who is also running the Linux i18n Project [linuxi18n.org].
  • FWIW, it seemed to come up a lot during the
    (most-recent) Microsoft anti-trust trial.
    MS execs seemed to think it describes the
    way the software industry in general works.

  • with regard to patents:

    Several reforms are necessary:

    The patent term should be significantly shortened, at least for algorithms. The patent system was designed to benefit society as a whole, ensuring that new ideas would eventually be made public after a limited period of time instead of being kept as trade secrets. But 20 years is incredibly long in the software area. Granting a monopoly for such a long time no longer benefits society.

    Unfortunately, I don't think a lot of people on Capitol Hill read slashdot frequently. Someone will eventually need to trudge to D.C. and make the case. The viability of the software industry is at stake.

  • When I asked if there would ever be a "developer" version of Mandrake, I didn't mean a distribution that automatically included all of the developer tools. I do use Mandrake 7.0, and I do have all of the developer tools installed. What
    I was referring to would be something a bit more like Debian, where things are quite so user-friendly and "windowsish". In other words, a distribution that would be more fun for developers to play around with.


    If you are really interested in being a developer it's best to download the source. I use and operate debian every day. I also update the packages I need from the unstable directory (ok not within the last 3 weeks because of my schedule) and then install them. If you compare the versions of some things say your favorite software package you can usually get a more up to date version at the maintainers site.

    One of my major headaches is wondering when some of these developers work on things.

    Anyone using the e2compr kernel patch? Well my hard drive kind of needs it because it's not that large (340Mb no not 34Gb or 3.4Gb 340Mb) well anyway if you check the current development kernel version you will note that it is 2.3.50 and the alpha patch is 2.3.51-2. Well the latest development for the 2.3.x series is at 2.3.6 and that dosn't even compile ( I currently use an old 2.2.7 version on my machine) well the last date of any change was November 11, 1999 :(. See even then you can get burned.

    I guess the best possible thing you could do would be to have say a seperate partition and install the source and binaries on that. Possibly you could then create debian packages for source and binaries for the rest of us (they are really wanted) but that's enough of my ranting.

    I realize that Debian exists for this purpose, but I would like to see a variety of developer platforms of Linux. I think that the community as a whole seems to be forgetting that Linux started as Linus's hobby, and grew to be the hobby
    of dedicated hackers everywhere. Now, Linux has moved to the corporate world (even though I do love the new "business Tux" /. icon) and the roots and history are really gone.


    In general I found that unless you like repartitioning your hd debian is a good choice although red hat is getting better in this regard with their upgrade features and experimental packages.

    I realize it and I find that to be the main reason that I can use linux and get away with it. Primarily I can code at least do so to get out of a paper bag however I am not that good. What I found out while thinking with nothing particularly exciting to do was that most of the apps that were used on these various forms of unix were created in house and not from a vendor so that essentially new apps didn't matter. However when you don't have time to write that new spiffy fractal generator in your spare time it becomes a problem. Therefore you have to rely on development in the community.
  • There have been arguments that maybe the best person to make a go program would be one who is not very strong at go. If the programmer just knew the basic game and made a program to do all the fancy thinking. Or maybe if would be good if a weak player made a program and as the player gets better, he would always make sure the program can beat him.
    I have spent some time trying to figure out a way to make a strong go-program, but this is beyond my capability and/or interest.
  • by SheldonYoung (25077) on Friday March 10 2000, @10:03AM (#1211618)
    <i>What I'd really like to see is a meta-compression format that has some heuristic to identify the type of file, and use the appropriate (optimal) algorithm. I know most modern compression programs do something like this already, (like RAR and its multimedia compression) but it's still neat. The few bits to identify the compression methods can be well worth it...</i>

    Yikes! Tell your brain "It's okay, you only thought it was a good idea".

    The TIFF image format does exactly this, which is why nobody uses it. The problem with a meta-format is that it can never be completely implemented by anybody. So user Joe downloads all of Project Gutenberg in a the tiny 500 kb file compressed by meta-zip, only to discover he doesn't have the GutenSquash codec. he then has to go hunt down the codec and install it, exactly as if he had to download and install a whole new compression tool.

    The problem can be reduced by having an auto-download feature like Windows Media, but the basic problem is the standard format is really just a standard container for unstandard things.

    Also, with fewer compression methods out there you stand a much better chance of finding something that can decompress your thesis 10 years from now.

  • by EXTomar (78739) on Friday March 10 2000, @10:12AM (#1211619)
    I'm so glad the last question was choosen and answered! It goes right to the heart of AI.

    - Go is severly complex to a computer. To play requires a combination of looking "n" moves ahead(which a computer does well) but regonizing patterns(which a computer doesn't do so well). The combination is way out of the reach of current computer AI.
    - Computers can't regonize what is in a digital image without massive hardware and even then the results aren't inspiring(think abstract art or oil painting). Yet humans can easily grasp the concept around images.
    - Computers can't regonize the content in text either. There are spell checking programs out there that can check grammer but none can actually know what they are doing...and "spell checking" is pattern matching! Humans can easily judge whether a sentance is "clever" or "stupid" but computer can not nor can the computer tell the difference between a sonnet or a memo.
    - We all know how poorly game AI tends to be. A computer can easily outplayed because it can not regonize between the good time to attack/defend and the bad time to attack/defend. The best it can do is guess(I've got "n" number, they've got "m"...my number is bigger so I should attack). Position and disposition are completely lost to a computer.

    I believe the Holy Grail in AI has to be real human like "recognition"**. Pattern recognition is something a computer does well but that really isn't the whole problem. Humans recognize "things" and instantly has memories associated with that "thing". Both steps seem to be compressed together. To a computer, recognition is an exhuastive, brute force search of its set of knowledge(ie a database) and the assocition is another exhuastive, brute force search through a different kind of set of knowledge(ie another database). Both steps are impossible to implement or maintain with our current methods and algorithms for pattern matching and searching. Something new has to be invented and I believe that "compression" and "hashing" will be a key to creating a true AI like sytem.

    ** note: for a computer to have "real human like recognition" it must also have the ability to be "mistaken". :-)
  • Fabrice Bellard [www-stud.enst.fr] wrote LZEXE, which transparently compressed executables under DOS, as well as the 486 mpg123 patch [www-stud.enst.fr] -- yes, MP3 playback on a 486, which I used on a 33Mhz box. Thanks to both of them, we all squeeze more efficiency out of hardware.
  • by pb (1020)
    Point taken. I wouldn't want to see a format implemented like that. AVIs and RealPlayer files do the same thing, and it annoys the crap out of me.

    The first thing to do is to make sure that all of the codecs used are implemented directly in the program / library, preferably open source so they can't be taken out later.

    The interesting (and different, I think) part I was mentioning would be the ability to change which "codec" is being used at whatever point in the file. VBR encoding for mp3's works sort of like this: it detects properties about the sound (don't need as much detail / less frequencies) and adjusts its compression accordingly. Of course, that's lossy. I guess PNGs can already do this too, and they're lossless, but I'd like to see more of this for general data compression, not just images.

    The advantage would be having one compressed file format that's good enough for many varieties of data, and we can go back to plain data files compressed with generic compressors. (like ps.gz instead of pdf files, or xcf.bz2 instead of .psd files, but with one great compressor. I don't think we'll ever settle on a standard data file format, even though it's probably not impossible to do...)

    Hey, if you're paranoid, just store your thesis as "thesis.roff.zoo", and you should be fine! ;)
    ---
    pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
  • by dillon_rinker (17944) on Friday March 10 2000, @10:16AM (#1211622) Homepage
    One view is that algorithms should not be patentable at all, whether obvious or not.
    Algorithms are patentable. In a sense, Algorithms have always been patentable. You just had to have a good patent attorney. Here's how it works:

    1. Submit application for algorithm patent
    2. Get laughed at by patent office clerk
    3. Hire good attorney to write application
    4. Submit application for a DEVICE which implements the algorithm
    5. Rake in the big bucks

    Anyone can implement your algorithm, as long as they don't manufacture a device that implements the algorithm. So if I had a compression algorithm, I would patent a device, consisting of CPU, memory, and disk drive that compresses data by (insert algorithm here). You would be free to implement the algorithm by hand, or maybe in a device made up of steam-powered wheels, cams, and rods. For all practical purposes, my algorithm would be patented.

    BTW, this same technique could be used to get around five year limits on software patents. Bezos is almost certainly aware of this, so I found his open letter to be quite disingenuous.
  • I always thought the road towards solving the Go problem would be in the graphics analysis space. People who play go well often talk about seeing shapes and patterns, much more so than someone who, for instance, has mastered Chess. It would not surprise me at all if you took a Go board, and thought of it as a B&W image and performed standard AI-like graphics analysis on it (edge-detection, boundary analysis, etc) you'd find someting interesting.

    Anyone know if this has ever been tried? I'd love to but I just don't know quite enough about either go or graphics...

    Eric

    Want to work at Transmeta? Hedgefund.net? Priceline?

  • by pb (1020)
    That's how the first good checkers AIs worked: the algorithm got to be smarter than the human programmer. Of course, Go is a *much* more complicated game than Checkers. But now that Chess is basically solved, maybe people will concentrate on Go, and we'll see some new approaches. AI could definitely use some new motivation.
    ---
    pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
  • by Anonymous Coward on Friday March 10 2000, @10:18AM (#1211625)
    Don't forget, Slashdot is News for Nerds, not News for Open Source prosthletizers(sp). If they start a Hall of Fame (which is a very good idea) it should include people that were great nerds that produced great things, even if they decided their inventions weren't worthless after all and charged money for them. The original developers at Sierra and Broderbund and just about everyone else mentioned in Steven Levy's book "Hackers: Heroes of the Digital Revolution" should be included.

    And no one should be allowed in simply because they were Open Source. For instance, GNOME and KDE developers are great contributors to the Open Source scene, but they're not doing anything innovative. New Xerox-PARC-based interfaces have been springing up since the 80s, there is no reason to accolade something that should be described as YAGUI.

    E.
  • That's surely the way. I'm not a great player, but the shapes are extremely important, at least for human. One of the first thing a go player learns is to look for shapes, are they good or bad and are they 'alive' or 'dead'. This brings down the number of possible 'good' moves. The problem as I see it, is that there are so many shapes where only one stone is the difference between great move and lost game.
    Currently it seems to be out of our reach to figure out these automatically with current computing power.
  • I fail to see why what you describe is a serious problem. The goal should not be to write a metaformat with hooks so anyone can plug in their own algorithm; the goal should be to write a single application which implements LOTS of algorithms. Successive versions of the program would, of course, implement more algorithms. Granted, early in the application's life span, there could be lots of revisions. But as the application matured, the only time there should be a revision is when a new type of content is developed, or mathematical advances result in significantly improved compression routines.

    Do you see problems with this? Am I missing something?
  • Heh, in their case, shouldn't the word be
    "co-optition"?
  • Except that in many vision applications the difference between two interpretations of a shape-image is a handful of pixels or less. There's af amous dept. of defense challenge that involves making the shapes in a box full of tools. The difference between finding a hammer and a wrench instead of a giant hammerwrenchthing is extremely small.

    My hope is that one of those techniques would prove workable...

    Want to work at Transmeta? Hedgefund.net? Priceline?

  • Maybe I'm misremembering, but I think Bezos mentioned that he was going to actively seek out and try to meet with some of the Capitol Hill guys about the patent duration issue.


    ---
  • It sounds to me like you're suggesting that the file should start with a header that describes what format it's in, and the compression program then figures out which engine to apply, sort of like starting a text file with:

    #!/bin/perl

    So that bash knows what engine to load for it when it's the first thing on the command line, right?
  • The RLE compression patent to which Gailly refers is "4056828: Run length encoding and decoding methods and means", filed by - you guessed it - Xerox. The full patent can be found here [ibm.com] at IBM's Patent Server [ibm.com], or here [164.195.100.11] at the US Patent and Trademark Office [uspto.gov]. The good news is that, as far as I know, this patent has expired because it's over 23 years old.
  • <i>Successive versions of the program would, of course, implement more algorithms.</i>

    Of course. This is mostly what new relases are about. The problem is that gzip hasn't had one in quite a while.

    I completely agree, as long as the point is better algorithms, and not just more.
  • by Anonymous Coward
    #Linux-Mandrake on irc.openprojects.net is a dedicated Linux-Mandrake Channel. It's quite nice, we help out newbies and the developers discuss the latest stuff in Mandrake. #Linux-Mandrake ON: irc.linux.com
  • I stand corrected re: only Open Source. I meant anyone, whether working for profit or not, who develops interesting and meaningful projects. I guess I focused on the Open Source stuff because usually these folks don't get the press the for-profit people do.

  • "Increasing pressure from users who didn't buy those cards because they couldn't be supported has led the manufacturers to release the hardware specifications, and those cards are now well supported.

    Similarly, I think that pressure from the open-source community can become strong enough to force companies to open their formats."

    I think this is probably a bit too optimistic. No matter how formidable the Open Source movement becomes, companies are probably not going to be releasing the full specifications for everything they develop. Competition won't allow for it.

    It's much more likely that, as Linux becomes more popular, hardware companies will begin to write drivers for Linux, instead of releasing the specs... on the software side, if we can't get vendors to start using open source formats, the time honored tradition of reverse engineering proprietary formats will continue...

  • But now that Chess is basically solved, maybe people will concentrate on Go

    Careful about the word "solved". Chess and checkers have not been solved, they've merely reached the point where computers can beat the best humans. Solved would mean that we know if the game is a white win/black win/draw before any moves have been made, assuming optimal play by both sides. Tic-Tac-Toe has been solved (draw). Checkers has been solved for, IIRC, 9 pieces or less on the board. (Chinook's endgame database.)

    I do agree on the AI for Go. As a beginning Go player, I'd love to have a better computer to practice on- even at my level I see the computer do just totally boneheaded things. (GnuGo for the Pilot is so bad I uninstalled it after about 4 games.)

    Eric

  • by Eccles (932)
    The goal should not be to write a metaformat with hooks so anyone can plug in their own algorithm; the goal should be to write a single application which implements LOTS of algorithms.

    The problem with this is that there isn't just one program. Thus someone else has to implement the code, and can't use an existing library (for language reasons, design reasons, platform differences, etc.), and they may just do a partial implementation for their short-term needs, and then the long-term implementation may never happen. Moreover, even if they're more conscientious, they have to do more implementation than if only a few algorithms were needed, and some may give so little improvement it really isn't worth the support overhead.
  • by Anonymous Coward
    I've also considered the stuff you're talking about. Data compression could also be called decomplexification- removing the irrelevant. (It could also be called maximal complexification too I suppose- separating the 'data' from 'that which is referenced to determine what the data means' to get a hunk of data with no pattern. That is, no redundancy.) Labeling is a kind of compression. Storing the name of a thing rather than the thing itself. Information itself could be called compression. Reducing infinite territory to finite map. Reducing an experience/phenomenon describable in an infinite number of ways from an infinite number of perspectives to a representation of that experience with only a finite number of perspectives from which that representation "makes sense". O mind is a tiny tiny comicbook =) J
  • Seems to me... Mandrakesoft is now just as beligerant about endusers as Microsoft is. I have posted in mandrake cooker mail list. I have posted in various Linux newsgroups/websites. So far I have 2 responses. NO responses from anybody at MandrakeSoft. So this means I can translate this to. MandrakeSoft is in it for the Money ONLY. Let people buy the OS. Let them find out that it will NOT run stable on their AMD K6/K62 Computer. Let them reinstall Windows 98. Let them put Mandrake 7.0 on the shelf to gather dust. If they question a simple issue... ignore it. DO NOT respond. THEY will GO AWAY. Well... I am not going to go away. there is not much I can do. Except REFUSE to install mandrake 7.0 on any and all machines I build and sell. I hope you enjoy the money you get from the retail sales to users who have no idea you only want their money. I really liked mandrake . Then I get this kind of response from a simple issue. I guess I will go back to where the first STABLE retail Linux began... RED HAT. That is where mandrake started from. I guess I have to join the Mandrake sucks newsgroups? No I refuse to drop that low. I would prefer a simple response, from the mandrakesoft people. It won't happen... but it would be nice. take care Christopher De Long
  • That's not quite accurate; as I understand the
    patenting laws (and going from past history,
    such as the early spreadsheet programs), you'd have patented a particular implementation of the algorithm. People would still be free to implement it other ways, even if that still involved memory/CPU/etc.

    #include
  • What I was referring to would be something a bit more like Debian, where things are quite so user-friendly and "windowsish". In other words, a distribution that would be more fun for developers to play around with.

    "Fun" in what sense? I don't consider configuring and tweaking a machine anywhere near as much "fun" as developing software for it; I have no problem with a "user-friendly and 'windowsish'" desktop, say - one of the reasons I use KDE on my machine at home is that the environment it gives me is "good enough" that I need to spend a lot of time figuring out all the knobs and buttons I needed to tweak to make it "good enough" for me.

    I'd rather be editing source code for Ethereal [zing.org], say, than editing configuration files.... (And, when trying to look into a networking bug, I'd rather be looking at network traces with Ethereal, say, than pawing through raw hex dumps of packets; part of the reason I got involved with Ethereal in the first place was that neither snoop, nor tcpdump, nor Microsoft's Network Monitor could cope with NFS or SMB requests or replies that required more than one frame. Having had to pull apart an NFS READDIR reply by hand, I decided that it was the sort of work best done by a computer, rather than by a human....

    Ethereal can't yet handle multiple-frame "packets", either, but it's something we want to do, and we have some ideas about how to do it.)

    I.e., as I said in another comment in another thread [slashdot.org]:

    I'd rather use my brain cells doing software development than configuring software tools, tweaking my system so that it recognizes my PnP ISA sound card, blah blah blah.

    (I also think that a system that doesn't need configuration is preferable to a system that offers a Nice Friendly GUI for configuration. If the system can figure something out for itself, e.g. what sort of peripherals it has, or what properties some peripheral has, it shouldn't oblige you to tell it that something, regardless of whether you do it by editing a configuration file or pushing buttons on a nice shiny GUI application.)

  • To a computer, recognition is an exhuastive, brute force search of its set of knowledge(ie a database) and the assocition is another exhuastive, brute force search through a different kind of set of knowledge(ie another database).

    Not always, yes in traditional procedural based AI (prolog - yuck) this would be a way to go about it. But there are better ways (IMHO) to go about it. Neural networks are very good at pattern recognition, hetero-associative and auto-associative neural networks are very good at recall - even with significant amounts of errors or knocked out axons these systems still perform well. Hollands work on classifier systems (using genetic algorithms to generate new classifiers) is also another way to perform some of these operations.
  • Ooh, a fellow user of both debian and e2compr.

    With regards to e2compr, I've coped simply by not upgrading the kernel. I know, I probably should...

    A word to the wise: keep a spare kernel capable of reading e2compr partitions on a floppy. You'll never know when Debian's going to decide to nuke your master boot record and keep you from booting.

  • I quite like this idea of having a time limit on algorythm patents (it surprised me that algorythms based on artificial, relatively finite, formal systems got patented in the iirst place; unless you're likeminded to Penrose, this is almost like patenting a mathematical expression [patent pending] ;-).

    However, there is one worrisome aspect. Essentially what we're trying to do here is acheive a balance between protecting a consumer who doesn't want to see his work fobbed of by someone (insert Big Monopoly of choice here), and protecting the skilful programmer, the careful orator of the formal system, in moving about in this problem space as he chooses.

    I simply fear that 2 years is inadequate, (if we feel an algorithm is a patentable idea)) to protect programmers. IBM, Sunsoft, Really Big Widgets Inc. [patent pending] are not terribly concerned to squeeze another 10% out of the Widget; they are concerned about not having their oppenents squeeze 10% more than them. In some markets w/ high entry barriers it is easy to imagine the Big Companies simply waiting out this period.

    Of course, the Internet is a great help in this, by promoting OpenSource, a spectacular exercise in cutting barriers to entry to software development.

    In short, I think it would make the most sense to have some sort of variable patenting scheme. Of course, this would require the various patent offices actually stopping to think about grants GASP! instead of just blindly issueing them and then leaving the ensuing mess to the courts!

    ____________________________

  • I love this word "coopetition". (Well, I don't know it's new, but I've never seen it before). In one word, it describes exactly why and how the open source model can work.

    Sad-but-true dept: the word was coined by Ray Noorda, the founder of Novell, not too very long before attempting to "co-op-ete" with Microsoft nearly destroyed Novell and got him tossed out on his ear.

    When did "innovate" become a synonym for "destroy all comptetiton"?

  • (I'm drunk, but try to hold on)
    Just my opinion, but so far there have been weak programs written by strong players and really bad programs written by decent programmers. What we really need for go is a program, written by a pro, that knows the fundamentals and is able to actually _learn_ from that and evolve to something really sophisticated.
  • Hrmm is that what he did? This post wasn't very coherent, and I don't remember the other one.

    I wish he had been a little clearer and more consistent - I can assure you that there is no issue with mandrake on K6 - I'm on a K6 box running mandrake right now.

    Unless he bought a service agreement of course they are under no obligation to provide tech support, unless he can narrow down the problem a bit it wouldn't help anyhow - I repeat, mandrake + K6 works just fine, there has to be another problem...
  • Beg to differ. Evolution would always choose perfection over the ability to improve.

    --Kevin
  • Howdyho!

    I'm running Mandrake 7.0 (2nd ISO bugfix release) on a K6/2-450, and it seems to be working quite well. There were a few small issues with the compiler being unhappy (I got rid of that by installing egcs RPMs, re-compiling GCC 2.95 and installing that) but otherwise it's been quite well mannered.

    For reference, my system includes 128MB RAM, a Matrox G100, 2 ethernet cards (1 realtek ISA/1 3C905B PCI), an ISA aha152x SCSI card with a zip drive attached, a Voodoo 2 8MB, 3 IDE HDs, a SB16 non-pnp sound card and an IDE CD-ROM drive. Quite an assortment of hardware, all autodetected, and it runs very smoothly (unlike Windows 2000 Pro, which doesn't like my 3com ethernet card...)

    I'd chance to guess you might be having a problem with the CPU overheating. This is very common with the higher speed K6/2 and K6/3 chips. Memory is also a possible problem, there are many odd errors with RAM that only rear their ugly heads when certain access patterns are used, these are possibly exascerbated by Mandrake's "Pentium Optimizations."

    If you have multiple DIMMs/SIMM banks, try removing some of the RAM to see if that makes a difference. If it works properly, put the RAM you just removed in the box by itself and try it again. If it fails, you've likely just found your problem.

    You can also try turning down the voltage on your processor by 1/10th of a volt, your processor will probably run stably at this voltage. This was recommended by Tyan on their web site, it won't damage your processor to try it and it works quite well on my 1590S. Dropped the temp by quite a few degrees on my CPU, and lower temps mean fewer failures.

    Hope this helps!

    Mr. Hankey
  • The RLE compression patent to which Gailly refers is "4056828: Run length encoding and decoding methods and means", filed by - you guessed it - Xerox.

    The comp.compression FAQ [faqs.org], which Gailly maintains, mentions patent 4,586,027 [ibm.com] as "[covering] run-length encoding in its most primitive form."

    The good news is that the FAQ is incorrect. Patent 4,586,027 does not cover RLE as such, but a variant in which the run length is placed before and after the compressed run. This allows the decompressor to read the stream backwards, as from a magnetic tape, and still correctly decompress the stream. All of the independent claims [ibm.com] make some mention of "front and back" or "front and end."

    Moreover, this patent expired in 1998 for nonpayment of maintenance fees.
  • >Now, Linux has moved to the corporate world (even though I do love the new "business Tux" /. icon) and the roots and history are really gone.

    Well, I can only partly agree, myself. Yes, Linux is becoming more business-friendly. But, quite frankly, the one major thing that you're probably thinking of is the desktop interface known as KDE.

    Quite frankly, KDE was designed by a bunch of coders that took what Apple and Microsoft came up with, and bashed them over the head with it. :^) Perhaps, like me, you'd like to see more scriptability in newer developments such as KDE and GNOME. Well, the only thing close on KDE is kTk (which is kinda obsolete in KDE KRASH) and personally, as far as GNOME is concerned, I hate parens (not that you can tell by my comment here. :^)

    If nothing else, I'd like to see something spring up that's:

    1.) integrated with KDE
    2.) mostly scripted
    3.) extremely easy to code for

    Perhaps this could be an updated version of TkDesk, or something completely new. I was always kinda disappointed that TkDesk got the shaft after KDE/GNOME came about. :^( The darn thing is *still* powerful and versatile. I've seen KDE & GNOME apps that go on for several hundred lines that could probably be done in 10 lines of code in a TkDesk config file. :^) Yeah, it's not got a themeable toolkit, it's not multithreaded, it's got a clunky interface, blah blah blah, but it more than makes up for being long in the tooth with it's versatility.

    Anyone care to hack together some stuff for it? I once had KDE/GNOME menus going on it (used KDE2WM.pl to do it, though...) and I'm sure that a GUI config could be hacked together easily enough...TkDesk with wizards, now *there's* a scary thought! :^)