ICFP 2001 Task 110
David Mentré writes: "The ICFP 2001 programming contest TASK is now available. The objective is to write an optimizer (aka compressor) for an HTML-like language. It must work for arbitrarily big inputs and in a limited wall-clock time. Can you guess what will be the winning language? ;)" We already announced the contest, but now that the task is available, it might be fun to look at. So what will the contestants come up with? Anyone think perl might be the language of choice? :)
Caution!!! (Score:1)
For those who are actually participating the competition, SAVE THE DOCUMENT!!! Don't give yourself an excuse for LOSING when the page is SLASHDOTTED and you can't lookup the task again!!!!!
My favorite language is (Score:1)
Now what was the article about ?
Its almost certain (Score:1)
Re:What about XML/XSL? (Score:1)
Re:no ++ (Score:1)
Of course. Go ahead, and ridiculize yourself by submitting an non-functional entry.
Those guy are functional from the bottom up, and their preaching about it. The contest is _designed_ to make non-functional languages ridiculized.
I bet that the SML/NG input to the software will be tailored to make functional software win (ie: you will have incredibely complex input, that will only be suitable for languages with built-in matching, so functional languages will be standing on the shoulders og giants).
Using C or C++ in this constest is akin using perl to write a linux kernel module.
Note that it is not a bad contest, on the contrary. It shows the areas where functional languages shine. Of course, I'll be rotfmalo the day where someone will submit a winning C entry (for instance, by having a library that would implement most of the goodies functional languages are good at)
Cheers,
--fred
Re:this story is gay (Score:1)
Re:no ++ (Score:1)
I am really short of time, but I am pretty sure that there are case where an optimisation will depend on very distant tags. This is the place where you C program will have a very hard time (in particular because the time limit imposes non-brute force heuristics).
That beeing said, 3 days is a lot of time, and the contest *looks* easier than the preceeding ones.
Cheers,
--fred
Re:And the winner is... (Score:1)
Re:no ++ (Score:2)
Re:Oh, OK. Sometimes I'm just really stupid. (Score:1)
Re:Oh, OK. Sometimes I'm just really stupid. (Score:1)
Hint: There are also EM/S nesting, size nesting, and underline nesting optimizations.
Local optimizations can be done on the fly as you initially indicated. The global optimization problems for nesting are algorithmically "hard." My guess is it is NP. The global optimization problem can be stated as a path minimization problem in a fairly large state space (naive size is 2x2x2x2x2x4x10x8) with ordered waypoints. The state space is actually a bit smaller than that.
Oh, and the state transition distances are different. A <B>,</B> pair costs 7 bytes, but a <PL>,</PL> pair costs 9. I bet that linear programming will choke on this problem too. There are a bunch of local minima. Time to dust off my Numerical Recipes book and read up on simulated annealing.
Of course, what do I know?
Re:Doesn't sound too hard... (Score:1)
<B></B>This is not Bold<B></B>
or
<B><B>This was already Bold</B></B>
The first pass optimization will worst case include all of the original tags and should therefore pass the stupidity test. Of course, after this initial cleanup, then the real work has to start.
Purely functional C (Score:2)
Hmm... I have better things to do than figuring out an algorithm for this, really I do...
What about XML/XSL? (Score:2)
--
Barry de la Rosa,
public[at]bpdlr.org
Re:Who will win? Look at past years: (Score:1)
--sam
--sam
Mercury faster than C? (Score:1)
Doug forgot to enable the Mercury compiler's optimizations! He's fixed that now, so we now do a lot better than before.
Nonetheless, I think the Mercury folks took a prize last year though, didn't they?
We came fourth, which was good enough to rate a mention, but not good enough for a prize.
I think ANSI Pascal's up to the challenge. (Score:1)
Re:Guess What ? (Score:1)
i suppose if you are a real glutton you could use assembler *shakes head*
actually a language with excellent string manipulation is also Business Basic (Basis Business Basic [basis.com] or ProvideX Basic) [pvx.com]
pick your favorite language
\\||//
--ooo00ooo--
Re:If you need me... (Score:1)
[Dilbert] Not so good, you just authored IIS.
Remember: it's a "Microsoft virus", not an "email virus",
Re:excuse me... (Score:1)
You just very rarely see perl being used functionally because despite the fact it has functional features, its functional features are really quite clumsy to use, and anyone prone to functional programming would probably prefer another language anyway.. in other words, you can be as functional as you like in perl. It just isn't worth the bother.
Larry Wall likes LISP, though.
Scheme/Lisp (Score:3)
for example: <WEIGHT unit="pound">
<NET certified="certified"> 67 </NET>
<GROSS> 95 </GROSS>
</WEIGHT>
becomes:
(WEIGHT (@ (unit "pound"))
(NET (@ (certified)) 67)
(GROSS 95)
)
How about.... (Score:4)
Anyone? Ok, I'm actually serious, don't mod me up +1 Funny.
Lex &Yacc (Score:1)
I dare you... (Score:1)
Re:How about.... (Score:2)
Why go for third party add-ons? You do know there's support for regular expressions [sun.com] in Java these days?
Let's see the code people (Score:1)
Re:innovation (Score:1)
Re:Mercury (Score:1)
Check out comment 28 [slashdot.org], it's his point.
---
Re:Just to torture myself (Score:1)
gzip *.html
Re:Who will win? Look at past years: (Score:1)
Re:What about XML/XSL? (Score:3)
No.
XML [jasmine.org.uk] is not a programming language, it's a markup language. You can't use it to compute anything. XSL-FO [jasmine.org.uk] is not a programming language, it's a page description language. You can't use it to compute anything. XSL-T [jasmine.org.uk] is a programming language, and a very elegant one, but it isn't (IMHO) well suited to this task, and certainly would not produce a compact output.
Re:And the winner is... (Score:2)
While that's partly true, it won't help you. The task is designed for functional languages, and it shows. Functional languages have algebraic data types and pattern matching built in. In C you'd have to implement this yourself, plus all the code required for memory management and write the optimiser, all in three days.
That's why I predict O'Caml will win. Of all the languages which have native support for precisely this kind of task, it's the one that also has the support for beating the time limit.
Re:And the winner is... (Score:2)
Java does not have pattern matching, unless you think method dispatch is pattern matching. :-)
Having said that, Java would be a good general-purpose language to solve this problem in. I don't like your chances given the three day limit, though.
Re:Any language will do (Score:2)
No, flexibility will be the deciding factor, closely followed by speed of development. Speed of execution won't make a huge difference, because the judges are going to throw in a test case so large and complex that no program can find the "optimal" sequence in the given wall time on the test machine. So the winner will be the program which can make the best use of the time available.
The winner will use a set of optimisation techniques (from simple transformations to expanding the string out into decorated characters and re-encoding), and will be able to combine these techniques seamlessly. And this will be coded in three days.
This is not a test for a systems programming language like C. This is a test for a language in which you can write complex decision logic quickly.
And the winner is... (Score:5)
I predict that the winner will probably be O'Caml, and for one reason only: it will beat the time limit. Moreover, this will have little to do with the speed of the O'Caml implementation.
Your program has to complete within a given wall time (not CPU time). This is supplied as a command-line argument and environment variable. The winner will be the tool which can stop computing before the time is up and produce the output. The winning program will be one that iteratively improves the input string, but is monitored by another thread which stops the optimiser thread when the time is almost up, then grabs the best solution so far and performs the output.
O'Caml just happens to have good built-in primitives to write this kind of "interrupt the thread when the time is up and let me know what you did" computation.
SNOBOL... (Score:1)
Re:Who will win? Look at past years: (Score:4)
Re:Hesitating ? (Score:1)
- Steeltoe
Re:And the winner is... (Score:1)
Java? Memory management - check; pattern matching - check; threads - check. All there and done for you, just have to write the actual optimiser. I might even give it a go and see how it handles it...
Re:How about.... (Score:1)
I'd give you +2 Funny, were it possible.
Mind you, that's not because of the Java suggestion; rather it's the mod point filer request that cracks me up.
Re:How about.... (Score:1)
http://alphaworks.ibm.com/tech/regex4j
Still, this isn't exactly Javas' core competency.
Solution in /bin/sh (Score:1)
echo Content-Type: text/html
echo
echo <html><title>Server overloaded</title>
echo <body bgcolor="#0000FF">
echo <font face="Courier, Courier New, Fixed">
echo IIS.EXE caused a general protection fault in
echo module BLUESCREEN.DLL at 0123:45F00F00
# Yes, I know it's not exactly clean HTML
# It'll work in almost all browsers though, and
# they DID say compressed...
Re:This is easy... (Score:2)
print "404 Not FoundThe requested URL was not found on this server.";
Any language will do (Score:2)
All you have to do, is parse the document, and then spew it back out in a smaller representation. There are lots of booleans in there, and those are as far as i can see what you really need to worry about.
Of course, I'm not gonna bother to do this one...
Anyhow, speed will be the deciding factor on this one. I wouldn't be surprised if the winner is written in plain ole C.
Re:SML/NG (Score:1)
Re:Who will win? Look at past years: (Score:1)
Re:actually... (Score:1)
I would expect that a good C implementation would not be faster, either, because using a higher-level, safe language would allow you to make more optimizations to your program (since you can write it faster).
Go ahead and prove me wrong, though!
Re:And the winner is... (Score:1)
I'd like to see a java entry in this, but having written compilers in java before, I know it is very painful. (Not as painful as C, mind you, but painful.) The ML family and related languages have a huge advantage here.
Re:Guess What ? (Score:1)
Ha, good luck with those. This is not just simple string manipulation...
Well, Microsoft Research. (Score:1)
Well, that would be Microsoft Research, actually, and last I checked Peyton-Jones was at the Cambridge one.
Mercury Speed (Score:1)
Here's benchmarks:
http://www.bagley.org/~doug/shootout/craps.shtm
Lots of tests aren't implemented for Mercury, which explains its low score.. but it isn't doing to well in the ones that are implemented, anyway.
Nonetheless, I think the Mercury folks took a prize last year though, didn't they? (Don't get me wrong, I'm all for unorthodox new languages.)
Re:LISP (Score:2)
I don't think lisp is the most popular functional language any more.. I would say it's probably SML or O'Caml. Lisp hasn't won in any of the previous years, I don't think.
I agree wholeheartedly, though, that Perl won't be powerful enough for this kind of AST manipulation. I like that the task will probably make slashdot kids think that perl is extremely appropriate -- that will put them in their place all right.
Brainf*ck. (Score:2)
I doubt it... (Score:2)
XSLT is designed not to have side effects and is instead more of a functional language meaning that you can't assign values to variables dynamically. Considering that this task may require quite a lot of storing of state before deciding which optimization to make I think it is rather unlikely that XSLT is a good language to use to solve this particular problem.
Then again I didn't look too hard at the task but it may be that the judges have designed it in such a way that it is amenable to functional programming considering that it is the being run by the International Conference on Functional Programming [mit.edu]. Even then most functional languages do allow you to store state, even Lisp so an XSLT solution would have to be rather clever.
--
Re:Doesn't sound too hard... (Score:3)
Consider: <B><I>Hello</I></B>World&l t;B><I>Foo</I></B>
Should be: <B><I>Hello<PL>World</PL>F oo</I></B>
But how do you know if it's more optimal to go with a PL tag vs closing the B and I? You would need the array to go further down and see that you will need the B and I again so you don't bother to close it.
This is the problem I'm running in to now
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx [slashdot.org].
Re:no ++ (Score:1)
If you need me... (Score:5)
Re:And the winner is... (Score:2)
Re:And the winner is... (Score:2)
Re:Where's the torture? Scheme is fun! (Score:2)
Re:Where's the torture? Scheme is fun! (Score:2)
Re:Any language will do (Score:3)
Anyhow, speed will be the deciding factor on this one.
I'll be pretty surprised if speed ends up being the deciding factor, at least for the first few places. This is a pretty complex problems with many ways you can optimize certain aspects, like overlap inversion, color nestsing, redundency elimination, etc. But to really minimize the size of the output file (the most important factor after qualification and validity) contestants have to come up with a great balance of optimizing all of those factors together. Because it of time constraints, contestants cannot effectively run multiple attempts across all optimizations to find the single smallest output (this would run in exponential time, so small inputs might work, but large ones would run for days). The winners will likely have the best algorithms for optimizing the output in some kind of polynomial time, regardless of the language used.
Re:And the winner is... (Score:2)
Instead it refers to a convenient mechanism for building switch-like statements over algebraic datatypes. Since the language in the task is described by a grammar, it can be represented as an algebraic datatype in a functional language.
In Java-programmer-friendly lingo, this would mean: Make one constructor (differently named) for each element in the grammar. That way you can build a tree from the text you parse. Then you can make recursive functions, either modifying or returning a new tree, where you will typically use switch-statements on the data, differentiating each elemeent by the constructor used to create it.
Most functional languages are quite convenient when working with data represented in a tree-form, because algebraic datatypes are so extremely convenient. In Java, one would have to use classes and subclasses instead of one simple definition of an algebraic datatype, and lot's of casting and if-statements instead of pattern matching. It's not that it can't be done, it is just a bit more to write (which happens to be somewhat important when having limited time available, but certainly not all-important, if you type fast).
72 hours, No Sleep Till Brooklyn! (Score:1)
Re:Mercury (Perl has first-class functions) (Score:1)
No, Perl has too many capabilities to be considered a purely functional language, however it does have first-class functions, so any program that can be created in more than one functional language (say Lisp and ML) can also be created in Perl:
http://perl.plover.com/lambda/ [plover.com]
Re:And the winner is... (Score:3)
Re:no ++ (Score:5)
ie, it doesn't have to be a functional programming language.
Re:If you need me... (Score:1)
...the awesome power... (Score:2)
Mercury (Score:3)
As far as time limits are concerned, may I suggest Mercury [mu.oz.au]. It's a functional/logical hybrid that's somewhat unorthodox for VHLLs in that it's a compile-only rather than interpreted-only or interpreted/compiled. Its designers claim that it can get similar performances to programs of similar functionality written in C, alhtough I haven't seen any benching, and I'll believe it when I see it. :)
Anyone think perl might be the language of choice? :)
No, because Perl isn't a functional language. *sigh*.
And if M$ doesn't win... (Score:1)
Arbitrarily Large Inputs (Score:2)
Actually, they say they will test with inputs as large as ``a few megabytes''. Note that it is always legal to merely echo an input if you can't figure out how to optimize it...
Regardless of how we do in the competition, we're learning a lot about our programming language [nickle.org] and its implementation, anyhow. The task isn't a perfect fit for our language, which is good, since it has tickled some of the dark corners already.
Two days to go!
Re:Mercury (Perl has first-class functions) (Score:2)
Lisp is not just a functional language, as any functional purist will tell you with an air of disdain. It supports multiple paradigms.
I don't know Perl, so the following is a real question, not merely rhetorical: Can you create new functions on the fly in Perl as you can in Lisp? That is, can you have a function1 that returns a new function2 as its value, where function2's definition is relative to the parameters that were passed into function1 at its time of creation? That is, can you create closures in Perl? In Lisp, this is done as follows:
Just passigning functions around does not make a language functional. Heck, C can do this function pointer. Truly functional languages support the creation of closures (among other things). Can Perl do this?
Re:actually... (Score:2)
It's a great choice. You'll kill the guys working in x86 assembler.
Re:And the winner is... (Score:1)
Hesitating ? (Score:1)
I'll personally offer 99 bottles of Chimay Belgian beer to the winner of he does it in BrainF*** [ls-la.net]!
--
Guess What ? (Score:2)
--
SML/NG (Score:3)
Where's the torture? Scheme is fun! (Score:2)
MSFT took 2nd in 1999 with Haskell entry (Score:2)
Large projects necessitate bad code? (Score:2)
Hint: 20 levels of bracing means you should be breaking that function down into smaller, reusable functions. You don't tend to do that in C++ because it's such a drag -- cut, paste, declare, etc. In Scheme new functions are cheap and easy.
That aside, who says you have to use it for a large project before it's fun? I'm having fun with a couple of 3KLOC projects.
Who will win? Look at past years: (Score:5)
I would like (Score:1)
Re:SML/NG (Score:2)
maybe you know something i don't.
---
My money's on Apple's Integer Basic (Score:2)
Irritating Superfluous Parentheses (Score:1)
JOVIAL!!! (Score:1)
JOVIAL will kick some serious buttocks!
-D
innovation (Score:3)
Art At Home [artathome.org]
LISP (Score:2)
The dark horse (Score:1)
It sure would be nice to hear some positive news about SNOBOLing
I know, bad joke
Re:this story is gay (Score:5)
Re:Mercury (Perl has first-class functions) (Score:2)
#!/usr/bin/perl -w
use strict;
sub make_adder {
my $addend = shift;
sub {my $x = shift; $x + $addend}}
my $adder = make_adder(3);
print &$adder(2), "\n";
IMHO not as nice as Lisp or ML, but there nonetheless.
nevermind all that... (Score:1)
10 print "please have a cup of coffee"
20 compress data
30 goto 10
done!
Just to torture myself (Score:3)
Re:MSFT took 2nd in 1999 with Haskell entry (Score:2)
Python Team (Score:4)
Everyone is welcome.
This is easy... (Score:3)
<HTML>
<HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD>
<BODY>
<H1>Not Found</H1>
The requested URL was not found on this server.
</BODY>
</HTML>";
Lightning entry (Score:2)
Re:Just to torture myself (Score:3)
By the way, what exactly do we win if we have the best entry?
Doesn't sound too hard... (Score:2)
A somewhat simple but slow way of doing it would be to create an array in which each non-tag character has a set of flags for bold, italic, whatever and a set of fields for color, size, etc. Then simply go through the newly created array and put in tags when ever the flags of fields change.
It should be possible to optimize that... but I imagine significantly more efficient ways to do it will be found.
I think this method will create highly optimized output... I can't think of anything to screw it up off the top of my head... but maybe I'm just missing something really obvious.
Perhaps if some of the tags turn other tags off? I don't remember. That could screw it up.
Comments?