Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Google Programming Contest 636

AccordionGuy writes: "Google has just announced its first annual programming contest! The objective is to write a program that will do something "interesting" with the about 900,000 Web pages' worth data that's Google provides. In addition to writing the program, contestants also have to convince the judges why their program is interesting (or useful) and why it will scale (that is, handle a constantly increasing load of data that grows as the Web grows). The prize is US$10,000 in cash, a V.I.P. tour of the Google facility in Mountain View, California and possibly a chance to run their program on Google's complete billion-Web-page store."
This discussion has been archived. No new comments can be posted.

Google Programming Contest

Comments Filter:
  • by suso ( 153703 ) on Wednesday February 06, 2002 @06:26PM (#2964129) Journal
    I think I'll write a program that will delete pages as it finds them. This should scale pretty nicely and make the web faster in the process.
    • by letxa2000 ( 215841 ) on Wednesday February 06, 2002 @07:27PM (#2964546)
      This is a way for Google to get free ideas and, better than that, free expert-level code development for them to make money off.

      I wouldn't go for $10k. Perhaps $100k, or perhaps $20k plus some percentage of future revenue attributable to my invention.

      Got to hand it to them, though, it's an innovative way to receive hundreds of ideas and get a working prototype. Only one person wins but they probably retain the rights to develop their own code that accomplishes the ideas submitted by everyone else.

      Basically, they want a cool idea for something innovative but their brainstorming sessions haven't come up with anything new...

      • While I am all for Free Software, I have to agree with the poster of this comment, at least in principal. 10k is a small price to pay for tons of ideas. While Im sure the majority of the ideas will not be worth the time spent reviewing them, there will always be that precious gem buried somewhere.

        For once, I just might agree with a binary only submission. That way if Google is truly interested they can license the code from the developer or have some sort of other agreement / arrangement.

        It isn't like Google is offering up their source to the rest of the world, so I don't see why it is unreasonable to only offer up a binary to them. At the risk of sounding like a "me too" post - I still think that this would be something fun to be involved in if I had the creativity or the passion to persue something of this sort.
        • would binary only even matter? its the IDEA they need... they have tons of coders easily available to implement whatever ideas they can glean from this. its not always about source control.
        • For once, I just might agree with a binary only submission.

          Ahh, but if you read the submission requirements, you have to submit your source, a Makefile, and use only GPL or other open source libraries, so they've covered their butt there.

          I hope anybody who does decide to participate in this contest realizes the implications of it. $10K is nothing for Google to pay to get ideas, source code, etc. Also note, in the submission requirements, any entry made to Google becomes their sole property. Christ, I can afford $10K, a tour of my house, allow somebody to run their prize winning code on the data on my computers if somebody's going to give me this kind of intellectual property. I really think that its a pretty raw deal for the developer.

          • by WNight ( 23683 ) on Wednesday February 06, 2002 @08:26PM (#2964797) Homepage
            The problem is that ideas aren't worth a lot without a way to use them. I've had a lot of neat thoughts about mapping connectivity and so on, but without something like Google to run it on I'd have to spider the whole web myself on my cable.

            They might get a good idea, but if you don't win the contest they don't really have much of a legal leg to take your idea, so you're pretty safe unless you're the winner, in which case you get $10k for hacking together a script that you never could have afforded to run anyways. (It's only concept they want, not the polished results of a 2-month dev process.)

            It honestly sounds like a good deal to me. I hack for a night or two on a project that I find interesting. If I lose, no big deal. If I win I get 10k USD (3 months wages for me, I get paid in Canadian $s) and I'd be famous in exactly the circles who are looking to hire a coder with good ideas...

            People go on about the value of ideas all the time, but really, without proper backing ideas are a dime a dozen. I've said many time "Hey, how about a ..." and seen it advertised a few years later. That doesn't mean I lost out on it, because I didn't have the cash to develop it let alone market it.

            This is why patents on wide ideas are so damaging. Any idiot can have a good idea every now and then, but it takes more work (and funding unfortunately) to make them fly. If you let someone with an undeveloped idea block off a whole field it does a great disservice to the people with the ability to follow through, who likely had the idea independently.
        • by Ragin'Cajun ( 135704 ) on Thursday February 07, 2002 @01:34AM (#2965818) Homepage

          For once, I just might agree with a binary only submission. That way if Google is truly interested they can license the code from the developer or have some sort of other agreement / arrangement.

          It isn't like Google is offering up their source to the rest of the world, so I don't see why it is unreasonable to only offer up a binary to them.

          Well, they *have* been running the best search engine on the web FOR FREE for the past 3 years. They don't clutter their main page with flashing X10 ads, or the the irritating news+sports+weather+financialnews+email combo that everybody seems to think people want. This might not be a bad way to give something back to the company that's saved us so much time and effort finding information.

          And to the guys out there who wouldn't bother with this contest for less than $100K: if your idea is so good, go develop it yourself! Get a lawyer, and work out a deal with Google that suits you better.

      • by MouseR ( 3264 ) on Wednesday February 06, 2002 @09:42PM (#2965120) Homepage
        I wouldn't go for $10k. Perhaps $100k, or perhaps $20k plus some percentage of future revenue attributable to my invention.

        Pardon me for asking but... what are you doing developing, maintaining or otherwise promoting a system for not even free beer?

        If a chance to provide usefull code for a worthy cause (google being still the best search engine out there and that still doesn't plaster your screen with pop-up adds), spend a couple of weeks on it and get paid 10K doesn't sound attractive, what would?
      • This is also a good way to get a job at Google. They pay a lot of money.
    • by Greyfox ( 87712 ) on Wednesday February 06, 2002 @08:14PM (#2964746) Homepage Journal
      That detects MS IE servers with the code red backdoor installed and takes over the server, forcing it to cache google content and directing google accesses from the same subnet to that machine first?
  • by Cruciform ( 42896 ) on Wednesday February 06, 2002 @06:28PM (#2964146) Homepage
    How about adding the option to have google understand what I *mean* to search for, not what I tell it to search for.

    Oh, and the ability to find one non-fake Britney porn pic.
  • by I am the blob ( 239590 ) on Wednesday February 06, 2002 @06:28PM (#2964151) Homepage
    Much like the recent discovery of the average color of the universe, this would be a pointless, but fun, use of the data. Of course, I'm not sure exactly what to average. Do you take into account browser real-estate a particular color occupies? Do you simply average each color= and stylesheet instance?

    Ideas?
  • by The Bungi ( 221687 ) <thebungi@gmail.com> on Wednesday February 06, 2002 @06:29PM (#2964153) Homepage
    10K is nice along with the recognition and all, but... I'm sure that's a lot cheaper than paying a few Google staff coders to come up with the same thing in a few months.

    Jus' being paranoid.
    • From the agreement:

      With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry.


      Hey Google! Why not make the agreement state that all entries go under the GPL?
      • by benwb ( 96829 ) on Wednesday February 06, 2002 @06:41PM (#2964258)
        Notice that they don't say exclusive license. You should be able to release it as GPL yourself.
        • Does the GPL allow the creator to grant liscense to certain commercial vendors? Otherwise, you wouldn't be able to GPL it. However, you can certainly release the source under some open liscense. What Google is doing is perfectly reasonable--if you create something based off their code, they are asking for the right to use it. It's similar to many liscenses already out there.

          One thing I do wish was part of the rules was that if they used your code/algorithms, etc. that they notify you. After all, you may think your idea is great, but it would be a big endorsement if Google used it, even if you didn't win. If anyone in charge of this contest reads this, I'd urge doing that anyway--it would be a good cheap way to reward more talented programmers.
    • by plalonde2 ( 527372 ) on Wednesday February 06, 2002 @07:04PM (#2964415)
      More to the point though is that it gives Google a great pool of potential employees. That should be of greater benefit to Google than the ideas.

      Always think of the potential of hiring people with good ideas, rather then buying the ideas outright.

      Geese and golden eggs, and all that.

  • This is brilliant (Score:2, Insightful)

    by jkujawa ( 56195 )
    Evil, but brilliant.

    Get hundreds of people to crank out code for you, pay a paltry sum to one of them, keep all the code. Pay $10K for millions of dollars in potential technology.

    That's about the slickest thing I've ever seen. You have to admire them for their evil. Microsoft could learn a thing or ten from them.
    • by dotderf ( 548723 )
      It's not evil, it's just business. Other companies have been doing it for years. Back in the day, car companies used to sponsor "car design" contest for little kids. The winner would get $50 and his car would be whisked away to the labs. Why pay a team of designers and engineers to do what a trained^H^H^H^H^H^H^H normal person would do for cheap? Maybe we'll get a spiffy new feature on google! Hurrah!
    • by JordanH ( 75307 )
      • You have to admire them for their evil. Microsoft could learn a thing or ten from them.

      What's evil about it? Smart maybe, but evil?

      Anybody who would enter such a contest is primarily motivated by the challenge, I would think. Getting the $10K gives you bragging rights is all.

      Sure, Google gets some value, but a lot of highly motivated programmers get a challenging problem.

      If all good programmers were primarily motivated by money, there'd be no Linux, BSD, Apache, Emacs, Vim...

      I reserve evil for things that actually hurt someone. This seems like a win-win to me.

    • by saint10 ( 248611 ) on Wednesday February 06, 2002 @06:45PM (#2964292)
      Better yet, post a story to slashdot about a contest with a prize of 10k, read all the responses moded at 4 and above, spend a weekending coding a few of em up, and cash in!

      Now that's evil!
    • by slam smith ( 61863 ) on Wednesday February 06, 2002 @06:50PM (#2964330) Homepage
      The key word here is potential. I think that you would almost waste more money in evaluating a lot of the trash that comes in. The most valuable thing they probably will get from it are the ideas that people come up with. Notice how they made it as open ended as they could.
    • Riiight... (Score:5, Insightful)

      by jonr ( 1130 ) on Wednesday February 06, 2002 @06:55PM (#2964358) Homepage Journal
      When did you last donate to Google? How many times have you used Google on your job, saving your self and your company money? Where is the friggin' "Do it for the love of coding" thinking now? I would be happy to enter (I just need the right idea ;)) and if Google gets better because of my code, so be it!
      J.
    • Re:This is brilliant (Score:5, Informative)

      by epsalon ( 518482 ) <slash@alon.wox.org> on Wednesday February 06, 2002 @07:06PM (#2964428) Homepage Journal
      If you read the rules [google.com], you will see that you don't even have to assign copyrights to Google. You only have to give them a license. This means you can GPL your code or even BSD it. Sounds fair to me.
    • You just described open source exactly. Except the part about paying ANYTHING at all. Pretty slick!
  • Usefulness? (Score:2, Interesting)

    I'm honestly curious as to what kind of useful programs could be run on that collection of pages and still be interesting? Statistical Analysis? Boring! Or maybe market analysis? Again, BORING! Some of the more trivial interesting things, like how much of phrase or word x appears on the internet couldn't really be termed useful... Hopefully, somebody will prove me wrong. Good luck to all you developers...
    • by shayne321 ( 106803 ) on Wednesday February 06, 2002 @07:13PM (#2964468) Homepage Journal
      Here's a free idea to anyone who has the time/initiative to code it (i.e. Not Me): a program that scans a page and rates it with an annoyance rating (x out of 100?) based on annoying things you'll find on the page if you open it: webbugs, cookies sent back to doubleclick, pop-unders, banner ads, java applets, BLINK tags, poorly formed HTML/CSS, broken images, sql/asp/php errors, etc. The higher the number the more annoying the page, and therefore the more likely the user is to click a different search result. Google could also tie it in to their ranking system to rank annoying pages lower in the results. Seems to me like it'd make the web a better place.

      Shayne

      • by YoJ ( 20860 ) on Wednesday February 06, 2002 @08:06PM (#2964709) Journal
        I like this idea. But I would limit the definition of "annoyance" to something easily quantifiable. Broken links might be the easiest, but even for that you have the problem of internet addresses being sporadically available, or just slow some days.


        Another idea is to just count the number of HTML errors as the annoyance factor. I'm sure there are many tools out there that can do this rather quickly. If this were actually implemented by Google, so sites with bad HTML were ranked below all other sites, imagine how much cleaner the web would get!

        • Perhaps the W3C's HTML Validator [w3.org] or something similar? Rate the page based on conformance to the HTML specs (say, number of errors divided by length of HTML), in the hopes that this has some correlation to how generally useful the page is (i.e., if they can't be bothered to follow the technical rules, they probably don't have enough of a clue to put out content of genuine use to their users instead of just brochureware or scams or the like)? This wouldn't be perfect, of course, and utility is very much a subjective measure...
        • by shayne321 ( 106803 ) on Wednesday February 06, 2002 @08:54PM (#2964902) Homepage Journal

          Another idea is to just count the number of HTML errors as the annoyance factor.

          That's not really what I had in mind... HTML errors are nowhere NEAR as annoying as pr0n sites that pop open ads all over the place, resize your browser, bookmark themselves, etc, etc. That's what I mean by annoyance, the kind of site that makes Joe Sixpack (as well as me) get upset when he gets stuck in a loop that for every window he closes two pop open. I'm more worried about discouraging sites from using bad behavior than I am encouraging them to use proper html. Of course, malformed html should ADD to the annoyance factor, but not be the only thing counted. That's my opinion anyway.

          Shayne

  • What a coincedence! (Score:2, Interesting)

    by ctkrohn ( 462529 )
    I was just talking to someone on IRC, and we were playing a game with Google. You had to find a two correctly spelled words which would obtain a page or less of results. He mentioned that a distributed client which searches for the longest string of words returning less than a page would be a cool idea.

    Just a thought...
  • by Kjella ( 173770 ) on Wednesday February 06, 2002 @06:30PM (#2964168) Homepage
    10000$/x hours of work we could get done for us...

    Make sure we get a slashdot posting so a bunch of geeks with programming skills will enter.

    The only thing I'd want is for google to stay just the way it is though, don't bloat. Great service, maybe I'm just pessimistic but sites rarely do everything well.

    Kjella
  • Sounds to me that google is getting lots of programs for only $10k and a tour.
  • Some Inspiration (Score:5, Insightful)

    by Eloquence ( 144160 ) on Wednesday February 06, 2002 @06:31PM (#2964174)
    A lot of implicit rating data can be gathered from the links pointing to a page. Google is already doing this when sorting the search results (frequently linked-to pages rank higher). It would be interesting to see how this could be used to detect very popular new sites. I sent this mail to Google a while ago:

    Hi,

    it occurred to me, since you are evaluating the number of links pointing to a page anyway, that it would be a very nice thing to have a sort of "Top 40 Links of the Day" page, regularly updated to include only new and unique stuff. You could use an algorithm similar to the one used by

    http://blogdex.media.mit.edu/ [mit.edu]

    or

    http://www.daypop.com/ [daypop.com]

    Both of these sites have become immensely popular through this feature (in the case of Daypop, I find http://www.daypop.com/top.htm very valuable), and I think it would also be a great addition to Google. I don't think inappropriate content would be much of a problem since it would hardly show up high on the list, and besides, a top 40 list can be looked through by a human.

    What do you think?

    Of course this could be spammed, but as I said, a human could filter the results every day; besides, it would be hard to create a very large number of unique links from different servers pointing to a page. I'm sure Google is already doing some of this to prevent spamming their search-order algorithm anyway.

    • Re:Some Inspiration (Score:2, Informative)

      by jimbo3123 ( 320148 )
      it occurred to me, since you are evaluating the number of links pointing to a page anyway, that it would be a very nice thing to
      have a sort of "Top 40 Links of the Day" page, regularly updated to include only new and unique stuff. You could use an
      algorithm similar to the one used by


      It's Called Google Zeitgeist.

      It is at: [google.com]
      Zeitgeist[Google.com]
    • Re:Some Inspiration (Score:3, Interesting)

      by costas ( 38724 )
      I hate to link a beta-level site from /., but that's exactly what I am trying out [memigo.com]...
  • Cool, but..... (Score:2, Insightful)

    This sounds really great doens't it? 10,000 USD cash prize, visiting their facilities (who wouldn't be curious to see the worlds biggest Beowulf cluster) and more.

    Thing is, though that is a lot of money, what happens if you make them, say 20,000 USD with a great new compression/analysis algorithm.

    What then? You have no claim to a part of their profits. I guess that's just a part of competing to give your ideas to a company.

    -mike
    • Especially considering that Google gets to 'own' all the entries, and not just the winning one.

      Hey... it worked for Microsoft (Their 'Compression' contest)
    • Thing is, though that is a lot of money, what happens if you make them, say 20,000 USD with a great new compression/analysis algorithm.

      If you're that good, they'll probably hire you to at least consult for them to maintain the code you wrote.

  • So basically... (Score:3, Insightful)

    by Dutchmaan ( 442553 ) on Wednesday February 06, 2002 @06:34PM (#2964212) Homepage
    They're going to (hopefully) get tons of interesting ideas and almost as much useful code for the price of $10,000. Sure beats hiring programmers.

    That's assuming that any contest entries automatically become the property of Google.

    Perhaps this is the evolution of a new buisness model... Either way, I don't really care as long as Google remains free, fast, and useful!
    • Re:So basically... (Score:5, Informative)

      by anthony_dipierro ( 543308 ) on Wednesday February 06, 2002 @06:39PM (#2964242) Journal

      That's assuming that any contest entries automatically become the property of Google.

      With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry

      So basically, google doesn't own your code, only the right to use it. GPLing your code would satisfy the worldwide, perptual non-exclusive license grant.

    • they do, but many companies have done this in the past.
    • Finding Programmers! (Score:5, Interesting)

      by rbeattie ( 43187 ) <russ@russellbeattie.com> on Wednesday February 06, 2002 @08:20PM (#2964775) Homepage
      Sure beats hiring programmers.

      No, that's it!

      According to this article [yahoo.com] Google is getting deluged by resumes, this is just a way for them to weed out the 600+ resumes they get a day.

      The winner of this contest (and maybe a few of the runner ups) will most likely get a job offer as well. Beats having to weed through 4200 greatly exagerated CVs every week...

      -Russ
  • by p-n-wise ( 526587 ) on Wednesday February 06, 2002 @06:37PM (#2964229) Homepage Journal
    I'd go for a dictionary of every word ever used on the web. Complete with common usage examples.
  • I know! (Score:2, Interesting)

    by AntiFreeze ( 31247 )
    Someone could do a CRC (cyclic redundancy check) on all the pages in the cache, that way, one could tell when the Internet's been updated...

    Even Stupider: Not only easy, but it could allow google to create static result pages for common searches: it would just update the result page when the cache CRC changes.

  • by edrugtrader ( 442064 ) on Wednesday February 06, 2002 @06:38PM (#2964236) Homepage
    how about have google parse every page, and save the homepage as an image. then take the map of the internet, and make it using tiny thumbnails of the most heavily linked (popular) sites.

    this would be just like those mosaic photos, only much nerdier. thinkgeek execs are drooling already....
  • by t0qer ( 230538 ) on Wednesday February 06, 2002 @06:40PM (#2964244) Homepage Journal
    A few years back there was a game, I think it was called Virus or something like that. It would scan your directory structure and make a map for the FPS world based on that.

    Looking at the web, I allways though it would be cool to make a game based on the same concept, but use web pages instead of your hard drive directory.

    I'm just throwing out ideas.
    • A few years back there was a game, I think it was called Virus or something like that. It would scan your directory structure and make a map for the FPS world based on that.
      So if you were standing in the C: room and unleashed a flury of rockets, was that equivalent to rm -rf /?
  • by wizarddc ( 105860 ) on Wednesday February 06, 2002 @06:40PM (#2964253) Homepage Journal
    Google Contest Winner Offers Better Porn Searches

    Winner of the First annual Google Programming Contest creates greatest porn spider ever.

    MOUNTAIN VIEW, Calif. - December 11, 2001 - Google Inc., developer of the award-winning Google search engine, today announced it's first winner of the Annual Google Programming Contest. Winner I. C. Porno has created a program to help catalog and organize google cache of the Internet, also refered to as the World Wide Web of Porn.

    "This announcement is an important step in Google's ongoing effort to provide search services that are fast, easy to use, and that help people find the information they need," said Larry Page, Google's co-founder and president of Products. "To search our collection of 3 billion documents for porn by hand, it would take 5,707 years, searching twenty-four hours per day, at one minute per document. With I. C.'s new program, it takes less than a second."

    World's Largest Collection of Porn
    Google users now have the world's largest and most comprehensive collection of porn right at their fingertips and can immediately primal urges using the following services:

    Google Web Porn Search: The company's newest search service now offers more than 2 billion documents - 25 percent of which are non-English language web pages. Google Web Search also offers users the ability to search for numerous non-HTML files such as PDF, Microsoft Office, and Corel documents. Google's powerful and scalable technology searches this comprehensive set of information and delivers a list of relevant porno in less than half-a-second.

    Google Porn Groups: This 20-year archive of Usenet porn conversations is the largest of its kind and can serve as a powerful reference tool, while offering more porno than the Internet. Google Groups was released from beta today with 700 million postings in more than 35,000 topical porno categories.

    Google Image Search: Comprising more than 330 million nude images, Google Image Search enables users to quickly and easily find porn images relevant to a wide variety of topics, including pictures of celebrities and popular travel destinations. Advanced features include search by image size, format (JPEG and/or GIF), coloration, and the ability to restrict searches to specific genre's of porn.

    About Google Inc.
    With the largest index of websites available on the World Wide Web and the industry's most advanced search technology, Google Inc. delivers the fastest and easiest way to find relevant information on the Internet. Google's technological innovations have earned the company numerous industry awards and citations, including two Webby Awards; two WIRED magazine Readers Raves Awards; Best Internet Innovation and Technical Excellence Award from PC Magazine; Best Search Engine on the Internet from Yahoo! Internet Life; Top Ten Best Cybertech from TIME magazine; and Editor's Pick from CNET. A growing number of companies worldwide, including Yahoo! and its international properties, Sony Corporation and its global affiliates, AOL/Netscape, and Cisco Systems, rely on Google to power search on their websites. A privately held company based in Mountain View, Calif., Google's investors include Kleiner Perkins Caufield & Byers and Sequoia Capital. More information about Google can be found on the Google site at http://www.google.com.
    • Google Web Porn Search: The company's newest search service now offers more than 2 billion documents - 25 percent of which are non-English language web pages. Google Web Search also offers users the ability to search for numerous non-HTML files such as PDF, Microsoft Office, and Corel documents.

      For all that Corel formatted porn out there...

    • Brings new meaning to googles "I'm Feeling Lucky" search option.
  • by option8 ( 16509 ) on Wednesday February 06, 2002 @06:41PM (#2964262) Homepage
    i actually bugged the google guys a while ago about adding a spellchecking function to google. throw a URL or a set of pages at it, and it spits out a list of misspelled or questionable words - highlighted in the way they already do search terms in the cache...

    anyway, someone there emailed me back basically saying it was an interesting idea, but not something on their agenda.

    maybe someone out there can work up a scalable google spellchecker that i can run my big-ass database-driven website through (which is a major pain to spellcheck, considering the client simply refuses to do when they provide the content)
  • Count all of the letters A, T, C, and G from all the web pages in the search results and sequence that into a DNA strand to produce the perfect human. Myuhahahahahaha.
  • Restoring meta-tags (Score:5, Interesting)

    by Charles Dodgeson ( 248492 ) <jeffrey@goldmark.org> on Wednesday February 06, 2002 @06:43PM (#2964278) Homepage Journal
    I've been kicking around an idea for a scheme to end meta-tag (keyword, description) abuse so that they can actually become useful again. But it would require the cooperation and effort of google (and others) do do this.

    The idea is roughly to refuse to index sites which engage in keyword/description abuse.

    1. index keywords and description data
    2. Allow users to search with keywords on or off
    3. If users search with keywords on, provide a mechanism for users to nominate a site as engaging in keyword abuse.
    4. semi-automatically, and then manusually review nominations.
    5. Refuse to index sites which have engaged in keyword abuse.
    This isn't so much a system that meets the specs of the contest. And there is a scaling issue, but it is on my wish-list for google (and others) to do.
  • Hmm.. (Score:2, Informative)

    by WndrBr3d ( 219963 )
    Does anyone else find the annoucement of this contest, and the unusally high number of WANTED: Programmers in their 'Work for Google' section a bit odd.

    They want someone to develope a system and get paid less than $10k for it. Screw that.
  • How about a program that searches for the meta generator tags and looks for "Microsoft Frontpage X.X", deletes the page from the database, and commenses a DOS attack from the rest of the slashdot community?

    Go Google! Get rid of the fake HTML goons!
  • Something to think about... you know that cool cacheing feature that google has? That basically means they have the entire internet saved on their disk array. Seriously though, I've been doing a lot of work and research in the area of neural nets, fuzzy logic, evolutionary algorithms, etc. etc. I wouldn't mind feeding 900,000 webpages into a neural net, and seeing how well it learns, or *what* it learns.
  • Why not (Score:3, Funny)

    by dmouritsendk ( 321667 ) on Wednesday February 06, 2002 @06:45PM (#2964293)
    Make a image-2-asciiart converter, so you could have a txt-only option on the google cache.
  • jargon watcher (Score:5, Interesting)

    by MbM ( 7065 ) on Wednesday February 06, 2002 @06:47PM (#2964309) Homepage
    Write an application to track keyword usage over time, when a keyword goes from only 10 hits to several thousand then flag it for jargon. The jargon can then be presented as a webpage of the top whatever with various statistics over popularity and suspected origin urls.
  • Regular Expressions! (Score:3, Interesting)

    by Oink.NET ( 551861 ) on Wednesday February 06, 2002 @06:51PM (#2964333) Homepage
    If someone can come up with a regular expression search engine that scales to billions of pages, that would be the killer app for Google. It would probably have to be a Deterministic Finite Automaton (DFA) regex engine, not the more powerful Nondeterministic Finite Automaton (NFA) engines like you have in Perl, Python, Emacs, and Tcl, but still, that would rock!
    • Ummm... (Score:3, Insightful)

      by Tom7 ( 102298 )
      DFA and NFA are equivalently powerful. (It is a relatively simple proof to show transformations between them.)

      It's true that Emacs et al. support a richer language than what's offered by traditional regular expressions (as can be implemented on DFA or NFA) but that's because the languages are *not regular*. It has nothing to do with the distinction between DFA and NFA.
  • Spam page deleter (Score:3, Interesting)

    by www.sorehands.com ( 142825 ) on Wednesday February 06, 2002 @06:51PM (#2964334) Homepage
    How about a program that checks for SPAM, then the program will delete the entries in the database that SPAMMERs have used to publicize. Then if there are more than 3 SPAMs, then notify the ISP and delete every page in the data base from that ISP.

  • by anthony_dipierro ( 543308 ) on Wednesday February 06, 2002 @06:51PM (#2964336) Journal
    Connect any two pages on the web to each other with the minimum number of hyperlinks.
  • by Embedded Geek ( 532893 ) on Wednesday February 06, 2002 @06:52PM (#2964342) Homepage
    Many posters have commented on how Google will essentially get free labor out of this (by having thousands of man hours expended for that $10K prize). The only thing that surprises me is that people think this is innovative/new/evil/dastardly or otherwise unique. Fact is, it's old hat.

    I mean, how many contests have you seen on the back of a cereal box to "create a new slogan!" or "write an essay"? Just a cheap way to create some buzz and get your customers to write your advertising copy for you. Heck, the most blatant scams in memory are HBO's Project Greenlight (trolling for scripts - you don't even want to know what the Writers' Guild thought of this) and the Lego Film Contest [lego.com] (trolling for complete commercials).

    Hardly new stuff. Remember Mark Twain's Tom Sawyer? There's a bit where he holds a "contest" to see which kid can whitewash the fence he's supposed to paint fastest. I'm sure that even as Twain wrote that bit, even he thought "I better be sure to give the fence painting thing a unique spin so it works. After all, it's an awfully old idea..."

    • Well, don't forget that they actually have to look through all this crap and find the good ideas (if they exist). So it is a gamble, but it's probably a good one. Anyway, I'm sure many people will be happy to do this, so don't spoil their fun. ;)
  • by Xzzy ( 111297 ) <sether@@@tru7h...org> on Wednesday February 06, 2002 @06:53PM (#2964348) Homepage
    I think their example ideas pretty much suck, dunno, maybe they did it on purpose so no one would try that stuff or maybe they just don't wanna see much creativity.

    I personally think it'd be coolest to turn it into an art project.. imagine you had a repository of the consciousness of an entire race and could run a script on it. Things like the map of the internet. Or the web collage [jwz.org]. Or use it to power some kind of AI chatterbot [google.com].

    I dunno. Their webpage on it didn't seem to do much to promote being creative; they just want to pay someone 10k to develop a new way to make more relevent search results.

  • by Mr. Sketch ( 111112 ) <`mister.sketch' `at' `gmail.com'> on Wednesday February 06, 2002 @06:56PM (#2964361)
    It seems like it would be very easy to come up with something interesting, and only a small fraction of those interesting things are actually useful.

    Examples of a few interesting non-useful things I can come up with just off the top of my head:
    Google Poet: Generate rhyming poetry from randomly rhyming sentances on the webpages in the database.
    Googlesaic: Input a picture and scavenge the webpages for pictures from which to create a large mosaic of the input picture.
    Google Map: Create a picture/graph of all the website connections (links) in the webpage list, perhaps add 3d/naviations. Perhaps perform graph opererations and maybe find the longest path one can travel through the links and still stay within the Google search results/database.

    These are just a few, I'm sure plenty of other people can find much more exciting/interesting things to do, but they won't always be useful to the google company.
    • Make a "find person" function. Write a name and Google figurs out what the facts are: e-mail, work, icq and interests. The problem today is that a lot of people are called the same, but with the corelation with email and other data. The program would be able to separate two persons with the same name. A great Big Brother function.
  • Search Engine Wars (Score:5, Interesting)

    by Van Halen ( 31671 ) on Wednesday February 06, 2002 @06:58PM (#2964378) Journal
    I already made a game last year I called Search Engine Wars [byond.com]. I wonder if it would qualify?

    It's a party game. The basic idea is that a bunch of people are in the game, and it goes around in turns. On your turn, you type in a few words to search for. The game goes and queries google for the first hit on that search, and sends everyone's browser to that page. Then the other players get 100 seconds to guess which words you searched for. The first player to guess correctly gets points for the amount of time remaining.

    It's written using BYOND [byond.com], which you'll have to download if you want to play.

  • "With regard to the software and repository that you obtain for the Contest, you agree to the license terms as stated in files you download or receive. With regard to an entry you submit as part of the Contest, you grant Google a worldwide, perpetual, fully paid-up, non-exclusive license to make, sell, or use the technology related thereto, including but not limited to the software, algorithms, techniques, concepts, etc., associated with the entry.

    If you are selected as a contest winner, you agree that Google may publicize your name, likeness, and the description of work you did to win the contest. Apart from the prizes associated with being selected as a winner, Google shall not be obligated to compensate you in any way for such publicity."


    So in other words, google buys the next great thing for $10K. The only upside of the above is that it's a non-exclusive license which means you could go and sell it to a competing search engine too...

    Of course, good luck finding a competing search engine :-)
  • Why are all you dorks posting your ideas? Go do it, or don't complain when someone implements your idea and wins a bunch of money!!!
  • The contest rules state that you grant google a "non-exclusive license" to your entry, so theoretically you could use your work in other areas too. Doesn't sound TOO bad, though I'd prefer to see the $10k up to $50k. :)
  • If you'd like to see the license before actually downloading the actual (huge, and possibly slashdotted) .tar:
    This repository of web page information is being provided to you by Google Inc. solely for academic and research purposes related to the Google programming contest. You may not modify, distribute, or make any commercial use of the repository.


    This source code is copyrighted 2002 by Google Inc. All rights reserved. You are given a limited license to use this source code for purposes of participating in the Google programming contest. If you choose to use or distribute the source code for any other purpose, you must either (1) first obtain written approval from Google, or (2) prominently display the foregoing copyright notice and the following warranty and liability disclaimer on each copy used or distributed.

    The source code and repository (the "Software") is provided "AS IS", with no warranty, express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular use. In no event shall Google Inc. be liable for any damages, direct or indirect, even if advised of the possibility of such damages.
  • by thehossman ( 198379 ) on Wednesday February 06, 2002 @07:08PM (#2964439)
    JWZ allready wrote the coolest apps I've ever seen that harvest the power of Internet search engines...

    Webcollage [jwz.org] -- slowly builds a random collage of images from the net.

    DadaDodo [jwz.org] -- generates random sentences based on word probabilities in pages on the net.

  • My program (Score:5, Funny)

    by Anonymous Coward on Wednesday February 06, 2002 @07:13PM (#2964471)
    s/www\.microsoft\.com/www\.goatse\.cx/g
  • by belphegore ( 66832 ) on Wednesday February 06, 2002 @07:16PM (#2964482)
    Six degrees of Google Bacon. How many links (and what's the path) to get from any page on the web to Kevin Bacon's personal homepage. Or more interesting from any page to any other page.
  • 57mb Download (Score:2, Interesting)

    by RageMachine ( 533546 )
    I have to say the download is quite smooth. 160k a second is nice. I wonder how much bandwidth google actually has? Probably a gigabit or more?
    This many people with Cable/DSL downloading that file, and its not even slashdotted.

    I havn't untared the file yet. But I wonder just how many people it takes to run google. How many are on staff? And how many work on the actual code that powers such a huge site?
  • by jjeffries ( 17675 ) on Wednesday February 06, 2002 @07:47PM (#2964621)
    ...something that looks through that data and finds the interesting bits based on a set of terms that the user provides?

    Or has someone done that already?
  • by jcwren ( 166164 ) on Wednesday February 06, 2002 @09:51PM (#2965146) Homepage
    Personally, I'd like to see hits to pages marked, and the top 100 hits from each search are fed back in to be re-indexed. This would eliminated a lot of dead site material, I should think.

    --John
  • by paylett ( 553168 ) on Thursday February 07, 2002 @04:14AM (#2966100)
    A couple of months ago, I sent Google an email to them suggesting that they should add an "I'm feeling really lucky" feature that would go to any page in the whole google database at random.

    Maybe something like pressing I'm feeling lucky with no search string?

    Haven't seen it yet :(
  • by NewtonsLaw ( 409638 ) on Thursday February 07, 2002 @04:42AM (#2966165)
    Hey, aren't Google breaching the copyright of at least some of those whose pages are included in the sample data being used -- especially the CDROM's worth that will be sent out?

    As for the cost-savings involved in running such a contest, I expect the fact that they only have to pay $10,000 will be more than offset by the fact that they'll have to sort through a mountain of crappy submissions. That'll take a lot of people a lot of time.
  • by chrysalis ( 50680 ) on Thursday February 07, 2002 @05:20AM (#2966267) Homepage
    So that pages that can properly be read by any browser comes first.
    Then, maybe webmasters will stop doing IE-only pages.

  • Scrabble (Score:4, Funny)

    by nicklott ( 533496 ) on Thursday February 07, 2002 @06:17AM (#2966382)

    I've got one:

    Lets take all 900,000 pages, and look at the statistical distribution of the frequency of appearance of each letter of the alphabet. That way we could check to 10 decimal places that the letter values in scrabble are REALLY correct...

  • by cascadefx ( 174894 ) <morlockhq@@@gmail...com> on Thursday February 07, 2002 @10:58AM (#2967336) Journal

    I hope Google reads these pages and gets some free ideas from it. At least take mine! Please. God knows that I don't have the coding chops to do it myself. I sent this same idea to Allaire (remember them) a long time ago and I had a couple of software engineers write me back, but nothing ever came of it. My guess is that this is a hard problem.

    I want a browser control/plugin/whatever that harnesses a backend of web information to make my surfing more productive/predictive.

    The gist would be to have a hover option for links which would give you information about what is behind the link without having to actually follow it. While browsing, the user would just hover over an link in a page and information pertaining to the page beyond the link would show up in a hovering menu or a sidebar (this would be great with mozilla, but I could see an activex control as well).

    The types of information is where it gets useful. Using some of the more advanced summarization algorithms out there, it would pull up the summaries of those pages if they were in the offsite database (Allaire, Google, and the WayBack Machine being possible backends). Based on your preferences a short, medium or long summary would be displayed. If it wasn't in the cache, it could be summarized on the fly and then presented after some delay (the new summary now being cached).

    It would also list, in an orderly way and subject to preferences, links from the page on the other side. That way the user could follow one of those if it turns out that she only needed the summary and a link. It would also list the elements of the page, like graphics, and give their specs (i.e. dimensions and estimated download times and ALT tag entries if present) and give the option to display them on a page by page basis. All of this would be nested, of course, so that a user could hover over links in the summary pages and get the same information all over again for that link (which is why I see it more as a "sidebar" feature). Theoretically a user could just surf by these summaries if they wanted.

    Now, I realize that this would pose some problems like trusting the summaries and so forth. However, the nice thing about it would be features that could be built into the user's preferences. For instance, you could make it so that the user could have certain words or phrases set that would then be scanned for during the summarization process. You could then either relax the amount of summary for the entire page or, better yet, still pull the cached summary but also pull a user-definable number of lines before and after their keywords (best of both worlds).

    Each summary could also list a numeric rank of where that page fits in "status" (like google's ranking system) based on the summary (generically) or the keywords of the user (specifically). Finally, it could pay for itself with text advertising (small and innocuous like the ones seen on Google).

    If you start to think about it for a while, there are all sorts of things you could do with this and it would help cut through the "padding" that you usually go through while looking for informaition on a certain subject. I think it would be great! It is kind of based on the idea of the "magic spyglass" that was heralded almost a decade ago, but never implemented in any OS that I know of.

    Like I said, I can't code it, but I would love to see it done. So have at it if you think it is good. Google's cache of pages and images and its ranking technology make it perfectly suited for this type of problem and they have enough PHD's that the summarization issue should prove an "interesting" problem to solve.

    Then again, it might suck. If you do implement it, let me know. I would love to beta-test it. I called the whole thing the Clairvoyant Browser Plugin... but you could use what you want.

"If it ain't broke, don't fix it." - Bert Lantz

Working...