Google Programming Contest Winner 229
asqui writes "The First Annual Google Programming Contest, announced about 4 months ago has ended. The winner is Daniel Egnor, a former Microsoft employee. His project converted addresses found in documents to latitude-longitude coordinates and built a two-dimensional index of these coordinates, thereby allowing you to limit your query to a certain radius from a geographical location. Good for difficult questions like "Where is the nearest all-night pizza place that will deliver at this hour?". Unfortunately there is no mention whether this technology is on its way to the google labs yet. There are also details of 5 other excellent project submissions that didn't quite make it."
more details (Score:5, Informative)
This is impressive bit of database manipulation. Somehow I didn't think that all of the datatypes, etc would be so easily parsed.
Although I do recall telephone directories that used to give you results for a specified radius for certain types of businesses
Re:404 Page Not Found ? (Score:3, Informative)
Re:About the "Former Microsoft Employee" bit.. (Score:1, Informative)
Not earth shattering, but useful (Score:2, Informative)
Re:if i'd only known (Score:4, Informative)
Let me quote from the homepage of the annual contest:
"Grand Prize
$10,000 in cash
VIP visit to Google Inc. in Mountain View, California
Potentially run your prize-winning code on Google's multi-billion document repository (circumstances permitting)"
NetGeo (Score:5, Informative)
A project that really could be good for Google... (Score:5, Informative)
So you think that Google's results are fair? You're wrong. The best ranked results are from sites that heavily cheat.
Since Google has aggressively removed fake generated sites linking to each other, new ways of cheating have been immediately adopted.
Apart from cloaking (what the Google crawler sees is different from what user see), generated sites now include fake generated english-like sentences in order to make Google think the text is real. Spam indexing is now distributed on multiple IPs. Content is dynamic, it changes everyday (random links and texts are generated) . Temporary sites are hosted on external (yet non-blacklisted) cheap colocated servers. Invisible frames are added, etc.
I'm not innocently talking about that because the company I'm working for is actively doing it. And it works. And they say "Spam? Uh? Who's talking about Spam? It bring us money, so it's not spam, it's our business".
There are ways to prevents cheating on Google. It's probably very complex, but it's realisable. If any human looks at our 'spam site', he will immediately discover that it's not a real site. It's a mess, just for keywords and links.
If such a project had been made for the Google content, it would have been wonderful.
Google is still the best search engine out there. Their technology rocks, and they are always looking for innovation. But what could make an huge difference between and other search engines is : fair results. Same wheel of fortune for everybody.
Yet this is not the case. Trust me, all well ranked web sites for common keywords belong to a few companies that are actively cheating.
Re:I see one being implemented soon (Score:2, Informative)
Markovian Dependece [google.com]- The condition where observations in a time series are dependent on previous observations in the near term. Markovian dependence dies quickly, while long-memory effects like Hurst dependence, decay over very long time periods.
Markov processes (Score:3, Informative)
A Markov process is basically a series of random variables where the value of random variable X^(i+1) only depends on X^i. The idea is that if you want to predict the value of X^(i+1), all of the information you could possibly use is in the value of X^i.
Lots of processes are Markovian- for instance, a random walk. If you're at point x at time t, then you know that there's a fifty-fifty chance you will be at x-1 or x+1 at time t+1. Knowing all of the previous points along the random walk won't help you predict the next point any better than that.
Re:404 Page Not Found ? (Score:3, Informative)
More Information About the Winner (Score:5, Informative)
Re:About the "Former Microsoft Employee" bit.. (Score:2, Informative)
Yes really, it's not a large room full of monkeys!
Re:more details (Score:4, Informative)
Although I do recall telephone directories that used to give you results for a specified radius for certain types of businesses
That's just a standard spatial query. It's easy to implement an R-Tree to be able to do (relatively) quick "give me points within x meters of this one" type of searches on a database. There's nothing extremely revolutionary about Daniel's project, anyone with some basic geometry knowledge and the patience to download the 33GB of TIGER data could have done it within the course of a few weeks. (Ironically enough I've been doing the same thing with 1.2 million addresses against TIGER data for the past month.)
But that's the true genius and beauty of it. Now that it's been said, it's such a mindbogglingly obvious and useful application of web search and spatial search technology that it's hard to believe nobody thought of it before.
I'd be honestly surprised if Google doesn't run with the ball and fold it into their main search engine. The only thing standing in the way is the storage space and CPU time to do it.