New Leader In Netflix Prize Race With One Day To Go 87
brajesh writes "The Netflix Prize, an algorithm competition to improve the Netflix Cinematch recommendation system by more than 10%, has a new leader — The Ensemble — just one day before the competition ends. The 30-day race to the end was kicked off after BellKor's Pragmatic Chaos submitted the first entry to break the 10% barrier, with the results showing a 10.08% improvement. The Ensemble, made up of three teams who chose to join forces ('Grand Prize Team,' 'Opera Solutions' and 'Vandelay United), has managed to overtake BellKor with a score of 10.09% — an improvement of .01% over the former leaders. From the article on Techcrunch: 'The competition will end [today], so teams still have a little bit of time left to make their last-second submissions, but things are looking good for The Ensemble. This has to be absolutely brutal for team BellKor.'"
I think (Score:5, Insightful)
Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
Re: (Score:3, Funny)
(Joking, partly).
Re: (Score:1)
Syntax is a subset of grammar, you insensitive clod!!
Ha! But you're equivocating, of course. He means code syntax.
Re: (Score:3, Insightful)
(-1 Offtopic) But, I've sort of hoped that a site, such as Slashdot, should somehow open-source their site code, it a sort of "community", and considering the context of the site, the amount of users, there are probably about 5,000 people capable of contributing decent code/help, and there has to be a rather significant number of those that are willing to.
Add a section devoted to it, then Polls, about which contribution should be implemented, etc. Articles/Submission are sort of (controlled) "open-source",
Re: (Score:1)
This has been empty for a while now:
http://www.slashcode.com/sites.pl [slashcode.com]
Re: (Score:2)
Re: (Score:2)
Yeah, I know about that, but I think the reason that it is stagnant, is because it's not really a part of "Slashdot", it's off in it's own little URL world, it should be merged into slashdot.org
Re:I think (Score:5, Funny)
>Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
I remember playing Call of Cthulhu many years ago and being told of the hideously deranging results of mere mortals who happened to gaze upon the unspeakable things that lurked in the dark places.
I beg you not to lead others down your insane and twisting path.
NO GOOD CAN COME OF IT! NO GOOOD!
Re: (Score:2)
Re: (Score:2)
Uve Boll (Score:3, Funny)
It's not Uve (Score:3, Informative)
Re: (Score:2)
Re: (Score:1)
Make it Michael Bay and I'd say we have a winnah!
I used to be very elitist about my reading (Score:2, Interesting)
Back when I first began using Amazon.com, I never bought a book based on the recommended items. I felt the recommendations were trite, ill-advised, and typically only peripherally related to the item I was buying.
Then the recommendations got better. Much better. I started to find myself buying things right out of the recommended section, and the product combination deals also became very tempting.
If Netflix can turn their recommendation engine into something similar, they will be sitting on a goldmine. As t
Re: (Score:1)
The only assumption that I made was that the recommendation engine could be improved.
With approximately 10,000 subscribers (as of 2008), and 1.3B in revenues from these subscribers, even a 1% increase in rentals would be worth 10 times the 1M they are paying to the winner of this contest.
Amazon has almost 20B in revenues from a much larger group of customers. A 1% increase per customer here would be huge.
Netflix, in addition to increasing the number of rentals per customer, should also be thinking about inc
Re: (Score:2)
Wow, you need movies & books to be recommended?
I have far more movies & books that I'm at least *vaguely* interested in than I can 'consume'. (A large part of the reason I started using the Netflix profile system was because of the 500 item limit in the queue.. and yes, I realize I won't ever watch the VAST majority of them.. but I would add movies/TV shows/documentaries that sounded interesting, and hit the limit. Note obviously a lot of the multiple items are separate discs in a collection, such
Why now? (Score:1, Insightful)
Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
Re: (Score:2)
Maybe they already have a solution which is higher and they are just being dicks? Maybe they aren't dicks at all and want to see the best team win? Maybe they think that their solution is unbeatable?
Whatever it is, it is certainly a lot more interesting than I thought it'd ever be. Kudos to the groups that have broken the 10% barrier!
Re: (Score:3, Insightful)
Re:Why now? (Score:5, Interesting)
Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
For the grand prize, there was a final 30-day countdown from the time the first entry that achieved greater than 10% was received, which was a month ago. So it seems like this will indeed come down to an ebay-like sniping situation in the last few hours.
I wouldn't feel too sorry for BellKor/KorBell though -- they've got many, many best paper awards at conferences and a huge degree of publicity out of the whole endeavor. In fact, in KDD 2009, they detailed most of the methods that most likely got them to the top -- i.e. they incorporated the fact that tastes and preferences drift over time. Simple, in retrospect of course. If you have an ACM subscription, you can read the 2009 paper here [acm.org].
Plus, since they work for AT&T/Yahoo Research, I remember Yehuda Koren stating that the money wouldn't have gone to them anyway -- possibly a large bonus, but I think they're entitled to that anyway. So I wouldn't feel too sorry for them.
Re:Why now? (Score:5, Informative)
The leaderboard gives score on the QUIZ dataset, which is half of the answers that the team submits. The WINNER of the million dollars is the person who does best on the TEST dataset, the other half of the answers they submit. Nobody knows how good these guys are doing on the TEST set, either team could be overfitting [wikipedia.org] the quiz set.
Re: (Score:1)
Yeah, there is a flaw in the evaluation mechanism, in my opinion. The good thing is that you don't need to hit 10% on the test set to win the money. Whatever team is qualified (10% on quiz) AND has the best test score wins. Even if they have terribly overfit the quiz set (the quiz set has been around fo
Re: (Score:3, Informative)
Re: (Score:1)
Try and flush out the competition, maybe? (unles it really is the best they have, or think they'll have).
Or perhaps try to lull the competition into a false sense of security by only edging them by a hair, when they something better held back?
Of course, with the amount of effort the teams have put into this, and the money at stake, you'd be nuts not to keep working on it flat-out until the time runs out; but still, if you're tired it could make a difference if you think you've got the competition by a comfo
Re: (Score:1)
should've "gamed" it (Score:5, Interesting)
Re:should've "gamed" it (Score:5, Insightful)
Re: (Score:2)
I'd be very surprised if Belkor doesn't have something better to submit at the last second.
It'd certainly have been an awful strategy to trigger the endgame with all your cards on the table.
Re:should've "gamed" it (Score:5, Insightful)
This isn't eBay, they can't just magic high scores.
If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
Re: (Score:2)
If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
OR maybe they can do better than 10.8% but because they thought they had it in the bag, they didn't put the extra effort in to really push those improvements through and now, with less than a day left, they don't have the time to get those improvements fully polished enough for submission
This isn't eBay, they can't just magic high scores.
Actually this is precisely like ebay. It appears that the prize got "sniped" out from under BellKor. The problem, just like ebay, is that the process has a fixed end-date. The way to avoid this problem (and produce the b
Re: (Score:2)
Actually, Netflix used a different way to prevent gaming the system. They split the submitted predictions into two sets - the "quiz" set and the "test" set. The quiz set results are on the leaderboard; the test set is used for final judging.
Re: (Score:2)
Maybe they did, and the 10.08 (pretty minimal increase from 10) was their low end result, and they will announce their 25% increase result in the coming day..
Then again, maybe they didn't :-)
Re: (Score:2)
The 10.08 was a 10.08% improvement over the original system. That's not exactly a minimal increase, and considering that the new leaders posted a 10.09% improvement over the original (0.0098% better than 10.08%) it's rather harsh to write off the 10.08% improvement as "pretty minimal".
Re: (Score:2)
Does Mighty Mouse come in time to save the day?
Tune in next week, to see the Action-packed conclusion!
Re: (Score:2)
True, and if only their own interest counts, that would be a good choice.
Things is, it's not good sportsmanship to "game the rules" that way.
Re: (Score:3, Insightful)
I don't think that this contest is about honor.
Re: (Score:2)
Basically impossible. The teams cannot compute their improvement. Netflix computes the improvement. The improvement is computed on a "secret" test dataset that only Netflix has access to. The models are developed on a public dataset available to everyone.
Ensemble learning (Score:1)
Re:Ensemble learning (Score:5, Informative)
Many teams actually combined multiple methods to get a better score. In fact, "BellKor's Pragmatic Chaos" is a combination of three teams, I'm guessing - BellKor, BigChaos and Pragmatic Theory.
Also, it helps to remember that what's posted on the leaderboard is the result of the "quiz" set - half of the actual set of recommendations you're asked to make. The other half, the "test set," is used for final judging. With such a small difference between BellKor's Pragmatic Chaos and The Ensemble on the quiz set (.0001 RMSE), the test set rank may actually end up reversed.
Re: (Score:2)
Actually it is not about averaging out. It's about building a better classifier from many good ones. See Adaboost [wikipedia.org].
Re: (Score:1)
Algorithms? (Score:1)
I thought Vandelay was into manufacturing latex.
Re: (Score:2)
Any winner at all? (Score:5, Interesting)
Re:Any winner at all? (Score:4, Insightful)
Why would that disqualify them? The didn't form multiple teams, they did the opposite -- they started with multiple teams and then merged them into one, abandoning or deleting the old, multiple accounts.
I suppose you could speculate that the teams weren't ever independent, but I think that's fairly obviously not the case.
Re: (Score:3, Funny)
Call me crazy, but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
Actually, yes, I think I will call you crazy.
Re: (Score:3, Funny)
Okay, you're crazy :-)
So, there's approximately minus 2855 days left?
I just want to know if netflix gets to keep John Titor's time machine [wikipedia.org] ... the time frame (2001) is right ...
Re: (Score:2)
Competition had 30 days to submit after the qualifying submission was presented. From your link: "After three (3) months have elapsed from the start of the Contest, when the RMSE of a submitted prediction set on the quiz subset improves beyond the qualifying RMSE an electronic announcement will inform all registered Participants that they have thirty (30) days to submit additional candidate prediction sets to be considered for judging."
Re: (Score:2)
(Unless someone reaches the 10% goal *before* the end date).
Sometimes better design beats better algorythms (Score:3, Insightful)
Re: (Score:1)
Re: (Score:2, Insightful)
That'd be all your fault. You should be creating separate account profiles for yourself and your wife.
Re: (Score:1)
Yeah, I should totally jump through hoops to improve their ability to sell to me. Just because it would make the programmers lives easier :)
No, if Netflix wants to sell more, they should follow up on that recommendation and make it very very easy to have multiple identities on a given account and a button on the page to switch them.
The reason is that there is a difference between the information needs of the administration of purchases (tied to an account in a 1:1 relationship) and the information needs of
Re:Sometimes better design beats better algorythms (Score:4, Informative)
Yeah, they do. see "Your account", "Account profiles". And then there's a dropdown on the top of the page. I don't see how they could make it much easier.
Re: (Score:2)
At least some of the ensemble modeling techniques handle this just fine. They will develop classifiers that detect your ratings, classifiers that detect her ratings and classifiers that detect your joint ratings. See the previous citation for adaboost at wikipedia. They do this by looking at error from a given classifier, and finding additional weak classifiers that address the error. So if your wife likes schwarzenegger movies, your liking for tear jerkers will show up as errors, and the algorithm will
Re: (Score:2, Insightful)
Data sets like this are always have garbage. There's the jackass that rates everything 5 stars. There's the jackass that rates everything 1 star. There's the jackass that rates the worst movies by consensus 5 stars, and vis versa.
There are 61,441,618 ratings by 478,548 unique users in the publicly available training set.
It just doesn't matter.
10.1% (Score:1)
Be afraid.... be very afraid... (Score:4, Interesting)
EPIC (Score:2)
Creepy (Score:1)
Chicago Web Design [transcendevelopment.com]