New Leader In Netflix Prize Race With One Day To Go 87

Posted by Soulskill on Sunday July 26, 2009 @10:15AM from the sniped-like-an-ebayer dept.

brajesh writes "The Netflix Prize, an algorithm competition to improve the Netflix Cinematch recommendation system by more than 10%, has a new leader — The Ensemble — just one day before the competition ends. The 30-day race to the end was kicked off after BellKor's Pragmatic Chaos submitted the first entry to break the 10% barrier, with the results showing a 10.08% improvement. The Ensemble, made up of three teams who chose to join forces ('Grand Prize Team,' 'Opera Solutions' and 'Vandelay United), has managed to overtake BellKor with a score of 10.09% — an improvement of .01% over the former leaders. From the article on Techcrunch: 'The competition will end [today], so teams still have a little bit of time left to make their last-second submissions, but things are looking good for The Ensemble. This has to be absolutely brutal for team BellKor.'"

This discussion has been archived. No new comments can be posted.

New Leader In Netflix Prize Race With One Day To Go

Load All Comments

Search 87 Comments Log In/Create an Account

Comments Filter:

I think (Score:5, Insightful)

by sys.stdout.write ( 1551563 ) writes: on Sunday July 26, 2009 @10:18AM (#28826715)

that other websites should do this as well.

Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  Are you joking? Slash is written in Perl, the best maintenance method is too start again.
  
  (Joking, partly).
  - - - Re: (Score:1)
        
        by brentonboy ( 1067468 ) writes:
        
        Syntax is a subset of grammar, you insensitive clod!!
        Ha! But you're equivocating, of course. He means code syntax.
- Re: (Score:3, Insightful)
  
  by Vectronic ( 1221470 ) writes:
  
  (-1 Offtopic) But, I've sort of hoped that a site, such as Slashdot, should somehow open-source their site code, it a sort of "community", and considering the context of the site, the amount of users, there are probably about 5,000 people capable of contributing decent code/help, and there has to be a rather significant number of those that are willing to.
  Add a section devoted to it, then Polls, about which contribution should be implemented, etc. Articles/Submission are sort of (controlled) "open-source",
  - - Re: (Score:1)
      
      by Exception Duck ( 1524809 ) writes:
      
      This has been empty for a while now:
      http://www.slashcode.com/sites.pl [slashcode.com]
  - Re: (Score:2)
    
    by caramelcarrot ( 778148 ) writes:
    
    There is slashcode, but that project seems to be stagnant. http://www.slashcode.com/ [slashcode.com]
    - Re: (Score:2)
      
      by Vectronic ( 1221470 ) writes:
      
      Yeah, I know about that, but I think the reason that it is stagnant, is because it's not really a part of "Slashdot", it's off in it's own little URL world, it should be merged into slashdot.org
- Re:I think (Score:5, Funny)
  
  by Blue Stone ( 582566 ) writes: on Sunday July 26, 2009 @10:40AM (#28826901) Homepage Journal
  
  >Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
  I remember playing Call of Cthulhu many years ago and being told of the hideously deranging results of mere mortals who happened to gaze upon the unspeakable things that lurked in the dark places.
  I beg you not to lead others down your insane and twisting path.
  NO GOOD CAN COME OF IT! NO GOOOD!
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by houstonbofh ( 602064 ) writes:
    
    So you are saying that looking at all of the slashdot code, and actually understanding it breaks your mind? Well that explains this nasty system choking javascript then.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
Uve Boll (Score:3, Funny)

by Afforess ( 1310263 ) writes: <afforess@gmail.com> on Sunday July 26, 2009 @10:19AM (#28826717) Journal

What did they do, make sure that all of Uve Boll's movies never came up as a "Recommended for you" movie?

Share
twitter facebook
- It's not Uve (Score:3, Informative)
  
  by thetoadwarrior ( 1268702 ) writes:
  
  Uwe Boll. It only sounds like a v because he's German.
- Re: (Score:2)
  
  by Anubis IV ( 1279820 ) writes:
  
  That was the first 9%, but that last 1.08% took a bit more thinking.
- Re: (Score:1)
  
  by strat ( 39913 ) writes:
  
  Make it Michael Bay and I'd say we have a winnah!
I used to be very elitist about my reading (Score:2, Interesting)

by BadAnalogyGuy ( 945258 ) writes:

Back when I first began using Amazon.com, I never bought a book based on the recommended items. I felt the recommendations were trite, ill-advised, and typically only peripherally related to the item I was buying.
Then the recommendations got better. Much better. I started to find myself buying things right out of the recommended section, and the product combination deals also became very tempting.
If Netflix can turn their recommendation engine into something similar, they will be sitting on a goldmine. As t
- - Re: (Score:1)
    
    by BadAnalogyGuy ( 945258 ) writes:
    
    The only assumption that I made was that the recommendation engine could be improved.
    With approximately 10,000 subscribers (as of 2008), and 1.3B in revenues from these subscribers, even a 1% increase in rentals would be worth 10 times the 1M they are paying to the winner of this contest.
    Amazon has almost 20B in revenues from a much larger group of customers. A 1% increase per customer here would be huge.
    Netflix, in addition to increasing the number of rentals per customer, should also be thinking about inc
- Re: (Score:2)
  
  by mattack2 ( 1165421 ) writes:
  
  Wow, you need movies & books to be recommended?
  I have far more movies & books that I'm at least *vaguely* interested in than I can 'consume'. (A large part of the reason I started using the Netflix profile system was because of the 500 item limit in the queue.. and yes, I realize I won't ever watch the VAST majority of them.. but I would add movies/TV shows/documentaries that sounded interesting, and hit the limit. Note obviously a lot of the multiple items are separate discs in a collection, such
Why now? (Score:1, Insightful)

by Anonymous Coward writes:

Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
- Re: (Score:2)
  
  by garcia ( 6573 ) writes:
  
  Maybe they already have a solution which is higher and they are just being dicks? Maybe they aren't dicks at all and want to see the best team win? Maybe they think that their solution is unbeatable?
  Whatever it is, it is certainly a lot more interesting than I thought it'd ever be. Kudos to the groups that have broken the 10% barrier!
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  It does seem like a slight flaw in the rules if there is only one 30-day countdown timer. That is, if a competing team can hold off until the last moment to release their version that bests the current leader, as is the case here. Now that this improvement has been made public, there should be something like a 10-day response time for the other competing teams.
- Re:Why now? (Score:5, Interesting)
  
  by caffeinemessiah ( 918089 ) writes: on Sunday July 26, 2009 @10:47AM (#28826961) Journal
  
  Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
  For the grand prize, there was a final 30-day countdown from the time the first entry that achieved greater than 10% was received, which was a month ago. So it seems like this will indeed come down to an ebay-like sniping situation in the last few hours.
  I wouldn't feel too sorry for BellKor/KorBell though -- they've got many, many best paper awards at conferences and a huge degree of publicity out of the whole endeavor. In fact, in KDD 2009, they detailed most of the methods that most likely got them to the top -- i.e. they incorporated the fact that tastes and preferences drift over time. Simple, in retrospect of course. If you have an ACM subscription, you can read the 2009 paper here [acm.org].
  Plus, since they work for AT&T/Yahoo Research, I remember Yehuda Koren stating that the money wouldn't have gone to them anyway -- possibly a large bonus, but I think they're entitled to that anyway. So I wouldn't feel too sorry for them.
  
  Parent Share
  twitter facebook
  - Re:Why now? (Score:5, Informative)
    
    by brian_tanner ( 1022773 ) writes: on Sunday July 26, 2009 @12:49PM (#28827871)
    
    It's also true that the winner is not the person who gets the highest score on the leaderboard. Most people seem to miss this.
    
    The leaderboard gives score on the QUIZ dataset, which is half of the answers that the team submits. The WINNER of the million dollars is the person who does best on the TEST dataset, the other half of the answers they submit. Nobody knows how good these guys are doing on the TEST set, either team could be overfitting [wikipedia.org] the quiz set.
    
    Parent Share
    twitter facebook
    - - Re: (Score:1)
        
        by brian_tanner ( 1022773 ) writes:
        
        Yeah, but if they're not hitting the 10% mark on the quiz set, then they're probably not going to hit the target 10% on the test set either, regardless of whether they're overfitting to the public data.
        Yeah, there is a flaw in the evaluation mechanism, in my opinion. The good thing is that you don't need to hit 10% on the test set to win the money. Whatever team is qualified (10% on quiz) AND has the best test score wins. Even if they have terribly overfit the quiz set (the quiz set has been around fo
    - Re: (Score:3, Informative)
      
      by currivan ( 654314 ) writes:
      
      In fact, according to the second post by Yehuda Koren in this thread, it looks like BelKor does have the best test error rate and will be declared the winner. http://www.netflixprize.com/community/viewtopic.php?id=1498 [netflixprize.com]
- Re: (Score:1)
  
  by SpinyNorman ( 33776 ) writes:
  
  Try and flush out the competition, maybe? (unles it really is the best they have, or think they'll have).
  Or perhaps try to lull the competition into a false sense of security by only edging them by a hair, when they something better held back?
  Of course, with the amount of effort the teams have put into this, and the money at stake, you'd be nuts not to keep working on it flat-out until the time runs out; but still, if you're tired it could make a difference if you think you've got the competition by a comfo
- Re: (Score:1)
  
  by tonycheese ( 921278 ) writes:
  
  Well, perhaps they did not know exactly how Netflix would rate their efficiency until after a submission. .01% is a pretty close difference, and they might not have known whether they would overtake first or not without submitting and having their algorithm run by Netflix.
should've "gamed" it (Score:5, Interesting)

by petes_PoV ( 912422 ) writes: on Sunday July 26, 2009 @10:29AM (#28826799)

rather than declaring your best result early, the Belkor team should have employed a bit of strategy and only declared a lesser result (if any). That would give the other teams something to aim at, without giving away their best results. These would be held back right up until the last minute and then submitted, so that other teams would not have time to make any further improvements (in fact, maybe this IS what they're doing). It's been a successful bidding strategy on eBay for years, so why wouldn't it translate into other competitive areas too?

Share
twitter facebook
- Re:should've "gamed" it (Score:5, Insightful)
  
  by stuckinarut ( 891702 ) writes: on Sunday July 26, 2009 @10:34AM (#28826837)
  
  Who's to say they haven't? People smart enough to win this competition are probably smart enough to think of this.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by SpinyNorman ( 33776 ) writes:
  
  I'd be very surprised if Belkor doesn't have something better to submit at the last second.
  It'd certainly have been an awful strategy to trigger the endgame with all your cards on the table.
- Re:should've "gamed" it (Score:5, Insightful)
  
  by Manip ( 656104 ) writes: on Sunday July 26, 2009 @10:36AM (#28826865)
  
  This isn't eBay, they can't just magic high scores.
  If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Jah-Wren Ryel ( 80510 ) writes:
    
    If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
    OR maybe they can do better than 10.8% but because they thought they had it in the bag, they didn't put the extra effort in to really push those improvements through and now, with less than a day left, they don't have the time to get those improvements fully polished enough for submission
    This isn't eBay, they can't just magic high scores.
    Actually this is precisely like ebay. It appears that the prize got "sniped" out from under BellKor. The problem, just like ebay, is that the process has a fixed end-date. The way to avoid this problem (and produce the b
    - Re: (Score:2)
      
      by Stile 65 ( 722451 ) writes:
      
      Actually, Netflix used a different way to prevent gaming the system. They split the submitted predictions into two sets - the "quiz" set and the "test" set. The quiz set results are on the leaderboard; the test set is used for final judging.
- Re: (Score:2)
  
  by mrvan ( 973822 ) writes:
  
  Maybe they did, and the 10.08 (pretty minimal increase from 10) was their low end result, and they will announce their 25% increase result in the coming day..
  Then again, maybe they didn't :-)
  - Re: (Score:2)
    
    by MartinSchou ( 1360093 ) writes:
    
    The 10.08 was a 10.08% improvement over the original system. That's not exactly a minimal increase, and considering that the new leaders posted a 10.09% improvement over the original (0.0098% better than 10.08%) it's rather harsh to write off the 10.08% improvement as "pretty minimal".
  - Re: (Score:2)
    
    by MrShaggy ( 683273 ) writes:
    
    Does Mighty Mouse come in time to save the day?
    Tune in next week, to see the Action-packed conclusion!
- Re: (Score:2)
  
  by shentino ( 1139071 ) writes:
  
  True, and if only their own interest counts, that would be a good choice.
  Things is, it's not good sportsmanship to "game the rules" that way.
  - Re: (Score:3, Insightful)
    
    by Sancho ( 17056 ) writes:
    
    I don't think that this contest is about honor.
- Re: (Score:2)
  
  by flynt ( 248848 ) writes:
  
  Basically impossible. The teams cannot compute their improvement. Netflix computes the improvement. The improvement is computed on a "secret" test dataset that only Netflix has access to. The models are developed on a public dataset available to everyone.
Ensemble learning (Score:1)

by mysterons ( 1472839 ) writes:

I'm actually surprised that this hasn't been done before. You can prove that using multiple models will on average produce better results than using any single model in isolation. For example, each netflix system will make different errors; using multiple systems will tend to average-out these errors and the consensus decision is most likely to be correct.
- Re:Ensemble learning (Score:5, Informative)
  
  by Stile 65 ( 722451 ) writes: on Sunday July 26, 2009 @10:58AM (#28827037) Homepage Journal
  
  Many teams actually combined multiple methods to get a better score. In fact, "BellKor's Pragmatic Chaos" is a combination of three teams, I'm guessing - BellKor, BigChaos and Pragmatic Theory.
  Also, it helps to remember that what's posted on the leaderboard is the result of the "quiz" set - half of the actual set of recommendations you're asked to make. The other half, the "test set," is used for final judging. With such a small difference between BellKor's Pragmatic Chaos and The Ensemble on the quiz set (.0001 RMSE), the test set rank may actually end up reversed.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by janwedekind ( 778872 ) writes:
  
  Actually it is not about averaging out. It's about building a better classifier from many good ones. See Adaboost [wikipedia.org].
  - Re: (Score:1)
    
    by mysterons ( 1472839 ) writes:
    
    Well, you really want to think about bias/variance reductions which brings ideas of averaging and using better classifiers together. For example, "bagging" can be thought of as a variance-reduction technique; "boosting" does both if I recall.
Algorithms? (Score:1)

by wkurzius ( 1014229 ) writes:

I thought Vandelay was into manufacturing latex.
- Re: (Score:2)
  
  by frieko ( 855745 ) writes:
  
  They're thinking of quitting the exporting, and adding more import statements. And this is causing a problem, because, why not do both?
Any winner at all? (Score:5, Interesting)

by Fnord666 ( 889225 ) writes: on Sunday July 26, 2009 @11:42AM (#28827315) Journal

My question is whether there will be any winner at all other than netflix? One of the rules for the competition was that you could not form multiple teams. This was to prevent people from gaining multiple submissions per day. Otherwise a five person group could create 30 teams and thus be able to submit 30 attempts per day. I believe both teams that have exceeded the 10% threshold and thus are eligible for the grand prize are composed of members from other teams and could be disqualified.

Share
twitter facebook
- Re:Any winner at all? (Score:4, Insightful)
  
  by ceoyoyo ( 59147 ) writes: on Sunday July 26, 2009 @11:57AM (#28827447)
  
  Why would that disqualify them? The didn't form multiple teams, they did the opposite -- they started with multiple teams and then merged them into one, abandoning or deleting the old, multiple accounts.
  I suppose you could speculate that the teams weren't ever independent, but I think that's fairly obviously not the case.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Funny)
  
  by Qubit ( 100461 ) writes:
  
  Call me crazy, but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
  Actually, yes, I think I will call you crazy.
- Re: (Score:3, Funny)
  
  by tomhudson ( 43916 ) writes:
  
  Call me crazy,
  
  Okay, you're crazy :-)
  but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
  So, there's approximately minus 2855 days left?
  I just want to know if netflix gets to keep John Titor's time machine [wikipedia.org] ... the time frame (2001) is right ...
- Re: (Score:2)
  
  by sleeper0 ( 319432 ) writes:
  
  Competition had 30 days to submit after the qualifying submission was presented. From your link: "After three (3) months have elapsed from the start of the Contest, when the RMSE of a submitted prediction set on the quiz subset improves beyond the qualifying RMSE an electronic announcement will inform all registered Participants that they have thirty (30) days to submit additional candidate prediction sets to be considered for judging."
- Re: (Score:2)
  
  by daveime ( 1253762 ) writes:
  
  (Unless someone reaches the 10% goal *before* the end date).
Sometimes better design beats better algorythms (Score:3, Insightful)

by davidannis ( 939047 ) writes: on Sunday July 26, 2009 @12:28PM (#28827713) Homepage

They could improve the predictive value immensely if they allowed me and my wife to each rank the movies we watch together separately. With the current system, some movies are rated by just me, some by just her, and some have a consensus rating. It leads to a dataset full of garbage.

Share
twitter facebook
- Re: (Score:1)
  
  by memristance ( 1285036 ) writes:
  
  This brings up an interesting point. The Netflix algorithm is working from flawed/incomplete data generated from poor design decisions, so no matter how good the algorithm gets it still won't be able to accurately predict what movies will actually interest people based on a very subjective unidimensional rating. For example, the same person might rate a movie differently under differing conditions, and the rating itself may hinge entirely on one thing in the movie (s)he did(n't) like, whereas the movie mi
- Re: (Score:2, Insightful)
  
  by Hawke666 ( 260367 ) writes:
  
  That'd be all your fault. You should be creating separate account profiles for yourself and your wife.
  - Re: (Score:1)
    
    by St.Creed ( 853824 ) writes:
    
    Yeah, I should totally jump through hoops to improve their ability to sell to me. Just because it would make the programmers lives easier :)
    No, if Netflix wants to sell more, they should follow up on that recommendation and make it very very easy to have multiple identities on a given account and a button on the page to switch them.
    The reason is that there is a difference between the information needs of the administration of purchases (tied to an account in a 1:1 relationship) and the information needs of
    - Re:Sometimes better design beats better algorythms (Score:4, Informative)
      
      by Hawke666 ( 260367 ) writes: on Sunday July 26, 2009 @04:17PM (#28829413) Homepage
      
      Yeah, they do. see "Your account", "Account profiles". And then there's a dropdown on the top of the page. I don't see how they could make it much easier.
      
      Parent Share
      twitter facebook
  - Re: (Score:2)
    
    by Alpha830RulZ ( 939527 ) writes:
    
    At least some of the ensemble modeling techniques handle this just fine. They will develop classifiers that detect your ratings, classifiers that detect her ratings and classifiers that detect your joint ratings. See the previous citation for adaboost at wikipedia. They do this by looking at error from a given classifier, and finding additional weak classifiers that address the error. So if your wife likes schwarzenegger movies, your liking for tear jerkers will show up as errors, and the algorithm will
- Re: (Score:2, Insightful)
  
  by coaxial ( 28297 ) writes:
  
  Data sets like this are always have garbage. There's the jackass that rates everything 5 stars. There's the jackass that rates everything 1 star. There's the jackass that rates the worst movies by consensus 5 stars, and vis versa.
  There are 61,441,618 ratings by 478,548 unique users in the publicly available training set.
  It just doesn't matter.
10.1% (Score:1)

by paxcoder ( 1222556 ) writes:

According to the linked leaderboard it's 10.10% for Ensemble, and 10.09% for BellKor's Pragmatic Chaos.
Be afraid.... be very afraid... (Score:4, Interesting)

by Baldrson ( 78598 ) * writes: on Sunday July 26, 2009 @04:17PM (#28829417) Homepage Journal

It's interesting that the fearmongering of the prior /. post about AI got hundreds of responses but this /. post, which is far more relevant to real AI, has gotten less than a hundred responses thus far. Anyway, congratulations to Netfilx for doing the right thing for their business in response to The Hutter Prize.

Share
twitter facebook
EPIC (Score:2)

by idontneedanickname ( 570477 ) writes:

I think this is bringing us one step closer to EPIC [wikipedia.org] (video [albinoblacksheep.com]).
Creepy (Score:1)

by TranscenDev ( 1602045 ) writes:

While I would appreciate some good movie recommendations, I can't help but feel a little creeped out that netflix may be able to read my mind one day....maybe I can make up a movie in my imagination and netflix can play it for me! ~Ami
Chicago Web Design [transcendevelopment.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I think (Score:5, Insightful)

Re: (Score:3, Funny)

Re: (Score:1)

Re: (Score:3, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:I think (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Uve Boll (Score:3, Funny)

It's not Uve (Score:3, Informative)

Re: (Score:2)

Re: (Score:1)

I used to be very elitist about my reading (Score:2, Interesting)

Re: (Score:1)

Re: (Score:2)

Why now? (Score:1, Insightful)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Why now? (Score:5, Interesting)

Re:Why now? (Score:5, Informative)

Re: (Score:1)

Re: (Score:3, Informative)

Re: (Score:1)

Re: (Score:1)

should've "gamed" it (Score:5, Interesting)

Re:should've "gamed" it (Score:5, Insightful)

Re: (Score:2)

Re:should've "gamed" it (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Ensemble learning (Score:1)

Re:Ensemble learning (Score:5, Informative)

Re: (Score:2)

Re: (Score:1)

Algorithms? (Score:1)

Re: (Score:2)

Any winner at all? (Score:5, Interesting)

Re:Any winner at all? (Score:4, Insightful)

Re: (Score:3, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Sometimes better design beats better algorythms (Score:3, Insightful)

Re: (Score:1)

Re: (Score:2, Insightful)

Re: (Score:1)

Re:Sometimes better design beats better algorythms (Score:4, Informative)

Re: (Score:2)

Re: (Score:2, Insightful)

10.1% (Score:1)

Be afraid.... be very afraid... (Score:4, Interesting)

EPIC (Score:2)

Creepy (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals