Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming Media Movies IT Technology

New Leader In Netflix Prize Race With One Day To Go 87

brajesh writes "The Netflix Prize, an algorithm competition to improve the Netflix Cinematch recommendation system by more than 10%, has a new leader — The Ensemble — just one day before the competition ends. The 30-day race to the end was kicked off after BellKor's Pragmatic Chaos submitted the first entry to break the 10% barrier, with the results showing a 10.08% improvement. The Ensemble, made up of three teams who chose to join forces ('Grand Prize Team,' 'Opera Solutions' and 'Vandelay United), has managed to overtake BellKor with a score of 10.09% — an improvement of .01% over the former leaders. From the article on Techcrunch: 'The competition will end [today], so teams still have a little bit of time left to make their last-second submissions, but things are looking good for The Ensemble. This has to be absolutely brutal for team BellKor.'"
This discussion has been archived. No new comments can be posted.

New Leader In Netflix Prize Race With One Day To Go

Comments Filter:
  • I think (Score:5, Insightful)

    by sys.stdout.write ( 1551563 ) on Sunday July 26, 2009 @09:18AM (#28826715)
    that other websites should do this as well.

    Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
    • Re: (Score:3, Funny)

      by Anonymous Coward
      Are you joking? Slash is written in Perl, the best maintenance method is too start again.

      (Joking, partly).
    • Re: (Score:3, Insightful)

      by Vectronic ( 1221470 )

      (-1 Offtopic) But, I've sort of hoped that a site, such as Slashdot, should somehow open-source their site code, it a sort of "community", and considering the context of the site, the amount of users, there are probably about 5,000 people capable of contributing decent code/help, and there has to be a rather significant number of those that are willing to.

      Add a section devoted to it, then Polls, about which contribution should be implemented, etc. Articles/Submission are sort of (controlled) "open-source",

    • Re:I think (Score:5, Funny)

      by Blue Stone ( 582566 ) on Sunday July 26, 2009 @09:40AM (#28826901) Homepage Journal

      >Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.

      I remember playing Call of Cthulhu many years ago and being told of the hideously deranging results of mere mortals who happened to gaze upon the unspeakable things that lurked in the dark places.

      I beg you not to lead others down your insane and twisting path.

      NO GOOD CAN COME OF IT! NO GOOOD!

  • Uve Boll (Score:3, Funny)

    by Afforess ( 1310263 ) <afforess@gmail.com> on Sunday July 26, 2009 @09:19AM (#28826717) Journal
    What did they do, make sure that all of Uve Boll's movies never came up as a "Recommended for you" movie?
  • Back when I first began using Amazon.com, I never bought a book based on the recommended items. I felt the recommendations were trite, ill-advised, and typically only peripherally related to the item I was buying.

    Then the recommendations got better. Much better. I started to find myself buying things right out of the recommended section, and the product combination deals also became very tempting.

    If Netflix can turn their recommendation engine into something similar, they will be sitting on a goldmine. As t

    • Wow, you need movies & books to be recommended?

      I have far more movies & books that I'm at least *vaguely* interested in than I can 'consume'. (A large part of the reason I started using the Netflix profile system was because of the 500 item limit in the queue.. and yes, I realize I won't ever watch the VAST majority of them.. but I would add movies/TV shows/documentaries that sounded interesting, and hit the limit. Note obviously a lot of the multiple items are separate discs in a collection, such

  • Why now? (Score:1, Insightful)

    by Anonymous Coward

    Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?

    • by garcia ( 6573 )

      Maybe they already have a solution which is higher and they are just being dicks? Maybe they aren't dicks at all and want to see the best team win? Maybe they think that their solution is unbeatable?

      Whatever it is, it is certainly a lot more interesting than I thought it'd ever be. Kudos to the groups that have broken the 10% barrier!

    • Re: (Score:3, Insightful)

      by Anonymous Coward
      It does seem like a slight flaw in the rules if there is only one 30-day countdown timer. That is, if a competing team can hold off until the last moment to release their version that bests the current leader, as is the case here. Now that this improvement has been made public, there should be something like a 10-day response time for the other competing teams.
    • Re:Why now? (Score:5, Interesting)

      by caffeinemessiah ( 918089 ) on Sunday July 26, 2009 @09:47AM (#28826961) Journal

      Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?

      For the grand prize, there was a final 30-day countdown from the time the first entry that achieved greater than 10% was received, which was a month ago. So it seems like this will indeed come down to an ebay-like sniping situation in the last few hours.

      I wouldn't feel too sorry for BellKor/KorBell though -- they've got many, many best paper awards at conferences and a huge degree of publicity out of the whole endeavor. In fact, in KDD 2009, they detailed most of the methods that most likely got them to the top -- i.e. they incorporated the fact that tastes and preferences drift over time. Simple, in retrospect of course. If you have an ACM subscription, you can read the 2009 paper here [acm.org].

      Plus, since they work for AT&T/Yahoo Research, I remember Yehuda Koren stating that the money wouldn't have gone to them anyway -- possibly a large bonus, but I think they're entitled to that anyway. So I wouldn't feel too sorry for them.

      • Re:Why now? (Score:5, Informative)

        by brian_tanner ( 1022773 ) on Sunday July 26, 2009 @11:49AM (#28827871)
        It's also true that the winner is not the person who gets the highest score on the leaderboard. Most people seem to miss this.

        The leaderboard gives score on the QUIZ dataset, which is half of the answers that the team submits. The WINNER of the million dollars is the person who does best on the TEST dataset, the other half of the answers they submit. Nobody knows how good these guys are doing on the TEST set, either team could be overfitting [wikipedia.org] the quiz set.
        • Re: (Score:3, Informative)

          by currivan ( 654314 )
          In fact, according to the second post by Yehuda Koren in this thread, it looks like BelKor does have the best test error rate and will be declared the winner. http://www.netflixprize.com/community/viewtopic.php?id=1498 [netflixprize.com]
    • Try and flush out the competition, maybe? (unles it really is the best they have, or think they'll have).

      Or perhaps try to lull the competition into a false sense of security by only edging them by a hair, when they something better held back?

      Of course, with the amount of effort the teams have put into this, and the money at stake, you'd be nuts not to keep working on it flat-out until the time runs out; but still, if you're tired it could make a difference if you think you've got the competition by a comfo

    • Well, perhaps they did not know exactly how Netflix would rate their efficiency until after a submission. .01% is a pretty close difference, and they might not have known whether they would overtake first or not without submitting and having their algorithm run by Netflix.
  • should've "gamed" it (Score:5, Interesting)

    by petes_PoV ( 912422 ) on Sunday July 26, 2009 @09:29AM (#28826799)
    rather than declaring your best result early, the Belkor team should have employed a bit of strategy and only declared a lesser result (if any). That would give the other teams something to aim at, without giving away their best results. These would be held back right up until the last minute and then submitted, so that other teams would not have time to make any further improvements (in fact, maybe this IS what they're doing). It's been a successful bidding strategy on eBay for years, so why wouldn't it translate into other competitive areas too?
    • by stuckinarut ( 891702 ) on Sunday July 26, 2009 @09:34AM (#28826837)
      Who's to say they haven't? People smart enough to win this competition are probably smart enough to think of this.
    • I'd be very surprised if Belkor doesn't have something better to submit at the last second.

      It'd certainly have been an awful strategy to trigger the endgame with all your cards on the table.

    • by Manip ( 656104 ) on Sunday July 26, 2009 @09:36AM (#28826865)

      This isn't eBay, they can't just magic high scores.

      If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.

      • If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.

        OR maybe they can do better than 10.8% but because they thought they had it in the bag, they didn't put the extra effort in to really push those improvements through and now, with less than a day left, they don't have the time to get those improvements fully polished enough for submission

        This isn't eBay, they can't just magic high scores.

        Actually this is precisely like ebay. It appears that the prize got "sniped" out from under BellKor. The problem, just like ebay, is that the process has a fixed end-date. The way to avoid this problem (and produce the b

        • Actually, Netflix used a different way to prevent gaming the system. They split the submitted predictions into two sets - the "quiz" set and the "test" set. The quiz set results are on the leaderboard; the test set is used for final judging.

    • by mrvan ( 973822 )

      Maybe they did, and the 10.08 (pretty minimal increase from 10) was their low end result, and they will announce their 25% increase result in the coming day..

      Then again, maybe they didn't :-)

      • The 10.08 was a 10.08% improvement over the original system. That's not exactly a minimal increase, and considering that the new leaders posted a 10.09% improvement over the original (0.0098% better than 10.08%) it's rather harsh to write off the 10.08% improvement as "pretty minimal".

      • Does Mighty Mouse come in time to save the day?

        Tune in next week, to see the Action-packed conclusion!

    • True, and if only their own interest counts, that would be a good choice.

      Things is, it's not good sportsmanship to "game the rules" that way.

    • by flynt ( 248848 )

      Basically impossible. The teams cannot compute their improvement. Netflix computes the improvement. The improvement is computed on a "secret" test dataset that only Netflix has access to. The models are developed on a public dataset available to everyone.

  • I'm actually surprised that this hasn't been done before. You can prove that using multiple models will on average produce better results than using any single model in isolation. For example, each netflix system will make different errors; using multiple systems will tend to average-out these errors and the consensus decision is most likely to be correct.
    • Re:Ensemble learning (Score:5, Informative)

      by Stile 65 ( 722451 ) on Sunday July 26, 2009 @09:58AM (#28827037) Homepage Journal

      Many teams actually combined multiple methods to get a better score. In fact, "BellKor's Pragmatic Chaos" is a combination of three teams, I'm guessing - BellKor, BigChaos and Pragmatic Theory.

      Also, it helps to remember that what's posted on the leaderboard is the result of the "quiz" set - half of the actual set of recommendations you're asked to make. The other half, the "test set," is used for final judging. With such a small difference between BellKor's Pragmatic Chaos and The Ensemble on the quiz set (.0001 RMSE), the test set rank may actually end up reversed.

    • Actually it is not about averaging out. It's about building a better classifier from many good ones. See Adaboost [wikipedia.org].

      • Well, you really want to think about bias/variance reductions which brings ideas of averaging and using better classifiers together. For example, "bagging" can be thought of as a variance-reduction technique; "boosting" does both if I recall.
  • I thought Vandelay was into manufacturing latex.

    • by frieko ( 855745 )
      They're thinking of quitting the exporting, and adding more import statements. And this is causing a problem, because, why not do both?
  • Any winner at all? (Score:5, Interesting)

    by Fnord666 ( 889225 ) on Sunday July 26, 2009 @10:42AM (#28827315) Journal
    My question is whether there will be any winner at all other than netflix? One of the rules for the competition was that you could not form multiple teams. This was to prevent people from gaining multiple submissions per day. Otherwise a five person group could create 30 teams and thus be able to submit 30 attempts per day. I believe both teams that have exceeded the 10% threshold and thus are eligible for the grand prize are composed of members from other teams and could be disqualified.
    • by ceoyoyo ( 59147 ) on Sunday July 26, 2009 @10:57AM (#28827447)

      Why would that disqualify them? The didn't form multiple teams, they did the opposite -- they started with multiple teams and then merged them into one, abandoning or deleting the old, multiple accounts.

      I suppose you could speculate that the teams weren't ever independent, but I think that's fairly obviously not the case.

  • by davidannis ( 939047 ) on Sunday July 26, 2009 @11:28AM (#28827713) Homepage
    They could improve the predictive value immensely if they allowed me and my wife to each rank the movies we watch together separately. With the current system, some movies are rated by just me, some by just her, and some have a consensus rating. It leads to a dataset full of garbage.
    • This brings up an interesting point. The Netflix algorithm is working from flawed/incomplete data generated from poor design decisions, so no matter how good the algorithm gets it still won't be able to accurately predict what movies will actually interest people based on a very subjective unidimensional rating. For example, the same person might rate a movie differently under differing conditions, and the rating itself may hinge entirely on one thing in the movie (s)he did(n't) like, whereas the movie mi
    • Re: (Score:2, Insightful)

      by Hawke666 ( 260367 )

      That'd be all your fault. You should be creating separate account profiles for yourself and your wife.

      • Yeah, I should totally jump through hoops to improve their ability to sell to me. Just because it would make the programmers lives easier :)

        No, if Netflix wants to sell more, they should follow up on that recommendation and make it very very easy to have multiple identities on a given account and a button on the page to switch them.

        The reason is that there is a difference between the information needs of the administration of purchases (tied to an account in a 1:1 relationship) and the information needs of

      • At least some of the ensemble modeling techniques handle this just fine. They will develop classifiers that detect your ratings, classifiers that detect her ratings and classifiers that detect your joint ratings. See the previous citation for adaboost at wikipedia. They do this by looking at error from a given classifier, and finding additional weak classifiers that address the error. So if your wife likes schwarzenegger movies, your liking for tear jerkers will show up as errors, and the algorithm will

    • Re: (Score:2, Insightful)

      by coaxial ( 28297 )

      Data sets like this are always have garbage. There's the jackass that rates everything 5 stars. There's the jackass that rates everything 1 star. There's the jackass that rates the worst movies by consensus 5 stars, and vis versa.

      There are 61,441,618 ratings by 478,548 unique users in the publicly available training set.

      It just doesn't matter.

  • According to the linked leaderboard it's 10.10% for Ensemble, and 10.09% for BellKor's Pragmatic Chaos.
  • by Baldrson ( 78598 ) * on Sunday July 26, 2009 @03:17PM (#28829417) Homepage Journal
    It's interesting that the fearmongering of the prior /. post about AI got hundreds of responses but this /. post, which is far more relevant to real AI, has gotten less than a hundred responses thus far. Anyway, congratulations to Netfilx for doing the right thing for their business in response to The Hutter Prize.
  • I think this is bringing us one step closer to EPIC [wikipedia.org] (video [albinoblacksheep.com]).
  • While I would appreciate some good movie recommendations, I can't help but feel a little creeped out that netflix may be able to read my mind one day....maybe I can make up a movie in my imagination and netflix can play it for me! ~Ami
    Chicago Web Design [transcendevelopment.com]

Never test for an error condition you don't know how to handle. -- Steinbach

Working...