Forgot your password?
typodupeerror
Programming Technology

Collaborative Filtering and the Rise of Ensembles 58

Posted by timothy
from the soon-there-will-be-symphonies dept.
igrigorik writes "First the Netflix challenge was won with the help of ensemble techniques, and now the GitHub challenge is over, and more than half of the top entries are also based on ensembles. Good knowledge of statistics, psychology and algorithms is still crucial, but the ensemble technique alone has the potential to make the collaborative filtering space a lot more, well, collaborative! Here's a look at the basic theory behind ensembles, how they shaped the results of the GitHub challenge, and how this pattern can be used in the future."
This discussion has been archived. No new comments can be posted.

Collaborative Filtering and Rise of Ensembles

Comments Filter:
  • by Anonymous Coward

    So can ensembles be used to create more sophisticated forms of direct democracy? That is, where everyone has input into decision-making, but where that input is vastly more complex than simple majority rule by the mob?

    FWIW, this is called open source governance [metagovernment.org].

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      Thing about gov is everyone insists on having equal voting power... and the whole reason ensembles work is due to unequal weights---maybe those who vote "better" than others are given more voting weight?

      • Re: (Score:3, Insightful)

        by sakdoctor (1087155)

        "better" would require a fitness function; and everyone thinks that they vote "best".

        If it were possible to define such a perfect function, we wouldn't really need voting anyway. We could just get a computer to crunch all the parameters and spit out a utopia.

        • by idontgno (624372)

          Donald Fagen [wikipedia.org], is that you?

          A just machine to make big decisions
          Programmed by fellows with compassion and vision
          Well be clean when their work is done
          Well be eternally free yes and eternally young

          -- I.G.Y. [donaldfagen.com]

        • by kthejoker (931838) on Tuesday September 01, 2009 @04:16PM (#29277679)

          "Fitness" doesn't always have to be related to the output; it can be related to the quality of a guessed input.

          Consider the corollary of a poll test: a model in which "trusted" voters receive extra votes while everyone else still gets on vote. You can determine "trustworthiness" (or "karma", if you will) the same way Slashdot does - through moderation and meta-moderation, or you can use a more objective "de minimis research" criteria (like a poll test but without the punishment for failure.)

          So someone voting on a school board bond election who can correctly answer questions about the stated usage of that bond, or the school district's financial bond rating, or who attends a school board meeting discussing the bond, could get 2 votes for the price of one.

          This would a) allow "passionate" (albeit informed) voters to have more of a say than someone who is indifferent, and b) encourage people to do research and get involved in politics.

          In a way, it's anti-democratic, but if you are going to insert any sort of elitism into the system, it might as well be a meritocracy.

          • Re: (Score:3, Interesting)

            by Korin43 (881732)
            Instead of trying to come up with a way to decide who gets extra votes, why not just go with transferable votes, where you can either vote on issues or tranfer your vote to another person (reversible of course). That way if I know that someone else is more informed than me, I can transfer my vote to them and not bother with with the details of politics. Each representative then has their vote, plus all of their constituents votes.

            It deals with the problems of underrepresentation (right now if 51% of peopl
            • Instead of trying to come up with a way to decide who gets extra votes, why not just go with transferable votes, where you can either vote on issues or tranfer your vote to another person (reversible of course).

              because a lot of people are lazy and/or greedy and would simply sell all thier votes to the highest bidder

              It deals with the problems of underrepresentation (right now if 51% of people voted for one person, and the other 49% voted for another, the 49 group is completely unrepresented

              Not really, if you consider a president normally has little power, most power is held in houses where everything is voted on, so the 49% get their representation, unfortunately people at some point decided strong governance is worth more than representation of individuals so most countries have retarded voting systems that take away quite a bit of this representation. currently most countries have fair

            • That way if I know that someone else is more informed than me, I can transfer my vote to them and not bother with with the details of politics. Each representative then has their vote, plus all of their constituents votes.
              It deals with the problems of underrepresentation (right now if 51% of people voted for one person, and the other 49% voted for another, the 49 group is completely unrepresented)

              How does it deal with that? The 49% aren't represented, whether they vote directly for the losing candidate or

          • Re: (Score:3, Insightful)

            by DragonWriter (970822)

            So someone voting on a school board bond election who can correctly answer questions about the stated usage of that bond, or the school district's financial bond rating, or who attends a school board meeting discussing the bond, could get 2 votes for the price of one.

            This would a) allow "passionate" (albeit informed) voters to have more of a say than someone who is indifferent, and b) encourage people to do research and get involved in politics.

            Giving everyone one vote already does that, without the ability

    • Not a chance. "The IQ of a mob is equal to that of its lowest member....divided by the size of the mob."

  • Group Labor (Score:2, Informative)

    by r6_jason (893331)
    Of course having a group of people working together is a strength. If you are having a bad day or just feel like slacking off some one else is there to pick up the slack and keep the project moving. See also Division of labour http://en.wikipedia.org/wiki/Division_of_labour [wikipedia.org]
    • Re:Group Labor (Score:5, Interesting)

      by Trepidity (597) <delirium-slashdot@@@hackish...org> on Tuesday September 01, 2009 @03:19PM (#29277103)

      Although that's true with humans, it's a bit curious why it'd be true with algorithms. After all, the aggregation of 3 algorithms is still just an algorithm. It's not even totally clear which algorithms are ensembles and which aren't--- some non-ensemble methods could be re-analyzed using ensemble terminology, and some ensemble methods could be rewritten as unified iterative loops that don't look very ensemble-y. The jury's still out on the whole subject, as far as I can tell (I'm not an ML person, but I'm an AI person whose research bleeds into ML).

      An exception is when you're aggregating information from truly different statistical problems, in which case you inherently have an ensemble problem, until someone comes up with the theory (plus tractable implementation) to view the problem as one unified statistical problem. I think collaborative filtering is currently in that stage--- there's no canonical way to pose the problem in the terminology of statistical regression/etc. that captures all aspects of it.

      • I think you're looking too deep into the pond on this one. The load bearing can be taken up with a few "if/elseif/else" calculations.
        • Re: (Score:1, Insightful)

          by Anonymous Coward

          No, I think his pond is just too deep for you.

          • Your inability to grasp concepts is apparently making mountains out of molehills. It's people like you who write unreadable wikipedia articles, written in pure jargon because they don't understand the topic well enough to form it in their own words -- so they have to borrow the words that were given to them to define it. In this case, here's the definition of what you think is the "deep pond":

            "For A's optimal conditions, use A"
            "For B's optimal conditions, use B"
            "For any other conditions, use C"

            I'm s
      • Well, it's firstly true with humans because you have always been able to do parallel processing with multiple humans.

        But it's perfectly applicable with algorithms, especially now that we have better parallel machines.

        But even so, you take a task where you have several heuristics, something like the ubiquitous quicksort but a little less reliable. If all of them have a 50/50 chance of working, you set all of them at the problem at once, and even on a single core, you end up with a fairly robust and accurate

      • by amplt1337 (707922)

        So in other words, it's Yet Another Buzzword?

    • by xdotx (966421)
      Untrue, a group of people is NOT necessarily a strength. Groups encourage slacking off and discourage taking responsibility. Or in short, http://en.wikipedia.org/wiki/Groupthink [wikipedia.org]
  • by Trepidity (597) <delirium-slashdot@@@hackish...org> on Tuesday September 01, 2009 @03:07PM (#29276997)

    There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems. But in the collaborative filtering case, they work well at least in part because there's not a canonical way of posing the problem statistically that's also tractable--- there are instead multiple ways to view the problem, which expose different information. Aggregating those views is a pretty straightforward way of getting more information.

    For example, you can see the Netflix prize as a few different standard statistical problems. As a per-movie regression, predicting what Person A will rate Movie B, given ratings vector of Person A and the ratings vectors of everyone who's already rated Movie B [the per-person ratings vectors excluding B are the X's, and the ratings on B are the Y's]. Or you slice the movie-ratings matrix the other way, with per-movie ratings vectors as the X's. Add in some other views (those are the two most straightforward), aggregate all the info you get from them, and you do better than any one approach alone.

    • by mbkennel (97636) on Tuesday September 01, 2009 @04:17PM (#29277695)

      "There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems."

      My opinion:

      1) It's a form of regularization & noise averaging. Different classes of estimators have different systematic errors and proper averaging almost always performs better than any single estimator. In more limited contexts, e.g. parametric estimators with variable numbers of free parameters (small compared to # of data points) this is well established in Bayesian contexts. It's like regularization in the sense that the averaging will exclude the howlers---occasional idiosyncratic screw-ups from any single estimator, phenomena that tend to happen with under-regularized estimators.

      2) "But in the collaborative filtering case, they work well at least in part because there's not a canonical way of posing the problem statistically that's also tractable--- there are instead multiple ways to view the problem, which expose different information. Aggregating those views is a pretty straightforward way of getting more information."

      If the thing to be predicted also includes the unknown parameter of "exactly how are the judges going to define the performance metric", then similarly averaging over different possibilities is a good risk-minimization strategy.

      In practice (knowing how people watch movies and what netflix cares about)the statistical setting of Netflix seems to be this:

      There is an unknown distribution of draw tuples of (movies, people, ratings). (in practice, "date-of-rating" and"date-of-movie" turn out to be additional useful data).

      You have observed a large number of these tuples already.

      Then, given a number of draws for (movies, person, ratings) where person is fixed, predict the the rating given (new_movie, same_person).

      The asymmetry is natural because movies and people are not interchangable: movies do not have opinions about people.

      I don't consider Netflix to be very ill-posed statistically.

      • That's not what ill-posed [wikipedia.org] usually means.

        In fact, the actual problem definition was very well thought out, and the performance metric is not at all dependent on the judges, but is precisely the least squares criterion.

        What makes Netflix ill-posed is that the solution is highly sensitive to small changes in the supplied data. This is empirically clear from the fact that people tried many, many different solution techniques (which use the data in different ways), and each of those doesn't get very close to

    • by NoOneInParticular (221808) on Tuesday September 01, 2009 @06:43PM (#29279275)

      There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems.

      Not sure if there's a lot of argument, as the bias-variance decomposition does seem to give some critical insight in why ensemble techniques work. Couldn't find a good link, but loosely every statistical prediction method will make errors due to bias (being systematicaly wrong, always in the same direction), and errors due to variance (being highly dependent on the fitness data, overfitted). Error methods such as sum-of-squares can be readily decomposed into a term for bias and a term for variance.

      Some methods will have errors mainly due to bias (linear methods). Others have error mainly due to variance (neural nets for instance). Whichever method has the lowest sum of the errors - bias + variance - wins. This holds for any prediction method, so also for an ensemble method. If you take for instance the ensemble method of bagging, you will use the same method on differently sampled data (Bootstrap AGGregated). This has as an effect that you average out all the error due to variance, and are left with the error due to bias. If you happen to use a low bias method (such as a neural network or a decision tree), this works fine.

      For the recommender challenges, something even niftier is happening. Here many different methods are aggregated, each with different biases and different variances. So, in theory, when doing this, one should be able to average out both bias and variance error, and converge on the (Bayesian) optimal predictor. Note however that averaging is in itself a prediction method, so ensemble methods have their own bias and variance tradeoff. Fortunately, computing the mean is an unbiased and low variance method in its own right.

      What I always found very interesting about ensemble methods is that they effectively contradict Occam's razor, in that the end result of an ensemble is not a single theory that predicts the data well, but a whole set of mutually contradicting theories that each hold some of the truth. The ensemble result might actually be huge, even when the system is simple.

      • What I always found very interesting about ensemble methods is that they effectively contradict Occam's razor, in that the end result of an ensemble is not a single theory that predicts the data well, but a whole set of mutually contradicting theories that each hold some of the truth. The ensemble result might actually be huge, even when the system is simple.

        Well put. We have some software that does predictive modeling/data mining using ensemble techniques. The ensemble models dependably work better, with

    • This idea that there are multiple ways of partially viewing the full problem is only a symptom that we don't yet know the *right* way to view the full problem.

      There's nothing special about ensemble methods that makes them better than a unified model. On the contrary, they are linear combinations of predictors, and the lack of unification makes this a rather inefficient idea.

      Compare with a series representation (eg Fourier, wavelets, etc): if each of the component predictors of your ensemble method is de

  • by reginaldo (1412879) on Tuesday September 01, 2009 @03:19PM (#29277101)
    One of the difficulties of ensemble development is weighting the logic that is being develeped. For instance, one of the problems we deal with at my job is matching incoming text to it's cleaned value. We have a list of approved words ['happy', 'sad', 'angry', 'sleepy'], and a text input of 'hap'. We need to determine which valid word 'hap' should match. Some rules I can think of for properly matching are:

    1.)Length of input compared to cleaned word.
    2.)Number of nonpositional letter matches.
    3.)Number of positional letter matches.

    Depending on how rules are weighted determines what the answer will be (either sad or happy). I know at my job this weighting process requires very careful politicking. :D
    • by kthejoker (931838)

      Serious question: do you guys have a software routine you use to account for fat-fingered typos? So do you weight the positional changes by how close a letter is to another on the keyboard?

      So for example,

      Leyboard

      would be a closer to match to "keyboard" than

      Reyboard

      because L is geospatially closer to K on the keyboard than R is?

      That would be an interesting program to see, anyway.

  • I have been a developer since the 80's. There are so many trendy cute buzzwords flying around (cloud computing...what the ****?), "mashups", "ensemble", etc. Can't we just call it what it is instead of this marketing crap? So tired of it.
    • No problem! Please inform us what year we should have stopped naming stuff and just stuck with the tried and true.

      Sorry, progress doesn't stop because you want it to or are getting a little long in the tooth to learn new things.

      • by amplt1337 (707922)

        The "progress" in terminology you're talking about is mostly the result of you being too young to have bothered to learn the old things.

        Which is fine. Way too much stuff has happened in the past for anybody to know it all; it's just unremarkable that everything old should be new again, under a new name, as ideas get born and reborn.

      • I understand the what the parent post is getting at. There are a lot of marketing crap names out there. I've just gotten used to telling them please explain what you are talking about and very often I find that is where vendor, peer employee, employee get's stuck. You find out later it's just a dynamic application using a database and webserver.
        But in this case "Ensemble" is not a new term, even by my old fart standards, it's been around since 1995 and describes the use of multiple modeling methods to
    • As much as I share your loathing for marketing, I don't believe this is a case of that. There is no company pushing this term, it's a community (or research?) -driven word. I'm not sure there IS a word for it yet except for ensemble. If you know otherwise, please let us know.

      I'll get off your lawn now.

      • Point taken. Utilizing the results of multiple models via bootstrap is nothing new. The term "ensemble" as applied just .... makes me have to deal with a new way which really annoys me. An ensemble as typically defined is a group of musicians working together in harmony. As applied in statistical modeling it is a process which selects the model which fits the data the best. Thus it it makes me irate as my understanding is in opposition with its classical use.
        • by Anonymous Coward

          A field is typically defined as a patch of grass with some kind of enclosure around it, often made of wood, stone or natural bushes.

          Please while you are on your campaign, have math, science and computing rename the concepts they refer to as fields.

        • by mbkennel (97636)

          As applied in statistical modeling it is a process which selects the model which fits the data the best. Thus it it makes me irate as my understanding is in opposition with its classical use.

          In fact it isn't. The classical practice is selecting the model which fits the data the best, known as "model selection".

          Ensemble methods do little "yes/no" model selection and instead make predictions from weighted combinations of many base predictors. To make a prediction, you need to score many or all of the base m

  • Machine learning ensembles sounds just like monte-carlo tree search (MCTS) techniques (also called UCT), which are used in computer go (and more and more other AI problems) with great success.

    The idea is that instead of trying to analyze a board position (which can be really, really difficult) using clever algorithms, you ask a random/simplistic algorithm to play out the rest of the game thousands upon thousands of times and see how many of those games it wins. The more it wins the better the positions.

    Soun

Nothing is more admirable than the fortitude with which millionaires tolerate the disadvantages of their wealth. -- Nero Wolfe

Working...