Follow Slashdot stories on Twitter


Forgot your password?

Can Machine Learning Replace Focus Groups? 93

itwbennett writes "In a blog post, Steve Hanov explains how 20 lines of code can outperform A/B testing. Using an example from one of his own sites, Hanov reports a green button outperformed orange and white buttons. Why don't people use this method? Because most don't understand or trust machine learning algorithms, mainstream tools don't support it, and maybe because bad design will sometimes win."
This discussion has been archived. No new comments can be posted.

Can Machine Learning Replace Focus Groups?

Comments Filter:
  • Translation (Score:5, Informative)

    by Anonymous Coward on Thursday May 31, 2012 @06:56PM (#40173949)

    So that you don't have to click through the slashvertisement, I have read TFA for you.

    Here is a summary: Let's say you have several different designs for a web interface that you want to test to find out which one works the best.

    One method is to have a "testing period" in which you randomly show each person one of the designs at random and identify how well it works for that person. Then, once you've shown 1,000 people each of the designs, you figure out which one is the best on average. Now the "testing period" is over, and the best design is shown to everyone from that point forward. That is the "old" method.

    The "new" method is to dispense with the testing period. Instead, you show the first person one design at random. If it works (e.g. they click on the ad), it gets bonus points. If it doesn't work, it gets a penalty. At any time, you show the design with the most points; if it is bad, it will lose points over time and eventually stop being shown.

    The goal of the "new" method is to hopefully avoid showing bad designs to 2000 people just to figure out which one is the best.

    If you care about the details then you should probably read the article. This summary is just an approximation for those who can't be bothered or who object to slashvertisements on principle.

  • Re:Translation (Score:4, Informative)

    by spazdor ( 902907 ) on Thursday May 31, 2012 @07:42PM (#40174409)

    The "new" method has the problem of immediately favoring the first design to get a positive response.

    No it doesn't. The designs are ranked according to what percentage of responses have been positive so far, not by the total number of positive responses. The first design to get a positive response will get shown more, and thus it will get more positive responses, and more negative responses.

  • Re:Translation (Score:5, Informative)

    by swillden ( 191260 ) <> on Thursday May 31, 2012 @08:59PM (#40175159) Homepage Journal

    No.... I'm suggesting that the algorithm presented above, which only ever displays the single highest scoring design, is biased against designs that haven't yet had a chance to be viewed by anybody, and thus have not had an opportunity to get a positive response, when people are already showing some favor towards others.

    What you're missing is the implied assumption that all of the options will fail most of the time, and that all options are initialized with maximum scores. The goal is to find the design that best motivates the user to take some action (e.g. click a link), and the assumption is that most of the time the user will not take that action. By starting all of the choices at a high value, they will all gradually converge downward to their true effectiveness rate, at which point the most effective will be chosen nearly all of the time. During the convergence process, the "leader" may change, but if the current leader isn't the true best, as it gets driven towards it's true rate, it will eventually dip under one of the others.

    If, by chance, a more effective option has a really bad run early on and gets pushed below the true effectiveness rate of another option, it would never recover -- which is why the author includes an occasional randomly-selected choice. If there is a large difference between the effectiveness of the options this is really unlikely to happen, but in the rare event it happens the randomization will eventually fix it. The author also covers a method of handling the case where the audience preferences drift over time, by including the ability to "forget" old input via simple exponential decay.

    The only really bad thing about this approach is that it assumes you don't have a lot of repeat visitors. If you do, they'll be annoyed by seeing different versions, apparently at random (from their perspective).

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson