Forgot your password?
typodupeerror
Internet Explorer Microsoft Programming

Schooling Microsoft On Random Browser Selection 436

Posted by kdawson
from the let-me-show-it-you dept.
Rob Weir got wind that a Slovakian tech site had been discussing the non-randomness of Microsoft's intended-to-be-random browser choice screen, which went into effect on European Windows 7 systems last week. He did some testing and found that indeed the order in which the five browser choices appear on the selection screen is far from random — though probably not intentionally slanted. He then proceeds to give Microsoft a lesson in random-shuffle algorithms. "This computational problem has been known since the earliest days of computing. There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach. But I do not believe there is some nefarious intent to this bug. It is more in the nature of a 'naive algorithm,' like the bubble sort, that inexperienced programmers inevitably will fall upon when solving a given problem. I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake. But with education and experience, one learns about these things. And one of the things one learns early on is to reach for Knuth. ... The lesson here is that getting randomness on a computer cannot be left to chance. You cannot just throw Math.random() at a problem and stir the pot and expect good results."
This discussion has been archived. No new comments can be posted.

Schooling Microsoft On Random Browser Selection

Comments Filter:
  • Re:Good enough (Score:4, Informative)

    by EvanED (569694) <evaned@nOspAm.gmail.com> on Sunday February 28, 2010 @03:53PM (#31308242)

    Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK

    This problem has nothing to do with how the PRNG is seeded.

    The word "seed" doesn't even appear in TFA at all.

  • by Anonymous Coward on Sunday February 28, 2010 @03:54PM (#31308260)

    You are missing the point.

    As the author of the article pointed out, this technique can cause an infinite loop.

  • Re:What? Why not? (Score:5, Informative)

    by Tapewolf (1639955) on Sunday February 28, 2010 @03:55PM (#31308264)
    I thought it was originally going to be that they forgot to seed the random number generator or something.

    What the problem actually seems to be is not that they're using random, but how they're using it.

    What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead. qsort() or whatever they used was designed to have results that are consistent, and with input like that it could potentially abort and leave the list entirely unsorted. Or if you're really unlucky it could sit there sorting them forever.

  • by sopssa (1498795) * <sopssa@email.com> on Sunday February 28, 2010 @04:02PM (#31308326) Journal

    Is picking a worse random number generation function (the default one in C and JS) really fucking up?

    And btw, it looks like their choice promotes all other browsers than IE almost 2x more!

    Position I.E. Firefox Opera Chrome Safari
    1 1304 2099 2132 2595 1870
    2 1325 2161 2036 2565 1913
    3 1105 2244 1374 3679 1598
    4 1232 2248 1916 590 4014
    5 5034 1248 2542 571 605

    I can already see all the comments how MS would be favoring IE with this (summary conveniently left that one out), but as it is they're promoting the other browsers almost double more.

  • by Zocalo (252965) on Sunday February 28, 2010 @04:06PM (#31308348) Homepage
    Frankly, I have no problem with this result. Quite the opposite in fact, since I think you can probably group the users into three sets, only one of which really matters in connection with the Browser Selection Screen:
    1. Those that equate "Internet" with a blue "e" and as such will pick IE regardless of its position
    2. Those that prefer another browser to IE and will pick another option regardless of positioning
    3. Those that have no clue about browsers and pick essentially at random, or belong to one of the above groups and click in error.

    The only people where the selection and any possible bias inherent in the implementation of the random() function are the last group, which is also quite possibly the smallest of the three sets. In an ideal world, you are going to get a bellcurve with the optimum position being in the middle and the least favourable at the sides, so if Microsoft wants to implement random() so that IE is more likely to end up in the position that is joint least likely to be picked of the big five browsers, that's fine by me.

    It's also apparently fine for Google's Chrome as well, so if you are anywhere near Ballmer's office in the next few days then I'd keep an eye out for any flying chairs, just in case...

  • by GIL_Dude (850471) on Sunday February 28, 2010 @04:08PM (#31308362) Homepage
    Right; the point of "random" was just that Microsoft didn't PICK the order that the browsers would show up in. For the business purpose required, the solutions selected was adequate and met the requirement. This guy is talking about trying to get highly random stuff like you might need in encryption or quantum physics simulations. While interesting, those higher randomness generators aren't required by the need here.
  • Re:Milliseconds (Score:3, Informative)

    by SpinyNorman (33776) on Sunday February 28, 2010 @04:18PM (#31308444)

    It's not an issue of how to get a truly random number, or of seeding a random number generator, but rather how to you use a source of random numbers to randomly order a list.

    Some "reasonable sounding" methods don't actually work - e.g. attach random numbers to each list item and sort the list by these numbers (1). Microsoft used a similar method of sorting using a random comparator.

    Some simple methods that DO work are picking a random permutation or executing a bunch of random swap operations on the list.

    (1) http://okmij.org/ftp/Haskell/perfect-shuffle.txt [okmij.org]

  • Re:What? Why not? (Score:2, Informative)

    by Tapewolf (1639955) on Sunday February 28, 2010 @04:28PM (#31308506)

    Out of curiosity - why would they need to seed the random number generator?

    Wouldn't -not- seeding it be better?

    I can't speak for Javascript but in most C libraries - at least in the DOS era when I last made this mistake - the library would default to a static seed of its own during program initialisation, probably 0 or something.

    As a result, if you made a program that started, output 15 random numbers and then quit, each time you ran it you would get the same 15 numbers.
    What you're supposed to do to avoid that is seed the RNG at the program start, using the current time since the epoch as the seed. This is what I was referring to, not using a static seed :P

  • by DamonHD (794830) <d@hd.org> on Sunday February 28, 2010 @04:45PM (#31308636) Homepage

    Yes, indeed, you've hit the nail on the head.

    A very sloppy shallow unworkmanlike solution to a very high profile legal issue.

    Almost as clever as the 'messing up pages shown to Opera' issue, though I suspect that that was deliberate and this not.

    Rgds

    Damon

  • by Anonymous Coward on Sunday February 28, 2010 @05:44PM (#31309086)

    Really? For example, hearing the same artist twice within some period of time is a situation where the birthday paradox applies. Even with a very large number of songs your brain will probably find *some* pattern, and you'll be an idiot and say the shuffle wasn't truly random. People experience a shuffle algorithm that avoids artists that it has recently played to be more random than a truly random shuffle.

  • by Anonymous Coward on Sunday February 28, 2010 @05:49PM (#31309134)

    On the other hand, searching for 'javascript array shuffle' returns the correct solution in almost every hit in the first page in Google (wrong answer comes on top in Bing, but the second hit is correct).

  • by cgenman (325138) on Sunday February 28, 2010 @06:12PM (#31309324) Homepage

    The relevant code is in their Javascript: [browserchoice.eu]

    aBrowserOrderTop5.sort(RandomSort); (they repeat this twice for some reason) ...
    function RandomSort (a,b)
    {
            return (0.5 - Math.random());
    }

    This takes the browser's built-in sorting function, tells it to sort by an essentially random criteria, and hopes that it all works out. Unfortunately, this is highly dependent on the implementation of the built-in sort function, and that's up to the browser designer to create. The only constraint on the structure of sort is that it must successfully order comprehensible data, which does not mean that it will properly randomize data when provided. Essentially, they overloaded a black-box function that wasn't designed for randomization in the hopes that it would work.

    For an instance of why this wouldn't work, consider the case of the last item. Say that you're sorting a list of 5 letters. Now say that you're most of the way through the list, having properly sorted the first 4 letters into "A, B, D, E", with just the 5th letter C left. So you step through.
    Does C come before E? Yes.
    Does C come before D? Yes.
    Does C come before B? No.
    C must go between B and D, and the list will look like "A , B , C , D , E." It will be sorted correctly every time.

    Now let's throw that randomization into the middle there. Let's start again with the list, though since we're randomizing let's call them item 1, 2, 3, and 4. If we're properly randomly sorting the last item 5 into the list, it should have equal chances of showing up everywhere. But remember, we're still using the sorting algorithm from above, we're just flipping a coin at each question instead of actually comparing. So what we get is:

    Does 5 come before 4? 50% yes, or stop
    Does 5 come before 3? 50% yes, or stop
    Does 5 come before 2? 50% yes, or stop

    etc. But because it's iterative, those 50% chances stack. You only get to the second question half of the time, so you only get to the third question half of that half of the time. And essentially what you wind up with is a % chance of the last number being sorted into each of the slots as: 3%, 6%, 12%, 25%, 50%. This is obviously not a random distribution curve.

    This is not necessarily the sort algorithm that Microsoft uses in I.E. (The 50% chance of staying as the last element is a bit suspicious, though, as is running their code twice). But it does point out unequivocably that you can't overload an off-the-shelf sort algorithm with a randomizing comparator and expect the outcome to follow a genuinely random distribution curve. They really ought to have an in-house random sort algorithm that their developers can pull from.

    (Thanks to another poster for finding the first google hit [slashdot.org] that describes this method.).

  • by jgrahn (181062) on Sunday February 28, 2010 @08:01PM (#31310244)

    Is picking a worse random number generation function (the default one in C and JS) really fucking up?

    The default rand(3) in C doesn't have to suck -- it doesn't on Linux or the BSDs. I see no evidence that it does in *any* major systems today, though apparently it did back in the 1980s when it got that reputation.

  • Re:Malice? (Score:3, Informative)

    by Bigjeff5 (1143585) on Sunday February 28, 2010 @08:49PM (#31310592)

    It starts off as a list dumbass, the browsers to sort have to come from somewhere.

    Would you be upset if the first item on the list were FireFox, so FF popped up first momentarily?

    Seriously, get a life.

  • by asaz989 (901134) on Sunday February 28, 2010 @09:40PM (#31310904)
    Are you running Firefox? One of the things that the article points out is that the specific type of non-randomness that sort gives in this case is implementation-dependent (meaning browser-dependent). IE being pushed to the end is what happens in the Internet Explorer implementation of Javascript; the version of Firefox that he tested disproportionately pushes IE to the front, and presumably other browsers would give a different distribution.
  • by skelterjohn (1389343) on Sunday February 28, 2010 @10:30PM (#31311288)

    Statisticians tend to draw a distinction between "random" and "stochastic". Random means from a uniform distribution - every possibility is as likely as every other. Stochastic just means drawn from a distribution.

If you are good, you will be assigned all the work. If you are real good, you will get out of it.

Working...