Schooling Microsoft On Random Browser Selection 436
Rob Weir got wind that a Slovakian tech site had been discussing the non-randomness of Microsoft's intended-to-be-random browser choice screen, which went into effect on European Windows 7 systems last week. He did some testing and found that indeed the order in which the five browser choices appear on the selection screen is far from random — though probably not intentionally slanted. He then proceeds to give Microsoft a lesson in random-shuffle algorithms. "This computational problem has been known since the earliest days of computing. There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach. But I do not believe there is some nefarious intent to this bug. It is more in the nature of a 'naive algorithm,' like the bubble sort, that inexperienced programmers inevitably will fall upon when solving a given problem. I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake. But with education and experience, one learns about these things. And one of the things one learns early on is to reach for Knuth. ... The lesson here is that getting randomness on a computer cannot be left to chance. You cannot just throw Math.random() at a problem and stir the pot and expect good results."
Re:Good enough (Score:4, Informative)
Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK
This problem has nothing to do with how the PRNG is seeded.
The word "seed" doesn't even appear in TFA at all.
Re:What's the problem? (Score:1, Informative)
You are missing the point.
As the author of the article pointed out, this technique can cause an infinite loop.
Re:What? Why not? (Score:5, Informative)
What the problem actually seems to be is not that they're using random, but how they're using it.
What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead. qsort() or whatever they used was designed to have results that are consistent, and with input like that it could potentially abort and leave the list entirely unsorted. Or if you're really unlucky it could sit there sorting them forever.
Re:damned faintly praising? (Score:5, Informative)
Is picking a worse random number generation function (the default one in C and JS) really fucking up?
And btw, it looks like their choice promotes all other browsers than IE almost 2x more!
Position I.E. Firefox Opera Chrome Safari
1 1304 2099 2132 2595 1870
2 1325 2161 2036 2565 1913
3 1105 2244 1374 3679 1598
4 1232 2248 1916 590 4014
5 5034 1248 2542 571 605
I can already see all the comments how MS would be favoring IE with this (summary conveniently left that one out), but as it is they're promoting the other browsers almost double more.
I have no problem with this at all (Score:2, Informative)
The only people where the selection and any possible bias inherent in the implementation of the random() function are the last group, which is also quite possibly the smallest of the three sets. In an ideal world, you are going to get a bellcurve with the optimum position being in the middle and the least favourable at the sides, so if Microsoft wants to implement random() so that IE is more likely to end up in the position that is joint least likely to be picked of the big five browsers, that's fine by me.
It's also apparently fine for Google's Chrome as well, so if you are anywhere near Ballmer's office in the next few days then I'd keep an eye out for any flying chairs, just in case...
Re:damned faintly praising? (Score:3, Informative)
Re:Milliseconds (Score:3, Informative)
It's not an issue of how to get a truly random number, or of seeding a random number generator, but rather how to you use a source of random numbers to randomly order a list.
Some "reasonable sounding" methods don't actually work - e.g. attach random numbers to each list item and sort the list by these numbers (1). Microsoft used a similar method of sorting using a random comparator.
Some simple methods that DO work are picking a random permutation or executing a bunch of random swap operations on the list.
(1) http://okmij.org/ftp/Haskell/perfect-shuffle.txt [okmij.org]
Re:What? Why not? (Score:2, Informative)
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better?
I can't speak for Javascript but in most C libraries - at least in the DOS era when I last made this mistake - the library would default to a static seed of its own during program initialisation, probably 0 or something.
As a result, if you made a program that started, output 15 random numbers and then quit, each time you ran it you would get the same 15 numbers. :P
What you're supposed to do to avoid that is seed the RNG at the program start, using the current time since the epoch as the seed. This is what I was referring to, not using a static seed
Re:damned faintly praising? (Score:3, Informative)
Yes, indeed, you've hit the nail on the head.
A very sloppy shallow unworkmanlike solution to a very high profile legal issue.
Almost as clever as the 'messing up pages shown to Opera' issue, though I suspect that that was deliberate and this not.
Rgds
Damon
Re:damned faintly praising? (Score:1, Informative)
Really? For example, hearing the same artist twice within some period of time is a situation where the birthday paradox applies. Even with a very large number of songs your brain will probably find *some* pattern, and you'll be an idiot and say the shuffle wasn't truly random. People experience a shuffle algorithm that avoids artists that it has recently played to be more random than a truly random shuffle.
Re:The top hit on Google... (Score:2, Informative)
On the other hand, searching for 'javascript array shuffle' returns the correct solution in almost every hit in the first page in Google (wrong answer comes on top in Bing, but the second hit is correct).
Re:damned faintly praising? (Score:5, Informative)
The relevant code is in their Javascript: [browserchoice.eu]
aBrowserOrderTop5.sort(RandomSort); (they repeat this twice for some reason) ...
function RandomSort (a,b)
{
return (0.5 - Math.random());
}
This takes the browser's built-in sorting function, tells it to sort by an essentially random criteria, and hopes that it all works out. Unfortunately, this is highly dependent on the implementation of the built-in sort function, and that's up to the browser designer to create. The only constraint on the structure of sort is that it must successfully order comprehensible data, which does not mean that it will properly randomize data when provided. Essentially, they overloaded a black-box function that wasn't designed for randomization in the hopes that it would work.
For an instance of why this wouldn't work, consider the case of the last item. Say that you're sorting a list of 5 letters. Now say that you're most of the way through the list, having properly sorted the first 4 letters into "A, B, D, E", with just the 5th letter C left. So you step through.
Does C come before E? Yes.
Does C come before D? Yes.
Does C come before B? No.
C must go between B and D, and the list will look like "A , B , C , D , E." It will be sorted correctly every time.
Now let's throw that randomization into the middle there. Let's start again with the list, though since we're randomizing let's call them item 1, 2, 3, and 4. If we're properly randomly sorting the last item 5 into the list, it should have equal chances of showing up everywhere. But remember, we're still using the sorting algorithm from above, we're just flipping a coin at each question instead of actually comparing. So what we get is:
Does 5 come before 4? 50% yes, or stop
Does 5 come before 3? 50% yes, or stop
Does 5 come before 2? 50% yes, or stop
etc. But because it's iterative, those 50% chances stack. You only get to the second question half of the time, so you only get to the third question half of that half of the time. And essentially what you wind up with is a % chance of the last number being sorted into each of the slots as: 3%, 6%, 12%, 25%, 50%. This is obviously not a random distribution curve.
This is not necessarily the sort algorithm that Microsoft uses in I.E. (The 50% chance of staying as the last element is a bit suspicious, though, as is running their code twice). But it does point out unequivocably that you can't overload an off-the-shelf sort algorithm with a randomizing comparator and expect the outcome to follow a genuinely random distribution curve. They really ought to have an in-house random sort algorithm that their developers can pull from.
(Thanks to another poster for finding the first google hit [slashdot.org] that describes this method.).
Re:damned faintly praising? (Score:3, Informative)
The default rand(3) in C doesn't have to suck -- it doesn't on Linux or the BSDs. I see no evidence that it does in *any* major systems today, though apparently it did back in the 1980s when it got that reputation.
Re:Malice? (Score:3, Informative)
It starts off as a list dumbass, the browsers to sort have to come from somewhere.
Would you be upset if the first item on the list were FireFox, so FF popped up first momentarily?
Seriously, get a life.
Re:damned faintly praising? (Score:4, Informative)
Re:Bad Article, Bad Summary (Score:1, Informative)
Statisticians tend to draw a distinction between "random" and "stochastic". Random means from a uniform distribution - every possibility is as likely as every other. Stochastic just means drawn from a distribution.