


Schooling Microsoft On Random Browser Selection 436
Rob Weir got wind that a Slovakian tech site had been discussing the non-randomness of Microsoft's intended-to-be-random browser choice screen, which went into effect on European Windows 7 systems last week. He did some testing and found that indeed the order in which the five browser choices appear on the selection screen is far from random — though probably not intentionally slanted. He then proceeds to give Microsoft a lesson in random-shuffle algorithms. "This computational problem has been known since the earliest days of computing. There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach. But I do not believe there is some nefarious intent to this bug. It is more in the nature of a 'naive algorithm,' like the bubble sort, that inexperienced programmers inevitably will fall upon when solving a given problem. I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake. But with education and experience, one learns about these things. And one of the things one learns early on is to reach for Knuth. ... The lesson here is that getting randomness on a computer cannot be left to chance. You cannot just throw Math.random() at a problem and stir the pot and expect good results."
LAST (Score:5, Funny)
Hmm, there's a nice shuffle implementation in Java that Microsoft could use... Oh, wait...
Rgds
Damon
Bad Article, Bad Summary (Score:5, Interesting)
Both the article and the summary mixes up the concepts. Randomness and bias are related but different things. Think of a biased coin loaded in favor of heads - the heads may appear twice as often as the tails, but the distribution is still random. Here too, contrary to the summary's claim of "far from random", the results are random, just biased, and biased against IE, if I may add, which is an important fact the summary omitted.
Re:damned faintly praising? (Score:5, Informative)
Is picking a worse random number generation function (the default one in C and JS) really fucking up?
And btw, it looks like their choice promotes all other browsers than IE almost 2x more!
Position I.E. Firefox Opera Chrome Safari
1 1304 2099 2132 2595 1870
2 1325 2161 2036 2565 1913
3 1105 2244 1374 3679 1598
4 1232 2248 1916 590 4014
5 5034 1248 2542 571 605
I can already see all the comments how MS would be favoring IE with this (summary conveniently left that one out), but as it is they're promoting the other browsers almost double more.
Re:damned faintly praising? (Score:3, Informative)
Re:damned faintly praising? (Score:5, Insightful)
No, the point was that no one browser got unfairly pushed to the top all the time. This algorithm does push a certain browser higher more often than not, and hence is not fit for it's job.
Re:damned faintly praising? (Score:5, Insightful)
If you had bothered to read the article, you'd see that the author has done JUST that. Not only did he prove (using proper statistical methods) that the results are significantly not random, he also dug up the exact javascript source code that does the shuffling and explained why it is faulty. RTFA!
Re:damned faintly praising? (Score:5, Insightful)
Even with a very high quality entropy source, the algorithm Microsoft used will result in a very non-uniform distribution.
Clearly, Microsoft didn't care about this enough to assign one of their experienced coders to it, which is odd given the legal involvement. Either the technical side of MS ignored the legal department's explanation of the importance of the browser ballot to MS's ability to do business on a particularly profitable continent, or someone powerful in MS decided to spite the EU by assigning low quality programmers to the project.
Re:damned faintly praising? (Score:3, Informative)
Yes, indeed, you've hit the nail on the head.
A very sloppy shallow unworkmanlike solution to a very high profile legal issue.
Almost as clever as the 'messing up pages shown to Opera' issue, though I suspect that that was deliberate and this not.
Rgds
Damon
Re:damned faintly praising? (Score:4, Insightful)
If someone on my team returned that piece of code and insisted that it met the requirements, I would find another team member. A random shuffle is supposed to give ballpark equal positions. This algorithm gave Internet Explorer the rightmost position in the list a full %50 of the time. It's not like he's complaining that the algorithm be up to encryption grade randomness, but rather that it fails even the human eyeball test. %10 statistical variation? Sure, whatever. But getting a particular slot a full %250 more than you should, when you're ordered by the court to make something random? That's really poor coding.
And the sad thing is, with just FIVE things to sort and no real pressure for speed or RAM, there is no reason why it should be this poor. There is essentially unlimited computing power and RAM, and it fails to produce even casually random results. It's just an inexperienced coder and an inexperienced team making freshman mistakes. Considering this was part of an EU directive, I would have expected at least a few higher level eyeballs would have caught this.
Re:damned faintly praising? (Score:5, Informative)
The relevant code is in their Javascript: [browserchoice.eu]
aBrowserOrderTop5.sort(RandomSort); (they repeat this twice for some reason) ...
function RandomSort (a,b)
{
return (0.5 - Math.random());
}
This takes the browser's built-in sorting function, tells it to sort by an essentially random criteria, and hopes that it all works out. Unfortunately, this is highly dependent on the implementation of the built-in sort function, and that's up to the browser designer to create. The only constraint on the structure of sort is that it must successfully order comprehensible data, which does not mean that it will properly randomize data when provided. Essentially, they overloaded a black-box function that wasn't designed for randomization in the hopes that it would work.
For an instance of why this wouldn't work, consider the case of the last item. Say that you're sorting a list of 5 letters. Now say that you're most of the way through the list, having properly sorted the first 4 letters into "A, B, D, E", with just the 5th letter C left. So you step through.
Does C come before E? Yes.
Does C come before D? Yes.
Does C come before B? No.
C must go between B and D, and the list will look like "A , B , C , D , E." It will be sorted correctly every time.
Now let's throw that randomization into the middle there. Let's start again with the list, though since we're randomizing let's call them item 1, 2, 3, and 4. If we're properly randomly sorting the last item 5 into the list, it should have equal chances of showing up everywhere. But remember, we're still using the sorting algorithm from above, we're just flipping a coin at each question instead of actually comparing. So what we get is:
Does 5 come before 4? 50% yes, or stop
Does 5 come before 3? 50% yes, or stop
Does 5 come before 2? 50% yes, or stop
etc. But because it's iterative, those 50% chances stack. You only get to the second question half of the time, so you only get to the third question half of that half of the time. And essentially what you wind up with is a % chance of the last number being sorted into each of the slots as: 3%, 6%, 12%, 25%, 50%. This is obviously not a random distribution curve.
This is not necessarily the sort algorithm that Microsoft uses in I.E. (The 50% chance of staying as the last element is a bit suspicious, though, as is running their code twice). But it does point out unequivocably that you can't overload an off-the-shelf sort algorithm with a randomizing comparator and expect the outcome to follow a genuinely random distribution curve. They really ought to have an in-house random sort algorithm that their developers can pull from.
(Thanks to another poster for finding the first google hit [slashdot.org] that describes this method.).
Re:damned faintly praising? (Score:3, Insightful)
This is obviously not a random distribution curve.
I believe you meant to say uniform rather than random.
Re:damned faintly praising? (Score:4, Informative)
Re:damned faintly praising? (Score:2)
Did you actually read TFA - at least the raw data at the TOP of the article ?!
Microsoft IE was in 5th place (out of 5) 50%, as opposed to a random 20%, of the time !!
Hard to say whether that's in their favor or not ... last place is just as special a position as 1st place.
Re:damned faintly praising? (Score:4, Insightful)
Is picking a worse random number generation function (the default one in C and JS) really fucking up?
There's no problem with the function they're using; the problem is how they're using it. If 'rand()' were perfect, their technique would still suck.
I can already see all the comments how MS would be favoring IE with this (summary conveniently left that one out), but as it is they're promoting the other browsers almost double more.
I do think the summary should have mentioned that bias, but I don't think it's quite as good a position as you convey. I bet the far right position is better than #3 and #4 at least.
(If I wanted to put on my conspiracy hat -- which I don't, I don't really believe this -- I'd say that MS wanted to bias it towards them and decided that biasing it toward #1 would be too blatant, but that #5 was "good enough".)
Re:damned faintly praising? (Score:3, Informative)
The default rand(3) in C doesn't have to suck -- it doesn't on Linux or the BSDs. I see no evidence that it does in *any* major systems today, though apparently it did back in the 1980s when it got that reputation.
Re:damned faintly praising? (Score:2)
Hah! The mp3 shuffle thing isn't a case of poor randomisation - humans just tend to pick a poor definition of random here. Usually people expect "play stuff I haven't heard recently, and in a completely new order that I haven't heard before". True random shuffle will give you songs and orders you've already heard, and your brain is good at picking them up.
Re:damned faintly praising? (Score:2)
Re:damned faintly praising? (Score:3, Insightful)
True random shuffle will give you songs and orders you've already heard --- just as likely as any other song and order combination.
Yes, but people forget most of the sequence... they just notice the times when it is the same artist in a row. Thus, the part of the elections evaluated when thinking "this isn't random" is extremely biased. Humans are good at seeing patterns.
Adding randomness... (Score:4, Funny)
pickabrowser() {
if (rand()>0.05) {
use IE
} else {
pickabrowser()
}
}
Nobody said anything about bias.
Re:Adding randomness... (Score:2)
I think that the EU *might* take a view about that algorithm... %-P
Rgds
Damon
Re:Adding randomness... (Score:5, Funny)
What's the problem? (Score:3, Insightful)
What's the problem? It's random enough for a browser selection screen.
This isn't an application where a statistically random shuffle is required.
Re:What's the problem? (Score:2, Funny)
Re:What's the problem? (Score:2)
There's no such thing. That number cannot be represented in IEEE binary floating point.
He's just bitching (Score:4, Insightful)
It is probably a combination of two things:
1) Hate for MS. MS is doing what some have said they've needed to do in giving users browser choice, and they've done so as to try not to promote any given one. While that makes proponents of choice happy, it makes MS haters mad. The more MS does to try and accommodate users and play fair, the less there is to hate on them for legitimately. As such haters are going to try and find nit picks to bitch about.
2) General geek pedantry. Many geeks seem to love to be exceedingly pedantic about every little thing. If a definition isn't 100% perfect, at least in their mind, they jump all over it. I think it is a "Look at how smart I am!" kind of move. They want to show that they noticed that it wasn't 100% perfect and thus show how clever they are.
Doesn't matter, it is what it is and as you said, random enough. This guy can whine all he likes.
Re:He's just bitching (Score:5, Insightful)
Re:He's just bitching (Score:3, Insightful)
one would think that a company such as Microsoft that has been owning the software market for decades now would know how to implement a randomizing algorithm correctly.
Wrong: a software company such as Microsoft that has been owning the software market for decades now knows how to use programmer time and resources effectively. Spending the extra programmer time and effort to turn a "99.99% random" process into a "100% random" one is an utter waste of both on something this trivial. Hate to break it to you, and to look-at-me-I'm-so-much-smarter-than-evil-Microsoft Rob Weir, but they're not making any mistake here.
Re:He's just bitching (Score:2)
To be fair, this is more like 60% random, not 99.99%.
Re:He's just bitching (Score:4, Interesting)
I don't know what you consider "99.99% random", but the difference between 20% (probability of IE turning up last in a real random shuffle) and ca. 50% (probability of IE showing up last in the implemented "random shuffle") is certainly significant enough that you can't call it 99.99% random." You might argue that it is "random enough for this," but that's of course a matter of opinion, and therefore debatable (there's no objective definition of "random enough").
Re:He's just bitching (Score:2)
Of course there is: "sufficiently random for the task in question". It's objective, but useless, because it depends on "sufficient" and "the task in question" to be specified, and neither are.
Re:He's just bitching (Score:3, Insightful)
It is in the second-most (or most, depending on circumstances) desirable position, because users pay a disproportionate amount of attention to visible ends of a list, particularly a horizontal list. The least desirable position is any middle position on the second screen. This is a consequence of the simplicity of a horizontal list, and user attention shifts accordingly as content grows in height (reading across, down, across, down).
Doubtful. The mere fact that the order places certain items in certain positions a disproportionate amount of the time would raise considerable doubt that Microsoft acted in good faith. This would be sufficient reason to introduce user test data which would demonstrate that the last position is not the least desirable.
Re:He's just bitching (Score:3, Insightful)
They made the stupid mistake of not assigning an experienced or well-educated programmer to a project that was necessary for them to legally do business in Europe. Somewhere along the way, the legal department had to have sent the technical managers an email containing a phrase very similar to "don't fuck this up!!", and the manager ignored it and assigned a programmer who didn't go to a good CS school and thus has never heard of a Fisher-Yates shuffle or something equivalent.
It's very understandable that some of Microsoft's programmers are of such low quality. What is odd is that their legal department can't make their technical managers understand "do this right or we lose the right to do business on the second most profitable continent."
Re:He's just bitching (Score:5, Funny)
> ...the manager ignored it...
Or he decided that it was so important that he had to do it himself.
Re:He's just bitching (Score:2)
Re:He's just bitching (Score:2)
Agreed. The blogger's article is a bit pointless: he can't show a bias in favor of any one browser; he only shows that there is a non-random distribution. Perhaps with a bit more analysis he could have determined if any of the browsers in fact did enjoyed a bias. That would have been interesting, but as it stands, the article is little more than supercilious pedantry.
Re:He's just bitching (Score:5, Insightful)
While in Microsoft's native browser (which would happen the first time), Internet Explorer is given a full %64 chance of receiving one of the coveted 2 edge positions. Considering that antitrust courts were involved in the creation of this screen, you'd think that getting "random" right would be a development priority, especially considering it should have taken a competent programmer exactly the same amount of time to do it right as to do it wrong. If this takes even one hour of lawyer time to ponder, it would have been much cheaper to send the programmer back to fix it.
A 50% chance of getting a particular slot that should be %20 is not "99.99% random." It's just wrong. And when you're talking about the cost of antitrust regulation, it's really, really wrong.
I'm glad this is being brought up on Slashdot. There is a lot of misunderstanding about how to create randomness in systems. Even on a basic level, people frequently ask for "random" when they actually want jukebox random. In this case, though, it just seems like a basic misunderstanding of statistics, which is not surprising given the moderate code complexity and likelihood this screen was given to an intern or jr programmer.
Re:He's just bitching (Score:5, Interesting)
Re:He's just bitching (Score:3, Insightful)
And as the article says, if you use statistics to determine if the program is random, then the answer will be ... not random. So please, don't call this random, and if you do, please, let me know which software you work on, so I can avoid it.
I agree that all this does not matter for the ballot screen. But at least on slashdot we can expect some higer standards than a
return (0.5 - Math.random());
comparison function.
Re:He's just bitching (Score:5, Insightful)
Re:He's just bitching (Score:3, Funny)
On the other hand, the devil is in the details, and one would think that a company such as Microsoft that has been owning the software market for decades now would know how to implement a randomizing algorithm correctly.
Sure!
10 RANDOMIZE TIMER
20 PRINT INT(RND * 5)
30 GOTO 20
Re:He's just bitching (Score:2, Insightful)
It's paranoia and naiveté like yours that led me to stop hanging around here so much.
Paranoia in that everything company X does is evil or has an inappropriate or immoral ulterior motive. Naiveté in that you don't stop to recognize that the not all of the developers who work for an institution are going to output code of the caliber of its most senior, experienced and/or knowledgeable developers, nor can code review and automated tests catch all of the problems and gotchas known to computer science, academia and the body of professional programmers.
So can the "the devil is in the details" crap; you don't know what you're talking about. Building a complex software package that takes into account every possible detail in both process and implementation is impossible in any environment currently available for consumer software and general computing hardware. Just when you think you've got everything covered, nature builds a vendor builds a buggy component, security specialists discover a flaw in the way you learned to write your software, nature builds a better idiot, or a piece of a radioactive isotope in a memory module emits a beta particle, just to ruin your day.
Re:He's just bitching (Score:3, Insightful)
Re:He's just bitching (Score:2)
I'm not convinced it doesn't benefit them. I'm betting that eye-tracking tests would show a disproportionate amount of attention to the last position, at least over the middle positions but possibly over the first as well. This hunch is founded on a tendency in UI design to place useful things (eg search and navigation) on the right and an expectation that users have grown accustomed to this.
Re:He's just bitching (Score:2)
First of all, let me say that I fall in on Microsoft's side on this one - So they didn't use a shuffle that would pass muster for a licensed video poker system - So what? Totally irrelevant, they satisfied the obligation to randomize the list, end of story.
However, as to your comment about geek pedantry - I often get accused of doing exactly what you say, and can say comfortably that I don't do it to "look smart". I do it because if you bother defining a term, then immediately contradict yourself, we have a real problem. To a lesser degree, if you use a normally well-defined term incorrectly, then again, we have a problem.
I think geeks pick nits about such matters more for clarity than for self gratification. And I defend that stance by the fact that I demand similar precision from myself... If, in a conversation, I start down a path that leads to a contradiction, I will apologize, see if my intended point still holds despite the misstart, and proceed from there.
The fact that this habit tends to annoy others, I attribute to the fact that, in demanding such rigor from ourselves, we get fairly good at not making such errors all that often - Thus we seem to pick nits about others, but not ourselves.
You can't artificially put down competition (Score:5, Insightful)
Here's the problem - consider the results again. Safari will almost always (almost 50% of the time) be put in the bottom two elements. In fact depending on the algorithm used it's 40-50% chance of being put in one exact slot (either choice four or five).
When the whole point of the list is promote browser competition, it makes no sense to accept a list which is that skewed for ANY browser result from the list. You need to have it properly shuffled so that no one browser has a statistical advantage or disadvantage - if you are going to claim it doesn't matter then why not let Microsoft set an arbitrary fixed order for the list?
That is not what the legal injunction against them says they can do, therefore the randomness of the results DO matter. Just as in most things in life, correctness of results is actually important.
Re:You can't artificially put down competition (Score:5, Funny)
Safari will almost always (almost 50% of the time) be put in the bottom two elements [out of five].
And how well did you do in statistics class?
Re:You can't artificially put down competition (Score:2)
Wow... the bottom 2 elements out of a possible 5 ALMOST 50% of the time.. talk about a massive skew by those evil MS guys....
BTW... did you ever see the Dilbert cartoon with the PHB complaining that a full 40% of employee sick days were taken on Mondays and Fridays? You weren't possibly the same PHB were you?
Re:You can't artificially put down competition (Score:3, Insightful)
I don't know how you managed to get 'insightful' mods for this.
For a completely random algorithm: Out of 5 slots, EACH item has a 20% chance to be put in any single slot. For any 2 slots, that's 40%.
Let's look at your statement again: "Safari will almost always (almost 50% of the time) be put in the bottom two elements. In fact depending on the algorithm used it's 40-50% chance of being put in one exact slot (either choice four or five)."
Wow. So you're saying that it's working perfectly.
Re:What's the problem? (Score:2)
And although some might shrug as the rightmost slot being somehow "bad", there are well known tendencies for people to remember the first and last thing in a list. The point of this whole ballot screen is to even the playing field to avoid this psychology stuff, and they missed the mark. Not to mention how bad it looks that they supposedly hire the best and brightest, but can't code a shuffle properly for a key component of a major lawsuit. WTF.
Re:What's the problem? (Score:2)
Re:What's the problem? (Score:3, Insightful)
As the author of the article pointed out, this technique can cause an infinite loop.
For certain, pretty crappy definitions of "can". First, you'll notice he also points out that that "depends on the sorting algorithm used". I don't think that the most likely choices (Quicksort in particular) fall victim to this. Second, the other poster is right: the probability that it's actually an infinite loop is 0.
Re:What's the problem? (Score:5, Funny)
Yeah right (Score:3, Insightful)
Re:Yeah right (Score:2)
1 - You didn't read the article, did you?
2 - Although their method for generating uniformly-random permutations is incorrect, they are doing nothing to favor IE.
Re:Yeah right (Score:2)
Re:Yeah right (Score:2)
I suspect it would have been faster to look up a correct algorithm and implement it than think up this hack. Who ever came up with this was just incompetent.
Milliseconds (Score:3, Interesting)
They could as well just have used the last millisecond to show the browser. I mean, it's a screen shown only once to a user. What's more random, and uniform, than the time the screen appears in milliseconds modulo 5?
Re:Milliseconds (Score:3, Informative)
It's not an issue of how to get a truly random number, or of seeding a random number generator, but rather how to you use a source of random numbers to randomly order a list.
Some "reasonable sounding" methods don't actually work - e.g. attach random numbers to each list item and sort the list by these numbers (1). Microsoft used a similar method of sorting using a random comparator.
Some simple methods that DO work are picking a random permutation or executing a bunch of random swap operations on the list.
(1) http://okmij.org/ftp/Haskell/perfect-shuffle.txt [okmij.org]
Re:Milliseconds (Score:2)
Ok, sorry, had read only the summary when posting this. Should have read the full article first...
Re:Milliseconds (Score:3, Insightful)
Re:Milliseconds (Score:2)
That method does work as long as you guarantee the uniqueness of the random numbers--in other words, a tie shouldn't mean leaving elements in place, but rather repeating the algorithm on each subset of the list where the random numbers came up equal.
I see no advantage to it over the Fisher-Yates shuffle (which should be the default solution to the problem), but it can be made to work.
Re:Milliseconds (Score:2)
Instead, FTFA one of the good solutions is to iterate backwards through the list, swapping the current element with a randomly chosen element earlier in the list.
The top hit on Google... (Score:5, Interesting)
A Google search on:
gives exactly the bogus answer that Microsoft used in the top hit. [javascriptkit.com]
Unfortunately for Microsoft, a bing search gives the same top hit.
Microsoft problem solving (Score:2, Funny)
There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach.
We are still talking here about the random selection of browsers, or something more broad?
do not fix! (Score:2)
Re:do not fix! (Score:4, Interesting)
I don't think it comes off worst at all. People usually look towards the right of the screen for the "go away and stop bugging me" button, and that's where Internet Explorer is 50% of the time.
Remember that the yellow exclamation mark in the system tray telling people they need to reboot is an annoyance when they just want to get on with their work. Then when the computer finally does reboot and they really really want to start doing whatever it was they turned on the computer to do, they get this annoying thing about web browsers.
I have no problem with this at all (Score:2, Informative)
The only people where the selection and any possible bias inherent in the implementation of the random() function are the last group, which is also quite possibly the smallest of the three sets. In an ideal world, you are going to get a bellcurve with the optimum position being in the middle and the least favourable at the sides, so if Microsoft wants to implement random() so that IE is more likely to end up in the position that is joint least likely to be picked of the big five browsers, that's fine by me.
It's also apparently fine for Google's Chrome as well, so if you are anywhere near Ballmer's office in the next few days then I'd keep an eye out for any flying chairs, just in case...
Re:I have no problem with this at all (Score:2)
3 Those that have no clue about browsers and pick essentially at random, or belong to one of the above groups and click in error.
It won't be a random choice.
They will choose the browser associated with their operating system:
Microsoft Windows = Microsoft IE 8 for Windows.
The only people where the selection and any possible bias inherent in the implementation of the random() function are the last group, which is also quite possibly the smallest of the three sets.
It is more likely the largest of the three sets.
These aren't geeks shopping for tech, these are ordinary people looking for a familiar and trusted brand name.
The ballot is a bureaucratic perversity that will play to Microsoft's advantage.
Re:I have no problem with this at all (Score:2)
Personally, I think this whole concept is a stupid idea that's right up there with some of the other brain dead excuses for legislative rulings that have come out of the EC, and won't change the status quo by meaningful amount. Once you've deducted the users who will pick a non-IE browser anyway and those who will pick IE based on either its icon or the name "Microsoft" regardless of its position on the screen, there are not too many users left. I doubt very much that when the next round of browser usage figures come in that there is going to be a significant shift towards non-IE browsers that could be attributed to the introduction of the Browser Selection Screen.
Re:I have no problem with this at all (Score:2)
Those who prefer another browser to IE will have already installed one, and they won't see this screen.
Seems like the right solution to me (Score:2, Insightful)
One solutions takes 3 seconds, can be done by an intern, and makes the company no money. The other solution takes a little bit of time, maybe some reading or prior knowledge and still makes the company no money. The results yielded for each solution are acceptable for the situation. Given the cost to profit it seems like Microsoft chose EXACTLY the right solution.
This is like your community telling you that you must have a fenced in yard for your dog to be off the leash and then setting up a cheap 6-foot standard wooden fence and then the local anti government militia guy laughing at your ignorance because everyone who knows anything about fences knows you choose the solution that's 12 feet high with curved top to prevent climbing and a sunken base of 3 feet to prevent dog-tunneling.
Re:Seems like the right solution to me (Score:2)
Well, as it happens (if you read TFA) Microsoft's solution ends up being biased against Microsoft rather than for them. It hard to say which is worse, though; in one case you work against your own market share, in the other you end up wasting time and money as the EU hauls you back in court for unfair practices.
It seems the least cost option for Microsoft would have been to have gotten this right first time (as is always the case with software).
Bug in the story (Score:2)
According to the description, it returns either 0 or < 0.
That's not the way the comparator function works in any implementation I've seen. Hopefully, the test code the author wrote wasn't as random as the documentation.
Random (Score:4, Funny)
Anywhoo... So what you're telling me is that Microsoft's programmers made a mistake for a production system that 99% of freshmen CS students wouldn't make? In this case, I think you're actually giving too much credit to Freshman CS students...
On the up side: it's a webpage. On the down side: (Score:2)
On the down side.. it's a webpage.
To cover the first bit.. it's a webpage - if a potential problem has been found, it can be fixed. I.e. this random selection thing? They can implement a better one.
Though I do wish they would start by fixing the page's use of javascript to sort the results -after- their display.
Turn off Javascript.. the order appears to always be: IE, FF, Opera, Chrome, Safari.
In fact - if you have a slow machine like I do.. leave javascript on, and just refresh the page.. look closely, and you'll find that -that- is the order of display *before* it gets shuffled by the javascript.
On the down side... if they were to detect the user not using javascript, and would switch server side-shuffled results instead, people would wonder how fair -those- are.. and since they can't peek inside the server, there's no way to tell.
Which leads to another down side... it's on a server out of our control - it's entirely possible, albeit unlikely, that one out of N hits favors browser X by design.
note: I don't really care much.. I don't think this should have been pushed through at all. But if they -have- to do it, I'd rather they do it well, just because it would have prevented the flurry of articles on this and other issues (such as the rollout not being instant, but stretched out through April(?) - probably to keep media attention / panicked users calling their tech support to a minimum), even in times to come.
Is it really about 1-5? (Score:2)
Isn't the *middle* choice going to be more important than the leftmost one?
We have a screen-centred dialog here that shows five browsers. One is going to end up smack-middle on the screen - and guess what - Windows centres the mouse pointer when it boots, which is when the screen will appear.
People often go for middle ground. Then they go to the direction they read... So in my mind, the 3rd choice is going to be the most important, followed by 4 for LTR cultures, and 2 for RTL.
But I'm talking out of my ass here... Any research into the importance of elements in a short horizontal list?
This is why we get outsourced (Score:2)
I'm a senior programmer, over 10 years experience. I don't think I would have EVER imagined a solution to randomizing that would involve a sort comparison function returning a random result. This is because I have an understanding of the implicit (and in some languages, explicitly documented [sun.com]) total ordering contract imposed on the comparator function.
But you know what? If Microsoft doesn't care for nice things like correctness, why would any big software development company? Excepting cases like Wolfram Research's Mathematica and Autodesk's AutoCAD, of course. And yet they have the balls to call their people software engineers?
I'm not disagreeing with the conclusion that the quick and cheap way is sufficient and satisfactory from both the business standpoint and for this problem the technical standpoint. I'm simply saying that, as a profession, the field is so compromised with crap like that this that it's no wonder most software sucks. It's easy to see how an experienced and knowledgeable developer can be replaced by a couple of best-low-bid hacks.
Of course management rarely sees or understands the difference between throwing a hacked up function at an unimportant problem like this one, and doing the same thing when designing the firmware for a car, a risk calculation algorithm for derivative investments, or the math portion of a CPU. As a result, they'll promote the hack who spewed out this junk to a position working on something important, and the same bad algorithm will get used when it matters, and off we go with the people dying and the lawsuits and the congressional investigations.
Malice? (Score:4, Interesting)
Re:Malice? (Score:3, Informative)
It starts off as a list dumbass, the browsers to sort have to come from somewhere.
Would you be upset if the first item on the list were FireFox, so FF popped up first momentarily?
Seriously, get a life.
Reach for Knuth? (Score:3, Insightful)
And one of the things one learns early on is to reach for Knuth
Knuth is for computer scientists. Not everybody who writes code meets that definition. A lot of us (and I include myself) don't even qualify as "engineers".
For most programmers, the best way to write good "select random x from 1..n is not to brush up on our algorithmics. That's like fabricating a car part instead of going to the auto supply. (Hey, there's a good reason the car analogy keeps popping up!) You need to rely on standard, well-tested libraries. Josh Bloch even refers to this use case as an example of why you should rely on library code [sun.com].
Re:Reach for Knuth? (Score:3, Insightful)
Why would you not know what the library call does? Presumably it's documented. Knuth might help you how the library call works, but more likely you'd refer to Knuth to write your own version of the code. And, as Josh Bloch argues, rewriting tried and tested code is a mistake.
Randomness vs Uniformity (Score:3, Insightful)
Those of you that are computer scientists should take a moment to consider that randomness is not the same as uniformity (as an insightful reader commented in TFA and triggered me to respond there).
Just because the only way to produce an algorithm for uniformity is via a random number generator, this does not mean that there aren't other non-statistical approaches. Here's one:
"The computer upon Windows installation contacts a MS site that uses a global installation counter - each new installation would increase the counter from N to (N+1) and then present a browser order according to (N modulo 5!). This is a totally deterministic process, with no randomness at all (statistical tests for randomness would fail because of the autocorrelation), which however would lead to perfect uniformity: at any given time instant, each browser would have been placed in each of the 5 positions with a percentage of precisely 20%, as required. The same kind of uniformity could be produced by using the installation serial number (licence) of Windows: since the licence key space is well-defined, the order of browsers could be also well (uniformly) defined from the serial number itself. There might be a problem with volume licences, but VLKs are a small percentage of total installations.
However, on a single offline computer, with no knowledge of history (what ballot was presented globally) or without a licence key, programmers have to resort to mathematics in order to produce
uniform (not necessarily random) distributions. This is an application of the law of large numbers: if the ballot is uniform on the same computer, it will be uniform globally." (using quotes because I'm quoting myself).
In conclusion, we should not care if the distribution is not "random" but whether it is uniform (i.e. all possible permutations of 5 browsers appear with equal frequencies).
The root of the problem... (Score:4, Insightful)
To be perfectly honest, this is exactly what I would have done too.
It takes 5 minutes to dig out my copy of Knuth.
It takes 1 minute to pirate Knuth and search through the pdf.
But it only takes 10 seconds to copy and paste this one-liner from the first google hit.
That probably explains my lack of success in the job market for the past decade...
Quality Standards (Score:3, Funny)
I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake.
Well, that seems an awfully high standard for Microsoft to hold itself to.
Microsoft: Ranked 1st percentile with Freshmen CS Majors
Re:Cheating at online poker (Score:2)
they did of Party Poker's site.
Shit, PlanetPoker, not Party Poker. Sorry.
Re:What? Why not? (Score:5, Insightful)
Why not? Is the author suggesting that random functions in use today are somewhat deficient? What is his solution?
You know, it's really too bad that the author of the article the summary linked to didn't write up an article answering exactly that. Then maybe Slashdot could have linked to it.
(In a nutshell, the answers are, respectively: "because plopping a 'rand()' into your code doesn't mean that what you'll get out is uniform", "no", and "use a shuffling algorithm that works.")
Re:What? Why not? (Score:5, Informative)
What the problem actually seems to be is not that they're using random, but how they're using it.
What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead. qsort() or whatever they used was designed to have results that are consistent, and with input like that it could potentially abort and leave the list entirely unsorted. Or if you're really unlucky it could sit there sorting them forever.
Re:What? Why not? (Score:2)
Re:What? Why not? (Score:2)
As another poster said there is a slight error. A fix would be to use a vector holding the remaining browsers to randomly select as the next item in the shuffle list. Or you could use probability theory to shift the list of randomly generated indices so only remaining indices as selectable in each iteration.
MS should have kept it simple. There's no need to overcomplicate something so trivial.
Re:What? Why not? (Score:2)
I'm almost certain he's suggesting you splice out the array item chosen each time so that the remaining indices are enumerated as 1-X at all times. I'm pretty sure that would work.
I'm not convinced it is simpler, although I see a couple people on slashdot suggesting it is. Adding a random comparator to an existing sort function is dead simple and obvious (wrong, but simple and obvious), and I'd be surprised if it weren't a lot more than 1% of freshmen that come upon that one, particularly when the top search results actually suggest that broken algorithm, unless these freshmen had never encountered sort functions before.
Interesting that random compare is not random... (Score:2)
What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead.
That algorithm seems really wrong, for the reason you mentioned - that potentially you get an infinite loop.
However after further thought, I don't think that result is possible, and I find it more interesting to think about why this does not work as intended. After all, if you are randomly comparing random numbers why is the end result not still random?
First, why it is not going to ever result in an infinite loop - because pretty much any sorting algorithm is not going to engage in an infinite number of comparisons. Pretty much any technique I can think of is always going to be left with an assumption about a growing number of elements and where they fit into place, so even though the comparison result is totally random the sort will always complete (the results will just always be wrong, if you can call any ordering based on a broken comparator wrong).
I think that the non-randomness of the result is related, in that that assumption is made as to where a value falls and then the remaining elements fall into line after that result... which might explain how one element like Safari was so out of whack when the others were also out of balance but to a lesser degree, as the initial list is probably always fed in the same order and the algorithm works on one particular element first to start the sort (which might also explain the browser variation, having slightly different sorting algorithms).
Re:What? Why not? (Score:2)
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better? That way MS don't have to include some seed number in their script and have everybody thinking they're manipulating the results by using seed numbers that they know would generate a certain ordering in various versions of IE.
I actually found this bit...
Although I can't verify this, as the test page provided... http://www.robweir.com/blog/shuffle.html [robweir.com] ...won't actually run in IE8 here :D
That said.. odd method of getting random results out of an array.. I'm sure it's more efficient but KISS would dictate you grab one random result out of the array, delete the one you chose, pick a new one, delete, etc. until the array is down to 1 and you stick that last one at the end. Can't really go wrong with that unless the random number generator itself is hosed, and even non-mathematicians can understand how that one works. It's not like it's a high-performance application.
Re:What? Why not? (Score:2, Informative)
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better?
I can't speak for Javascript but in most C libraries - at least in the DOS era when I last made this mistake - the library would default to a static seed of its own during program initialisation, probably 0 or something.
As a result, if you made a program that started, output 15 random numbers and then quit, each time you ran it you would get the same 15 numbers. :P
What you're supposed to do to avoid that is seed the RNG at the program start, using the current time since the epoch as the seed. This is what I was referring to, not using a static seed
Re:What? Why not? (Score:2)
A-ha... crystal clear - thanks for clarifying :)
Just to check - IE8 certainly seems to do this - whether it does it properly, I wouldn't know, but it's certainly not the same results each time I fire up IE8 with a test page (just printing out 10 random numbers).
Certainly not ;) But if MS were to seed the number, they'd still have to get that seed out of somewhere.. though I suppose they could use javascript date/time functions for that.
Re:What? Why not? (Score:5, Insightful)
Re:Good enough (Score:4, Informative)
Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK
This problem has nothing to do with how the PRNG is seeded.
The word "seed" doesn't even appear in TFA at all.
Re:Good enough (Score:5, Insightful)
Given that each person will only lose one cent per lifetime, I propose to move $0.01 from each bank account in the world to my own account.
Read the results, not good enough (Score:3, Insightful)
Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK.
A) That was not the problem.
B) Consider the result instead of the algorithm is it OK to have your "random" list just about always present any one choice in the bottom two elements? Because that is what happened for Safari.
If you aren't going to insist on a list that's even close to random then you should not make randomness a requirement.
Re:Good enough (Score:2)
Tangential question (not nitpicking) - isn't it really once per user? Those of us that actually use multiple users may have to answer it multiple times. (I'm on OS X, but I'm still curious.)
Re:I have been fired for less. (Score:2)
You need to stop working for this guy [nytimes.com].
And yes, I'm also seriously disturbed by careless attitude of many in the Bay Area and Silicon Valley --- I've had a chance to get to know the area better lately.