Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
Internet Explorer Microsoft Programming

Schooling Microsoft On Random Browser Selection 436

Posted by kdawson
from the let-me-show-it-you dept.
Rob Weir got wind that a Slovakian tech site had been discussing the non-randomness of Microsoft's intended-to-be-random browser choice screen, which went into effect on European Windows 7 systems last week. He did some testing and found that indeed the order in which the five browser choices appear on the selection screen is far from random — though probably not intentionally slanted. He then proceeds to give Microsoft a lesson in random-shuffle algorithms. "This computational problem has been known since the earliest days of computing. There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach. But I do not believe there is some nefarious intent to this bug. It is more in the nature of a 'naive algorithm,' like the bubble sort, that inexperienced programmers inevitably will fall upon when solving a given problem. I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake. But with education and experience, one learns about these things. And one of the things one learns early on is to reach for Knuth. ... The lesson here is that getting randomness on a computer cannot be left to chance. You cannot just throw Math.random() at a problem and stir the pot and expect good results."
This discussion has been archived. No new comments can be posted.

Schooling Microsoft On Random Browser Selection

Comments Filter:
  • by Anonymous Coward on Sunday February 28, 2010 @02:46PM (#31308172)

    What's the problem? It's random enough for a browser selection screen.

    This isn't an application where a statistically random shuffle is required.

  • Yeah right (Score:3, Insightful)

    by CSHARP123 (904951) on Sunday February 28, 2010 @02:46PM (#31308184)
    Showing a browser selection has been imposed on them and these geeks think MS is going to select the best approach possible for randomness. No wonder none of you are sucess in business.
  • Good enough (Score:1, Insightful)

    by TBoon (1381891) on Sunday February 28, 2010 @02:47PM (#31308194)
    Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK. Can't see why you would need more randomness that that in this particular situation. Just make sure that the distribution of browsers evens out for all seeds...
  • Re:What? Why not? (Score:5, Insightful)

    by EvanED (569694) <evaned@ g m ail.com> on Sunday February 28, 2010 @02:50PM (#31308226)

    Why not? Is the author suggesting that random functions in use today are somewhat deficient? What is his solution?

    You know, it's really too bad that the author of the article the summary linked to didn't write up an article answering exactly that. Then maybe Slashdot could have linked to it.

    (In a nutshell, the answers are, respectively: "because plopping a 'rand()' into your code doesn't mean that what you'll get out is uniform", "no", and "use a shuffling algorithm that works.")

  • He's just bitching (Score:4, Insightful)

    by Sycraft-fu (314770) on Sunday February 28, 2010 @03:02PM (#31308322)

    It is probably a combination of two things:

    1) Hate for MS. MS is doing what some have said they've needed to do in giving users browser choice, and they've done so as to try not to promote any given one. While that makes proponents of choice happy, it makes MS haters mad. The more MS does to try and accommodate users and play fair, the less there is to hate on them for legitimately. As such haters are going to try and find nit picks to bitch about.

    2) General geek pedantry. Many geeks seem to love to be exceedingly pedantic about every little thing. If a definition isn't 100% perfect, at least in their mind, they jump all over it. I think it is a "Look at how smart I am!" kind of move. They want to show that they noticed that it wasn't 100% perfect and thus show how clever they are.

    Doesn't matter, it is what it is and as you said, random enough. This guy can whine all he likes.

  • by magsol (1406749) on Sunday February 28, 2010 @03:06PM (#31308352) Journal
    On the other hand, the devil is in the details, and one would think that a company such as Microsoft that has been owning the software market for decades now would know how to implement a randomizing algorithm correctly.
  • Re:What? Why not? (Score:5, Insightful)

    by Anonymous Coward on Sunday February 28, 2010 @03:07PM (#31308354)
    No, Math.random is not the problem, the problem is how it is used. They used it as random input to a sorting algorithm without considering how the sorting algorithm works. The assumption that any sorting algorithm with inconsistently random input = random order is wrong. If they had assigned a random value to each element and sorted by that value the result would have been truly random as the value associated with each element would have been consistent.
  • Re:Good enough (Score:5, Insightful)

    by mangu (126918) on Sunday February 28, 2010 @03:08PM (#31308366)

    Given that each user is only going to see this screen once per computer

    Given that each person will only lose one cent per lifetime, I propose to move $0.01 from each bank account in the world to my own account.

  • by EvanED (569694) <evaned@ g m ail.com> on Sunday February 28, 2010 @03:08PM (#31308368)

    Is picking a worse random number generation function (the default one in C and JS) really fucking up?

    There's no problem with the function they're using; the problem is how they're using it. If 'rand()' were perfect, their technique would still suck.

    I can already see all the comments how MS would be favoring IE with this (summary conveniently left that one out), but as it is they're promoting the other browsers almost double more.

    I do think the summary should have mentioned that bias, but I don't think it's quite as good a position as you convey. I bet the far right position is better than #3 and #4 at least.

    (If I wanted to put on my conspiracy hat -- which I don't, I don't really believe this -- I'd say that MS wanted to bias it towards them and decided that biasing it toward #1 would be too blatant, but that #5 was "good enough".)

  • by SuperKendall (25149) on Sunday February 28, 2010 @03:11PM (#31308380)

    Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK.

    A) That was not the problem.

    B) Consider the result instead of the algorithm is it OK to have your "random" list just about always present any one choice in the bottom two elements? Because that is what happened for Safari.

    If you aren't going to insist on a list that's even close to random then you should not make randomness a requirement.

  • by kevinNCSU (1531307) on Sunday February 28, 2010 @03:12PM (#31308388)

    One solutions takes 3 seconds, can be done by an intern, and makes the company no money. The other solution takes a little bit of time, maybe some reading or prior knowledge and still makes the company no money. The results yielded for each solution are acceptable for the situation. Given the cost to profit it seems like Microsoft chose EXACTLY the right solution.

    This is like your community telling you that you must have a fenced in yard for your dog to be off the leash and then setting up a cheap 6-foot standard wooden fence and then the local anti government militia guy laughing at your ignorance because everyone who knows anything about fences knows you choose the solution that's 12 feet high with curved top to prevent climbing and a sunken base of 3 feet to prevent dog-tunneling.

  • by SuperKendall (25149) on Sunday February 28, 2010 @03:16PM (#31308428)

    Here's the problem - consider the results again. Safari will almost always (almost 50% of the time) be put in the bottom two elements. In fact depending on the algorithm used it's 40-50% chance of being put in one exact slot (either choice four or five).

    When the whole point of the list is promote browser competition, it makes no sense to accept a list which is that skewed for ANY browser result from the list. You need to have it properly shuffled so that no one browser has a statistical advantage or disadvantage - if you are going to claim it doesn't matter then why not let Microsoft set an arbitrary fixed order for the list?

    That is not what the legal injunction against them says they can do, therefore the randomness of the results DO matter. Just as in most things in life, correctness of results is actually important.

  • by AdmiralXyz (1378985) on Sunday February 28, 2010 @03:19PM (#31308456)

    one would think that a company such as Microsoft that has been owning the software market for decades now would know how to implement a randomizing algorithm correctly.

    Wrong: a software company such as Microsoft that has been owning the software market for decades now knows how to use programmer time and resources effectively. Spending the extra programmer time and effort to turn a "99.99% random" process into a "100% random" one is an utter waste of both on something this trivial. Hate to break it to you, and to look-at-me-I'm-so-much-smarter-than-evil-Microsoft Rob Weir, but they're not making any mistake here.

  • by TerranFury (726743) on Sunday February 28, 2010 @03:22PM (#31308468)
    Whatever. They offloaded what looked like a menial task to some low-level programmer, who ran it a few times, saw it was "random" (without doing any statistical tests), and went home happy. He probably should have known the Knuth shuffle algorithm -- I remember studying it in high school CS, even -- but honestly it's not that huge a deal.
  • by EvanED (569694) <evaned@ g m ail.com> on Sunday February 28, 2010 @03:29PM (#31308526)

    As the author of the article pointed out, this technique can cause an infinite loop.

    For certain, pretty crappy definitions of "can". First, you'll notice he also points out that that "depends on the sorting algorithm used". I don't think that the most likely choices (Quicksort in particular) fall victim to this. Second, the other poster is right: the probability that it's actually an infinite loop is 0.

  • by DavidShor (928926) <supergeek717@gmail.BOYSENcom minus berry> on Sunday February 28, 2010 @03:35PM (#31308566) Homepage
    Did you read the god-damn article? The results are insanely non-uniform. With a sample-size of 10,000 , IE ends up in 5th place 50% of the time. I mean, it's not evil, it doesn't benefit them. But it makes them look insanely dumb.
  • by beelsebob (529313) on Sunday February 28, 2010 @03:37PM (#31308586)

    No, the point was that no one browser got unfairly pushed to the top all the time. This algorithm does push a certain browser higher more often than not, and hence is not fit for it's job.

  • by Short Circuit (52384) <mikemol@gmail.com> on Sunday February 28, 2010 @03:37PM (#31308592) Homepage Journal

    It's paranoia and naiveté like yours that led me to stop hanging around here so much.

    Paranoia in that everything company X does is evil or has an inappropriate or immoral ulterior motive. Naiveté in that you don't stop to recognize that the not all of the developers who work for an institution are going to output code of the caliber of its most senior, experienced and/or knowledgeable developers, nor can code review and automated tests catch all of the problems and gotchas known to computer science, academia and the body of professional programmers.

    So can the "the devil is in the details" crap; you don't know what you're talking about. Building a complex software package that takes into account every possible detail in both process and implementation is impossible in any environment currently available for consumer software and general computing hardware. Just when you think you've got everything covered, nature builds a vendor builds a buggy component, security specialists discover a flaw in the way you learned to write your software, nature builds a better idiot, or a piece of a radioactive isotope in a memory module emits a beta particle, just to ruin your day.

  • by 644bd346996 (1012333) on Sunday February 28, 2010 @03:37PM (#31308594)

    Even with a very high quality entropy source, the algorithm Microsoft used will result in a very non-uniform distribution.

    Clearly, Microsoft didn't care about this enough to assign one of their experienced coders to it, which is odd given the legal involvement. Either the technical side of MS ignored the legal department's explanation of the importance of the browser ballot to MS's ability to do business on a particularly profitable continent, or someone powerful in MS decided to spite the EU by assigning low quality programmers to the project.

  • Re:do not fix! (Score:1, Insightful)

    by Anonymous Coward on Sunday February 28, 2010 @03:39PM (#31308602)

    looking at the outcome IE comes off the worst with the current algorithm, please keep it that way. Thanks from all the Web Developers.

    Exactly. And the Apple people here managing to interpret this as a plot against Safari are just amazing. MS would represent IE the worst, and Chrome and Firefox the best, just to get Safari. Yeah, right. Talk about delusions of grandeur.

  • by 644bd346996 (1012333) on Sunday February 28, 2010 @03:45PM (#31308644)

    They made the stupid mistake of not assigning an experienced or well-educated programmer to a project that was necessary for them to legally do business in Europe. Somewhere along the way, the legal department had to have sent the technical managers an email containing a phrase very similar to "don't fuck this up!!", and the manager ignored it and assigned a programmer who didn't go to a good CS school and thus has never heard of a Fisher-Yates shuffle or something equivalent.

    It's very understandable that some of Microsoft's programmers are of such low quality. What is odd is that their legal department can't make their technical managers understand "do this right or we lose the right to do business on the second most profitable continent."

  • by cgenman (325138) on Sunday February 28, 2010 @03:49PM (#31308680) Homepage

    If someone on my team returned that piece of code and insisted that it met the requirements, I would find another team member. A random shuffle is supposed to give ballpark equal positions. This algorithm gave Internet Explorer the rightmost position in the list a full %50 of the time. It's not like he's complaining that the algorithm be up to encryption grade randomness, but rather that it fails even the human eyeball test. %10 statistical variation? Sure, whatever. But getting a particular slot a full %250 more than you should, when you're ordered by the court to make something random? That's really poor coding.

    And the sad thing is, with just FIVE things to sort and no real pressure for speed or RAM, there is no reason why it should be this poor. There is essentially unlimited computing power and RAM, and it fails to produce even casually random results. It's just an inexperienced coder and an inexperienced team making freshman mistakes. Considering this was part of an EU directive, I would have expected at least a few higher level eyeballs would have caught this.

  • by teg (97890) on Sunday February 28, 2010 @04:08PM (#31308830) Homepage

    True random shuffle will give you songs and orders you've already heard --- just as likely as any other song and order combination.

    Yes, but people forget most of the sequence... they just notice the times when it is the same artist in a row. Thus, the part of the elections evaluated when thinking "this isn't random" is extremely biased. Humans are good at seeing patterns.

  • by cgenman (325138) on Sunday February 28, 2010 @04:11PM (#31308846) Homepage

    While in Microsoft's native browser (which would happen the first time), Internet Explorer is given a full %64 chance of receiving one of the coveted 2 edge positions. Considering that antitrust courts were involved in the creation of this screen, you'd think that getting "random" right would be a development priority, especially considering it should have taken a competent programmer exactly the same amount of time to do it right as to do it wrong. If this takes even one hour of lawyer time to ponder, it would have been much cheaper to send the programmer back to fix it.

    A 50% chance of getting a particular slot that should be %20 is not "99.99% random." It's just wrong. And when you're talking about the cost of antitrust regulation, it's really, really wrong.

    I'm glad this is being brought up on Slashdot. There is a lot of misunderstanding about how to create randomness in systems. Even on a basic level, people frequently ask for "random" when they actually want jukebox random. In this case, though, it just seems like a basic misunderstanding of statistics, which is not surprising given the moderate code complexity and likelihood this screen was given to an intern or jr programmer.

  • by bmcage (785177) on Sunday February 28, 2010 @04:12PM (#31308854)
    Do you have any mathematical or logical inclination?? Well I have, and seeing such a stupid way to randomize 5 entries just makes me weep! Be truthfull here, would _you_ program it that way?? What are they smoking in Seattle?

    And as the article says, if you use statistics to determine if the program is random, then the answer will be ... not random. So please, don't call this random, and if you do, please, let me know which software you work on, so I can avoid it.

    I agree that all this does not matter for the ballot screen. But at least on slashdot we can expect some higer standards than a

    return (0.5 - Math.random());

    comparison function.

  • by Aladrin (926209) on Sunday February 28, 2010 @04:12PM (#31308858)

    I don't know how you managed to get 'insightful' mods for this.

    For a completely random algorithm: Out of 5 slots, EACH item has a 20% chance to be put in any single slot. For any 2 slots, that's 40%.

    Let's look at your statement again: "Safari will almost always (almost 50% of the time) be put in the bottom two elements. In fact depending on the algorithm used it's 40-50% chance of being put in one exact slot (either choice four or five)."

    Wow. So you're saying that it's working perfectly.

  • Reach for Knuth? (Score:3, Insightful)

    by fm6 (162816) on Sunday February 28, 2010 @05:00PM (#31309202) Homepage Journal

    And one of the things one learns early on is to reach for Knuth

    Knuth is for computer scientists. Not everybody who writes code meets that definition. A lot of us (and I include myself) don't even qualify as "engineers".

    For most programmers, the best way to write good "select random x from 1..n is not to brush up on our algorithmics. That's like fabricating a car part instead of going to the auto supply. (Hey, there's a good reason the car analogy keeps popping up!) You need to rely on standard, well-tested libraries. Josh Bloch even refers to this use case as an example of why you should rely on library code [sun.com].

  • by fm6 (162816) on Sunday February 28, 2010 @06:35PM (#31309980) Homepage Journal

    Why would you not know what the library call does? Presumably it's documented. Knuth might help you how the library call works, but more likely you'd refer to Knuth to write your own version of the code. And, as Josh Bloch argues, rewriting tried and tested code is a mistake.

  • by omfgnosis (963606) on Sunday February 28, 2010 @06:36PM (#31309984)

    with IE in the least desirable position about 50% of the time.

    It is in the second-most (or most, depending on circumstances) desirable position, because users pay a disproportionate amount of attention to visible ends of a list, particularly a horizontal list. The least desirable position is any middle position on the second screen. This is a consequence of the simplicity of a horizontal list, and user attention shifts accordingly as content grows in height (reading across, down, across, down).

    Microsoft actually punished themselves extra by using this function, thereby making another lawsuit on this particular matter impossible to win against them.

    Doubtful. The mere fact that the order places certain items in certain positions a disproportionate amount of the time would raise considerable doubt that Microsoft acted in good faith. This would be sufficient reason to introduce user test data which would demonstrate that the last position is not the least desirable.

  • by CAIMLAS (41445) on Sunday February 28, 2010 @06:43PM (#31310060) Homepage

    Exactly.

    From these results, we can assume one of two things:

    1) Incompetence
    2) Malice

    There may be an off chance of both incompetence and malice given Microsoft's history, but consider that this action was performed solely to meet legal requirements set forth by the EU to inhibit Microsoft's monopolistic behaviors.

    Regardless of which it was, the end result will (likely) be one of two things: the EU will say "not good enough" and another year+ long trial will go on before any actual change gets made, or the EU will let it slide and Microsoft will reap the benefit of whatever they intended with this algorithm.

    To my eyes, it looks like Microsoft is giving preference to 1st place proportionately to the browser current market name recognition - to the exception of their own browser. I don't know if this is intentional.

    However, also consider how dialog boxes typically work, and how people have been conditioned (on Windows and pretty much everywhere else) to immediately look to the left hand side for their "get past this irritating prompt" button. It's a technique used to install all sorts of insidious malware, so evidently it's a technique that works. By having IE hold closest position to that 'visual queue' area, they are giving it preference. Also consider the impact that having the IE logo branding (or any logo, for that matter) on your desktop for a decade will have.

    I would not be surprised to see an article on statistics resulting from this browser selector showing up in a couple months, showing the profound popularity of IE. I'd wager at least 50%.

  • by UBfusion (1303959) on Sunday February 28, 2010 @06:54PM (#31310180)

    Those of you that are computer scientists should take a moment to consider that randomness is not the same as uniformity (as an insightful reader commented in TFA and triggered me to respond there).

    Just because the only way to produce an algorithm for uniformity is via a random number generator, this does not mean that there aren't other non-statistical approaches. Here's one:

    "The computer upon Windows installation contacts a MS site that uses a global installation counter - each new installation would increase the counter from N to (N+1) and then present a browser order according to (N modulo 5!). This is a totally deterministic process, with no randomness at all (statistical tests for randomness would fail because of the autocorrelation), which however would lead to perfect uniformity: at any given time instant, each browser would have been placed in each of the 5 positions with a percentage of precisely 20%, as required. The same kind of uniformity could be produced by using the installation serial number (licence) of Windows: since the licence key space is well-defined, the order of browsers could be also well (uniformly) defined from the serial number itself. There might be a problem with volume licences, but VLKs are a small percentage of total installations.

    However, on a single offline computer, with no knowledge of history (what ballot was presented globally) or without a licence key, programmers have to resort to mathematics in order to produce
    uniform (not necessarily random) distributions. This is an application of the law of large numbers: if the ballot is uniform on the same computer, it will be uniform globally." (using quotes because I'm quoting myself).

    In conclusion, we should not care if the distribution is not "random" but whether it is uniform (i.e. all possible permutations of 5 browsers appear with equal frequencies).

  • by SuperKendall (25149) on Sunday February 28, 2010 @07:25PM (#31310412)

    Obviously I didn't explain what was going on very well, I (stupidly) assumed people would read the actual article and the data there. But hey, this is Slashdot so I guess I better fill you in.

    It's not two slots. It's one slot (look at the results). The "bottom two" comes from the fact that in each browser test, a SINGLE SLOT was used either 40% or 50% of the time (depends on the browser). The exact NUMBER of the lost depended on which browser was being used. Thus we are talking about 40-50% vs 20% (which would be random).

    Furthermore the 50% is still out of variance by a decently large factor, even if we were talking about two whole slots instead of one.

    So, "wow" yourself. Look at the data next time before you leap to conclusions, or state that an utterly broken algorithm is "working perfectly".

  • Re:Milliseconds (Score:3, Insightful)

    by socsoc (1116769) on Sunday February 28, 2010 @08:56PM (#31311038)
    Yep
  • by zill (1690130) on Sunday February 28, 2010 @08:57PM (#31311046)
    This error can be easily traced back to the first google result [google.ca] (Actually it should be the first bing result [www.bing.ca] in this case).

    To be perfectly honest, this is exactly what I would have done too.
    It takes 5 minutes to dig out my copy of Knuth.
    It takes 1 minute to pirate Knuth and search through the pdf.
    But it only takes 10 seconds to copy and paste this one-liner from the first google hit.

    That probably explains my lack of success in the job market for the past decade...
  • by Phantasmagoria (1595) <<loban.rahman+slashdot> <at> <gmail.com>> on Sunday February 28, 2010 @09:20PM (#31311224)

    If you had bothered to read the article, you'd see that the author has done JUST that. Not only did he prove (using proper statistical methods) that the results are significantly not random, he also dug up the exact javascript source code that does the shuffling and explained why it is faulty. RTFA!

  • by darkshadow88 (776678) on Sunday February 28, 2010 @11:31PM (#31311946)

    This is obviously not a random distribution curve.

    I believe you meant to say uniform rather than random.

I have not yet begun to byte!

Working...