Forgot your password?
typodupeerror
Bug Microsoft Programming Software Science

Researchers Test Developer Biometrics To Predict Buggy Code 89

Posted by Soulskill
from the subject-was-asleep-when-this-code-was-checked-in dept.
rjmarvin writes: Microsoft Research is testing a new method for predicting errors and bugs while developers write code: biometrics. By measuring a developer's eye movements, physical and mental characteristics as they code, the researchers tracked alertness and stress levels to predict the difficulty of a given task with respect to the coder's abilities. In a paper entitled "Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development," the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor to 15 developers as they programmed for various tasks. Biometrics predicted task difficulty for a new developer 64.99% of the time. For a subsequent tasks with the same developer, the researchers found biometrics to be 84.38% accurate. They suggest using the information to mark places in code that developers find particularly difficult, and then reviewing or refactoring those sections later.
This discussion has been archived. No new comments can be posted.

Researchers Test Developer Biometrics To Predict Buggy Code

Comments Filter:
  • by i kan reed (749298) on Tuesday July 22, 2014 @01:57PM (#47509659) Homepage Journal

    Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?

    • by dave562 (969951) on Tuesday July 22, 2014 @02:05PM (#47509711) Journal

      The world hates putting up with buggy code.

      • by dave562 (969951)

        On a more serious note, a single developer mistake can potentially affect millions of end users (in the case of an application like Windows). Therefore it makes sense to focus on the developers. "With great power comes great responsibility" and all that.

      • by K. S. Kyosuke (729550) on Tuesday July 22, 2014 @02:20PM (#47509807)
        And what about managers who steer the development effort in a direction highly likely to produce buggy code, those won't get measured?
        • by dave562 (969951)

          Of course they get measured. In the long term if they deliver too many screwed up projects, their superiors stop giving them projects.

          Ultimately it is the developer's responsibility to push back against stupid managers and give them honest feedback about what can and cannot be done.

          • by Anonymous Coward

            I would like to know where the entrance is located to this magically meritocratic land you speak of, It is obviously not Earth.

            • And what about managers who steer the development effort in a direction highly likely to produce buggy code, those won't get measured?

              Of course they get measured. In the long term if they deliver too many screwed up projects, their superiors stop giving them projects.

              Ultimately it is the developer's responsibility to push back against stupid managers and give them honest feedback about what can and cannot be done.

              I would like to know where the entrance is located to this magically meritocratic land you

          • by gstoddart (321705)

            Of course they get measured. In the long term if they deliver too many screwed up projects, their superiors stop giving them projects.

            Sadly, in my experience, they blame it on their team, and get promoted.

            Bad managers are surprisingly good at making it look like someone else's fault.

    • by netsavior (627338) on Tuesday July 22, 2014 @02:21PM (#47509821)
      because the average judge/jury/CEO/consumer/manager has no idea how to write code.

      They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.

      Developers are part of a very narrow segment which has no reliable Key Performance Indicators. [wikipedia.org]
      Part of that is developers are smart enough to game any system, because they can think in algorithms.

      Want to track productivity on Lines of code? Fine, Developers can do NO WORK, and produce TONS of code
      Want to track productivity on Number of defects introduced? Fine, doing NO WORK is the baseline for perfect.
      Want to track productivity on Number of defects fixed? Fine, through the magic of hand wavery, defects can be found and fixed with no actual work happening

      Compare that to well-defined Key Performance Indicators for sales... Bring in X dollars of sales, your performance is X.

      CEOs HATE things they cannot measure... which means CEOs are a natural enemy of Developers.
      • by Anonymous Coward

        Your simplification of gaming the system can be applied to sales people too. It is not unheard of how some product manufacturers sell stuff to distribution channels and count it as sold, despite living in a warehouse or shop. Or how you can be an excellent talker and make people pay more for stuff they don't need (oh, this phone? no, you want the 5" screen!), but this can have detrimental effects, maybe the user pays more, feels cheated, and never comes to buy from the brand despite other products matching

        • I am thinking of a company* that sold customizable software to large customers who demanded the ability to customize. It also sold new features, which would have to be created by the development staff. At one point, the sales and development staff wound up in different cities, to the technical people weren't able to sit on the overenthusiastic sales people.

          The sales people found that they could sell new features, and (a) it made it easier to close the sale, and (b) custom development cost money, so it

      • You are assuming a mutual exclusion that doesn't exist,as well as a knowledge of the methodology used. If you apply all three, none of those workarounds work. Then, one could also do "blind metrics analysis", where different methods of analysis are used during different, and secret / unannounced windows of time.

        You see, as smart as the typical developer thinks he is, it doesn't take much for someone with real skills to bury them ... deep.
      • They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.

        Disclaimer: I am in hardware myself and may completely miss the point here. However, our software/firmware folks do agile programming involving dividing programming problems into pieces which are assigned to programmers, followed-up on large whiteboards and being daily discussed in "scrum meetings" etc. (I may be confusing some concepts here but that is of less importance). The point being that your statement, that programming is some sort of unique dark-art-which-cannot-be-measured-by-managers, appears unt

    • by znrt (2424692)

      i hit 50 last sunday and been a developer since i can remember, and i still love my profession but the "guild" has changed an awful lot, from once being a peculiar bunch to the herd it mostly is today. most of my colleagues are much younger than me and ... what can i say ... they are often so brainwashed with corporate bullshit they break my heart almost daily. holy shit, they even blog about it! it's so depressing, it makes me wanna cry. :'(

      sign'o'times, i guess. i can perfectly believe many of those sheep

      • by turp182 (1020263)

        I'm 40 now. I remember the late 1990s when I was young, as was everyone around me, and at a non-public facing reinsurance company, we had extra staff just doing pie-in-the-sky stuff no one was ever going to see. We got a lot done via inherent competence, I realize now that we were lucky, and we had budget.

        In the early 2000s I led the design and development of a SOA rewrite of an existing VB6 app. We had an iDeisgn consultant come in for a week to get us started which was invaluable; but it was through lu

    • Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?

      Mostly because they are a newer profession and a trickier one to quantify.

      Time and motion studies, along with 'scientific management' were already a serious hit in terrifyingly invasive performance enhancement for blue collar labor around the turn of the 20th century(Taylor and the Gilbreths being the poster children, with many successors). The workers who haven't been replaced by robots yet are likely still subject to a descendant of it. Though less amenable to automation, service sector jobs are also r

    • by ranton (36917)

      Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?

      You really think developers are singled out here? There is an entire industry built upon business process improvement and operational management. I guess there probably are more invasive measures against professions like developers, but that is only because it has been hard to completely replace the profession with machines.

      When you start having your bathroom breaks timed by your manager like some retail workers then you have it rough.

  • the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor

    I'm sure being hooked up to something similar to a polygraph doesn't make a developer more stressful at all. Was the fact that they had all this equiptment hooked up to them a factor in their statistics?

    • Checking the methodology section of the paper, they didn't feel it was necessary to include any sort of experimental control.

      Now it can be hard to come up with controls for this sort of experiment, when you test the ability of an algorithm that tests for kind of nuanced data, like "where in this block code might there be bugs?", but it should've at least gotten a mention in the conclusion that it wasn't comparative to other methods.

    • by Jeremi (14640)

      Was the fact that they had all this equiptment hooked up to them a factor in their statistics?

      Yes. When they compared their measurements against the measurements they gathered from the people they didn't measure, they took that bias into account.

      :^P

  • by Anonymous Coward

    I'm writing a networking multithreaded program in ASM that need to use very little memory, hence complicated registry magic and blatant violation of calling conventions.
    The sensors are going to freak the fuck out.

  • Now my boss is going to be watching the developer's eye movements instead of testing code... This will not end well.

    There is no magic bullet and where this might find the sections of code that your developer finds difficult to understand, it still isn't going to give you any idea about the quality of the code they produce. All you will know is how hard they concentrated when producing it.

    I remember when we watched SLOC, but it was of marginal value. Then it was logical edges and complexity which was somet

  • How about instead of spending a small fortune solving the handful of bugs caused by programmer typos, you spend that money on better requirements gathering, keeping specifications from changing constantly, and giving programmers time to actually unit-test and document their code?

    If you want some fancy tech so you can write a paper on it, make an "electro-stimulus behavior moderation band", strap it onto clients/managers, and give them fifty thousand volts whenever they say or do something stupid.

  • by nimbius (983462) on Tuesday July 22, 2014 @02:17PM (#47509781) Homepage
    When your developers cringe at a project, when they encounter a subroutine or callback that literally makes them groan, you've found exactly what needs to be refactored. if you find a python wrapper around a godforsaken class, or find explitives cursing a dead gods name in a forgotten universe, thats the code that needs your attention. Project managers, section leaders, whoever has direct line-of-sight communication with the dev pit needs to pay more attention.

    the problem is 'refactoring' is a lie. as a DevOps (christ i hate that fucking word) engineer, I've been faced with rotting festering codebases for years in my career on a daily basis. the issue is business priorities interfering with good coding practices. I and 2 junior devs might want to go rip up a few thousand lines of horror-code to make everyone more productive, but we get denied. why?:

    1. downtime is unacceptable for this application. this code controls so much, does so many things, and is so obscure (say it with me, payments processing subsystem) that to do ANYTHING to it is literally worse than pistol whipping the CEO's daughter.
    2. New New NEW! we need to get in those swim lanes and stand up in those scrums nice and straight so we can deliver optimum ROI to our dear customers! who cares if the system crashes 5 times a month because this module is satans petrified asscrack, google just launched their new $app so our new $cloud_app_pro needs to go live NOW!.
    3. we had the resources, but uber elite coders in our ranks were ganked to other projects months ago. they havent seen the code in 3 months, and we're sure they'll be along to help us again once they put in their 2 weeks and show up in flip flops for the knowledge transfer.
    4. you were ganked from the refactor project and are now plugging away at an irritating new web 9.0 cash money matic piece of code that marketing wont stop skullfucking and your boss cant deliver fast enough. Catch this rabbit though and you'll be able to sit down and think through...wait....what was the refactoring project about again? oh christ is that CVS?

    what this technology will get used for
    efficiency sampling in your dev groups. eye tracking and biometrics will now subtly be included in SCRUM/ITIL/six sigma/devops/management wankfest.
    • by drinkypoo (153816)

      1. downtime is unacceptable for this application. this code controls so much, does so many things, and is so obscure (say it with me, payments processing subsystem) that to do ANYTHING to it is literally worse than pistol whipping the CEO's daughter.

      Then you can't afford not to have a backup server and a development server. This point needs expansion :p

    • Once again we have some big sister/brother company/government claiming that they can do the impossible with biometric data. They don't address the primary source of the problems, which you lay out in detail.

      Why was security skimped on in the code? Funding.
      Why did funding get dropped? So that someone could get a bonus.
      Who was the person that had the demo code for security? Canned to save budget.
      Can't our Outsource code it? Not in their contract or business statement.

      None of those issues are the coder

  • by DrJimbo (594231) on Tuesday July 22, 2014 @02:18PM (#47509793)

    They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?

    • by bobbied (2522392)

      They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?

      Naw, they just used a really accurate ruler, made each measurement 10 times and averaged their results...

      You make an excellent point. There is no indication in the fine article about how accurate their results could be statistically, and given their really small sample size it doesn't seem likely 4 significant digits is justified.

    • by Anonymous Coward on Tuesday July 22, 2014 @03:26PM (#47510269)

      I think it the extra bogus precision has something to do with the conversion between Imperial and Metric developers.
      (Oh, and something something dark side.)

    • While I agree the extra precision is misleading, the real problem is that they don't provide confidence intervals. Even a rounded 80% could be misleading if it could fall between 60% and 90% due to sparse data.

      On the other hand, if in their report details they said "84.38% with a 99% confidence interval of 72.27% to 94.49%", then the extra precision is no longer necessarily misleading (it is just the calculational result of the model used) and, although it is a little pedantic and redundant, I would hav

  • The core implication here is that developers are the source of mistakes, and those mistakes must be minimized. Never mind that developers are also a source of productivity and innovation, and that dehumanizing them decreases both.
    • by gnoshi (314933)

      They are saying developers are the source of bugs (mistakes?), but not in the way you are suggesting. Developers are the source of bugs in that they write the code which includes the bug, and so it is not particularly surprising that you can read biometrics that indicate when developers are likely to produce code with bugs.
      For example, if the developer hasn't slept in two days and so has saggy eyes and wildly drifting eye movement then that's a pretty good indicator that there will be some bugs, and indeed

      • by J-1000 (869558)

        In summary: I don't necessarily think it is offensive to say that bugs are coded by developers, because they are. However, it is offensive to say that they are responsible for the bugs without taking into account the broader context in which they are working (and indeed, saying they are responsible for the bugs still doesn't necessarily mean that they are in some way wrong or deficient for entering a bug. People - even brilliant people - can and do make mistakes, and that is why review processes do (or shou

  • Should measure the interaction between developers and product managers before coding even starts - see how the developer responds to impossible requests, contradictory requirements, and meaningless buzzword filled descriptions...
  • Microsoft Research should also track how far the individual is working away from the main office of his company, because that has far more of an effect on bugs than any biometric reading. I recommend that they develop a special laser and a series of geostationary satellites and ground repeater stations. The total round trip time of the laser pulse will be a measure of how buggy the developer's code is.

    1) Microsoft Research is wasting an awful lot of money to conclude that the reason why Microsoft's software

  • Using the same methodology, I'd be interesting to analyze differences in performance/productivity of developers. I'd expect to see something like normal or log-normal curve.
  • by Charliemopps (1157495) on Tuesday July 22, 2014 @02:59PM (#47510093)

    Example? [businessinsider.com]

    That's where we're at?
    I see this garbage all the time... and I never understand it. Growing up my father ran a factory and was damned good at it. His people showed up on time, did great work, had low scrap and were very productive. How did he manage this amazing feat? Minders that followed you everywhere in the factory? Discreet blood samples taken hourly? No...

    They had stats. That's it. You produce X parts per week. Go way above that get a bonus, go way bellow that, you get fired. If everyone is getting bonuses the stats would rise... if everyone was getting low stats they'd first check for procedural problems that might be hindering peoples work and baring that they'd lower their expectations. It worked marvelously well... people would think of ways to go faster and bring them up... because it meant a bonus until everyone else caught on. Anything that made production harder was immediately reported because people wanted their bonus.

    Damn near every successful factory in the country works this way. Do the same thing for code. What my father always said was "They could spend 7hrs in the bathroom every day... I don't care... if they can come out at hit 1000% efficiency that last hour to make up the difference it's fine with me. But they better keep in mind their peers are going to eventually figure out how they pulled it off and change the curve."

    • The problem is that while widget A and being treated as fully equivalent to widget B, it's hard to compare one line of code to another line of code from a completely different program.

      If good programmer just means "spits out more lines of code than anyone else", I can write all sorts of gobbledy gook that add lots more code without really participating in the actual purpose of the program.

    • by jcochran (309950)

      And what stats do you apply to code development?
      Because quite frankly, that is the gist of the problem.

      • by mythosaz (572040)

        You simply use a longer timeline.

        Nobody worth their skin judges a poker player by who won the last pot -- but over time I can measure quite well who the good ones are.

        • by jcochran (309950)

          And you've still avoided naming any metrics....

          Number of lines of code? As mentioned earlier, one can easily inflate LOC with trash.
          Also how do you evaluate a programmer who actually reduces the lines of code in a program? By the LOC metric, said programmer is counter productive. Then again you get the beautiful quote by Ken Thompson... "One of my most productive days was throwing away 1000 lines of code."

          Code quality? Once again, how do you judge it?

          • by mythosaz (572040)

            I'm not avoid naming metrics.

            More analogies. You can judge an entry level grade-school musician by notes missed. He gets credit for showing up to class and not tooting his horn (literally) during the rests. Advancement beyond that requires someone who knows what a good musician sounds like, the subtleties and nuances in his play and style.

            You can judge entry-level programmers by lines of code, errors in their code, and other simple metrics. You take those numbers day by day, and like the poker player ab

    • Writing code is nothing like working in a factory producing identical components. It's more like designing a house, followed by an office building, a bridge, a power drill, a pacemaker, a roller coaster, a lunar lander, ...

      If every developer is doing their job perfectly, they will literally *never* write the same code twice. Every single task they do will be different from all of their previous tasks. So how do you measure their output?

  • by jeffb (2.718) (1189693) on Tuesday July 22, 2014 @03:15PM (#47510173)

    1) Arrogance. You know that average developers have a hard time with some kinds of code, but you're a superprogrammer, and you don't have those problems. If someone decides later that there's something wrong with your code, well, they should've gotten their requirements straightened out before they told you to go and build it. The only time you lose your cool is when you have to deal with idiot managers, analysts, or users.

    1) Complacency. You've been pounding on this code forever, and you just don't care any more. Yeah, there'll be bugs, people will yell, they'll get fixed. That's just the way development goes. Why get worked up about it?

  • lol what the f*ck!
    I quote from the Microsoft research paper:
    "In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult."

    15 developers is enough to reach a
  • To be hooked up to some device while we work to measure how likely we are to wind up at the top of the stack rank. It could be completely automated, if it determines you just wrote sucky code it generates a pink slip email and a robot carries you out the door. Instant better code!
  • While this information can be useful to software developers to help them optimize their own performance, it will likely prove detrimental to provide these metrics to managerial or production supervisors, as they typically only choose in intensifying the workload to improve efficiency. I would like to see more people tested and less emphasis on productivity, and more emphasis on how we can use this data to improve the experience of coding for the programmer. That would probably result in better code more rea
  • ..how many swear words are in the comments.
  • *Microsoft* is working on intrusive software to predict buggy code? I can do that without software - just point at the Microsoft campus, and any and all products....

                            mark

  • So, Microsoft Research has developed a method to tell when a programmer is in a condition that tends to create bugs. That's nice. What happens with this?

    I already know when I'm in a condition that tends to create bugs. It won't help there. It could be passed on to others, such as management.

    Now, is management going to take action to reduce the amount of time I'm more vulnerable to causing bugs, by improving the office environment or discouraging overtime or making reasonable deadlines? Or is manag

  • It's simple: Just install a program on your developers' computers that tracks how often (how many times in general, and for how long) the developer switches focus away from their IDE. If they're constantly googling, looking up reference docs or algorithms, etc., chances are they are doing something that's new, untested, uncharted territory for them. If they're just rattling off hundreds of SLOC at a time, while only needing IntelliSense as an aide, chances are most of it will work on the first attempt.

    Progr

Felson's Law: To steal ideas from one person is plagiarism; to steal from many is research.

Working...