Researchers Test Developer Biometrics To Predict Buggy Code 89
rjmarvin writes: Microsoft Research is testing a new method for predicting errors and bugs while developers write code: biometrics. By measuring a developer's eye movements, physical and mental characteristics as they code, the researchers tracked alertness and stress levels to predict the difficulty of a given task with respect to the coder's abilities. In a paper entitled "Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development," the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor to 15 developers as they programmed for various tasks. Biometrics predicted task difficulty for a new developer 64.99% of the time. For a subsequent tasks with the same developer, the researchers found biometrics to be 84.38% accurate. They suggest using the information to mark places in code that developers find particularly difficult, and then reviewing or refactoring those sections later.
Why is it always developers? (Score:3)
Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?
Re:Why is it always developers? (Score:4, Insightful)
The world hates putting up with buggy code.
Re: (Score:2)
On a more serious note, a single developer mistake can potentially affect millions of end users (in the case of an application like Windows). Therefore it makes sense to focus on the developers. "With great power comes great responsibility" and all that.
Re:Why is it always developers? (Score:5, Interesting)
Re: (Score:2)
Of course they get measured. In the long term if they deliver too many screwed up projects, their superiors stop giving them projects.
Ultimately it is the developer's responsibility to push back against stupid managers and give them honest feedback about what can and cannot be done.
Re: (Score:1)
I would like to know where the entrance is located to this magically meritocratic land you speak of, It is obviously not Earth.
Re: (Score:3)
Re: (Score:2)
Sadly, in my experience, they blame it on their team, and get promoted.
Bad managers are surprisingly good at making it look like someone else's fault.
Re: (Score:3)
It's because software sucks, and no one has any real idea what to do about it.
You are more right than you know. Where writing software is a skill that most can develop, the really good developers are more a cross between engineers and artists. They are more like architects, where the form and function are both of high importance because having software that "works" (in that it does everything required [engineering]) and having software that is "workable" (in that it is easy to use [artist]) are worlds apart. Finding developers that do both engineering and art is rare.
It's not just
Developers as novelists (Score:2)
Agreed.
When talking with non-developers about developers, I use the simile that developers are like novelists, who work out stories in their heads, and commit those stories to paper.
A novel contains a set of symbols which, taken collectively, and written correctly, form an impressive body of knowledge that can change the world. (Tolstoy's "War and Peace" is my usual example.)
But if the symbols are faulty -- if the book is badly wri
who the hell wrote this crap?!! (Score:5, Funny)
oh wait, that was my commit
Re: (Score:2)
Re:Why is it always developers? (Score:5, Insightful)
They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.
Developers are part of a very narrow segment which has no reliable Key Performance Indicators. [wikipedia.org]
Part of that is developers are smart enough to game any system, because they can think in algorithms.
Want to track productivity on Lines of code? Fine, Developers can do NO WORK, and produce TONS of code
Want to track productivity on Number of defects introduced? Fine, doing NO WORK is the baseline for perfect.
Want to track productivity on Number of defects fixed? Fine, through the magic of hand wavery, defects can be found and fixed with no actual work happening
Compare that to well-defined Key Performance Indicators for sales... Bring in X dollars of sales, your performance is X.
CEOs HATE things they cannot measure... which means CEOs are a natural enemy of Developers.
Re: (Score:3)
You do realize that we've done that over and over. We keep coming up with programs that do code generation. We introduced automatic programmers that represented machine codes with mnemonics and kept track of memory locations (these are now called assemblers). We introduced programs that would take scientific calculations (FORTRAN) and business logic (COBOL). We introduced "fourth generation languages" in the 1980s. All of these, and many more, were honest attempts to remove the need for programmers.
Re: (Score:1)
Your simplification of gaming the system can be applied to sales people too. It is not unheard of how some product manufacturers sell stuff to distribution channels and count it as sold, despite living in a warehouse or shop. Or how you can be an excellent talker and make people pay more for stuff they don't need (oh, this phone? no, you want the 5" screen!), but this can have detrimental effects, maybe the user pays more, feels cheated, and never comes to buy from the brand despite other products matching
Re: (Score:2)
I am thinking of a company* that sold customizable software to large customers who demanded the ability to customize. It also sold new features, which would have to be created by the development staff. At one point, the sales and development staff wound up in different cities, to the technical people weren't able to sit on the overenthusiastic sales people.
The sales people found that they could sell new features, and (a) it made it easier to close the sale, and (b) custom development cost money, so it
Re: (Score:2)
You see, as smart as the typical developer thinks he is, it doesn't take much for someone with real skills to bury them
Programming CAN be judged (Score:3)
They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.
Disclaimer: I am in hardware myself and may completely miss the point here. However, our software/firmware folks do agile programming involving dividing programming problems into pieces which are assigned to programmers, followed-up on large whiteboards and being daily discussed in "scrum meetings" etc. (I may be confusing some concepts here but that is of less importance). The point being that your statement, that programming is some sort of unique dark-art-which-cannot-be-measured-by-managers, appears unt
Re: (Score:3)
i hit 50 last sunday and been a developer since i can remember, and i still love my profession but the "guild" has changed an awful lot, from once being a peculiar bunch to the herd it mostly is today. most of my colleagues are much younger than me and ... what can i say ... they are often so brainwashed with corporate bullshit they break my heart almost daily. holy shit, they even blog about it! it's so depressing, it makes me wanna cry. :'(
sign'o'times, i guess. i can perfectly believe many of those sheep
Re: (Score:2)
I'm 40 now. I remember the late 1990s when I was young, as was everyone around me, and at a non-public facing reinsurance company, we had extra staff just doing pie-in-the-sky stuff no one was ever going to see. We got a lot done via inherent competence, I realize now that we were lucky, and we had budget.
In the early 2000s I led the design and development of a SOA rewrite of an existing VB6 app. We had an iDeisgn consultant come in for a week to get us started which was invaluable; but it was through lu
Re: (Score:3)
Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?
Mostly because they are a newer profession and a trickier one to quantify.
Time and motion studies, along with 'scientific management' were already a serious hit in terrifyingly invasive performance enhancement for blue collar labor around the turn of the 20th century(Taylor and the Gilbreths being the poster children, with many successors). The workers who haven't been replaced by robots yet are likely still subject to a descendant of it. Though less amenable to automation, service sector jobs are also r
Re: (Score:2)
I'm truly sorry, but an IT union isn't happening, until at least my generational cohort is out of the system.
A. Too many libertarians.
B. Too many people convinced of their own prowess and respect
C. None of us are at much physical risk
D. We get quite a bit more than a living wage, in general
Those factors add up to an insurmountable barrier, even if I personally think the idea is wise.
Re: (Score:2)
Is there any company on earth that treats its customers with more contempt than Microsoft?
Comcast? AT&T? Anyone associated with the MPAA/RIAA?
Re: (Score:2)
Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?
You really think developers are singled out here? There is an entire industry built upon business process improvement and operational management. I guess there probably are more invasive measures against professions like developers, but that is only because it has been hard to completely replace the profession with machines.
When you start having your bathroom breaks timed by your manager like some retail workers then you have it rough.
Hooked up to all the equiptment (Score:2)
the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor
I'm sure being hooked up to something similar to a polygraph doesn't make a developer more stressful at all. Was the fact that they had all this equiptment hooked up to them a factor in their statistics?
Re: (Score:2)
Checking the methodology section of the paper, they didn't feel it was necessary to include any sort of experimental control.
Now it can be hard to come up with controls for this sort of experiment, when you test the ability of an algorithm that tests for kind of nuanced data, like "where in this block code might there be bugs?", but it should've at least gotten a mention in the conclusion that it wasn't comparative to other methods.
Re: (Score:2)
Was the fact that they had all this equiptment hooked up to them a factor in their statistics?
Yes. When they compared their measurements against the measurements they gathered from the people they didn't measure, they took that bias into account.
Calibration (Score:1)
I'm writing a networking multithreaded program in ASM that need to use very little memory, hence complicated registry magic and blatant violation of calling conventions.
The sensors are going to freak the fuck out.
Re: (Score:2)
Wonderful....This won't be good... (Score:2)
Now my boss is going to be watching the developer's eye movements instead of testing code... This will not end well.
There is no magic bullet and where this might find the sections of code that your developer finds difficult to understand, it still isn't going to give you any idea about the quality of the code they produce. All you will know is how hard they concentrated when producing it.
I remember when we watched SLOC, but it was of marginal value. Then it was logical edges and complexity which was somet
Here's a better idea (Score:2)
How about instead of spending a small fortune solving the handful of bugs caused by programmer typos, you spend that money on better requirements gathering, keeping specifications from changing constantly, and giving programmers time to actually unit-test and document their code?
If you want some fancy tech so you can write a paper on it, make an "electro-stimulus behavior moderation band", strap it onto clients/managers, and give them fifty thousand volts whenever they say or do something stupid.
Comment removed (Score:5, Insightful)
Re: (Score:2)
1. downtime is unacceptable for this application. this code controls so much, does so many things, and is so obscure (say it with me, payments processing subsystem) that to do ANYTHING to it is literally worse than pistol whipping the CEO's daughter.
Then you can't afford not to have a backup server and a development server. This point needs expansion :p
This and more (Score:2)
Once again we have some big sister/brother company/government claiming that they can do the impossible with biometric data. They don't address the primary source of the problems, which you lay out in detail.
Why was security skimped on in the code? Funding.
Why did funding get dropped? So that someone could get a bonus.
Who was the person that had the demo code for security? Canned to save budget.
Can't our Outsource code it? Not in their contract or business statement.
None of those issues are the coder
64.99%, 84.38%, Really? (Score:5, Interesting)
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
Re: (Score:3)
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
Naw, they just used a really accurate ruler, made each measurement 10 times and averaged their results...
You make an excellent point. There is no indication in the fine article about how accurate their results could be statistically, and given their really small sample size it doesn't seem likely 4 significant digits is justified.
Re:64.99%, 84.38%, Really? (Score:4, Funny)
I think it the extra bogus precision has something to do with the conversion between Imperial and Metric developers.
(Oh, and something something dark side.)
Re: (Score:2)
While I agree the extra precision is misleading, the real problem is that they don't provide confidence intervals. Even a rounded 80% could be misleading if it could fall between 60% and 90% due to sparse data.
On the other hand, if in their report details they said "84.38% with a 99% confidence interval of 72.27% to 94.49%", then the extra precision is no longer necessarily misleading (it is just the calculational result of the model used) and, although it is a little pedantic and redundant, I would hav
Re: (Score:2)
I highly doubt you can write a non-trivial C or C++ program without bugs, or really any language for that matter. I'm not talking about thousands of lines either. 100 or so should do it. The fact that you don't mention any kind of requirements spec, perhaps with the aid of some CASE tools, or at least a testing and feedback method, c
This is offensive (Score:2)
Re: (Score:2)
They are saying developers are the source of bugs (mistakes?), but not in the way you are suggesting. Developers are the source of bugs in that they write the code which includes the bug, and so it is not particularly surprising that you can read biometrics that indicate when developers are likely to produce code with bugs.
For example, if the developer hasn't slept in two days and so has saggy eyes and wildly drifting eye movement then that's a pretty good indicator that there will be some bugs, and indeed
Re: (Score:2)
when developers talk to product managers? (Score:1)
Foreign workers (Score:2)
Microsoft Research should also track how far the individual is working away from the main office of his company, because that has far more of an effect on bugs than any biometric reading. I recommend that they develop a special laser and a series of geostationary satellites and ground repeater stations. The total round trip time of the laser pulse will be a measure of how buggy the developer's code is.
1) Microsoft Research is wasting an awful lot of money to conclude that the reason why Microsoft's software
Analyze difference in performance/productivity (Score:1)
uh huh (Score:3)
Example? [businessinsider.com]
That's where we're at?
I see this garbage all the time... and I never understand it. Growing up my father ran a factory and was damned good at it. His people showed up on time, did great work, had low scrap and were very productive. How did he manage this amazing feat? Minders that followed you everywhere in the factory? Discreet blood samples taken hourly? No...
They had stats. That's it. You produce X parts per week. Go way above that get a bonus, go way bellow that, you get fired. If everyone is getting bonuses the stats would rise... if everyone was getting low stats they'd first check for procedural problems that might be hindering peoples work and baring that they'd lower their expectations. It worked marvelously well... people would think of ways to go faster and bring them up... because it meant a bonus until everyone else caught on. Anything that made production harder was immediately reported because people wanted their bonus.
Damn near every successful factory in the country works this way. Do the same thing for code. What my father always said was "They could spend 7hrs in the bathroom every day... I don't care... if they can come out at hit 1000% efficiency that last hour to make up the difference it's fine with me. But they better keep in mind their peers are going to eventually figure out how they pulled it off and change the curve."
Re: (Score:2)
The problem is that while widget A and being treated as fully equivalent to widget B, it's hard to compare one line of code to another line of code from a completely different program.
If good programmer just means "spits out more lines of code than anyone else", I can write all sorts of gobbledy gook that add lots more code without really participating in the actual purpose of the program.
Re: (Score:2)
And what stats do you apply to code development?
Because quite frankly, that is the gist of the problem.
Re: (Score:2)
You simply use a longer timeline.
Nobody worth their skin judges a poker player by who won the last pot -- but over time I can measure quite well who the good ones are.
Re: (Score:2)
And you've still avoided naming any metrics....
Number of lines of code? As mentioned earlier, one can easily inflate LOC with trash.
Also how do you evaluate a programmer who actually reduces the lines of code in a program? By the LOC metric, said programmer is counter productive. Then again you get the beautiful quote by Ken Thompson... "One of my most productive days was throwing away 1000 lines of code."
Code quality? Once again, how do you judge it?
Re: (Score:2)
I'm not avoid naming metrics.
More analogies. You can judge an entry level grade-school musician by notes missed. He gets credit for showing up to class and not tooting his horn (literally) during the rests. Advancement beyond that requires someone who knows what a good musician sounds like, the subtleties and nuances in his play and style.
You can judge entry-level programmers by lines of code, errors in their code, and other simple metrics. You take those numbers day by day, and like the poker player ab
Re: (Score:2)
Writing code is nothing like working in a factory producing identical components. It's more like designing a house, followed by an office building, a bridge, a power drill, a pacemaker, a roller coaster, a lunar lander, ...
If every developer is doing their job perfectly, they will literally *never* write the same code twice. Every single task they do will be different from all of their previous tasks. So how do you measure their output?
This misses two of the biggest developer problems: (Score:4, Interesting)
1) Arrogance. You know that average developers have a hard time with some kinds of code, but you're a superprogrammer, and you don't have those problems. If someone decides later that there's something wrong with your code, well, they should've gotten their requirements straightened out before they told you to go and build it. The only time you lose your cool is when you have to deal with idiot managers, analysts, or users.
1) Complacency. You've been pounding on this code forever, and you just don't care any more. Yeah, there'll be bugs, people will yell, they'll get fixed. That's just the way development goes. Why get worked up about it?
Fixed It For You (Score:2)
lol (Score:1)
I quote from the Microsoft research paper:
"In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult."
15 developers is enough to reach a
Just what we need (Score:1)
Interesting but misguided (Score:1)
My metric to predict buggy code... (Score:1)
Let me get this straight... (Score:2)
*Microsoft* is working on intrusive software to predict buggy code? I can do that without software - just point at the Microsoft campus, and any and all products....
mark
And now what? (Score:2)
So, Microsoft Research has developed a method to tell when a programmer is in a condition that tends to create bugs. That's nice. What happens with this?
I already know when I'm in a condition that tends to create bugs. It won't help there. It could be passed on to others, such as management.
Now, is management going to take action to reduce the amount of time I'm more vulnerable to causing bugs, by improving the office environment or discouraging overtime or making reasonable deadlines? Or is manag
You don't need biometrics for that (Score:1)
It's simple: Just install a program on your developers' computers that tracks how often (how many times in general, and for how long) the developer switches focus away from their IDE. If they're constantly googling, looking up reference docs or algorithms, etc., chances are they are doing something that's new, untested, uncharted territory for them. If they're just rattling off hundreds of SLOC at a time, while only needing IntelliSense as an aide, chances are most of it will work on the first attempt.
Progr