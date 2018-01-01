Which Programming Languages Are Most Prone to Bugs? (i-programmer.info) 93
An anonymous reader writes: The i-Programmer site revisits one of its top stories of 2017, about researchers who used data from GitHub for a large-scale empirical investigation into static typing versus dynamic typing. The team investigated 20 programming languages, using GitHub code repositories for the top 50 projects written in each language, examing 18 years of code involving 29,000 different developers, 1.57 million commits, and 564,625 bug fixes.
The results? "The languages with the strongest positive coefficients - meaning associated with a greater number of defect fixes are C++, C, and Objective-C, also PHP and Python. On the other hand, Clojure, Haskell, Ruby and Scala all have significant negative coefficients implying that these languages are less likely than average to result in defect fixing commits."
Or, in the researcher's words, "Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages static typing is also somewhat better than dynamic typing."
Brainfuck
I have never heard of a large scale production problem happening in a application written in brainfuck. So by that metric it is not really error prone.
In before Fractal of Bad Design (Score:3)
You already have to be a genius to understand functional languages, so of course those people make fewer mistakes.
I love it when functional fans insist it's more analogous to how the brain really thinks. That's why so few people can figure out how to do things that way.
I don't know about that. As originally an assembly language programmer, I had greater difficulty with Pascal than with Scheme.
I think the concept of code and data being equivalent was an easier transition from assembly to Scheme.
Assembly to C was quite easy as well.
Pascal and ANSI C are very similar, but pre-ANSI C is a completely different beast, far more similar to BCPL. In fact, ANSI C could almost be described as Pascal with C syntax.
Pre-ANSI C didn't have prototypes - it assumed any undeclared name was an external function. It didn't automatically convert int to long if the function expected it, etc. - you had to explicitly cast. You had to be careful to cast results of functions correctly, too. All it had was a set of rules for how argument types were stacked, and it was up to you not to pass something a function wasn't expecting. This is closer to assembly language programming than Pascal or ANSI C.
And one way around it was to declare the function before it was used - that did help a bit sometimes. Not always, but sometimes - and depending on compiler.
But Pascal, C, Java, C++ and Ada all belong to the same programming paradigm, with or without the object orientation twist.
Then we have Basic and Fortran which sometimes have the paradigms from Pascal et.al. but also other ways not related to them.
Cobol is in turn an animal of itself.
And then you can turn to Erlang, Haskell and Prolog for yet another way
... Pascal, whose sole flaw is using verbose keywords like "begin/end" instead of the concise "{ }".
ROTFL!
http://www.lysator.liu.se/c/bw... [lysator.liu.se]
Compared to how horrible the original C compilers were (no type checking of parameters), Pascal is a huge improvement.
Let's look at the issues discussed in the paper:
That is very much like saying that because Microsoft Visual Studio lets you write in C++.NET, there has been a lot of unfair badmouthing of C++. Pointing to one vendor's proprietary extensions (which essentially make a new language) doesn't make the general complaint wrong.
So you are saying that the years since BWK wrote that article have given us even more reasons to dislike Pascal, such as the fact that the only versions that are useful are either dead for 20+ years (Turbo Pascal) or need vendor-proprietary extensions (Delphi)?
Re: (Score:3)
Which is hardly surprising, since C is just PDP11 assembler tidied up a bit.
Which is interesting considering the PDP11 was "a hardware Fortran machine", and the i386 architecture is a close copy of the PDP11 - the 286 even copied the early (variable page size) PDP11 memory management scheme that was the fashion before people figured out it was not really good for virtual memory, and then the 386 copied the later (fixed page size) PDP11 MM which is needed for virtual
Nope.
http://csapp.cs.cmu.edu/3e/doc... [cmu.edu]
The i386 architecture is not a close copy of the PDP11. You might be thinking of the 68000 which is a more plausible candidate.
C has been ported to most recent processors mainly because it was needed. The fact that it is relatively easy to port gives the lie to your assertion that it is "Macro-11 tidied up".
And, by the way, RISC sort of has displaced CISC. Modern CISC processors like the x86 and
Different languages allow for different language-specific bugs.
However the worst bugs aren't the language specific bugs - it's system design bugs where the designer don't understand the business case that shall be solved.
The programming language bugs are usually found either by hand or by tools like FindBugs, Splint and other similar tools. But system design bugs are all in the brain of people - and some people have a very strange brain wiring resulting in "perfect solutions on non-existing problems".
Functional programming is not more complicated then other imperative programming styles.
But unfortunately functional languages like Haskell often have a strange syntax, that is all.
And my 50 line Perl version is much more concise than both of those, but my co-workers keep complaining that it's actually modem line noise!
Because being concise in a programming language is a secondary benefit of maintaining software. The primary benefit is the ability to come back and modify it, without having to start an effort as great as that of creating the software new. Perl is fast in the wrong parts (the typing) and slow in the wrong parts (thinking). That's why it sucks, and when you get tired of code golf, you too can join the rest of the software developers.
That was the joke. If you were less of a humorless scold, maybe you would have realized that the bit about co-worker complaints was a flag to indicate where the costs of that tradeoff come in.
One occasionally does have use for write-only source code, where the source code is small enough that changing the purpose of the program would inherently require rewriting such a large proportion of the code that it is just as efficient to start from a blank file, but that is seldom going to be the case for even a 50
Most people need unnecessary small changes in program flow with copious context, and preferably description in intermediate variables and functions. This was at the heart of the Perl exodus and Lisp excommunication.
Not sure about Perl but Lisp thrives on intermediate variables and functions. You most likely don't want to write large expressions in Lisp without intermediate variables, unless those expressions are auto-generated (which, admittedly, is a programming style in itself).
2018
Is that really the case? GNAT is part of GCC isn't it, which allows you to use the output and the libraries without counting as a derived work.
Mainstream languages, duh (Score:1, Insightful)
" On the other hand, Clojure, Haskell, Ruby and Scala"
Yeah, those are niche languages. Good, autistic programmers seek out niche languages and write clean code since they're the only ones working on it.
The mainstream languages are the ones you do at work, with a couple shitty coworkers, and an endless amount of scope creep and impossible deadlines that creates a spaghetti nightmare.
I know this is
/. and RTFA is simply not done, but if you read the sample, you would see that Ruby was 2nd in developer count (after C), 4th in project counts (JS, python, C) and 5th in commits. Ruby represented 9.6k of the 28k developers; about a third. So, not so niche. In fact, C and Ruby together represented 22.4k of the 28k.
Java had about a third as many developers as Ruby with the same project count.
What is interesting is that Typescript had a 0.15 coefficient compared to Javascript's 0.03, meaning it
Yep, that was my thought when I read the article. Software used by a wide audience will generally require a lot more bug fixes than similar software used by only a few. Users find the damndest ways to use and abuse software. The more users, the more things they want changed and the more actual bugs they identify.
Face it -- most of the world runs on C, C++ and, increasingly, Python. Of course there are lots of bug fixes. And BTW I loath C++. IMO C++ code is almost always unreadable except possibly by i
Complexity (Score:2, Insightful)
Or could it be that the software written in C++ usually tends to be large complex software where performance is important along with various other complicating factors. While the software written in ruby for example tends to be simpler?
Sounds like this 'study' started with a conclusion already in mind.
That was my first thought. It turns out that they ranked popularity by number of stars.
When you think of popular pieces of open source written in each language, what do you think of? Here are the ones from the paper:
C projects: Linux, git, php-src.
Python projects: Flask, django, reddit.
JavaScript projects: Bootstrap, jquery, node.
Java projects: Storm, elasticsearch, ActionBarSherlock
If you didn't know, ActionBarSherlock is a piece of Android infrastructure. Given that, these all seem very reasonable.
How abo
ActionBarSherlock is a replacement for a piece of Android Architecture to backport functionality that existed in 4.0+ to 2.2+ (or thereabouts). Its been deprecated for years, as nobody writes for those old versions, and Google has been releasing their own backporting libraries for years. So no, its not reasonable.
Fair enough, I didn't know that. Still, as Meat Loaf said, 2/3 ain't bad.
Re: (Score:3)
1993 just called and wants it C vs. C++ flamewar back.
Be fair. The argument still made sense before C++11 or so.
C++ is generally faster than C these days, though ofcourse you can fuck it up like anything in C++ if you don't know what you are doing, or do and are just trying to "prove" your incorrect point.
Re: (Score:3)
Probably because a huge amount of C++ code is in massive embedded systems which are closed source and you can't see the code. I can't either in the general case, but in the cases I have seen, the quality of code is roughly what you would expect from the quality of the team. (In all probability, larger teams will need one or more specialists to do the more complex
a huge amount of C++ code is in massive embedded systems
You mean the systems that use "C++" instead of C++? As in, with significant restrictions? I'm reasonably sure the quality of those has more to do with the discipline of the programmers than with anything else.
Or could it be that the software written in C++ usually tends to be large complex software where performance is important along with various other complicating factors. While the software written in ruby for example tends to be simpler?
Sounds like this 'study' started with a conclusion already in mind.
Yeah. Another possible conclusion from this data is that C++ is more commonly or easily debugged, and thus more bugs are found and fixed, where they are left unfixed in the other languages.
No one else has pointed out the old saw that there are three kinds of lies (lies, damned lies, and statistics)? Or that one could -- and someone actually did -- literally write a book on "How to Lie with Statistics"?
Any reasonably aged language (Score:1)
Despite every example online... it is possible to write clean perl. That said, I've not seen any reasonable dbi example. It's all the same bad garbage over and over. This led all the Python followers to declare there is no good perl and no bad Python.
Several years later, indeed there is now lots of bad Python in existence! As time marches on there are even more examples of this behavior.
I would argue any language of sufficient age will have an equally large sum of poor examples.
Languages can sing and it tak
Haskell and C++ programmers are different. (Score:4, Interesting)
Something the linked article didn't seem to address it that the population for each language will differ. The average Haskell programmer is going to be very different from the average C++ programmer, or, god forbid, the average Python programmer.
Also, while they did try to address problem domains, I don't think they addressed systemic issues. For historical reasons, there are many projects which use C or C++ simply because of what they need to interface with to get the job done. For instance, there simply aren't going to be that many browser projects which aren't written in C++.
Personally, I think the interesting take-home is not the difference between languages, it's how small the number of commits for security and memory issues was.
*Difficulty of task* (Score:1)
More like difficulty of task.
If your coding in C, you can pretty much guarantee you're doing something low level, complex threaded and difficult.
C++ and it could be some major app.
If you're using a nice fluffy wrapped language, it will often be used for some office 'form' style application.
The idea that bugs stem from the programming language and not from the complexity of the task being tackled is bogus.
Haskell and C++ programmers are different.
Also, while they did try to address problem domains, I don't think they addressed systemic issues.
I don't think they do: none of them have things like zero overhead abstractions, zero cost memory allocation and so on. And some of them (like go) lack the kind of abstractions present in many modern languages.
For instance, there simply aren't going to be that many browser projects which aren't written in C++.
Of the three remaining extant enignes: Firefox, Webkit/Blink and Edge and Trident all except firefox are written in C++. Firefox is partly Rust now.
Rust I think is one of the very very few languages aimed a the same problem domain as C++ by people who understand enough C++ to know what the problem domain was. Look for example at Pike's rants on GO and how was designed to replace C++ and didn't: many C++ programmers sikmmed the features and said something like "oh that'll make my program slowe, more verbose, buggier and harder to write". Rust on the other hand is the same machine model as C++ but with a very very different type system.
It's never going to replace C++ across the board that's for sure but it's proven capable of replacing C++ in a niche where formerly there were no contenders.
I'd venture that the small number of commits for security issues is because many developers 1) don't mark issues as security issues (security not being foremost in their mind) and 2) many developers can't recognize issues as affecting security (which is even scarier).
Conclusions only valid on Open Source Projects?
This is an interesting study, but I don't know if the results can be extrapolated to include closed source software.
My problem with this is that I don't see any evidence of:
a) Projects in the study have a published project plan with somebody managing it at a high level (I would think the Linux Kernel could be thought of as having a plan with strong central management ). I tend to believe that projects in which multiple individuals (with varying levels of understanding of the software, the app's background and issues experienced during development) would be at a much lower quality level than something managed by a strong, continuous team - this doesn't seem to be a consideration when I RFTA (popularity of projects seems to be a bigger issue).
b) Different development tools used by different developers. In terms of the C/C++ typing issues, Windows software developed and built in Visual Studio, Eclipse Text Editor with MinGW or something like Komodo Edit with Cygwin and user written make files will identify different typing issues and may generate code that works differently, especially in regards to identifying and handling typing issues. I would like to know how many bug fixes are the result of something that isn't flagged and works fine on VS and doesn't work when built in MinGW, leading to a fix.
b.1) I'm not 100% sure of the methodology used in this study, but wouldn't a file that originally had tabs for indentation that an editor automatically changes it into spaces be misidentified as a "fix" if it's uploaded back into the repository? This is a combination of b) and c).
c) Different coding styles. I know of several Open Source projects in which a developer has re-formatted code simply because they don't think it's in the "correct" style and they have difficulty reading it resulting in them changing it so they can follow it better. To be fair, I'm sure a lot of us have done that because some people have very different and strongly felt ideas about how code should be formatted.
d) Lack of formal testing methodologies. I don't think many Open Source projects have strong, automated regression testing processes and methodologies before allowing a new release.
e) Difference in functional use of different languages. I would think that methods written in C, C++ and Objective C would be providing more low-level functionality than Clojure, Haskell or Scala. Ruby probably fits somewhere between the two groups.
Comments?
2) C++
3) PHP
4) Javascript-based fameworks
5) Anything used to write an Excel or Word macro by the HR department
This is an unfair comparison: PHP specifically targets producing buggy products, and in the unlikely event that an HR compartment gets anything to work, it is even more unlikely to involve a computer.
Do javascript problems actually get fixed? Based on my experience with our ever deteriorating Internet, bugs in javascript "programs" live on forever.
Python
I know I'll get flamed for this, but Python is really error-prone in a particular area, and that's its ridiculously weak name resolution rules. In a language like C, Perl, or even PHP, names are resolved during the compile phase. The compiler knows which definition of a name is going to be used at any point. Python doesn't have this - when it runs across a name, it walks up the scope hierarchy looking for a candidate.
This means that code can run happily for months or even years, until it just crashes with an undefined name error. This could be because of a rarely-used code path with a typo in it, botched refactoring of a rarely-used code path, or a particular set of rare circumstances where a global name isn't set before the code gets to a certain place.
The usual response is that unit tests should catch this. But let's face it, 100% unit test coverage is pretty rare, particularly for the kind of fast turnaround stuff that Python's frequently used for. Also, unit testing isn't necessarily going to simulate a corner case where a global doesn't get set before code that uses it executes. It also makes refactoring more risky because there's no point where the compiler can tell you you're referencing a name that's no longer defined, or no longer has a certain method/field.
This is the kind of area where it's really useful if the compiler can help you, and Python's ridiculously weak name resolution rules make that completely impossible.
I don't think Python's name resolution is a major source of bugs. But I don't disagree that its name resolution sucks. At least for me using Python 2. I've always assumed that the problem is that I'm too damn dumb to understand whatever brilliant scheme underlies its name resolution/namespace logic.
age of code
Did they control for age of code? For older code you are likely to get more bugs as it becomes more likely no one exactly remembers how it works and the person modifying it introduces some logic error for a weird corner case. Theoretically tests should prevent this, but most projects lack this kind of comprehensive testing and usually just have main case scenario covered with perhaps some common special cases.
Re: age of code
FTFA:
"Project age is included as older projects will generally have a greater number of defect fixes; the number of developers involved and the raw size of the project are also expected to affect the number of bugs and finally the number of commits is bound to."
APL Still A Write Only Language
source http://wiki.c2.com/?AplLanguag... [c2.com]
[6] L(L':')L,L drop To:
[7] LLJUST VTOM',',L mat with one entry per row
[8] S1++/\L'(' length of address
[9] X0/S
[10] LS((L)+0,X)L align the (names)
[11] A((1L),X)L address
[12] N0 1DLTB(0,X)L names)
[13] N,'',N
[14] N[(N='_')/N]' ' change _ to blank
[15] N0 1RJUST VTOM N names
[16] S+/\' 'N length of last word in name
As mangl
Actually thinking about it, it was always easier to just code a new function than try to read someone else's old stuff.
For whatever reason APL always reminds me of Arthur C Clarke's classic story "The Nine Billion Names of God." If anyone ever writes a readable APL program perhaps the stars in the sky will, without any fuss, go out.
Languages aren't error prone. Programmers are
The base concept is bulls**it on its own.
It's more like spoken or written human languages to me:
You need to study, learn and practice before being proficient.
If you think that you need a fast solution, then the language you know the best is among the right solutions.
Assembly isn't more error prone than English.
It just depends whether you are or not an idiotic programmer or a easy-going speaker.
"Errors should never pass silently"
Python program can be very self-diagnostic. Something goes wrong, it presents as an exception traceback from an uncaught exception.
A lot of bug reports I get go like this: Someone sends me a screenshot with a traceback, I look up the line of the error, find that the error is obvious, fix it, commit the fix, and I still have time for a cup of coffee before 5 minutes have passed. The reporter may not be happy because they can't get on with their work until I cut a new version, but other than that this sort of bug is of very little consequence: no data files have been corrupted or anything like that.
Then there's the other kind of bug, the subtle kind where everything seems to be working fine, but someone checked the output and it just isn't right: the totals on the report don't add up or something. These are the hard ones. And then you have to dig in and hypothesise and experiment and bisect and so on. Of course those bugs happen in Python programs as well.
But I bet the kind of bugs that put Python over average are the first kind, and that Python is below average on the second kind. Which is a good tradeoff.
Even worse are horrid bugs (think "buffer overflow") that in practice result in minor performance degradation (still well within the requirements).
Or, my favorite so far - using an unitialized variable that by complete coincidence is always zero at this point in this compile run, and zero is the value t
No, that is a shitty tradeoff. If easily spotting a bug when someone tells you where to look is the most common situation, you are writing too much obviously bad code, and you should look at every line you write as a possible bug and write more (or better) unit tests.
In the group I work with, if you look at code that gets committed to shared repositories, the most common Python errors involve runtime detection of errors on code paths that only run in uncommon situations: in particular, use of undefined fun
Stupid comparison.
Comparing PHP with Scala is like comparing "Game of Thrones" with "Ulysses".
Any n00b can program something useful in PHP within an hour. That's the whole point of PHP. That's why we have such amazingly feature complete systems like WordPress. Given, the architecture of these PHP systems is so bizarre any reasonably seasoned programmer will not believe his eyes when he looks at the actual code - but it does work (most of the time) and it is useful.
Scala is a programming language that forces you to know what
C, Perl, PHP, JavaScript, C++
Not counting auto* script clusterf*ck
;-)
Are you complaining about the auto keyword or is "auto script clusterf*ck" some sort of distributed brainfuck?
typing
Anyone that does integration work and has to glue together a dozen different systems written in a variety of languages will lean towards strong typing. Most of my co-workers that start out favoring weak-typing or duck-typing quickly realize maintaining code written by multiple people in weak-typing is a nightmare. Especially in an enterprise environment where every project "should have been delivere
18 years worth of code?
That in of itself makes the results next to useless.
In particular, considering C++ pre 2011 and after (c++11) as the same language from a prone-to-bugs POV is ridiculous. Sure, since it's backwards compatible you can continue to shoot yourself in the foot like it's 2010 (or 2000 - sheesh!) if you really want to, but if you're using C++ nowadays and having problems like memory leaks or dangling pointers then YOU are the problem, not the language.
I'm sure other languages have similar issues - if you don't use