Why Scientists Are Still Using FORTRAN in 2014 634
New submitter InfoJunkie777 (1435969) writes "When you go to any place where 'cutting edge' scientific research is going on, strangely the computer language of choice is FORTRAN, the first computer language commonly used, invented in the 1950s. Meaning FORmula TRANslation, no language since has been able to match its speed. But three new contenders are explored here. Your thoughts?"
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:5, Interesting)
What amused me about the article was actually the Fortran versions they spoke about. F95? F03? F08? Let's be real: just about every Fortran code I've heard of is still limited to F77 (with some F90 if you're lucky). It just won't work on later versions, and it's deemed not worth porting over, so the entire codebase is stuck on almost 40 years old code.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:2, Interesting)
Seconded. And the legacy isn't necessarily just the source code. Many of the engineering industries using such codes have a relatively low turnover rate, meaning an older group of engineers and researchers with the most experience stick around for decades. Most of these folks used Fortran since college. It works for them, and they aren't concerned with any "new-fangled" languages that offer more features. Another reason I hear from these folks is that Fortran has powerful array slicing and indexing syntax not found in C, making big data manipulation simpler. Newer programming languages like Python have packages like NumPy which offer similar capabilities, but it's often a nightmare to translate hundreds of thousands of legacy code lines simply to "escape" Fortran. And there are decent bindings to Fortran that can be leveraged for many parallel computing packages (MPI), which means even less incentive to move up.
Newer folks entering the field often work under the tutelage or mentoring of these folks, and Fortran sticks around. Python is gaining usage in the scientific communities, and it's often coupled with mixed-language wrapping code like f2py or SWIG to access any legacy Fortran code for heavy number-crunching work. I've seen this recipe used successfully in parallel computing to detach some of the "administrative" aspects of scientific code into newer languages.
Key Reason (Score:5, Interesting)
Huge libraries of FORTRAN code have been formally proven. New FORTRAN code can be formally proven. Due the limitations of the language, it is possible to put the code through formal processes to prove the code is correct. In addition, again, as a benefit of those limitations, it is very easy to auto-parallelize FORTRAN code.
We're Not (Score:2, Interesting)
I saw this link bait the other day...
We're NOT using Fortran anymore...
Many of us at the National Labs do modern, object-oriented C/C++... Like the project I'm in charge of: http://www.mooseframework.org/ [mooseframework.org]
There are whole labs that have completely expunged Fortran in favor of C++... Like Sandia (http://trilinos.sandia.gov) who actually went through a period in the late 90s and early 2000s where they systematically replaced all of their largest Fortan computational science codes with C++.
Those places that don't use C++ use C like the awesome PETSc library from Argonne ( http://www.mcs.anl.gov/petsc/ [anl.gov] ) which actually employs an object-oriented scheme in C.
The big name modern codes that are getting run on the biggest machines are generally done in C and C++.
I don't see that situation changing anytime soon as there is simply a massive amount of C and C++ libraries that will continue to provide the engine for tomorrows codes. The trend i see happening most often is utilizing C and C++ libraries with Python glue for everything doesn't need raw speed.... I think that trend will continue.
Re:We're Not (Score:2, Interesting)
If you're using C++ for scientific math, then you deserve to have whatever credentials you may possess to be revoked immediately. No language should be used for scientific math that can produce different results based upon the version of library or platform it is compiled against.
You also cannot prove C++ code is good. You just can't. C++ is not deterministic, again, because the outcome depends on platform/library versions, compiler options, time of day, alignment of the planets, and many other factors. There is no way to say for certain that "Yes, this code will produce the correct results under all conditions."
The big name modern codes that are getting run on the biggest machines are generally done in C and C++ and producing incorrect results.
I have PoC code that I have used to prove that C++ can produce incorrect results based on factors other than the code itself, and at the level of significance as high as 10^-15. That is a completely unacceptable level of inaccuracy for scientific exploration.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:5, Interesting)
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:4, Interesting)
F77+extensions, usually DEC extensions. Very very few people ever used strict F77 with no extensions.
Some of the issues this causes are irritating bordering on unnerving. This we we discovered that g77 didn't care for treating INTEGER as LOGICAL. Used to be that there was no other way to specify bit operations, now it is precluded. Everybody's code has that, and there's really nothing intrinsically wrong or difficult to understand about it, but it was technically non-standard (although everyone's extensions permitted it) and it won't work on g77 - maybe only with the infamous -fugly flag.
Re:Because C and C++ multidimensional arrays suck (Score:4, Interesting)
Easily fixed with libraries like Eigen ( http://eigen.tuxfamily.org/ind [tuxfamily.org]... ) and many others.
That's the problem. There's no one way to represent a multidimensional array in C++. There are many ways. Which means math libraries using different ones are incompatible with each other. The last time I did a big number-crunching job in C++, I had four different array representations forced on me by different libraries.
Because the compiler has no clue what those array libraries are doing, you don't get basic loop optimizations that FORTRAN has had for 50 years.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:5, Interesting)
Re:It's the right tool for the job (Score:3, Interesting)
The python code I tried ran at half the speed of my C++ code for machine learning (mostly matrix crunching). The situation got worse for python when I could push C++ compute steps into compile time. Scientific modeling seem to need a lot of number crunching.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:4, Interesting)
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:0, Interesting)
blah blah... allowing people to write random loops and then...blah blah
This comment is either an ad hominem or completely ignorant. It's math, and there are no random loops. Loops are intentionally designed and required, or perhaps you missed out on several years of Calculus and Linear Algebra.
The reason Fortran is still around is that it's exceptional for math. Both ease of writing code to solve equations and it's precision when code is run.
Sure, Pascal had some advantage in certain expressions. Pascal was also horrible for I/O so great if you never wanted to do anything but solve an equation in memory. C is easier for some things too, but not when the equations get massive or require repetition.
Blabbing on and on about vector based GPUs is idiocy, because not everything uses trig where vector based processing is beneficial. I have no confidence you have ever seen math intensive code based on what you are talking about. Nile from their own page is # The Nile Programming Language ## Declarative Stream Processing for Media Applications and NOT a language for Math.
Legacy Programmers (Score:5, Interesting)
Also "legacy training". Student learns from prof. Student becomes prof. Cycle repeats.
Not really - even when I was a student we ditched F77 whenever we possibly could and used C or C++. The issue is more legacy programmers. Often the person in charge of a project is a older person who knows FORTRAN and does not want to spend the time to learn a new language like C (or even C++!). Hence they fall back into something more comfortable.
However by now even this is not the case. The software in particle physics is almost exclusively C++ and/or Python. The only things that I am aware of which are still FORTRAN are some Monte-Carlo event generators which are written by theorists. My guess is that as experimentalists even older colleagues have to learn C++ and Python to use and program modern hardware. Theorists can get by using any language they want and so are slower to change. Certainly it has probably been at least 15 years since I wrote any FORTRAN myself and even then what I wrote was the code needed to test the F77 interface to a rapid C I/O framework for events which was ~1-200 times faster than the F77 code it replaced.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:2, Interesting)
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:2, Interesting)
However most of the new and "cool" languages I've seen in the last ten years are all basic scripting languages
True point.
Re:Q: Why Are Scientists Still Using FORTRAN in 20 (Score:5, Interesting)
Wow, faster AND more accurate. They must use some mystical floating-point instructions that only Fortran compiler writers know about.
On PPC implementations, head-tail floating point is typically used for "long double"; this leads to inaccuracies in calculations. 80 bit Intel floating point is also inaccurate. So are SSE "vector" instructions, since denormals, NaNs, INFs, and -0 are always suspect unless you compiler emits an extra instruction in order to trigger the "next instruction after" signalling of the condition, and for NaNs, you are still somewhat suspect there.
If it isn't IEEE-754 compliant, you pretty much can't trust it. FORTRAN goes way the heck out of its way, including issuing additional instructions and introducing pipeline stalls, in order to force IEE-754 compliance.
Pretty much this accuracy only matters if you are doing Science(tm); if you are doing graphics, you are generally willing to eat the occasional FP induced artifact, because what you typically care about is the frame rate in your game, rather than being 100% accurate.
So, in closing, they're not using "some mystical floating-point instructions", they are just using accurate floating point, rather than approximate floating point.
Re:Popular has a lot to do with installed base... (Score:5, Interesting)
Isn't the main performance benefit that Fortran has always claimed over C/C++ the fact that an array is guaranteed to only be used from one thread at a time, and thus you don't have to re-read from memory to registers each time you want to do something with the data in the array? A capability that was formally added to C in C99 (and pretty much universally informally added to C++) with the restrict keyword?
Correct me if I'm wrong here, as I'm not a Fortran programmer.