Forgot your password?
typodupeerror
Programming Science

Why Scientists Are Still Using FORTRAN in 2014 634

Posted by timothy
from the why-change dept.
New submitter InfoJunkie777 (1435969) writes "When you go to any place where 'cutting edge' scientific research is going on, strangely the computer language of choice is FORTRAN, the first computer language commonly used, invented in the 1950s. Meaning FORmula TRANslation, no language since has been able to match its speed. But three new contenders are explored here. Your thoughts?"
This discussion has been archived. No new comments can be posted.

Why Scientists Are Still Using FORTRAN in 2014

Comments Filter:
  • by Nemyst (1383049) on Friday May 09, 2014 @09:05PM (#46963955) Homepage
    This. I have many friends in the physics dept and the reason they're doing Fortran at all is that they're basing their own stuff off of existing Fortran stuff.

    What amused me about the article was actually the Fortran versions they spoke about. F95? F03? F08? Let's be real: just about every Fortran code I've heard of is still limited to F77 (with some F90 if you're lucky). It just won't work on later versions, and it's deemed not worth porting over, so the entire codebase is stuck on almost 40 years old code.
  • by Anonymous Coward on Friday May 09, 2014 @09:06PM (#46963967)

    Seconded. And the legacy isn't necessarily just the source code. Many of the engineering industries using such codes have a relatively low turnover rate, meaning an older group of engineers and researchers with the most experience stick around for decades. Most of these folks used Fortran since college. It works for them, and they aren't concerned with any "new-fangled" languages that offer more features. Another reason I hear from these folks is that Fortran has powerful array slicing and indexing syntax not found in C, making big data manipulation simpler. Newer programming languages like Python have packages like NumPy which offer similar capabilities, but it's often a nightmare to translate hundreds of thousands of legacy code lines simply to "escape" Fortran. And there are decent bindings to Fortran that can be leveraged for many parallel computing packages (MPI), which means even less incentive to move up.

    Newer folks entering the field often work under the tutelage or mentoring of these folks, and Fortran sticks around. Python is gaining usage in the scientific communities, and it's often coupled with mixed-language wrapping code like f2py or SWIG to access any legacy Fortran code for heavy number-crunching work. I've seen this recipe used successfully in parallel computing to detach some of the "administrative" aspects of scientific code into newer languages.

  • Key Reason (Score:5, Interesting)

    by stox (131684) on Friday May 09, 2014 @09:16PM (#46964015) Homepage

    Huge libraries of FORTRAN code have been formally proven. New FORTRAN code can be formally proven. Due the limitations of the language, it is possible to put the code through formal processes to prove the code is correct. In addition, again, as a benefit of those limitations, it is very easy to auto-parallelize FORTRAN code.

  • We're Not (Score:2, Interesting)

    by friedmud (512466) on Friday May 09, 2014 @09:39PM (#46964121)

    I saw this link bait the other day...

    We're NOT using Fortran anymore...

    Many of us at the National Labs do modern, object-oriented C/C++... Like the project I'm in charge of: http://www.mooseframework.org/ [mooseframework.org]

    There are whole labs that have completely expunged Fortran in favor of C++... Like Sandia (http://trilinos.sandia.gov) who actually went through a period in the late 90s and early 2000s where they systematically replaced all of their largest Fortan computational science codes with C++.

    Those places that don't use C++ use C like the awesome PETSc library from Argonne ( http://www.mcs.anl.gov/petsc/ [anl.gov] ) which actually employs an object-oriented scheme in C.

    The big name modern codes that are getting run on the biggest machines are generally done in C and C++.

    I don't see that situation changing anytime soon as there is simply a massive amount of C and C++ libraries that will continue to provide the engine for tomorrows codes. The trend i see happening most often is utilizing C and C++ libraries with Python glue for everything doesn't need raw speed.... I think that trend will continue.

  • Re:We're Not (Score:2, Interesting)

    by Anonymous Coward on Friday May 09, 2014 @09:47PM (#46964151)

    If you're using C++ for scientific math, then you deserve to have whatever credentials you may possess to be revoked immediately. No language should be used for scientific math that can produce different results based upon the version of library or platform it is compiled against.

    You also cannot prove C++ code is good. You just can't. C++ is not deterministic, again, because the outcome depends on platform/library versions, compiler options, time of day, alignment of the planets, and many other factors. There is no way to say for certain that "Yes, this code will produce the correct results under all conditions."

    The big name modern codes that are getting run on the biggest machines are generally done in C and C++ and producing incorrect results.

    I have PoC code that I have used to prove that C++ can produce incorrect results based on factors other than the code itself, and at the level of significance as high as 10^-15. That is a completely unacceptable level of inaccuracy for scientific exploration.

  • by K. S. Kyosuke (729550) on Friday May 09, 2014 @10:04PM (#46964217)
    APL-style languages should be even more optimizable, since they use higher-order array operators that make the control flow and data flow highly explicit without the need to recover information from loopy code using auto-vectorizers, and easily yield parallel code. By this logic, in our era of cheap vector/GPU hardware, APL-family languages should be even more popular than Fortran!
  • by Brett Buck (811747) on Friday May 09, 2014 @10:12PM (#46964237)

    F77+extensions, usually DEC extensions. Very very few people ever used strict F77 with no extensions.

            Some of the issues this causes are irritating bordering on unnerving. This we we discovered that g77 didn't care for treating INTEGER as LOGICAL. Used to be that there was no other way to specify bit operations, now it is precluded. Everybody's code has that, and there's really nothing intrinsically wrong or difficult to understand about it, but it was technically non-standard (although everyone's extensions permitted it) and it won't work on g77 - maybe only with the infamous -fugly flag.

     

  • by Animats (122034) on Friday May 09, 2014 @10:33PM (#46964357) Homepage

    Easily fixed with libraries like Eigen ( http://eigen.tuxfamily.org/ind [tuxfamily.org]... ) and many others.

    That's the problem. There's no one way to represent a multidimensional array in C++. There are many ways. Which means math libraries using different ones are incompatible with each other. The last time I did a big number-crunching job in C++, I had four different array representations forced on me by different libraries.

    Because the compiler has no clue what those array libraries are doing, you don't get basic loop optimizations that FORTRAN has had for 50 years.

  • by K. S. Kyosuke (729550) on Friday May 09, 2014 @10:37PM (#46964371)
    Well, we live in a somewhat different world today, given that suitable HW for that is virtually everywhere. But just to be clear, I'm not suggesting anyone should adopt APL's "syntax". It's more about the array language design principles. Syntax-wise, I'd personally like something along the lines of Nile [githubusercontent.com], with math operators where suitable, and with some type inference and general "in-language intelligence" thrown into the mix to make it concise. I realize that depriving people of their beloved imperative loops might seem cruel, but designing the language in a way that would make obvious coding styles easily executed on vector machines seems a bit saner to me than allowing people to write random loops and then either hope that the vectorizer will sort it out (they're still very finicky about their input) or provide people with examples what they should and shouldn't be writing if they want it to run fast.
  • by jbo5112 (154963) on Friday May 09, 2014 @11:02PM (#46964473)

    The python code I tried ran at half the speed of my C++ code for machine learning (mostly matrix crunching). The situation got worse for python when I could push C++ compute steps into compile time. Scientific modeling seem to need a lot of number crunching.

  • by K. S. Kyosuke (729550) on Friday May 09, 2014 @11:15PM (#46964543)
    Blitz++ is hardly the pinnacle of what should be possible with proper array languages. Think of what you could do with higher-order operators - for example, interprocedural loop fusion becomes trivial, and one could probably come up with many other operations optimizable accross procedure/function/subroutine (whatever you want to call it) boundaries as well. Blitz++ was neat but it can't beat a dedicated compiler for an array language (by which I most certainly don't mean stateful loopy Fortran). Although I agree that the C++/C interoperability is a huge plus.
  • by Anonymous Coward on Friday May 09, 2014 @11:35PM (#46964615)

    blah blah... allowing people to write random loops and then...blah blah

    This comment is either an ad hominem or completely ignorant. It's math, and there are no random loops. Loops are intentionally designed and required, or perhaps you missed out on several years of Calculus and Linear Algebra.

    The reason Fortran is still around is that it's exceptional for math. Both ease of writing code to solve equations and it's precision when code is run.

    Sure, Pascal had some advantage in certain expressions. Pascal was also horrible for I/O so great if you never wanted to do anything but solve an equation in memory. C is easier for some things too, but not when the equations get massive or require repetition.

    Blabbing on and on about vector based GPUs is idiocy, because not everything uses trig where vector based processing is beneficial. I have no confidence you have ever seen math intensive code based on what you are talking about. Nile from their own page is # The Nile Programming Language ## Declarative Stream Processing for Media Applications and NOT a language for Math.

  • Legacy Programmers (Score:5, Interesting)

    by Roger W Moore (538166) on Saturday May 10, 2014 @12:01AM (#46964717) Journal

    Also "legacy training". Student learns from prof. Student becomes prof. Cycle repeats.

    Not really - even when I was a student we ditched F77 whenever we possibly could and used C or C++. The issue is more legacy programmers. Often the person in charge of a project is a older person who knows FORTRAN and does not want to spend the time to learn a new language like C (or even C++!). Hence they fall back into something more comfortable.

    However by now even this is not the case. The software in particle physics is almost exclusively C++ and/or Python. The only things that I am aware of which are still FORTRAN are some Monte-Carlo event generators which are written by theorists. My guess is that as experimentalists even older colleagues have to learn C++ and Python to use and program modern hardware. Theorists can get by using any language they want and so are slower to change. Certainly it has probably been at least 15 years since I wrote any FORTRAN myself and even then what I wrote was the code needed to test the F77 interface to a rapid C I/O framework for events which was ~1-200 times faster than the F77 code it replaced.

  • by phantomfive (622387) on Saturday May 10, 2014 @12:05AM (#46964727) Journal
    I think the problem is the keyboard. They don't make 'em anymore.
  • by phantomfive (622387) on Saturday May 10, 2014 @12:05AM (#46964731) Journal

    However most of the new and "cool" languages I've seen in the last ten years are all basic scripting languages

    True point.

  • by tlambert (566799) on Saturday May 10, 2014 @03:52AM (#46965283)

    Wow, faster AND more accurate. They must use some mystical floating-point instructions that only Fortran compiler writers know about.

    On PPC implementations, head-tail floating point is typically used for "long double"; this leads to inaccuracies in calculations. 80 bit Intel floating point is also inaccurate. So are SSE "vector" instructions, since denormals, NaNs, INFs, and -0 are always suspect unless you compiler emits an extra instruction in order to trigger the "next instruction after" signalling of the condition, and for NaNs, you are still somewhat suspect there.

    If it isn't IEEE-754 compliant, you pretty much can't trust it. FORTRAN goes way the heck out of its way, including issuing additional instructions and introducing pipeline stalls, in order to force IEE-754 compliance.

    Pretty much this accuracy only matters if you are doing Science(tm); if you are doing graphics, you are generally willing to eat the occasional FP induced artifact, because what you typically care about is the frame rate in your game, rather than being 100% accurate.

    So, in closing, they're not using "some mystical floating-point instructions", they are just using accurate floating point, rather than approximate floating point.

  • by Rei (128717) on Saturday May 10, 2014 @05:42AM (#46965469) Homepage

    Isn't the main performance benefit that Fortran has always claimed over C/C++ the fact that an array is guaranteed to only be used from one thread at a time, and thus you don't have to re-read from memory to registers each time you want to do something with the data in the array? A capability that was formally added to C in C99 (and pretty much universally informally added to C++) with the restrict keyword?

    Correct me if I'm wrong here, as I'm not a Fortran programmer.

"For the man who has everything... Penicillin." -- F. Borquin

Working...