Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
Python Government United States

Python Gets a Big Data Boost From DARPA 180

Posted by Soulskill
from the from-unclesam-import-money dept.
itwbennett writes "DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs. The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."
This discussion has been archived. No new comments can be posted.

Python Gets a Big Data Boost From DARPA

Comments Filter:
  • by solidraven (1633185) on Wednesday February 06, 2013 @03:18AM (#42806195)
    You're dead wrong, nothing quite beats Fortran in speed when it comes to number crunching. If you need to go through hundreds of gigabytes of data and performance is important there's only one realistic choice: Fortran. Python isn't fit to run on a large cluster to simulate things, too much overhead. And lets not forget what sort of efficiency you can get if you use a good compiler (Intel Composer). You won't find Fortran on the way out over here, it's here to stay!
  • by Anonymous Coward on Wednesday February 06, 2013 @03:46AM (#42806261)
    Short answer, Fortran has stricter aliasing rules so the compiler has more optimization opportunities. Long answer, see Stack Overflow [stackoverflow.com].
  • Re:Python 2 or 3? (Score:5, Informative)

    by SQL Error (16383) on Wednesday February 06, 2013 @03:53AM (#42806279)

    Both. The prebuilt "Anaconda" distro defaults to Python 2.7, but it also works with 3.3 and 2.6.

  • Re:Wrong language (Score:4, Informative)

    by SQL Error (16383) on Wednesday February 06, 2013 @04:09AM (#42806337)

    DARPA runs a lot of these research seed programs, putting a couple of million dollars into a bunch of different but related research projects. In this case the program budget is $100 million in total, and Continuum got $3 million for their Python work (Numba, Blaze, etc). Some of the program money may have gone to R as well; there's a couple of dozen research groups, but I don't have a full list.

  • by Anonymous Coward on Wednesday February 06, 2013 @04:10AM (#42806345)

    I guess the problem is that people who speak about Fortran actually think about FORTRAN. The last FORTRAN standard was from 1977, and that shows. After that, there had been no new standard and little new development until the Fortran 90 standard (note the different capitalization). Fortran 90 got rid of the old punch card based restrictions by giving it completely new, much more reasonable code parsing rules (it still accepts old form code for backwards compatibility, but you cannot mix both forms in one file because they are too different), gave it a full set of properly nesting flow control statements (actually that was one thing already commonly available as non-standard extension to FORTRAN), and added very powerful array processing, operator overloading, and modules (and probably a few other things I don't remember right now). Later versions even added object orientation (and probably a whole set of other things; I haven't really followed Fortran development beyond Fortran 90).

  • by Chrisq (894406) on Wednesday February 06, 2013 @06:09AM (#42806799)

    The entire point of Fortran is that it has difficult-to-deal-with aliasing rules that make the compiler more free to produce optimized code. That's why it is suitable for things that require every last bit of performance you can wring out of it. Today probably you can get the same thing with C or C++ provided you are prepared to use things like restrict, but it used to be you couldn't, so Fortran ruled certain topics.

    Python is an easy-to-use system with abysmal performance - expect 10-100x slowdown for code that runs in pure Python over a similar C version. If you can get things set up so Python is only gluing other C components together and the data never has to touch native Python data structures or loops, then performance will be fine, but now you aren't really coding in Python any more.

    The point is, the purpose of Fortran and the purpose of Python are entirely opposed. They are exactly the opposite of each other. So it boggles the mind how you can think that Python can be Fortran "done right". So much so that now I suspect I got trolled. Well done, sir.

    Yes I understand, and many people made the same point. However Fortran was for a lot of scientists and engineers the hammer to crack any nut. It was used for simple "try outs" where performance wasn't needed, simply because it was the language that Engineers knew. I think the same thing is happening with Python now, it is the first and sometimes only language that many engineers know. Now for the performance issue, it will not give the best performance but packages like SciPy and NumPy do give very good performance (arguably by using these libraries you are just using python to string c functions together, but it is properly integrated). Tests show that you are getting about a third of the performance of Fortran [nasa.gov], (with the exception of the Fortran DGEMM marix multiply which greatly outperforms Python and other Fortran variants). The typical engineering reaction to performance needs is to throw hardware at the problem, then optimise your algorithm, and only change language if absolutely necessary!

  • Re:Great. Just Great (Score:5, Informative)

    by sdaug (681230) on Wednesday February 06, 2013 @07:54AM (#42807227)

    Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community

    Open sourcing is a requirement of the XDATA program.

"There is nothing new under the sun, but there are lots of old things we don't know yet." -Ambrose Bierce

Working...