Forgot your password?
typodupeerror
Python Government United States

Python Gets a Big Data Boost From DARPA 180

Posted by Soulskill
from the from-unclesam-import-money dept.
itwbennett writes "DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs. The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."
This discussion has been archived. No new comments can be posted.

Python Gets a Big Data Boost From DARPA

Comments Filter:
  • Great. Just Great (Score:1, Insightful)

    by Anonymous Coward on Wednesday February 06, 2013 @03:31AM (#42806011)

    The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data.

    Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

    For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.

  • by Kwyj1b0 (2757125) on Wednesday February 06, 2013 @04:07AM (#42806151)

    Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

    For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.

    So what you are saying is that DARPA funds will be used in a way to further the goals of DARPA/The government? Shocking. I haven't read anything that says which agencies will/won't have access to these tools - so I'd hazard a guess that any department that wants it can have it (including the famous three letter agencies).

    FYI, Continuum Analytics is a company that is based on providing high-performance python-based computing to clients. Any packages they might release will either be open source (and can be checked), or closed source (in which case you don't have to use it). They aren't hijacking the Numpy/Scipy libraries. They are developing libraries/tools for a client (who happens to be DARPA). (Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community). You do know that DARPA funds also go to improve robotics, they supported ARPANET, and a lot of their space programs later got transferred to NASA?

    Basically, I have no idea what you are ranting about. One government organization funded a project - it happens all the time. Do you rant about NSF/NIH/NASA money as well? If so, you'd better live in a cave - a lot of government sponsored research has gone into almost every modern convenience that we take for granted.

  • by LourensV (856614) on Wednesday February 06, 2013 @06:06AM (#42806555)

    You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation. Just getting something that does what they want is hard enough for them, which is why they use high-level languages like Matlab and R. If things are too slow, they learn to rewrite their computations in matrix form, so that they get deferred to the built-in linear algebra function libraries (which are written in C or Fortran), which usually gets them to within an order of magnitude of these low-level languages.

    If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit, 2) buy or rent a (virtual) machine with more CPU and more memory, or 3) hire a programmer to re-implement everything in a low-level language and so that it can run in parallel on a cluster. The third option is rarely chosen, because it's expensive, good programmers are difficult to find, and in the course of research the software will have to be updated often as the research question and hypotheses evolve (scientific programming is like rapid prototyping, not like software engineering), which makes option 3) even more expensive and time-consuming.

    So yes, operational weather forecasts and big well-funded projects that can afford to use it will continue to use Fortran and benefit from faster software. But for run-of-the-mill science, in which the data sets are currently growing rapidly, having a freely available "proper" programming language that is capable of relatively efficiently processing gigabytes of data while being easy enough to learn for an ordinary computer user is a godsend. R and Matlab and clones aren't it, but Python is pretty close, and this new library would be a welcome addition for many people.

  • by nadaou (535365) on Wednesday February 06, 2013 @07:41AM (#42806929) Homepage

    You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation.

    Which is exactly why FORTRAN is an excellent choice for them instead of something else fast (close to assembler) like C/C++, and why so many of the top fluid dynamics models continue to use it. It is simple (perhaps a function of its age) and because of that it is simple to do things like break up the calculation for MPI or tell the compiler to "vectorize this" or "automatically make it multi-threaded" in a way which is still a long from maturity for other languages.

    Can you guess which language MATLAB was originally written in? You know that funny row,column order on indexes? Any ideas on the history of that?

    R is great an all, and is brilliant in its niche, but how's that RAM limitation thing going? It's not a solution for everything.

    MATLAB is pretty good too, as is Octave and SciLab, and it has gotten a whole lot faster recently, but ever try much disk I/O or array resizing for something which couldn't be vectorized? Becomes slow as molasses.

    If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit,

    heh. I don't think you know these people.

    2) buy or rent a (virtual) machine with more CPU and more memory,

    Many problems are I/O limited and require real machines with high speed low latency network traffic. VMs just don't cut it for many parallelized tasks which need to pass messages quickly.

    Forgive me if I'm wrong, but your post sounds a bit like you think you're pretty good on the old computers, but don't know the first thing about FORTRAN and are feeling a bit defensive about that, and attacking something out of ignorance.

  • Re:Python? (Score:2, Insightful)

    by Anonymous Coward on Wednesday February 06, 2013 @01:21PM (#42810171)

    Okay, look. I used Octave for a long time on Linux and on Windows. On Linux (Ubuntu) it generally worked rather well and I used it for classwork where possible. On Windows, it works well as long as you don't need to plot anything. I can't tell you the number of times I installed/uninstalled various versions of Octave on Windows to find out that the plotting was broken in some way. MATLAB is great until you run in to licensing issues.

    Then I found out about the combination of IPython/Numpy/Scipy/Matplotlib, which now all seems to fall under the name of "Scipy". It runs circles around Octave in just about every way, except that the syntax doesn't try to be matlab compatible. The plotting isn't as good as MATLAB's plotting, for large data sets, but for 99% of use cases, it works quite well, and for that other 1% I've been able to reduce my data set or view the data differently. Where "Scipy" destroys Octave and MATLAB is that in the same language as I do scientific computing, I have access to database libraries, asynchronous networking, good HDF5 support, GUI toolkits, multithreading, multiprocessing, etc. This is because Python is a computer language that makes it easy to integrate or "glue" things together. To the point that people created and glued really some really good numerical processing and plotting libraries. Saying "Fine then use Octave" is ridiculous because it ignores how much better "Scipy" is than Octave. Also, with Anaconda CE, you get a bunch of useful packages installed by default, available as 64-bit on every major OS. I understand that Octave is maintained by volunteers and that Numpy/Scipy have some degree of financial backing, but they're both open source, and I'm going to use the open source option that is more polished. If you don't explicitly care about trying to adhere to matlab syntax(which mathworks continually tries to break, anyways), then I don't know why someone would choose Octave over Scipy.

Facts are stubborn, but statistics are more pliable.

Working...