Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Open Source Programming Python

Comparing R, Octave, and Python for Data Analysis 61

Here is a breakdown of R, Octave and Python, and how analysts can rely on open-source software and online learning resources to bring data-mining capabilities into their companies. The article breaks down which of the three is easiest to use, which do well with visualizations, which handle big data the best, etc. The lack of a budget shouldn't prevent you from experiencing all the benefits of a top-shelf data analysis package, and each of these options brings its own set of strengths while being much cheaper to implement than the typical proprietary solutions.
This discussion has been archived. No new comments can be posted.

Comparing R, Octave, and Python for Data Analysis

Comments Filter:
  • by eldavojohn ( 898314 ) * <eldavojohnNO@SPAMgmail.com> on Wednesday May 23, 2012 @03:11PM (#40092849) Journal
    So, you're linking a SlashdotBI article to the Slashdot front page?

    Well then [imgur.com].
    • If people thought Idle was bad, the Business Intelligence takes Slashdot an order of magnitude lower.

      How long until the BI editors demand outright access to the frontpage?

      • by Anonymous Coward

        It's even more moronic when you consider that the articles comments had more useful content than the actual article!

        It's no wonder that taco left... /. It was a nice ride but you've really fallen by the wayside in the last few years as in nearly irrelevant with late story postings and garbage like this one.

  • by ACK!! ( 10229 ) on Wednesday May 23, 2012 @03:24PM (#40093013) Journal
    The whole article was not much more than a high level review. The graphic naturally draws attention to the parameters the writer wanted to cover but he did not back up his graphic with any sort of serious textual review of what he felt were the weaknesses or advantages of the different programming language at least not in any detail.
    • by Ruie ( 30480 ) on Wednesday May 23, 2012 @03:32PM (#40093099) Homepage

      The whole article was not much more than a high level review. The graphic naturally draws attention to the parameters the writer wanted to cover but he did not back up his graphic with any sort of serious textual review of what he felt were the weaknesses or advantages of the different programming language at least not in any detail.

      And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB. Of all the three languages R is the only one to have a native format (data.frame) that interfaces easily to database queries. Both Octave (Matlab) or Python have to use compound types which make addressing difficult.

      Also, I found R easier to master than either Octave or Python, but this is probably because I am familiar with Lisp.

      • by csirac ( 574795 ) on Wednesday May 23, 2012 @10:19PM (#40096487)

        Through pandas [sourceforge.net], for a start. The SciPy/NumPy stack is quite nifty, I'm especially interested in how to apply it for working with irregular time series data.

        Not to say anybody should ditch R, I still support our researchers most weeks at work in using it. But it's not as clear-cut as you seem to think it is, especially in terms of memory efficiency.

      • by Anonymous Coward

        And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB.

        Dude. That's not what people mean when they say big data. HP and Dell will both quite happily sell you machines with 2TB of main memory, and SGI will go to 16TB, and anything which can fit in memory on a single machine without custom hardware isn't big data. It's only big data once you get up to a few hundred terabytes.

        • by Ruie ( 30480 )

          And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB.

          Dude. That's not what people mean when they say big data. HP and Dell will both quite happily sell you machines with 2TB of main memory, and SGI will go to 16TB, and anything which can fit in memory on a single machine without custom hardware isn't big data. It's only big data once you get up to a few hundred terabytes.

          Heh ! I am sure I can use R on such hardware, as long as I have access to it ;)

      • by plopez ( 54068 )

        If you know Lisp and OOP R is easy. Unfortunately Lsip has become arcane and most programmers I met did not understand OOP.

    • by Anonymous Coward

      he did not back up his graphic with any sort of serious textual review

      She [slashdot.org] is Geeknet's "Senior Director of Analytics".

    • by Anrego ( 830717 ) *

      Indeed. This is high level "meeting for the suits" bullshit. I can picture this showing up on powerpoint presentation.

      Here are your three options.. this is the one that sucks, this is the one that sucks for a different reason, and this is the one I want you to go with. Oh, and here is a chart with some pretty checkmarks and stuff to help clarify! Lets do lunch!

    • It's full of puff pieces and press releases.

      I think a lot of Slashdot readers (me included) would be interested to get an introduction in various practical aspects of analytics, especially with Open Source tools we can experiment with ourselves. SlashBI could be a good gateway for that. So far every article I have read there has seems like a waste of time.
    • by ceoyoyo ( 59147 )

      It wasn't even that. It came down to one of the last paragraphs:

      "In my [limited and misleading] experience...."

      Python isn't good at visualization? I guess the author has never used VTK-Python or Matplotlib. R isn't good with big data? I suppose that comes from R not having great database interactivity... so just feed it data via Python using rpy2.

  • by Anonymous Coward on Wednesday May 23, 2012 @03:26PM (#40093031)

    I wish there was also a column for availibility of resources for learning like: tutorials, free books, example code, etc ....

  • by vlm ( 69642 ) on Wednesday May 23, 2012 @03:28PM (#40093059)

    how analysts can rely on open-source software

    I've done that kind of stuff at work and those criteria are NEVER how a package is selected.

    If I need a commercial product I need all manner of signoffs requiring at least weeks of delay and massive IT involvement so they can insert it into windoze images automatically or whatever it is they do.

    If I'm doing FOSS it just ... gets done that day. No agony. And it just works, and instead of a call center script reader in India who can only tell me to reinstall the software over and over, with FOSS the "whole internet" is my support system and they as in the whole internet know what they're doing.

    Nothing about this has changed in about 15 years, so I'm not sure how this is "news". This would have been a good "news" story in the early/mid nineties.

    • by Anonymous Coward

      Even with proprietary software, the "whole internet" can support your system, it is bunk to say that only happens with FOSS.

      And to say it just works is bunk too, I see plenty of problems with FOSS where the "whole internet" has no f*cking clue other than, go to the source and figure it out yourself - not always a trivial exercise.

      But go ahead and and keep believing your own bullshit.

      • ... except the 'whole internet' often says "too bad, you'll have to wait for a fix" with proprietary software whereas "Oh, try this patch over here" often happens on FOSS instead.

    • This is a thinly veiled attempt to put Python on the same level as R. /shakes head/

      • by seanzig ( 834642 )
        Absolutely - we all know that Python is much greater than R. ;-) Seriously though, I know where he's coming from, but it really should have had better explanations regarding his ratings for each language. For example, if one uses the Visualization Toolkit (VTK, www.vtk.org), it has Python bindings. I think the author simply doesn't know about that.
      • Both! (Score:4, Insightful)

        by Kludge ( 13653 ) on Wednesday May 23, 2012 @06:32PM (#40094987)

        The best option is to use python and R, through rpy for example.
        R rocks for statistical libraries and good documentation.
        Python rocks for everything else.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      Besides, in research, using something opensource (or at the very least gratis) makes it that much easier for others to replicate what you did. Getting SAS scripts just isn't fun.

    • by Anonymous Coward on Wednesday May 23, 2012 @05:24PM (#40094369)

      I'm an astronomer. At this point in my career, I move to a new research institution every couple of years. Each institution may have a site licence for some piece of commercial software like IDL or Matlab, but I use free software (Python, in my case) because I know that I can keep using it, rather than rewriting all my scripts for a new language every time I move.

  • by Anonymous Coward

    n/t

  • More crap from /. (Score:5, Insightful)

    by NoMaster ( 142776 ) on Wednesday May 23, 2012 @03:35PM (#40093139) Homepage Journal

    "Here is a breakdown of R, Octave and Python ..."

    No there isn't - that's there is not much more than a shitty 'feature' table, too high level to be anything other than facile, which is "Based on [the author's] own user experience and research".

    As an student user of all 3 I would have been interested in reading a good comparative review or explanation aimed at outsiders. This ain't it; it's just more slashvertising.

    • by Anonymous Coward

      Yes, but the advantage of the author's approach is that it'd be real easy to extend the review to include Scilab.

  • by Anonymous Coward

    Sage math http://www.sagemath.org/

  • Julia? (Score:4, Informative)

    by Chrisq ( 894406 ) on Wednesday May 23, 2012 @04:05PM (#40093473)
    There was a previous article about Julia [slashdot.org] which looked cool. I wonder how this measures up
  • Oh.. (Score:3, Insightful)

    by Anrego ( 830717 ) * on Wednesday May 23, 2012 @05:02PM (#40094089)

    Now that's just desperation.

    Come on .. keep this shit in bi. Either it takes off or it doesn't.

  • My suggestion is to try all three, and see which offering’s toolbox solves your specific problems.

    Well no **** Sherlock!

  • I don't understand (Score:5, Informative)

    by utkonos ( 2104836 ) on Wednesday May 23, 2012 @08:33PM (#40095821)
    This article compares three languages that have different purposes. R's purpose is statistical analysis and visualization. Octave is a general mathematical analysis and visualization language. Python is a generalist language that has it's own focuses on code readability among other things.

    These languages also have a target audience. R is for statisticians and scientists. Octave is for mathematicians, and Python is for programmers.
  • I still don't get it. How can you compare specialized statistical and number crunching languages with a general purpose programming language.

Genius is ten percent inspiration and fifty percent capital gains.

Working...