Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Programming Stats Math Open Source Python

R Throwdown Challenge 185

theodp (442580) writes "'R beats Python!' screams the headline at Prof. Norm Matloff's Mad (Data) Scientist blog. 'R beats Julia! Anyone else wanna challenge R?' Not that he has anything against Python, Matloff adds, but he just doesn't believe that Python or Julia will become 'the new R' anytime soon, or ever. Why? 'R is written by statisticians, for statisticians,' explains Matloff. 'It matters. An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same. It would likely be missing some things of interest to the practicing statistician. And R is Statistically Correct.'"
This discussion has been archived. No new comments can be posted.

R Throwdown Challenge

Comments Filter:
  • Bad analogy (Score:5, Insightful)

    by Florian Weimer ( 88405 ) <fw@deneb.enyo.de> on Sunday May 25, 2014 @08:39AM (#47086797) Homepage

    An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

    You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

  • R itself is okay, but even as a long-time user I don't think the language or environment itself is all that much to brag about. What makes it great for statistics is just that statisticians use it, which means that a lot of the packages are written by statisticians. That makes a big difference: recent papers often have R implementations, standard problems have well-maintained R packages for them with all the bells and whistles, etc. As Matloff notes, this means they often have everything that statisticians are looking for, while straightforward textbook implementations you often find in other languages often aren't nearly as thorough in how they handle the statistical models, or only handle some special cases (though there are some really good packages in other languages, just not as many).

    But I don't think that has much to do with R itself being uniquely suited to statisticians. It's used for historical reasons: Bell Labs S was influential in the field way back when nothing like Python or Julia existed, and statisticians started using it because it was a lot nicer than Fortran, which is what other areas of science mostly used back then. GNU R is essentially a free-software workalike for Bell's S, and it's kept most of the community on board through a mixture of existing packages, familiarity, and inertia.

  • by nurb432 ( 527695 ) on Sunday May 25, 2014 @09:25AM (#47086907) Homepage Journal

    Use the right tool for the job and stop bashing other tools that were designed for different jobs .

  • by jythie ( 914043 ) on Sunday May 25, 2014 @09:46AM (#47086955)
    Hrm. I never thought about the whitespace requirements in python from an accessibility perspective.
  • Re:Bad analogy (Score:5, Insightful)

    by professionalfurryele ( 877225 ) on Sunday May 25, 2014 @11:14AM (#47087297)

    Sorry but I use both R and python in my work as a biomechanist and while I love working with python and hate working in R, R is not only less verbose for this task, but it is more consistent, intuitive and better documented. Very few languages beat python for simple, easy to read code, but it is not up to the task of doing general purpose statistics. To see why this is the case consider a problem with that blog post. All the diagnostic plots I need to do to check the regression are missing, no qq, no cook's, not even something simple like fitted vs. residual. Now consider what happens when I notice that while the fit is decent the residuals depend on what subject I'm looking at and I need to vary the error term. Or need to switch to a mixed effects model because there is clearly a dependence on the intercept by subject.
    Seriously when i say I hate R, I mean it. The code is ugly, it can be hard to read and woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do. It is still better than python for statistics.

  • DSLs (Score:4, Insightful)

    by jbolden ( 176878 ) on Sunday May 25, 2014 @01:35PM (#47088051) Homepage

    He's probably right. All other things being equal a good Domain Specific Language will crush a General Purpose Language in its domain. If Julia is much faster than R and that were unfixable it would still be far easier to write a library in Julia accessible by R than to train R users in all of Julia's concepts.

    General purpose languages can sometimes get close to DSLs in effectiveness and then the greater diversity of users creates an economy of sacle and deep entrenchment which drives DSLs away. But then with a large and highly diverse user base the General Purpose language isn't able to rapidly adapt so DSLs spring up to fill niches. Some of those DSLs become incredibly successful and start to move into other domains diversifying their purpose and user base to become General Purpose Languages and the cycle repeats.

  • by jonnyj ( 1011131 ) on Sunday May 25, 2014 @04:46PM (#47088985)

    Completely right.

    We use R extensively in work. Programmers talk about R's libraries, but that's not the real reason we use it. The killer blow is that the _documentation_ is written by statisticians. That means that it's reliable, easy to understand, and honestly tells you the pitfalls of the techniques you're using.

    We're financial guys who are doing stuff in consumer finance that has rarely, if ever, been done in our field. The statistics aren't particularly advanced, but it's impossible to hire someone who understands the industry and knows the statistics already. Statistics text books tend to either be so basic that you already know what they say, or so advanced that you need a PhD to understand them. On the other hand, much of the R documentation is beautifully simple to read, and comes with brilliant worked examples - albeit from fields that are very different from our own. Whenever we're researching potential new statistical approaches, we find blogs stuffed full of examples written in R.

    In short, the R ecosystem makes you a better statistician. Julia and Python can't offer that.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...