Forgot your password?
typodupeerror
Programming IT Technology

The Power of the R Programming Language 382

Posted by samzenpus
from the much-better-than-Q dept.
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
This discussion has been archived. No new comments can be posted.

The Power of the R Programming Language

Comments Filter:
  • by Anonymous Coward on Wednesday January 07, 2009 @08:49PM (#26366395)

    I guess I was thinking of analysts using Excel to develop "complicated" statistical analyses. Sure, Excel is unbeatable at handling small, tabular datasets and doing basic or even considerable arithmetic with them.

    When it comes to do more elaborate analysis, using Excel IS reinventing the wheel. Plus, it is IMPOSSIBLE to understand later.

  • by refactored (260886) <cyent&xnet,co,nz> on Wednesday January 07, 2009 @09:02PM (#26366533) Homepage Journal
    I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.

    Actually that wasn't why I used R, just a fun addendum. The reason to use R is the huge body of statistics, data mining and graphics facilities. Superb.

    Of course, the problem with any statistical library is you have to turn your brain on first. Nothing produces "Garbage in Garbage out" quite like statistical analysis.

    With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming.

    Which is actually not a Bad Thing at all.

    The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.

  • by idiot900 (166952) * on Wednesday January 07, 2009 @09:15PM (#26366665)

    Actually it may not suck. But having used it on and off over the past few years while not being a statistics pro, I find the R language bletcherous and annoying. - as an assignment operator?

  • by colinrichardday (768814) <colin.day.6@hotmail.com> on Wednesday January 07, 2009 @09:32PM (#26366845)

    Has Microsoft corrected its percentile function? Or does it still put the largest datum in the 100th percentile, as well as assign fractional percentiles?

  • by garcia (6573) on Wednesday January 07, 2009 @09:39PM (#26366905) Homepage

    My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.

    Would you ask someone who utilizes SAS or SPSS to do the same thing? Because that's more or less what R is -- a free version of SAS or SPSS. I work in SAS all day long and I have been planning on using R to automate some of my personal website statistics/graphing that I run regularly because I don't really like doing the queries in MySQL on the console, copying the data to Excel, and graphing the results.

    As anyone knows, you should utilize the best tool for any particular job you're doing. There's no sense in recreating the wheel in C or Perl or Foo when R, SAS, SPSS, or whatever does stats, mining, and graphing well.

  • by fm6 (162816) on Wednesday January 07, 2009 @09:47PM (#26366977) Homepage Journal

    I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.

    That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion [wikipedia.org].

    I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book [mit.edu].

  • Re:Based on S (Score:5, Interesting)

    by Anonymous Coward on Wednesday January 07, 2009 @10:30PM (#26367289)

    I wish it had a more googleable name. It's hard to search for help. The signal to noise ratio is low.

  • by Anonymous Coward on Wednesday January 07, 2009 @11:12PM (#26367651)

    the thing is.. C is way too verbose for non-programmers. R, like Matlab/Octave, makes it more straightfoward to work if you don't care much about compiler/memory management/etc. details that software developers usually care for.

    sure, you can do whatever you want in C, .. heck, even assembler ;) but you have to understand that some people don't really care about data structures and complicated ways to declare objects: as always, one should use the level of abstraction that one feels comfortable with to deal with the specific "algorithmic needs", no?

  • by radtea (464814) on Wednesday January 07, 2009 @11:32PM (#26367769)

    (a) it allows for easy access of Fortran and C library routines

    (b) it allows you to pass large blobs of data by name

    (c) it makes it easy to pass data to and from your own compiled C and Fortran routines

    So, it's exactly like Python, except with an outdated 1970's syntax that was frankly pretty weird to start with :-)

    I've used R, and found it useful for some of its relatively esoteric capabilities, but currently use it almost exclusively via rpy now, the Python binding to R.

    Furthermore, I've been using it less in recent years as the native statistical capability of Python has continued to improve. I can appreciate that people who work strictly in data analysis could find R an appropriate tool, but as someone whose work spans multiple areas, from analysis to application design and development, R is too limiting a tool, and using it always feels a little alien and weird.

  • by Peaquod (1200623) on Wednesday January 07, 2009 @11:52PM (#26367941)

    I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.

    I'm sure you had plenty of loops in your code. They were just hidden via the use built in functions. Not that that's a bad thing.... just saying. You have to understand the mechanics of the calculations to use them properly, and over-reliance on built in functions can make it too easy to talk out of your ass.

  • by emok (162266) on Thursday January 08, 2009 @12:45AM (#26368301)

    You mentioned it towards the end of your post, but R's plotting functions should really be emphasized. Since I discovered them, I really can't stand to make plots using any other software.

  • Re:FUD from SAS (Score:3, Interesting)

    by belmolis (702863) <billposer.alum@mit@edu> on Thursday January 08, 2009 @01:04AM (#26368393) Homepage

    I guess nobody told her about how proprietary Excel is inferior to libre Gnumeric in having quite a few errors in its statistical functions and how when apprised of errors the Gnumeric folks fixed them quickly while the Excel folks either never fixed them, did it slowly, or introduced new errors? See the report [csdassn.org] by Drexel University statistician B. D. McCullough.

  • Re:Based on S (Score:2, Interesting)

    by Rhabarber (1020311) on Thursday January 08, 2009 @02:16AM (#26368771)
    Or that [google.com] if you don't have Javascript enabled.
  • by Undead Waffle (1447615) on Thursday January 08, 2009 @04:46AM (#26369435)

    I use LabView on a daily basis. I hate it.

    My coworkers like it and what they seem to have in common is that they either don't know any other languages or aren't proficient in them.

    It is a language that aims to be very simple by removing as much typed code as possible. Because of this you will spend stupid amounts of time moving little wires around and trying to make your code not look like a tangled mess. And good luck changing it later.

    Since there are no functions and the only way to reuse code is to put it in a different file people tend not to do this. So if you want to use part of someone else's code you will usually have to copy and paste into a different file and spend a bunch of time reconnecting wires and dealing with references to variables you won't have access to in the new file.

    The visual style is also, in my opinion, much harder to read than typed code. If I'm trying to figure out some sort of formula it's easier to read it as text than try to figure out where all these wires are coming from that are connected to little "+" and "-" terminals. Also, since comments take space they tend to be short and are usually missing in more complicated sections because it's harder to route the wires around them. And control structures quickly make code virtually unreadable.

    There's also the part about writing most of your code with a mouse. Do you really enjoy having to navigate through a series of menus to do anything?

  • by Undead Waffle (1447615) on Thursday January 08, 2009 @05:23AM (#26369575)

    It has plenty of other annoying behaviors.

    If you try to access an array element out of range it just gives you the default value for that data type rather than giving some indication that something is wrong.

    There is an option to automatically build an array as the output of a loop, but no way to make it *not* add a value to the array. Like when you hit a terminating condition for the loop or some value you want to skip. If you have these situations you either have to modify the array afterwards or build the array manually.

  • by syntaxglitch (889367) on Thursday January 08, 2009 @07:55AM (#26370225)

    Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.

    Not that difficult, really. The main problem is sequencing, which is provided by things like function composition. The problem is the unwieldy nature in languages like Scheme of specifying sequencing using pure functions, while also handling data that doesn't require sequencing; but this is a syntactic problem, not a practical one.

    The issue is handled admirably by the language Haskell, using a mathematical construct called a "monad" to allow an elegant way of handling sequencing--even a syntactic sugar "do" notation that looks vaguely imperative--while remaining 100% pure, unlike Scheme.

  • by dargaud (518470) <slashdot2&gdargaud,net> on Thursday January 08, 2009 @09:04AM (#26370575) Homepage
    I agree with your and the GP's analysis of LabView, which is why I use LabWindows: it's basically LabView's user interface connected to standard C, hence totally deterministic. But two things about LabView ensure its future: it's close to the way of thinking of electronicians and also it deploys very well on multicore processors (and probably on distributed architectures as well).
  • by golodh (893453) on Thursday January 08, 2009 @09:38AM (#26370853)
    Well yes and no.

    I agree with the first part of your post: to me R is something to code in when you have to, and to keep the resulting code as short and simple as possible. If I ever had to code a real application with a GUI that needed the statistical strengths of R, I would almost certainly not use R.

    On the other hand I'd probably use Java and link to R as a server (see my other post about R and Java) instead of using Python.

  • by zippthorne (748122) on Thursday January 08, 2009 @02:14PM (#26374607) Journal

    I was overly harsh. It has some good points. Just not compelling enough to bother to learn a whole new kind of language for, especially if you know about the powerful things in the matlab toolbox.

    But.. depending on what you're actually using it for next semester, I would venture to guess that you might not actually have to learn labview at all. My introduction to it was in a lab class where it was used to interface with a kind generic electronic projects daughter-board, and for the whole semester we never used it for anything more complicated than paring input to a log file (maybe ONE function filter) and output from a script.

  • by TheCouchPotatoFamine (628797) on Thursday January 08, 2009 @04:31PM (#26376525)
    I made a living once reaching into excel spreadsheets from python to do math because excel was bone slow. A process that took *three* hours in excel took python about 45 seconds. Maybe my loops were smarter (hint) but it was still a very very dramatic experience. just sayin'

The meat is rotten, but the booze is holding out. Computer translation of "The spirit is willing, but the flesh is weak."

Working...