The Power of the R Programming Language 382
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
Only for certain kind of analyst... (Score:2, Insightful)
... most others keep thinking that M$ Excel is the silver bullet.
Sad, but f****** true.
popular? no (Score:5, Insightful)
Growing in use? sure.
Re:Only for certain kind of analyst... (Score:5, Insightful)
... most others keep thinking that M$ Excel is the silver bullet.
The folks I know who use Excel for analysis use it because it's the package that everyone gets in their organization, there's a shit load of material on the web that uses excel, there's plenty of add-ons for it (no need to reinvent the wheel), and when sharing data and analysis, everyone is familiar with it. An engineer I know who uses excel chose it because it was the fastest way to connect to his testing equipment. R is relatively new and as more folks come into the workforce who know it, we'll see it replace Excel for functions that it is better suited for.
Show me some example code (Score:5, Insightful)
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Include reasons to support the notion that the R language is [necessarily] better at what it does.
Re:Show me some example code (Score:5, Insightful)
FTA
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.""
Seriously, does this person know what she is talking about?
1. Yes, CFD and Structural Analysis software is increasingly written using open source tools and run on open source OS (Linux running on clusters)
2. SAS is not used to design any part of the aircraft.
I have noticed SAS uses the same kind of FUD to counter R as M$ uses to counter Linux.
Free as in beer (Score:3, Insightful)
Very true. This is what I try to explain to people when they can't understand why some software is given away gratis. Because if they charged for it, given the current attitudes of the market, they wouldn't stand a chance and wouldn't ever get any market share to begin with.
Re:Show me some example code (Score:5, Insightful)
Let's see, Director of technology product marketing. I'm gonna go with a big NO.
FUD from SAS (Score:4, Insightful)
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Wow...talk about FUD. Does SAS imdemnify against plane crashes?
Re:Not a language, really (Score:1, Insightful)
Calling R a programming language is like calling Mathematica or Matlab a language. R is a system for statistical tasks that has a language and snytax, and but it is not capable of producing stand-alone executables that do not require the entire R environment.
So, you're saying java, js, python, perl, and ruby aren't programming languages?
Re:Not a language, really (Score:5, Insightful)
Are you kidding me? Are you really *(*$@#ing, Grade A kidding me?
Python/Perl/Ruby require interpreters. Scheme and Lisp are frequently run within interpreters. "stand-alone executable" require HARDWARE. Any programming system requires *something* underneath it unless you are programming in a purely physical system like an automated abacus with mechanical gears that buzz and whirr.
Programming languages are defined by their Turing completeness: can they do things repeatedly, can they assign values to memory locations and perform some basic set of operations (nand works nicely), can they make decisions. Everything else is fluff.
Perl has "fluff" that handles regular expressions very well.
Python (and others) have "fluff" that make networking and database ops easy.
R has "fluff" that makes it terribly convenient to work with data.
Matlab has "fluff" that makes it very easy to do numerical methods programming.
Mathematica has "fluff" that makes it very easy to do symbolic computation.
Each and every one of these, and most well-known languages, with all their warts and beauty marks are Turing complete and are deserving of the term "programming language".
Regards,
Mark
Re:Show me some example code (Score:5, Insightful)
I have no idea how i would start to code that in C, python, etc. in a way that's remotely efficient ;)
How about:
#include <clapack.h>
dgesdd( argument list );
This sort of thing is a feature of libraries, not an inherent advantage of one language.
Re:Only for certain kind of analyst... (Score:3, Insightful)
Do analysts who use R get better returns than those who use Excel?
Re:Show me some example code (Score:3, Insightful)
But we already have a language that does vectors correctly. It's called Matlab and it's based on Fortran, which I guess technically also does vectors correctly, if you want to bother to learn it.
Re:Not a language, really (Score:4, Insightful)
Your comment is absolutely wrong.
http://en.wikipedia.org/wiki/Programming_language [wikipedia.org]
R is a Turing complete programming language. The fact that it requires an interpreter is completely irrelevant.
Re:Show me some example code (Score:5, Insightful)
One big advantage R has over Matlab (er, besides the fact that R is OSS, but of course there's Octave for those who want an OSS Matlab alternative) is that R handles non-matrix data structures much, much better than Matlab does. Trying to work with anything that isn't a vector or a matrix in Matlab is an exercise in pain.
Re:Not a language, really (Score:4, Insightful)
Re:Not a language, really (Score:3, Insightful)
I would argue that GP is confusing "programming language" with "general-purpose programming language".
I bet even SQL is Turing-complete, but I wouldn't want to do more than database operations with it.
Re:Show me some example code (Score:3, Insightful)
Okay, I'll take you up on that... here's some code that takes in a vector of genotypes (as a factor with levels AA,AC,CC,XX), and a matrix of columns to be used for different bootstraps, and spits out a list of genotype counts for those bootstraps:
## matmap -- maps a vector onto a matrix of indexes to the vector
## (a hack to get round something that R doesn't seem to do by default)
matmap <- function(vector.in, matrix.indices){
res <- vector.in[matrix.indices];
if(is.null(dim(matrix.indices))){
dim(res) <- c(length(matrix.indices),1);
} else {
dim(res) <- dim(matrix.indices);
}
return(res);
}
## generate table based on genotype frequencies
GTcounts <- function(in.genotypes, columns.pop){
gt.table <- apply(matmap(in.genotypes,columns.pop),2,tabulate, nbins = 4);
rownames(gt.table) <- levels(in.genotypes);
return(gt.table);
}
Out of the [imperative] languages I know, only octave/matlab have a chance out doing better than that in terms of lines of code. And when you're writing code, being able to avoid duplication and mindless for loops is a really useful feature.
Re:Freak your colleagues out with "no loop" code.. (Score:1, Insightful)
That sounds extremely weird: if a program has a stack, then it has a state - the location on the stack is still state. Thus, if you use recursion, you still have state. I mean, you can try to hide the fact that you have state, but I don's see how you can have a program without state.
Even the wizard book appears to have a chapter on state: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-19.html#%25_chap_3 [mit.edu] , but, unlike your description, instead of talking about a program without state, it considers two kinds of state: the state of objects, or the state of streams of data.
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Re:Show me some example code (Score:4, Insightful)
Re:Only for certain kind of analyst... (Score:5, Insightful)
Nothing really (Score:4, Insightful)
Labview is well designed for its intent. So someone with minimal programming skills can sit down and get something done in a short amount of time. Would I use it for crunching numbers or collecting terabytes of data, probably not. But its sure damn handy if you want to interface test equipment and get results. Its all about the best tool for the job.
You have to play with it. (Score:3, Insightful)
You have to play with it. As with APL you'll either love it or hate it.
If you like the idea of a language that includes relational tables as a primitive data type, that extends most operators to do the right thing when you feed them vectors and matrices, that has linear regression and equation solving built-in, you'll probably like R.
Re:Only for certain kind of analyst... (Score:3, Insightful)
Pfft. Matlab is the fastest way to connect to his testing equipment.
One of MATLAB's few redeeming features is the Instrument Control Toolbox, especially since it works well with most of the top-end Agilent/Tektronix kit. It's nice to be able to automate acquisition and analysis of instrument data from a single environment.
Re:Only for certain kind of analyst... (Score:3, Insightful)
Since there are no functions and the only way to reuse code is to put it in a different file people tend not to do this.
Oh yeah, and when you do so, you have to draw an icon to represent the function instead of just giving it a name, and many people don't do this (even though the icon could just be text in a box). And since the data-flow nature of the language also eliminates most intermediate variables, you end up with code that is nothing but unlabeled lines drawn between generic-looking boxes. In otherwords, the semi-self-documenting nature of function and variable names is lost because those names don't exist, or are not shown.
Just another example of how LabView makes it easy for people to write bad code while being far more time-consuming to write good code than other languages.
Re:Show me some example code (Score:1, Insightful)