The Power of the R Programming Language 382
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
Only for certain kind of analyst... (Score:2, Insightful)
... most others keep thinking that M$ Excel is the silver bullet.
Sad, but f****** true.
Re:Only for certain kind of analyst... (Score:5, Insightful)
... most others keep thinking that M$ Excel is the silver bullet.
The folks I know who use Excel for analysis use it because it's the package that everyone gets in their organization, there's a shit load of material on the web that uses excel, there's plenty of add-ons for it (no need to reinvent the wheel), and when sharing data and analysis, everyone is familiar with it. An engineer I know who uses excel chose it because it was the fastest way to connect to his testing equipment. R is relatively new and as more folks come into the workforce who know it, we'll see it replace Excel for functions that it is better suited for.
Re:Only for certain kind of analyst... (Score:4, Interesting)
I guess I was thinking of analysts using Excel to develop "complicated" statistical analyses. Sure, Excel is unbeatable at handling small, tabular datasets and doing basic or even considerable arithmetic with them.
When it comes to do more elaborate analysis, using Excel IS reinventing the wheel. Plus, it is IMPOSSIBLE to understand later.
Re:Only for certain kind of analyst... (Score:4, Informative)
Re: (Score:3, Interesting)
Re:Only for certain kind of analyst... (Score:5, Informative)
Sorry, but R is not relatively new, it's been around for at least 10 years, I was taught how to use R at University back in 2001, and S and later S+ (which R is a FOSS version of) has been around for even longer, since the mid 70's.
Re:Only for certain kind of analyst... (Score:4, Funny)
--
This is where my sig would go if I had one...
Re:Only for certain kind of analyst... (Score:5, Funny)
So we can the financial crisis on idiots who don't understand that GIGO applies in EVERY computer language?
No, but we can the dropping of verbs on idiots who don't understand that they apply to EVERY sentence!
Re:Only for certain kind of analyst... (Score:4, Interesting)
Has Microsoft corrected its percentile function? Or does it still put the largest datum in the 100th percentile, as well as assign fractional percentiles?
Re:Only for certain kind of analyst... (Score:5, Informative)
Pfft. Matlab is the fastest way to connect to his testing equipment.
Well.. Labview, actually, but no one in their right mind would want to actually use it. Anyway, simulink gets you a lot of the graphical programming features if you need that.
Re: (Score:3, Insightful)
Pfft. Matlab is the fastest way to connect to his testing equipment.
One of MATLAB's few redeeming features is the Instrument Control Toolbox, especially since it works well with most of the top-end Agilent/Tektronix kit. It's nice to be able to automate acquisition and analysis of instrument data from a single environment.
Re:Only for certain kind of analyst... (Score:5, Insightful)
Labview sucks the most (Score:5, Informative)
Labview is utterly non-deterministic in its execution. The execution order of blocks does NOT follow the data flow of the lines joining them if there are more than a handful of blocks present. In fact, the execution sequence becomes random, and changes randomly when block positions are changed (even without changing the data connectivity). This forces the use of explicit sequence structures in any non-trivial function, increasing its complexity and opacity. Just try synchronizing shared data between asynchronous loops. Even their Knowledgebase admits that there's no way to do it properly.
And let's not get started on the crappy content of Labview's documentation. It's organized and formatted tolerably well, but the content is vacuous. Hardly any functions have any suggestion of their behaviour when faulty data arrives (e.g. a NaN), for example.
Re:Labview sucks the most (Score:5, Interesting)
It has plenty of other annoying behaviors.
If you try to access an array element out of range it just gives you the default value for that data type rather than giving some indication that something is wrong.
There is an option to automatically build an array as the output of a loop, but no way to make it *not* add a value to the array. Like when you hit a terminating condition for the loop or some value you want to skip. If you have these situations you either have to modify the array afterwards or build the array manually.
Re:Labview sucks the most (Score:5, Interesting)
Nothing really (Score:4, Insightful)
Labview is well designed for its intent. So someone with minimal programming skills can sit down and get something done in a short amount of time. Would I use it for crunching numbers or collecting terabytes of data, probably not. But its sure damn handy if you want to interface test equipment and get results. Its all about the best tool for the job.
It is a pain in the ass to change. (Score:5, Informative)
Say you realize that you need to check for another corner case that you forgot, or need to extend a function for another purpose, or whatever. In any other language, you would type a few lines of code and be done with it. Not with labview. With labview you have to move things around to make room for the new code, disconnect wires and reconnect them. NI has added stuff into the newer version to help with this (auto growing, etc) but it still turns into a mess in short order.
Other things are just easier to type than to draw, and also easier to read in text then as a schematic, like equations. So much so that they have added the ability to type portions of the code, but the amount of setup that you need to do with a code block often defeats the time benefit you get from using it.
As someone who likes "clean code" I find LabView much more tedious and time consuming to keep neat, and when dealing with other coders that are not as picky, I find that their LabView code is much messier and harder to read than Java or C code by the same developer.
Re:Only for certain kind of analyst... (Score:5, Interesting)
I use LabView on a daily basis. I hate it.
My coworkers like it and what they seem to have in common is that they either don't know any other languages or aren't proficient in them.
It is a language that aims to be very simple by removing as much typed code as possible. Because of this you will spend stupid amounts of time moving little wires around and trying to make your code not look like a tangled mess. And good luck changing it later.
Since there are no functions and the only way to reuse code is to put it in a different file people tend not to do this. So if you want to use part of someone else's code you will usually have to copy and paste into a different file and spend a bunch of time reconnecting wires and dealing with references to variables you won't have access to in the new file.
The visual style is also, in my opinion, much harder to read than typed code. If I'm trying to figure out some sort of formula it's easier to read it as text than try to figure out where all these wires are coming from that are connected to little "+" and "-" terminals. Also, since comments take space they tend to be short and are usually missing in more complicated sections because it's harder to route the wires around them. And control structures quickly make code virtually unreadable.
There's also the part about writing most of your code with a mouse. Do you really enjoy having to navigate through a series of menus to do anything?
Re: (Score:3, Insightful)
Since there are no functions and the only way to reuse code is to put it in a different file people tend not to do this.
Oh yeah, and when you do so, you have to draw an icon to represent the function instead of just giving it a name, and many people don't do this (even though the icon could just be text in a box). And since the data-flow nature of the language also eliminates most intermediate variables, you end up with code that is nothing but unlabeled lines drawn between generic-looking boxes. In otherwords, the semi-self-documenting nature of function and variable names is lost because those names don't exist, or are not
Re: (Score:3, Interesting)
I was overly harsh. It has some good points. Just not compelling enough to bother to learn a whole new kind of language for, especially if you know about the powerful things in the matlab toolbox.
But.. depending on what you're actually using it for next semester, I would venture to guess that you might not actually have to learn labview at all. My introduction to it was in a lab class where it was used to interface with a kind generic electronic projects daughter-board, and for the whole semester we neve
Re: (Score:3, Informative)
>The folks I know who use Excel for analysis use it because it's the package that everyone gets in their organization, there's a shit load of material on the web that uses excel, there's plenty of add-ons for it (no need to reinvent the wheel), and when sharing data and analysis, everyone is familiar with it
Back when I was in grad school, ten years ago, Excel was the preferred data analysis tool for most physical and biological scientists that I knew; even when they had high end analysis tools installed
Re: (Score:3, Funny)
Why don't they just use Word if they need a database??
http://www.neopoleon.com/home/blogs/neo/archive/2003/09/29/5458.aspx [neopoleon.com]
Re: (Score:3, Insightful)
Do analysts who use R get better returns than those who use Excel?
What's a pirate's favorite programming language? (Score:5, Funny)
R!
Sure about that? (Score:4, Funny)
Re:Sure about that? (Score:4, Funny)
not to mention that, because of reason 3, it is the most drunk-friendly programming language there is.
Re:Sure about that? (Score:5, Funny)
Re:What's a pirate's favorite programming language (Score:5, Funny)
http://www.arrrrrr.com/corsair.jpg [arrrrrr.com]
Re: (Score:3, Funny)
No, it's P. It's like R but it's missing a leg!
popular? no (Score:5, Insightful)
Growing in use? sure.
Two links? (Score:2)
Show me some example code (Score:5, Insightful)
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Include reasons to support the notion that the R language is [necessarily] better at what it does.
Re:Show me some example code (Score:5, Insightful)
FTA
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.""
Seriously, does this person know what she is talking about?
1. Yes, CFD and Structural Analysis software is increasingly written using open source tools and run on open source OS (Linux running on clusters)
2. SAS is not used to design any part of the aircraft.
I have noticed SAS uses the same kind of FUD to counter R as M$ uses to counter Linux.
Re:Show me some example code (Score:5, Insightful)
Let's see, Director of technology product marketing. I'm gonna go with a big NO.
Re:Show me some example code (Score:4, Insightful)
Re: (Score:3, Informative)
Production side: I would agree. However statistical differential equations? SAS is good for predefined "statistical analysis", not for solving partial differential equations. Almost all mechanical problems in aerospace (read fluids, solids, thermal, electro) are expressed as partial differential equations. solutions of these (baring a few special cases) require numerical methods. The most common of these methods are finite element, finite difference and finite volume.
And each one of these has it numerous "s
Freak your colleagues out with "no loop" code... (Score:5, Interesting)
Actually that wasn't why I used R, just a fun addendum. The reason to use R is the huge body of statistics, data mining and graphics facilities. Superb.
Of course, the problem with any statistical library is you have to turn your brain on first. Nothing produces "Garbage in Garbage out" quite like statistical analysis.
With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming.
Which is actually not a Bad Thing at all.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
Re:Freak your colleagues out with "no loop" code.. (Score:5, Informative)
"The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"
Try searching from http://rseek.org/ [rseek.org] instead of directly from Google.
Wow! The Google Star of R has risen... (Score:2)
I retract my sole criticism of R.
Re:Freak your colleagues out with "no loop" code.. (Score:5, Interesting)
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion [wikipedia.org].
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book [mit.edu].
Re: (Score:3, Informative)
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Of course there is a state, you're using a standard computer to run the program, so there must be a state somewhere. Still, the point is that even if the language implementation works by changing the computer's memory state, the abstraction you use to program isn't state-based. In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions.
Regarding the SICP book, like most functional programming languages, Scheme isn't a pure functional langu
Re: (Score:3, Interesting)
Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Not that difficult, really. The main problem is sequencing, which is provided by things like function composition. The problem is the unwieldy nature in languages like Scheme of specifying sequencing using pure functions, while also handling data that doesn't require sequencing; but this is a syntactic problem, not a practical one.
The issue is handled admirably by the language Haskell, using a mathematical construct called a "monad" to allow an elegant way of handling sequencing--even a syntactic sugar "do"
Re:Freak your colleagues out with "no loop" code.. (Score:2)
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
Yep. There are a couple of dedicated R search engines that can help with that: http://www.dangoldstein.com/search_r.html [dangoldstein.com] and http://www.rseek.org/ [rseek.org]. It may also sometimes be useful to Google on "Splus (whatever)" since most R and S+ code is pretty much interchangeable.
Re: (Score:2, Informative)
x = vector(mode="list")
x[["joe"]] = y
x[["bob"]] = z #z can be a function!
x = list(joe=y)
x$bob = z
r-project.org (Score:4, Informative)
The language is very well documented online and the mailing lists contain thousands of examples. It is primarily for statistical analysis, and the libraries available for doing such analysis are unparalleled.
Re:r-project.org (Score:5, Funny)
the libraries available for doing such analysis are unparalleled.
With multi-core processors becoming more and more prevalent, R's developers should remedy this as soon as possible.
Re:r-project.org (Score:5, Informative)
With multi-core processors becoming more and more prevalent, R's developers should remedy this as soon as possible.
Already done. There's an R package called SNOW [www.sfu.ca] that allows you to handle code running in parallel.
Re:Show me some example code (Score:4, Informative)
It may not be "better" in the sense of "calculating stuff with higher efficiency" (i reckon you can do the same stuff in C, given the right libraries :P), but for statistical and data mining/visualization purposes it is a quite simple object-oriented functional language with many useful built-in procedures and lots of freely available packages/libraries that is simple enough for "non-programmers" and, so far, it does what i want it to do fast enough and.. it's free.
So.. probably not the best all-purpose programming language, but fits nicely in the "statistical software environment/language" niche and, unlike SPSS et al., it's free (as in "libre", as in "everyone can independently verify your results without having to shell out cash", which is useful in academia).
Example code:
results <- prcomp(datamatrix)
This does a PCA (Principal Component Analysis [wikipedia.org]) on the data contained in "datamatrix" and dumps the results into the "results" variable.
I have no idea how i would start to code that in C, python, etc. in a way that's remotely efficient ;)
Re: (Score:2)
I have no idea how i would start to code that in C, python, etc. in a way that's remotely efficient ;)
I'd go with
#include "prcomp.h"
Once someone did the algorithm for you, any programming language is easy. I think the point of the language would be, if said algorithm was orders of magnitude easier to code, represent, argue about, etc. in R, than it would be in "C, Python, etc."
Re: (Score:2)
Re: (Score:3, Insightful)
But we already have a language that does vectors correctly. It's called Matlab and it's based on Fortran, which I guess technically also does vectors correctly, if you want to bother to learn it.
Re:Show me some example code (Score:5, Insightful)
One big advantage R has over Matlab (er, besides the fact that R is OSS, but of course there's Octave for those who want an OSS Matlab alternative) is that R handles non-matrix data structures much, much better than Matlab does. Trying to work with anything that isn't a vector or a matrix in Matlab is an exercise in pain.
Re:Show me some example code (Score:4, Informative)
Re:Show me some example code (Score:4, Informative)
I used to use Matlab quite a lot (mostly for prototyping simulations and for visualization; I use C for my "real" simulations which take a lot of CPU time, since they run so much faster in C). I learned R about 2 years ago, and found that it can do pretty much everything Matlab can that I need for my own research.
Anyway, I wrote up a "Matlab / R Reference" that translates the basics between the two packages. It doesn't have highly specialized stuff, but many people have found it handy. I use my own reference quite a bit myself, since these days I mix up commands between the two packages quite a bit. It's available at:
http://www.math.umaine.edu/faculty/hiebeler/comp/matlabR.html [umaine.edu]
Re:Show me some example code (Score:5, Insightful)
I have no idea how i would start to code that in C, python, etc. in a way that's remotely efficient ;)
How about:
#include <clapack.h>
dgesdd( argument list );
This sort of thing is a feature of libraries, not an inherent advantage of one language.
Re:Show me some example code (Score:5, Informative)
It's been a while since I worked with it and I don't have code examples with me at the moment, but think of it as the Matlab/Octave of statistics, including the preference for "function over each row/column" instead of loops.
Compared to other languages, R makes it easy to do statistical analysis tasks like Matlab/Octave makes it easy to do linear algebra tasks.
Plus, as other posts stated above, there's excellent documentation and tons of useful libraries (take a peek at the libraries available at the Debian repositories), Bioconductor being the finest example.
Oh, and nice emacs integration. :)
Re:Show me some example code (Score:5, Informative)
I use R a great deal. Think of it as an alternative to MATLAB, or Excel, rather than C or perl or lisp or whatever you like to use as a general purpose language. So, compared to MATLAB, functions are first class objects (rather like lisp), so, you can write functions that take functions as arguments, and return them as well, just as though
they were simple variables. It handles
vectors rather easily, and has decent plotting tools.
#quick example
# function, which, given numerical arguments a and b, and a function g, returns a function of x
f - function(a,b, g){
function(x){ a * x + g(b * x)}
}
f1 - f(1,2.5,sin)
x - seq(-pi,pi,l=100)
plot(x,f1(x),type='l')
Re: (Score:2, Informative)
Re:Show me some example code (Score:4, Informative)
i'm a PhD student in biostatistics at a fairly prestigious american university. we use R almost exclusively, because it is better than other statistical software options. reasons for it's superiority are i) it's free ii) it's open source and iii) its considerably more powerful than STATA, SPSS, SAS, etc.
it is true that other languages can be quicker for many tasks. proficiency in C is desirable, but C is not geared toward statistics, where many built-in libraries and user-contributed packages for R implement complex methodologies.
i'm not as versed in C as i am in R, so i can't provide a direct comparison of the languages, but i have included a sample below. it's a function that fits a simple linear model, taking the outcome data and input data (as a matrix) and a couple of other parameters as inputs. it returns a variety of values, including the model coefficients and fitted values. there is an R function that does this exact thing, but we have to do something for homework.
lm=function(y,x,returnHat=FALSE,addInt=FALSE){
if(addInt){
x=cbind(matrix(1,nrow(x),1),x)
}
#use range around 0, for roundoff error
if(-1e-5=det(t(x)%*%x) & det(t(x)%*%x)=1e-5){stop("x'x not invertible",call.=F)}
beta=solve(t(x) %*% x) %*% t(x) %*% y
sigma = as.numeric(sqrt(var(y-(x%*%beta))))
varbeta=sigma * (solve(t(x)%*% x))
fitted=x %*% beta
residuals=y-fitted
if(!returnHat){
output=list(beta,sigma,varbeta,fitted,residuals)
names(output)=c("beta","sigma","varbeta","fitted","residuals")
}
if(returnHat){
hat=x%*% solve(t(x) %*% x) %*% t(x)
output=list(beta,sigma,varbeta,fitted,residuals,hat)
names(output)=c("beta", "sigma", "varbeta", "fitted", "residuals", "hat matrix")
}
output
}
i'd also say that i'm glad to see some press for R. it's popular in some circles, but not as accepted by companies and some academics because it is open source. the idea is that software you have to pay a licensing fee for must be more reliable because, well, you paid for it (thinking i'm sure you're familiar with).
Re: (Score:3, Interesting)
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Would you ask someone who utilizes SAS or SPSS to do the same thing? Because that's more or less what R is -- a free version of SAS or SPSS. I work in SAS all day long and I have been planning on using R to automate some of my personal website statistics/graphing that I run regularly because I don't really like doing the
Re: (Score:2, Informative)
x = 1:10 #integers from 1 to 10
#set all even elts of x that are less than 7
x[(x < 7)&(x %% 2 == 0)] = -1
#y is some big array with several dimensions
#I and J are vectors of integers
z = y[I,,J,,, drop = F]
#'z' is now a sub array
z = y[I,2,J,1,]
#now z is a subarray with fewer dimensions
Re: (Score:3, Insightful)
Okay, I'll take you up on that... here's some code that takes in a vector of genotypes (as a factor with levels AA,AC,CC,XX), and a matrix of columns to be used for different bootstraps, and spits out a list of genotype counts for those bootstraps:
## matmap -- maps a vector onto a matrix of indexes to the vector
## (a hack to get round something that R doesn't seem to do by default)
matmap <- function(vector.in, matrix.indices){
res <- vector.in[matrix.indices];
if(is.null(di
ARRRRRR (Score:2)
Oh god... cue pirate jokes.
Free as in beer (Score:3, Insightful)
Very true. This is what I try to explain to people when they can't understand why some software is given away gratis. Because if they charged for it, given the current attitudes of the market, they wouldn't stand a chance and wouldn't ever get any market share to begin with.
SAS strikes out ^H^H^H er, "back" (Score:5, Informative)
Good thing Boeing's not using fere software for aircraft simulation tools [sourceforge.net], space station labs [nasa.gov], sub hunters [com.com], or moon rockets [popularmechanics.com] ;-)
Re:SAS strikes out ^H^H^H er, "back" (Score:5, Informative)
FUD from SAS (Score:4, Insightful)
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Wow...talk about FUD. Does SAS imdemnify against plane crashes?
Re: (Score:3, Interesting)
I guess nobody told her about how proprietary Excel is inferior to libre Gnumeric in having quite a few errors in its statistical functions and how when apprised of errors the Gnumeric folks fixed them quickly while the Excel folks either never fixed them, did it slowly, or introduced new errors? See the report [csdassn.org] by Drexel University statistician B. D. McCullough.
R sucks as a language (Score:3, Interesting)
Actually it may not suck. But having used it on and off over the past few years while not being a statistics pro, I find the R language bletcherous and annoying. - as an assignment operator?
Re: (Score:2)
Well, crap, hit Submit instead of Preview. Meant to say, <- as an assignment operator (I know = works now, but still...)? Bizarre data frame and object semantics? R is quite useful but I really dislike writing anything nontrivial in it.
Re: (Score:3, Informative)
The R language is optimized for writing statistical code. It's going to seem a little weird, especially if you have a traditional programming background. Once you spend some serious time writing R code, however, you will probably begin to appreciate many of the things that initially seemed odd.
For example, consider the way R handles function calls [moertel.com]:
Embedded FUD (Score:2)
Anne H. Milley, director of technology product marketing at SAS ... adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
High Level Lingua Franca (Score:2)
The "smart set" needs a such a high level lingua franca to express infinite precision financial models of no accuracy whatsoever!
Not a programming language! (Score:2)
The R language and its uses (Score:5, Informative)
The R language (yes, it's a language; an interpreted languages is a language too) has developed as the language of choice by statisticians (both academics and sundry statistical researchers) around the world as their main computer language. It is used in those cases where researchers feel the need for customized computations rather than the use of a package like SAS or SPSS.
The reason that R has become popular is due to a snowball effect and history. It started as a FOSS re-implementation-from-scratch of the "S" language designed for statistical work at Bell labs (see http://en.wikipedia.org/wiki/S_(programming_language) [wikipedia.org]. Some academics and researchers of repute used it (the S language) because at that time (1975) it was very innovative and far better than most alternatives, and others followed. The S language gained a measure of acceptance among statisticians. Then when R became available the cycle intensified because of the much improved availability of the interpretor and its libraries. This cycle continued to the point that by now probably most professional statisticians use it.
As far as I can see, the R language isn't especially sophisticated or elegant, and may strike people used to more modern languages as a bit repugnant. It does however excel in three respects:
(a) it allows for easy access of Fortran and C library routines
(b) it allows you to pass large blobs of data by name
(c) it makes it easy to pass data to and from your own compiled C and Fortran routines
The first reason is particularly important because it allows one to use e.g. pre-compiled linear algebra package like LAPACK, or Fourier Transforms, or special function evaluations and thereby gain execution speeds comparable to C despite being an interpreted language (just like Matlab, Octave, Scilab, Gauss, Ox and suchlike): the hard work is carried out by a compiled library routine which is made easily accessible through the interpreted language. Any algorithm needed in statistics that's available as C or Fortran code can be linked in and called without too much effort.
The second reason is important because it slows down execution much less than any pass-by-value interpreted language would, and it allows you to change data that is passed into a function.
The third reason is particularly important because it helps researchers be more productive. Reading in your data, examining it, graphing it, tracing outliers and cleaning them up is best done in an interactive environment in an interpreted language. Coding such things in C or Fortran is an awful waste of time, and besides, researchers aren't code-monkeys and don't enjoy coding inane for-loops to read, clean, and display data. Vector and matrix primitives are far more powerful, and usually preferable unless they are so inefficient that you have to wait for the result. However, there are times when you just need to carry out standard algorithms (linear algebra, calculation of mathematical or statistical functions) or simply time-consuming repetitive algorithms that run so much faster in a genuine compiled language. You could start out by coding the algorithm in an interpreted language to check if it's working, and then isolate the computationally expensive part and code it up in C or Fortran. Using R (or Matlab or Scilab) you can *call* the compiled subroutine, pass it your (cleaned) data, and get the result back in an environment where you can easily analyze it.
That's why languages like R, Matlab, Scilab, Octave, Gauss, and Ox are so productive: you get the best of both worlds. Both the convenience, interactiveness, and terseness of a high-level interpreted language and the speed of compiled languages.
So why R, and why not Gauss or Matlab or whatever?
Well, part of that is cultural. If you're an econometrician you'll have been weane
Re:The R language and its uses (Score:5, Interesting)
(a) it allows for easy access of Fortran and C library routines
(b) it allows you to pass large blobs of data by name
(c) it makes it easy to pass data to and from your own compiled C and Fortran routines
So, it's exactly like Python, except with an outdated 1970's syntax that was frankly pretty weird to start with :-)
I've used R, and found it useful for some of its relatively esoteric capabilities, but currently use it almost exclusively via rpy now, the Python binding to R.
Furthermore, I've been using it less in recent years as the native statistical capability of Python has continued to improve. I can appreciate that people who work strictly in data analysis could find R an appropriate tool, but as someone whose work spans multiple areas, from analysis to application design and development, R is too limiting a tool, and using it always feels a little alien and weird.
Re:The R language and its uses (Score:5, Informative)
I second that. R is terribly useful for the wide variety of libraries available and esoteric statistical procedures. But you would *never* want to write a long/complex program in R.
As you say, it's most convenient to work in some other language that's actually designed to be scaleable, object-oriented, and easy to debug. It's usually straightforward to call R libraries when you need them. I find that python+scipy+rpy is an almost ideal environment for day to day scientific programming.
Coding large applications in R (Score:3, Interesting)
I agree with the first part of your post: to me R is something to code in when you have to, and to keep the resulting code as short and simple as possible. If I ever had to code a real application with a GUI that needed the statistical strengths of R, I would almost certainly not use R.
On the other hand I'd probably use Java and link to R as a server (see my other post about R and Java) instead of using Python.
Fine-tuning financial models (Score:2)
I think we all know how well that's turned out, eh? So it that the fault of the language or programmer error?
You have to play with it. (Score:3, Insightful)
You have to play with it. As with APL you'll either love it or hate it.
If you like the idea of a language that includes relational tables as a primitive data type, that extends most operators to do the right thing when you feed them vectors and matrices, that has linear regression and equation solving built-in, you'll probably like R.
Well... (Score:5, Funny)
...if at first you don't succeed, then skydiving is not for you.
Don't be ridiculous (Score:3, Funny)
You didn't have any friends in the 3rd grade.
Re: (Score:2)
The point is S is popular, but expensive. R gets some popularity from that. One could hope also S is made free at some point.
And i also dont know why it is called R.... maybe because sounds starting with a vowel sound better.....
Re: (Score:3, Informative)
And i also dont know why it is called R
The guys who originally wrote both had first names that started with R and being the jokers that they were, they thought it would be funny to give it a name very similar to S.
Re: (Score:3, Informative)
"And I also don't know why it is called R"
"The guys who originally wrote both had first names that started with R and being the jokers that they were, they thought it would be funny to give it a name very similar to S."
Additionally, in statistics r is the letter used to denote the Pearson product-moment correlation coefficient [wikipedia.org].
Re:Based on S (Score:5, Interesting)
I wish it had a more googleable name. It's hard to search for help. The signal to noise ratio is low.
Re:Based on S (Score:5, Informative)
http://www.rseek.org/ [rseek.org]
Re: (Score:3, Informative)
are you talking about R or S? searching for "R" on google returns pretty good results [google.com]--the first 6 links are all related to R. and 4 of the results on the next page are also related to R. searching for "S" on the other hand doesn't immediately come up with any relevant results.
i'd say it's fairly easy to find info on R using google considering its limited popularity relative to other languages. obviously you're not going to find a ton of information on it since it's a somewhat obscure niche language. but if
Re:Based on S (Score:5, Funny)
Re:Based on S (Score:4, Funny)
I tried. The search results are extremely relevant to my interests. Of course, I don't use the natural language processing system "Lolita".
Re:Based on S (Score:4, Funny)
That "Post Anonymously" button is kind of hard to miss these days.
Re:Based on S (Score:5, Funny)
Re: hence: when he restored unto Moses fled from m (Score:4, Funny)
Re:Not a language, really (Score:5, Insightful)
Are you kidding me? Are you really *(*$@#ing, Grade A kidding me?
Python/Perl/Ruby require interpreters. Scheme and Lisp are frequently run within interpreters. "stand-alone executable" require HARDWARE. Any programming system requires *something* underneath it unless you are programming in a purely physical system like an automated abacus with mechanical gears that buzz and whirr.
Programming languages are defined by their Turing completeness: can they do things repeatedly, can they assign values to memory locations and perform some basic set of operations (nand works nicely), can they make decisions. Everything else is fluff.
Perl has "fluff" that handles regular expressions very well.
Python (and others) have "fluff" that make networking and database ops easy.
R has "fluff" that makes it terribly convenient to work with data.
Matlab has "fluff" that makes it very easy to do numerical methods programming.
Mathematica has "fluff" that makes it very easy to do symbolic computation.
Each and every one of these, and most well-known languages, with all their warts and beauty marks are Turing complete and are deserving of the term "programming language".
Regards,
Mark
Re: (Score:3, Insightful)
I would argue that GP is confusing "programming language" with "general-purpose programming language".
I bet even SQL is Turing-complete, but I wouldn't want to do more than database operations with it.
Re: (Score:2)
Re:Not a language, really (Score:4, Insightful)
Your comment is absolutely wrong.
http://en.wikipedia.org/wiki/Programming_language [wikipedia.org]
R is a Turing complete programming language. The fact that it requires an interpreter is completely irrelevant.
Re: (Score:3, Informative)
Actually, R is a real (Turing-complete) programming language like Perl, Python, Ruby, etc. It just happens to have lots of statistical libraries and matrix-oriented functions.
You put #!/usr/bin/Rscript in your first line and it can work just like any other scripting language, with command-line arguments, etc. I use it all the time as a replacement for other scripting languages (think PDL+Perl or Numpy+Python).
R is an excellent language for any scientist. The sytax and semantics of the language are very w
Re:Not a language, really (Score:4, Insightful)
Re: (Score:3, Informative)