This series introduces you to R, a rich statistical environment, released as free software. It includes a programming language, an interactive shell, and extensive graphing capability. What’s more, R comes with a spectacular collection of functions for mathematical and statistical manipulations — with still more capabilities available in optional packages.
Can we be a bit more creative than single letter names?
R is based on the S / S+ statistical package, hence the single letter name.
I could care less how many letters are in the name, as long as it is excellent software, which R is. Many academics have adopted it as a de facto standard in place of S-plus, precisely because of its free nature. Authors include R code in their manuscripts, so readers can immediately download the software and replicate any results they find confusing or troubling.
Those who have trouble with single-letter software names can write the next package, called ‘Pedant’, which I will shorten to just P!
I really, really hope that was just a troll and not stupidity.
That was probably more sarcasm directed at the recent attacks on OSnews at command line oriented interfaces. R is awesome work and is even better because of its interface choices.
I can’t believe they still make software with CLI programs. A research scientist wants to get work done, not type arcane commands. I wouldn’t recommend R for anybody. Instead, I suggest uisng MS Excel.
If you are going to go where nobody gone before, you need more flexibility than what programs like Excel can offer.
E.g. if you are a real scientist you build a statistical model of your problem, perform your experiment, then you apply your data to the model and see if you can reject your hypotheses are rejected and if so calculate at what level of significance. For this a CLI is excellent-
Excel is good for lab technichians that do their standard ANOVA whithout even checking if the date fits the requirements for the test. (Happens far too often)
We were introduced to this statistical program by our lecturer for a statistics module, and it was pretty easy to use. The CLI really makes it more powerful, in fact, like what my profesor said. It easily triumphs over other proprietary statistics packages like SPSS, etc, with its functionalities and unbeatable price (free!).
Some links for the interested:
R for Beginners
http://www.stat.nus.edu.sg/~statwy/ST2238/Rnotes/rdebuts_en.pdf
R: A self-learn tutorial
http://www.stat.nus.edu.sg/~statwy/ST2238/Rnotes/selftutorial.pdf
Using R for Data Analysis and Graphics:
http://www.stat.nus.edu.sg/~statwy/ST2238/Rnotes/usingR.pdf
An introduction to R for dynamic modeling:
http://www.stat.nus.edu.sg/~statwy/ST2238/Rnotes/RIntro.pdf
Thumbs up, that was a good one!
“I can’t believe they still make software with CLI programs. A research scientist wants to get work done, not type arcane commands. I wouldn’t recommend R for anybody. Instead, I suggest uisng MS Excel.”
Try to calculate 1 + 1 with Excel and with psi ( http://www.chez.com/spinecho/pypsi/pagpypsi.htm )
I work professionally with market research and use SPSS on a daily basis. I just checked out “R” which was previously unknown to me and once again I get astonished by the lack of quality in statistics software.
Everything is about presentation, something “R” COMPLETELY LACKS. Any output just looks completely horrid. So do SPSS for a fact, but at least it works in combination with Excel and Powerpoint at a good level.
Besides from that, R seems like a typical student program which has no business in any commercial company at all. Time is of the essence and I’m confident you can do all R can do in SPSS at less than 1/5 of the time.
On a final note, this is simply a dead product IBM gives away in order to “look good” but in fact has no market what so ever.
R is simply excellent.
I would also highly recommend:
mupad: http://www.mupad.com/ – symbolic/algebraic computation
scilab: http://www.scilab.org — similar domain to R
one of the most useful software for data presentation is called scighraphica. it is free and produces publication quality putput. and its works simialr to Microcal Origin. but development seems to have died. vendors do still package it up and make it work.. but its falling behind in terms of library dependencies… (gtkextra?)
scigraphica: http://scigraphica.sourceforge.net/
i hope this picks up as i have owed much to this software. have a look at the pdf at
http://www.cs.bris.ac.uk/home/tr1690/documentation/fuzzy_clustering…
most of the diagrams and plots were done using the above software.
QtiPlot is a program for data analysis and plotting. The home page says it is a free clone of origin. This programs seems to be under active development. You can download it from:
http://www.kde-apps.org/content/show.php?content=14826
If you really want to do “interesting things” CLI programs are the way to go. SPSS (and/or Excel, Statistica, Statgraphics…) can do a lot of things but after a while when you try to do the hard stuff (new methods or just batch processing tons of data)
New statistical methods can be easily programmed in R (or matlad/Octave) with only a few lines of code, and after that could be tweaked and refined easily.
As an example in Time Series analisys you can’t get good fitting in “normal graphical” software if you have some of the more complex series (i.e. mixture of more than one non-linear series) You are contrained to what the programmers added in your version (Yeah I know, some packages have some kind of scripting language.. but then you are in CLI land again :-))
Check the mailing list of scigraphica, the port the GTK2 has begun a few month ago.
It seems, GtkExtra was already ported, see its CVS: http://cvs.sourceforge.net/viewcvs.py/gtkextra/gtkextra-2/
Hopefully there will be a release soon.
I work professionally with market research and use SPSS on a daily basis. I just checked out “R” which was previously unknown to me and once again I get astonished by the lack of quality in statistics software.
Everything is about presentation, something “R” COMPLETELY LACKS. Any output just looks completely horrid. So do SPSS for a fact, but at least it works in combination with Excel and Powerpoint at a good level.
[/i]
It’s the calculations and flexibility, stupid!
Everything is NOT about presentation. Since you talk about research, I presume you are a marketing drone, doing simple-ass statistical analysis. Well, R is for SCIENTISTS.
Besides from that, R seems like a typical student program which has no business in any commercial company at all.
You wouldn’t be able to differentiate a student program than a powerhouse if it bit you in the posterior. You are right, though, it has no business in the hand of a commercial company doing MARKET RESEARCH or anything. Better leave it to scientists, and be content with toys bend on “presentation” which, after all, “is everything”.
By the way, have you noticed that one of the presenters of the “typical student program” for the article, is a Ph.D at the University of Colorado?
Time is of the essence and I’m confident you can do all R can do in SPSS at less than 1/5 of the time.
You are welcome.
On a final note, this is simply a dead product IBM gives away in order to “look good” but in fact has no market what so ever.
IBM has nothing to do with R. It only hosts an article in its website that is a presentation of the R. IBM did not make it, and does not sell it or give it away. But it’s not surprising you could not even figure this out.
Sorry, corrected some misplaced tag:
I work professionally with market research and use SPSS on a daily basis. I just checked out “R” which was previously unknown to me and once again I get astonished by the lack of quality in statistics software.
Everything is about presentation, something “R” COMPLETELY LACKS. Any output just looks completely horrid. So do SPSS for a fact, but at least it works in combination with Excel and Powerpoint at a good level.
It’s the calculations and flexibility, stupid!
Everything is NOT about presentation. Since you talk about research, I presume you are a marketing drone, doing simple-ass statistical analysis. Well, R is for SCIENTISTS.
Besides from that, R seems like a typical student program which has no business in any commercial company at all.
You wouldn’t be able to differentiate a student program than a powerhouse if it bit you in the posterior. You are right, though, it has no business in the hand of a commercial company doing MARKET RESEARCH or anything. Better leave it to scientists, and be content with toys bend on “presentation” which, after all, “is everything”.
By the way, have you noticed that one of the presenters of the “typical student program” for the article, is a Ph.D at the University of Colorado?
Time is of the essence and I’m confident you can do all R can do in SPSS at less than 1/5 of the time.
You are welcome.
On a final note, this is simply a dead product IBM gives away in order to “look good” but in fact has no market what so ever.
IBM has nothing to do with R. It only hosts an article in its website that is a presentation of the R. IBM did not make it, and does not sell it or give it away. But it’s not surprising you could not even figure this out.
Besides from that, R seems like a typical student program which has no business in any commercial company at all.
Time is of the essence and I’m confident you can do all R can do in SPSS at less than 1/5 of the time.
On a final note, this is simply a dead product IBM gives away in order to “look good” but in fact has no market what so ever.
Incidentally, here is a list of contributors to this “joke” program:
http://www.r-project.org/nosvn/foundation/memberlist.html
* Burns Statistics Ltd., London, U.K.
* Department of Statistics, Brigham Young University, Utah, USA
* Institute of Mathematical Statistics (IMS), Ohio, USA
* MedAnalytics, Inc., Minnesota, USA
* Merck and Co., Inc., USA
#
# Boehringer Ingelheim Austria GmbH, Vienna, Austria
# Breast Center at Baylor College of Medicine, Houston, Texas, USA
# Dana-Farber Cancer Institute, Boston, USA
# Department of Biostatistics, Johns Hopkins University, Maryland, USA
# Department of Biostatistics, Vanderbilt University School of Medicine, USA
# Department of Economics, Stockholm University, Sweden
# Department of Statistics, University of Wisconsin-Madison, Wisconsin, USA
# Department of Statistics & Actuarial Science, University of Iowa, USA
# Division of Biostatistics, University of California, Berkeley, USA
# Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, University of Augsburg, Germany
# Loyalty Matrix Inc., California, USA
# Norwegian Institute of Marine Research, Bergen, Norway
# Spotfire, Massachusetts, USA
# TERRA Lab, University of Regina – Department of Geography, Canada
# Università Ca’ Foscari Venezia, Italy
QtiPlot is promising altough it still lacks of many origin features (e.g. layers and data analysis)
R is great! It is available for multiple platforms, has an active user community, lots of user packages for specialty applications. The CLI scares a lot of users but add-on packages such as R-Commander give it a gui interface.
Windows users will probably want to look at sciviews (http://www.sciviews.org/main.htm ) which incorporates R and a gui interface as one component of this project.
As far as power beyond well known packages like SPSS, check out R (and S) trellis graphs.
R is well worth the time and effort to learn.
How do you think R compares with SAS, whose name could be re-arranged into a one-word review of it as a scripting language?
I work professionally with market research and use SPSS on a daily basis. I just checked out “R” which was previously unknown to me and once again I get astonished by the lack of quality in statistics software.
You might be interested to know how your competitors are using R. Check out
http://loyaltymatrix.com
which does customer intelligence with R. Need proof? See:
http://r.loyaltymatrix.com/files/R4CI_Porzak.PDF
Everything is about presentation, something “R” COMPLETELY LACKS. Any output just looks completely horrid. So do SPSS for a fact, but at least it works in combination with Excel and Powerpoint at a good level.
In the statistics community there is fairly widespread disdain for using Excel for much more than simple statistics. For one example, see
http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf
Besides from that, R seems like a typical student program which has no business in any commercial company at all. Time is of the essence and I’m confident you can do all R can do in SPSS at less than 1/5 of the time.
R is the ‘engine” behind Bioconductor, an open source software tool for bioinformatics. Can SPSS do this:
http://www.bioconductor.org/Screenshots/index.html
On a final note, this is simply a dead product IBM gives away in order to “look good” but in fact has no market what so ever.
This “joke” software that is a “dead product” seems to be used by many more people than IBM as exampled by the useR conference:
http://www.ci.tuwien.ac.at/Conferences/useR-2004/abstracts/
SPSS has its place. So does R. The opinions of the previous poster appear to be based on a very cursory inspection of the software.
Everything is about presentation, something “R” COMPLETELY LACKS. Any output just looks completely horrid. So do SPSS for a fact, but at least it works in combination with Excel and Powerpoint at a good level.
Are you implying that Excel and PowerPoint have good output? Is that how low the bar has fallen these days? Have people gotten so used to the poor quality (and often incorrect output) of MS Office, that it is the new “high standard?”
If that’s the case, that is a sad thing. The standard for quality is still using a proper computational tool for calculations, then using a proper typesetting tool for presentation. In R, this can be achieved by using the LaTeX integration module.
ah
my experience with this program is ok during my Statistic class.
the coding in some way reminds me of Matlab.
However,
I wonder if it’s worth spending time to learn new code (though rather easy and intuitive) for R, which unlike Matlab, doesn’t seem to be an industrial standard program.
How do you think R compares with SAS, whose name could be re-arranged into a one-word review of it as a scripting language?
For my first four years as a Unix system administrator, anything that was beyond my shell scripting capabilities I did in SAS. It’s got all sorts of nifty data parsing routines. And once your data is parsed, it’s easy to generate pointless bits of information (let’s regress disk quota usage against middle initial!)
Excel for statistical work? Someone hasn’t done their research about the dangers of using Excel for statistical work… Mainly, that can make some pretty serious rounding errors because of the limited precision with which it stores numbers.
I’ve looked at R, and want to play with it more sometime. Right now I use SAS for all my statistical work.
“Everything is NOT about presentation. Since you talk about research, I presume you are a marketing drone, doing simple-ass statistical analysis. Well, R is for SCIENTISTS. ”
Well, as a research scientist, I have to disagree here. A lot of it IS about presentation because once you have results, you need to publish them “publish or perish” as the saying goes in the scientific world these days.
But as far as producing publication quality graphs and charts, R is no better or worse than most other statistical packages. Hence, programs like SigmaPlot, specifically for doing the charts and such once you have already analyized the data.
I agree with Jeffrey. SAS is an absolutely excellent package for performing regressions. It’s not the same thing as R, as far as I can tell from a ten second review of it.
I’ve done quite a bit of econometrics (way too much, in fact) in school with SAS, Stata, and Excel. SAS has a slightly worse command syntax than Stata, but I think the added power more than makes up for it. Excel, well, let’s say I prefer SAS or Stata.
-Erwos
I’ve been using R for over five years now. First I would like to mention that the graphical capabilities of R, in contrast to some aerlier ‘well informed’ remarks, are unrivaled. Though it does involve a learning curve.
Second, packages like SAS / SPlus / R are ‘power tools’ that serve a different target group. So any comparisson with SPSS is not usefull. SPSS is ‘what you click is all you get’, which is quite sufficient for some analyses, but not for others.
The *BIG* advantage of R (I’m talking to the SPlus / SAS users here) is that it is open-source. This means it has a lot of academic clout & momentum behind it. Not because it is free (please don’t mention this to the ‘everything that is free must be crap’ managers), but because it will give you more freedom. You can use the work of others that allready solved similar problems, you can contribute your work so others don’t need to reinvent the wheel, and you can stay up-to-date in an ever evolving world without having to beg the budget department.
In the end we all gain from it, even the:
data | SPSS | cut & pase report | insight > /dev/null
guys. They pay a small fortune in software overhead,
which they need to charge their clients, whome are mainly ‘everything that is free must be crap’ managers.
“Not because it is free (please don’t mention this to the ‘everything that is free must be crap’ managers), but because it will give you more freedom. You can use the work of others that allready solved similar problems, you can contribute your work so others don’t need to reinvent the wheel, and you can stay up-to-date in an ever evolving world without having to beg the budget department.”
Well, first of all, there is nothing that stops you from distriubting your SAS programs to others. So you can still use the work of others.
And as far as the cost, it isn’t really an issue in an academic environment since a typical University will have an academic license for SAS from which individual departments can buy licenses at huge discounts. What I really do not like about SAS however, is their subscription based model. I have to pay a yearly licesning fee for SAS, even if I haven’t upgraded. And then I have to install a new key that SAS sends me by email every year or else SAS stops working.