What is R语言
参考网站:
http://machinelearningmastery.com/what-is-r/(简介与入门课程下载)
https://www.zhihu.com/question/19611094(知乎:R 语言的优劣势是什么?)
http://blog.csdn.net/qq_17478541/article/details/51201726(CSDN:R安装)
http://www.cnblogs.com/jamesf/p/4751598.html(RStudio,R最好的IDE)
What is R?
R is an open source environment for statistical programming and visualization.
R is a number of things, which might be confusing at first.
R is the most popular platform among professional data scientists for applied machine learning.
- R is a computer language. It is a variant of Lisp and you can write programs in it.
- R is an interpreter. It can parse and execute R scripts (programs) that are typed in directly or loaded from a file with a .R extension.
- R is a platform. It can create graphics to be displayed on the screen or saved to file. It can also prepare models that can be queried and updated.
You may want to write R scripts in files and run them in batch mode using the R interpreter to get results such as tables or graphics. You may want to open the R interpreter and type in commands to load data, explore and model it in an ad hoc manner.
There are graphical environments, but the simplest and most common usage of R is from the R console (like a REPL). If you are just starting out with R, I would recommend learning R on the console.
What is REPL?
A read–eval–print loop (REPL), also known as an interactive toplevel or language shell, is a simple, interactive computer programming environment that takes single user inputs (i.e. single expressions), evaluates them, and returns the result to the user; a program written in a REPL environment is executed piecewise. The term is most usually used to refer to programming interfaces similar to the classic Lisp machine interactive environment. Common examples include command line shells and similar environments for programming languages, and is particularly characteristic of scripting languages.
In a REPL, the user enters one or more expressions (rather than an entire compilation unit) and the REPL evaluates them and displays the results. The name read–eval–print loop comes from the names of the Lisp primitive functions which implement this functionality:
- The read function accepts an expression from the user, and parses it into a data structure in memory. For instance, the user may enter the s-expression (+ 1 2 3), which is parsed into a linked list containing four data elements.
- The eval function takes this internal data structure and evaluates it. In Lisp, evaluating an s-expression beginning with the name of a function means calling that function on the arguments that make up the rest of the expression. So the function + is called on the arguments 1 2 3, yielding the result 6.
- The print function takes the result yielded by eval, and prints it out to the user. If it is a complex expression, it may be pretty-printed to make it easier to understand. In this example, though, the number 6 does not need much formatting to print.
The development environment then returns to the read state, creating a loop, which terminates when the program is closed.
如果能有个网站可以在线执行代码:我们只要打开浏览器,进入网页,敲入要执行的代码,点击执行按钮就看到执行结果。那是一件多畅快的事情!
对于这种交互式的开发环境我们叫做 REPL(Read-Eval-Print Loop),我收集了一些 REPL Online(有些网站可能被"墙"了,你懂的),欢迎大家留言补充:
http://blog.csdn.net/redraiment/article/details/6941121
多种语言在线REPL:http://www.shucunwang.com/RunCode/r/
For an interesting and detailed treatment of the history of R, check out the technical report R: Past and Future History (PDF).
R 语言的优劣势是什么?
R语言的优势是免费开源易操作,劣势是慢、慢、慢!
优势基本上就是开源,方便,更新快而且功能多。最大的缺点自然就是慢。
R对内存的管理不尽人意,不能compile,所以慢。
小而强大,画图功能特别牛逼。
ggplot2画静态图,plotly画交互图、shiny可以用网页交互,ggmap画地图类可视化。
其强大的作图功能和扩展能力一直让我无法自拔。
优势是:R是由统计学家开发的;劣势是:R是由统计学家开发的。
有很强的user groups,最重要的优势就是那些已经写好的包。野鸡的包不要用。优势和劣势都是那些包,用起来强大无比,学起来头疼万分。
Rstudio用起来真的挺好的,界面非常友好,是个很优秀的IDE。
R语言学习资料
给个R,我个人收集的一些资料,有一些常用的入门书籍和一些常用的R的blog:[http://github.com/pjpan/DataScience]
另外,机器学习常用的包罗列如下,精通了这些基本上够用了;有个链接收藏了很多R相关的内容:
GitHub - qinwf/awesome-R: A curated list of awesome R packages, frameworks and software.
- data.table(fread,load速度最快,有个缺点,有时候识别不出有些数值型变量)
- readr
- caret
- dplyr
- tidyr
- lubridate(处理日期最好用的包,R的时间处理实在是太复杂了)
- ggplot2(图画界的王者 )
- plotly(交互图,鼠标hover上面可以看到图上的数值)
- ggmap(很多时候需要FQ,需要用到googlemap,其他替代方面,网络上有)
- gbm(要被替代掉了,xgboost已经横行工业界了)
- xgboost(目前的算法王者,回归、分类、排序通吃)
- randomforest(速度太慢了,跑模型的时候可以去喝个下午茶)
- glmnet(L1,L2毫无压力)
- libsvm(e1071)(需要很有耐心,训练速度很慢;)
CRAN
CRAN为Comprehensive R Archive Network(R综合典藏网)的简称。它除了收藏了R的执行档下载版、源代码和说明文件,也收录了各种用户撰写的软件包。现时,全球有超过一百个CRAN镜像站。
R的获取与安装
cran.r-project.org——获取资源的网址
RStudio的获取与安装
rstudio.com——RStudio的网址
RStudio:IDE,R的用户接口,有了它,我们在用R时,方便了很多。当然你也可以选择StatET(Eclipse环境)、ESS(Emaces)等IDE,此处不再赘述。
学习资料
《统计建模与R软件》http://pan.baidu.com/share/link?shareid=1945527438&uk=170195656
Machine Learning With R Mini-Course
http://machinelearningmastery.com/what-is-r/
Super Fast Crash Course in R (for developers)
http://machinelearningmastery.com/r-crash-course-for-developers/