---恢复内容开始---
先插入工作表,表名为msleep_ggplot2.csv
> setwd("F:/研究生/课程/哈佛视频课/1") > tab= read.csv("msleep_ggplot2.csv") > class(tab) [1] "data.frame" > head(tab)
> dim(tab) [1] 83 11
> View(tab)
> c(tab$sleep_total,1000)
> plot(tab$brainwt,tab$sleep_total) > plot(tab$brainwt,tab$sleep_total,log="x")
计算列为“sleep_total”的总值:
> summary(tab$sleep_total) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.90 7.85 10.10 10.43 13.75 19.90
表中列1、2的所有内容:
> tab[c(1,2),]
计算sleep_total大于18的项目的所有内容:
> tab[tab$sleep_total>18,]
计算前1,2行动物的sleep_total:
> tab$sleep_total[c(1,2)]
[1] 12.1 17.0
计算sleep_total大于18的平均值:
> mean(tab$sleep_total[tab$sleep_total>18])
[1] 19.275
用which筛选sleep_total>18的项目所在的行,即位置:
> which(tab$sleep_total>18)
[1] 22 37 43 62
第一个sleep_total>18的项目,sleep_total的数值:
> tab$sleep_total[which(tab$sleep_total>18)[1]]
[1] 19.7
计算What is the row number of the animal which has more than 18 hours of total sleep and less than 3 hours of REM sleep?,在R语言中,条件和用&连接,不用and:
> which(tab$sleep_total>18 & tab$sleep_rem<3) [1] 43
sort()返还的是数值,数值从小到大排序,sort() simply gives back the list of numeric values after sorting them:
> sort(tab$sleep_total)
order()从小到大排序,返还的是排序后数值所在的位置,或者行,order() gives back the index, in the original vector, of the smallest value, then the next smallest, etc:
> order(tab$sleep_total)
tab$sleep_total[order(tab$sleep_total)]返回的是数值,相当于sort(tab$sleep_total)。
> rank(tab$sleep_total)
为指定的某几行按给出的意愿排序,得出的结果为排序后原先的位置或行:
> match(c("Cow","Owl monkey","Cheetah"),tab$name)
计算the row number for "Cotton rat" in the tab dataframe,类似于检索指定的项目在哪个位置:
> match(c("Cotton rat"),tab$name)
> vec=c("red","blue","green","green","yellow","orange") > fac=factor(vec) > fac [1] red blue green green yellow orange Levels: blue green orange red yellow > levels(fac) [1] "blue" "green" "orange" "red" "yellow" > vec=="blue" [1] FALSE TRUE FALSE FALSE FALSE FALSE > fac2=factor(vec,levels=c("blue","green","yellow","orange","red")) > fac2 [1] red blue green green yellow orange Levels: blue green yellow orange red > levels(fac2) [1] "blue" "green" "yellow" "orange" "red"
table()可统计数据的频数:
> table(tab$order)
split()函数,split() is a function which takes a vector and splits it into a list, by grouping the vector according to a factor
将order列排序,再列出sleep_total的值:
> s=split(tab$sleep_total,tab$order)
计算Rodentia的平均值:
> mean(s[["Rodentia"]])
lapply() and sapply() are useful functions for applying a function repeatedly to a vector or list. lapply() returns a list, while sapply() tries to "simplify", returning a vector if possible:
> lapply(s,mean)
$Afrosoricida
[1] 15.6
$Artiodactyla
[1] 4.516667
$Carnivora
[1] 10.11667
$Cetacea
[1] 4.5
$Chiroptera
[1] 19.8
$Cingulata
[1] 17.75
$Didelphimorphia
[1] 18.7
$Diprotodontia
[1] 12.4
$Erinaceomorpha
[1] 10.2
$Hyracoidea
[1] 5.666667
$Lagomorpha
[1] 8.4
$Monotremata
[1] 8.6
$Perissodactyla
[1] 3.466667
$Pilosa
[1] 14.4
$Primates
[1] 10.5
$Proboscidea
[1] 3.6
$Rodentia
[1] 12.46818
$Scandentia
[1] 8.9
$Soricomorpha
[1] 11.1
> sapply(s,mean)
Afrosoricida Artiodactyla Carnivora Cetacea Chiroptera Cingulata
15.600000 4.516667 10.116667 4.500000 19.800000 17.750000
Didelphimorphia Diprotodontia Erinaceomorpha Hyracoidea Lagomorpha Monotremata
18.700000 12.400000 10.200000 5.666667 8.400000 8.600000
Perissodactyla Pilosa Primates Proboscidea Rodentia Scandentia
3.466667 14.400000 10.500000 3.600000 12.468182 8.900000
Soricomorpha
11.100000
tapply()函数比较简化,相当于split()和sapply()两者函数结合:
> tapply(tab$sleep_total,tab$order,mean)
计算"Primates"的sleep_total的标准差:
Use either lapply(s, sd), sapply(s, sd) or tapply(tab$sleep_total, tab$order, sd)
---恢复内容结束---