当我们输入的数据形式有字符串和数字的时候,更好的输入形式就是以数据框的形式输入进去,数据框也可以用ncol() ,nrow(),取具体某个值 这些函数等
但是以数据框形式输入,有字符串时,这些字符串默认是以因子的形式的
例如:
Died.At <- c(22,40,72,41)
Writer.At <- c(16, 18, 36, 36)
First.Name <- c("John", "Edgar", "Walt", "Jane")
Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
Sex <- c("MALE", "MALE", "MALE", "FEMALE")
Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")
Next, you just combine the vectors that you made with the data.frame()
function:
writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
Remember that data frames must have variables of the same length. Check if you have put an equal number of arguments in all c()
functions that you assign to the vectors and that you have indicated strings of words with ""
.
Note that when you use the data.frame()
function, character variables are imported as factors or categorical variables. Use the str()
function to get to know more about your data frame.
str(writers_df)
For the variables First.Name
and Second.Name
, you don't want this. You can use the I()
function to insulate them.
You can keep the Sex
vector as a factor, because there are only a limited amount of possible values that this variable can have.
Also for the variable Date.of.Death
you don't want to have a factor. It would be better if the values are registered as dates. You can add the as.Date()
function to this variable to make sure this happens.
str(writers_df)
## 'data.frame': 4 obs. of 6 variables:
## $ Died.At : num 22 40 72 41
## $ Writer.At : num 16 18 36 36
## $ First.Name : Factor w/ 4 levels "Edgar","Jane",..: 3 1 4 2
## $ Second.Name : Factor w/ 4 levels "Austen","Doe",..: 2 3 4 1
## $ Sex : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
## $ Date.Of.Death: Factor w/ 4 levels "1817-07-18","1849-10-07",..: 4 2 3 1
Died.At <- c(22,40,72,41)
Writer.At <- c(16, 18, 36, 36)
First.Name <- I(c("John", "Edgar", "Walt", "Jane"))
Second.Name <- I(c("Doe", "Poe", "Whitman", "Austen"))
Sex <- c("MALE", "MALE", "MALE", "FEMALE")
Date.Of.Death <- as.Date(c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18"))
writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
str(writers_df)
str(writers_df)
## 'data.frame': 4 obs. of 6 variables:
## $ Died.At : num 22 40 72 41
## $ Writer.At : num 16 18 36 36
## $ First.Name :Class 'AsIs' chr [1:4] "John" "Edgar" "Walt" "Jane"
## $ Second.Name :Class 'AsIs' chr [1:4] "Doe" "Poe" "Whitman" "Austen"
## $ Sex : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
## $ Date.Of.Death: Date, format: "2015-05-10" "1849-10-07" ...
I()函数能够隔绝字符串,把它转换成一般变量,而不是因子
You can also retrieve the names with the names()
function:
names(writers_df)
## [1] "Died.At" "Writer.At" "First.Name" "Second.Name"
## [5] "Sex" "Date.Of.Death"
How To Remove Columns And Rows From A Data Frame
writers_df[1,3] <- NULL
rows_to_keep <- c(TRUE, FALSE, TRUE, FALSE)
> limited_writers_df <- writers_df[rows_to_keep,]
> limited_writers_df
Died.At Writer.At First.Name Second.Name Sex Date.Of.Death
1 22 16 John Doe MALE 2015-05-10
3 72 36 Walt Whitman MALE 1892-03-26
>X<-data.frame()
temps = data.frame(day=1:10,
+ min = c(50.7,52.8,48.6,53.0,49.9,47.9,54.1,47.6,43.6,45.5),
+ max = c(59.5,55.7,57.3,71.5,69.8,68.8,67.5,66.0,66.1,61.7))
> head(temps)
day min max
1 1 50.7 59.5
2 2 52.8 55.7
3 3 48.6 57.3
4 4 53.0 71.5
5 5 49.9 69.8
6 6 47.9 68.8
> sapply(temps,mode)
date min maximum
"numeric" "numeric" "numeric"
访问数据框的变量
> temps[,3]
[1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7
When you use a single subscript with a data frame, it refers to a data frame consisting
of just that column. R also provides a special subscripting method (double brackets)
to extract the actual data (in this case a vector) from the data frame:
> temps['max']
max
1 59.5
2 55.7
3 57.3
4 71.5
5 69.8
6 68.8
7 67.5
8 66.0
9 66.1
10 61.7
> temps[['max']]
[1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7
n. Suppose we want to convert our minimum
and maximum temperatures to centigrade, and then calculate the dierence between
them. Using with, we can write:
> with(temps,5/9*(max-32) - 5/9*(min-32))
[1] 4.888889 1.611111 4.833333 10.277778 11.055556 11.611111 7.444444
[8] 10.222222 12.500000 9.000000