• R中的一些数据形式


    当我们输入的数据形式有字符串和数字的时候,更好的输入形式就是以数据框的形式输入进去,数据框也可以用ncol() ,nrow(),取具体某个值 这些函数等

    但是以数据框形式输入,有字符串时,这些字符串默认是以因子的形式的

    例如:

    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- c("John", "Edgar", "Walt", "Jane")
    Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")
    

    Next, you just combine the vectors that you made with the data.frame() function:

     writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)

    Remember that data frames must have variables of the same length. Check if you have put an equal number of arguments in all c()functions that you assign to the vectors and that you have indicated strings of words with "".

    Note that when you use the data.frame() function, character variables are imported as factors or categorical variables. Use the str() function to get to know more about your data frame.

     str(writers_df)

     For the variables First.Name and Second.Name, you don't want this. You can use the I() function to insulate them. 

    You can keep the Sex vector as a factor, because there are only a limited amount of possible values that this variable can have.

    Also for the variable Date.of.Death you don't want to have a factor. It would be better if the values are registered as dates. You can add the as.Date() function to this variable to make sure this happens.

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   : Factor w/ 4 levels "Edgar","Jane",..: 3 1 4 2
    ##  $ Second.Name  : Factor w/ 4 levels "Austen","Doe",..: 2 3 4 1
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Factor w/ 4 levels "1817-07-18","1849-10-07",..: 4 2 3 1


    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- I(c("John", "Edgar", "Walt", "Jane"))
    Second.Name <- I(c("Doe", "Poe", "Whitman", "Austen"))
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- as.Date(c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18"))
    writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
    str(writers_df)

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   :Class 'AsIs'  chr [1:4] "John" "Edgar" "Walt" "Jane"
    ##  $ Second.Name  :Class 'AsIs'  chr [1:4] "Doe" "Poe" "Whitman" "Austen"
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Date, format: "2015-05-10" "1849-10-07" ...

    I()函数能够隔绝字符串,把它转换成一般变量,而不是因子


    You can also retrieve the names with the names() function:

    names(writers_df)
    ## [1] "Died.At"       "Writer.At"     "First.Name"    "Second.Name"
    ## [5] "Sex"           "Date.Of.Death"

    How To Remove Columns And Rows From A Data Frame

    writers_df[1,3] <- NULL

     

    rows_to_keep <- c(TRUE, FALSE, TRUE, FALSE)
    > limited_writers_df <- writers_df[rows_to_keep,]
    > limited_writers_df
    Died.At Writer.At First.Name Second.Name Sex Date.Of.Death
    1 22 16 John Doe MALE 2015-05-10
    3 72 36 Walt Whitman MALE 1892-03-26

     

    >X<-data.frame()

    temps = data.frame(day=1:10,
    + min = c(50.7,52.8,48.6,53.0,49.9,47.9,54.1,47.6,43.6,45.5),
    + max = c(59.5,55.7,57.3,71.5,69.8,68.8,67.5,66.0,66.1,61.7))
    > head(temps)
    day min max
    1 1 50.7 59.5
    2 2 52.8 55.7
    3 3 48.6 57.3
    4 4 53.0 71.5
    5 5 49.9 69.8
    6 6 47.9 68.8

    > sapply(temps,mode)
    date min maximum
    "numeric" "numeric" "numeric"

    访问数据框的变量

    > temps[,3]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    When you use a single subscript with a data frame, it refers to a data frame consisting
    of just that column. R also provides a special subscripting method (double brackets)
    to extract the actual data (in this case a vector) from the data frame:

    > temps['max']
    max
    1 59.5
    2 55.7
    3 57.3
    4 71.5
    5 69.8
    6 68.8
    7 67.5
    8 66.0
    9 66.1
    10 61.7
    > temps[['max']]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    n. Suppose we want to convert our minimum
    and maximum temperatures to centigrade, and then calculate the di erence between
    them. Using with, we can write:
    > with(temps,5/9*(max-32) - 5/9*(min-32))
    [1] 4.888889 1.611111 4.833333 10.277778 11.055556 11.611111 7.444444
    [8] 10.222222 12.500000 9.000000

  • 相关阅读:
    Windows 8.1 应用再出发
    Windows 8.1 应用再出发
    python 列表,字典,元组,字符串,常用函数
    python 排序 sort和sorted
    python中的zip、map、reduce 、lambda、filter函数的使用
    SecureCRT的安装与破解(过程很详细!!!)
    L1和L2正则
    神经网络,机器学习公开课
    待整理
    tensorflow中文教程
  • 原文地址:https://www.cnblogs.com/yupeter007/p/5329250.html
Copyright © 2020-2023  润新知