• R 分组计算描述性统计量


    统计学区内各个小区的房价均值

    数据格式

    id|community_name|house_area|house_structure|house_total|house_avg|agency_name|house_floor_curr|house_floor_total|house_floor_type

    6328500962692431872|尚东花园|77.0|32|285.0|37013.0|利众置业|5|5|多层

    6328500979813580800|赛世香樟园|93.0|22|265.0|28495.0|苏商房产仙林店|9|11|小高层

    导入数据

    house<- read.table("house_data.txt", header = TRUE, sep='|',fileEncoding ="UTF-8",

                      stringsAsFactors = FALSE,

                      colClasses = c("character","character","numeric",

                                     "character","numeric","numeric","character",

                                     "numeric","numeric","character"))

    houseXQ<- sqldf("select * from house where  community_name!='东郊小镇' ",row.names=TRUE)

    选择列

    selectedColumns<- c("community_name","house_avg")

    将小区名转换成因子

    communityFactor<- factor(houseXQ$community_name, order=FALSE)

    将因子列整合到数据框中

    houseXQ <-cbind(houseXQ, communityFactor)

    重新选择列

    selectedColumns<- c("communityFactor","house_avg")

    看一下数据

    head(houseXQ[selectedColumns])

    按小区名分组计算均值

    aggregate(houseXQ[selectedColumns], by=list(communityFactor=houseXQ$communityFactor),mean)

    结果:

     

    自定义函数计算统计量

    funcMystats<- function(x, na.omit= FALSE){

      if(na.omit){

        x<- x[!is.na(x)]

      }

      m<- mean(x)

      n<- length(x)

      s<- sd(x)

      skew<- sum((x-m)^3/s^3)/n

      kurt<- sum((x-m)^4/s^4)/n-3

      return (c(n=n,mean=m, stdev=s, skew=skew, kurtosis=kurt))

      }

     

    funcDstats <- function(x) sapply(x, funcMystats)#对于每个X调用 funcMystats 函数

     

    by(houseXQ[selectedColumns], houseXQ$community_name, funcDstats)
    #对于数据框 houseXQ,选择 selectedColumns 的列,根据 houseXQ$community_name 进行分组,对每组调用 funcDstats函数计算统计量

     

    结果(部分)

    houseXQ$community_name: 东方天郡
             communityFactor     house_avg
    n                     51    51.0000000
    mean                  NA 38255.8039216
    stdev                  0  2145.6443696
    skew                  NA    -0.4395676
    kurtosis              NA     0.6015383
    ------------------------------------------------------------ 
    houseXQ$community_name: 康桥圣菲
             communityFactor     house_avg
    n                      9     9.0000000
    mean                  NA 34359.0000000
    stdev                  0  1567.1059313
    skew                  NA    -0.9804274
    kurtosis              NA    -0.8342473
    ------------------------------------------------------------ 
    houseXQ$community_name: 南师大茶苑
             communityFactor house_avg
    n                      1         1
    mean                  NA     31691
    stdev                 NA        NA
    skew                  NA        NA
    kurtosis              NA        NA
    ------------------------------------------------------------ 
    houseXQ$community_name: 赛世香樟园
             communityFactor     house_avg
    n                      3     3.0000000
    mean                  NA 28938.3333333
    stdev                  0  1654.1733081
    skew                  NA     0.2487582
    kurtosis              NA    -2.3333333
    ------------------------------------------------------------ 
    houseXQ$community_name: 三味公寓
             communityFactor  house_avg
    n                      2     2.0000
    mean                  NA 28662.0000
    stdev                  0   576.9991
    skew                  NA     0.0000
    kurtosis              NA    -2.7500
    ------------------------------------------------------------ 
    houseXQ$community_name: 尚东花园
             communityFactor house_avg
    n                      1         1
    mean                  NA     37013
    stdev                 NA        NA
    skew                  NA        NA
    kurtosis              NA        NA
    ------------------------------------------------------------ 

    使用 psych 包中的 describeBy()分组计算概述统计量

    library(psych)
    selectedColumns<- c("communityFactor","house_avg")
    describeBy(houseXQ[selectedColumns], list(houseXQ$communityFactor))

    结果如下(部分)

     
  • 相关阅读:
    hdu 1254 推箱子(双重bfs)
    hdu 1495 非常可乐 (广搜)
    [leetcode-687-Longest Univalue Path]
    [leetcode-686-Repeated String Match]
    POJ 2287 田忌赛马 贪心算法
    [leetcode-304-Range Sum Query 2D
    [leetcode-682-Baseball Game]
    [leetcode-299-Bulls and Cows]
    [leetcode-319-Bulb Switcher]
    [leetcode-680-Valid Palindrome II]
  • 原文地址:https://www.cnblogs.com/quietwalk/p/8253089.html
Copyright © 2020-2023  润新知