• 相关性分析


    1、   计算相关系数

    (1)     cor()函数可以计算以下三种相关系数:

    (2)     Pearson 极差相关系数:两个连续变量之间的线性相关程度。

    (3)     Spearman 等级相关系数:等级变量之间的相关程度。

    (4)     Kendall 等级相关系数:非参数的等级相关度量。

    (5)     语法:cor(data, use= ,  method=)

    data:矩阵或数据框;

    use:缺失数据的处理方式。

      all.obs:假设不存在缺失数据,遇到缺失数据将报错。

      everything:遇到缺失数据时,相关系数的计算结果将被设置为 missing ;

      complete.obs:行删除;

      pairwise.obs: 成对删除。

           method:指定相关系数的类型。pearson、spearman、kendall。

    原示例

    > states<- state.x77[, 1:6]

    > x<- states[,c("Population", "Income", "Illiteracy","HS Grad")]

    > y<-states[,c("Life Exp","Murder")]

    > cor(x,y)

    结果:

                  Life Exp     Murder

    Population -0.06805195  0.3436428

    Income      0.34025534 -0.2300776

    Illiteracy -0.58847793  0.7029752

    HS Grad     0.58221620 -0.4879710

    探索 房子单价与 面积,所在楼层,总层高之间的相关性

    数据准备

    > house<- read.table("house_data.txt", header = TRUE, sep='|',fileEncoding ="UTF-8",

    +                    stringsAsFactors = FALSE,

    +                    colClasses = c("character","character","numeric",

    +                                   "character","numeric","numeric","character",

    +                                   "numeric","numeric","character"))

    >

    > houseXQ<- sqldf("select * from house where  community_name!='东郊小镇' ",row.names=TRUE)

    Error in sqldf("select * from house where  community_name!='东郊小镇' ",  :

      could not find function "sqldf"

    > library(sqldf)

    载入需要的程辑包:gsubfn

    载入需要的程辑包:proto

    载入需要的程辑包:RSQLite

    > houseXQ<- sqldf("select * from house where  community_name!='东郊小镇' ",row.names=TRUE)

    > communityFactor<- factor(houseXQ$community_name, order=FALSE)

    > houseXQ <-cbind(houseXQ, communityFactor)

    总价与面积,当前楼层,总层高,单价的相关性

    x<- houseXQ [,c("house_total")]

    y<- houseXQ [,c("house_area","house_floor_curr","house_floor_total","house_avg")]

    > cor(x,y)

    结果:

    house_area house_floor_curr house_floor_total house_avg

    [1,]  0.9450675      -0.02058832        0.03570221 0.4395242

    总价与面积高度相关。

    相关系统的显著性检测:由结果可见,它们高度相关

    cor.test(houseXQ[, c("house_total")],houseXQ[, c("house_area")] )

    Pearson's product-moment correlation

     

    data:  houseXQ[, c("house_total")] and houseXQ[, c("house_area")]

    t = 39.537, df = 187, p-value < 2.2e-16

    alternative hypothesis: true correlation is not equal to 0

    95 percent confidence interval:

     0.9274393 0.9585053

    sample estimates:

          cor

    0.9450675

    单价与 面积,当前楼层,总层高,总价的相关性

    x<- houseXQ [,c("house_avg")]

    y<- houseXQ [,c("house_area","house_floor_curr","house_floor_total","house_total")]

    cor(x,y)

    结果:

    house_area house_floor_curr house_floor_total house_total

    [1,]  0.1659645        0.2139952         0.3024903   0.4395242

  • 相关阅读:
    VMware Workstation 11 安装MAC OS X 10.10 Yosemite(14B25)图解 2015-01-13 12:26:01|
    tensor搭建--windows 10 64bit下安装Tensorflow+Keras+VS2015+CUDA8.0 GPU加速
    vs2015终于配置完成了
    Visual Studio 2015 update 3各版本下载地址
    惊艳的cygwin——Windows下的Linux命令行环境的配置和使用
    TensorFlow从入门到实战资料汇总 2017-02-02 06:08 | 数据派
    官方Caffe-windows 配置与示例运行
    ipython notebook 如何打开.ipynb文件?
    Ubuntu16.04 +cuda8.0+cudnn+caffe+theano+tensorflow配置明细
    【CUDA】CUDA开发环境搭建
  • 原文地址:https://www.cnblogs.com/quietwalk/p/8301342.html
Copyright © 2020-2023  润新知