• 高级统计方法 | Advanced statistical methods


    来自选修的一门统计课程:Advanced statistical methods

    理论性较弱,实践性很强的工具类课程,学完后可以直接拿R来分析数据。

    课程目录:

    1. Introduction to R
    2. Regression model in R
    3. Applied regression I
    4. Applied regression II
    5. Applied regression III
    6. Conditional logistic regression and propensity score method
    7. Inverse probability weighting and meta analysis
    8. Instrumental variable analysis

    课程结业标准:

    1. Appropriate analytic method
    2. Accurate numerical results
    3. Clear presentation of the results and choice of methods
    4. Interpretation of the results relevant to the public health context

    1 Introduction to R

    1. Use R to perform basic algebraic operations
    2. Work with variables, vectors and matrices in R
    3. Produce clear and well formatted graphs in R
    4. Install and load R packages for specific needs

    数据基本操作

    基本运算符:+ - * / ^

    基本运算函数:sqrt、exp、log、abs、round

    帮助:?、??

    数据基本操作函数:rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif

    矩阵运算:%*%、solve、t、colSums、colMeans、dim、cbind、rbind

    逻辑运算:! & |

    判断:is.na is.factor

    类型转换:as.factor

    数据转换:aggregate、plyr包、melt、table、prop.table

    文件读取

    文件存储

    绘图

    基本绘图

    plot

    pairs

    hist

    boxplot

    points

    lines

    text

    abline

    polygon

    legend

    title

    axis

    par

    windows

    layout

    pdf

    dev.off

    高级绘图

    ggplot2

    cowplot

    2 Regression model in R

    生成随机分布的数据:

    runif - 均匀分布

    rbinom

    rnorm

    sample

    set.seed

    cut

    factor

    relevel

    线性回归

    simple linear regression

    multiple linear regression

    interactions

    lm

    summary

    Residuals - the difference between the actual observed response values

    Coefficients

    【必须了解summary结果里面的每一个指标及其意义】

    CI

    confint

    QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R

    3 Applied regression I

    针对特定的数据使用合适的模型

    • Apply poisson and negative binomial regression models to count data
    • Identify and apply suitable model to overdispersed data

    count data

    • Nonnegative
    • positively skewed
    • Variance tends to increase with mean
    • 不符合Homoscedasticity, Normality

    Generalized Linear Model (GLM)

    maximum likelihood

    很奇怪,对1回归,summary(glm(deaths ~ 1, data=horse, family=poisson))?

    Dispersion parameter for poisson family taken to be 1

    glm的summary结果解读

    Model checking

    compare the observed event counts to data that we might have expected, under a Poisson(0.61) model

    Formal model goodness-of-fit

    residual deviance/df should not be too much bigger than 1

    A Poisson model with covariates in R

    summary(glm(deaths~corps, data=horse, family=poisson))

    Incidence rate ratios (IRR) / relative risks

    Poisson regression with offsets

    Overdispersion - Negative Binomial model

    the variance (823.475) is much larger than the mean (28.41)

    summary(glm.nb(y~1, data=epilepsy))

    Comparing models

    A lower AIC indicates a ‘better’ model

    4 Applied regression II

    1. Apply Poisson and negative binomial regression models to count data
    2. Identify and apply suitable model to overdispersed data
    3. Identify influential observations影响点,去掉某点后的影响力大小
    4. Perform model diagnostics
    5. Understand and deal with multicollinearity

    hatvalues(mvc.r.lm)

    sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)

    Model diagnostics

    Estimation method and statistical tests are based on model assumptions

    • potential violated assumptions
    • extent of violation
    • Acknowledge limitation
    • alternative statistical model

    Assumptions of linear regression model

    • Linearity
    • Homoscedasticity
    • Normality of the errors
    • Independence

    Residual plot against fitted values

    Q-Q Plot

    P-P Plot

    ACF plot

    Multicollinearity

    VIF

    5 Applied regression III

    1. Identify and handle multicollinearity
    2. Account for confounding factors in regression model
    3. Assess potential effect modifiers in regression model
    4. Perform basic mediation analysis


    6 Conditional logistic regression and propensity score method

    1. Fit conditional logistic regression model to data from case control study
    2. Understand the assumptions of the propensity score method
    3. Interpret results from propensity score method


    7 Inverse probability weighting and meta analysis

    1. Appreciate the use of inverse probability weighting
    2. Apply inverse probability weighting for analysis of missing data
    3. Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies


    8 Instrumental variable analysis

    1. Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
    2. Understand the assumptions instrumental variable analysis
    3. Interpret results from instrumental variable analysis

    基本概念:

    RR

    OR和β(estimated coefficients)

    Final exam

    An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.

    研究人员根据病历数据库对他汀类(statin)药物治疗与心理疾病之间的关联进行了回顾性分析(retrospective analysis)。 该分析针对潜在的混杂因素(例如年龄,性别,BMI和合并症)进行了调整。

    变量Variable name

    • Id
    • Male
    • Age
    • Bmi
    • comorbid.s, Charlson comorbidity index
    • Statin, Statin users
    • Psych, Psychological disorder
      id male age  bmi comorbid.s statin psych
    1  1    0  54 20.9          1      0     0
    2  2    0  42 19.1          0      0     0
    3  3    1  46 23.9          1      1     0
    4  4    1  58 23.5          0      0     1
    5  5    1  43 28.7          1      1     0
    6  6    1  46 26.6          0      1     0
    

    -

    问题:

    (A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
    table. [10%] 标准的线性模型

    The investigator also decided to carry out a propensity score analysis. PSA分析参考作业2
    (B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
    (C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
    (D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
    (E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
    (F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
    (G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]

    结题思路:

    1. 可以用的模型,标准linear regression;GLM:possion、NB;clogit等

  • 相关阅读:
    php图片上传代码
    数据库笔记
    数学函数类方法的使用.java
    有n人围成一圈,顺序排号。从第1个人开始报数(从1到3报数),凡报到3的人退出圈子,问最后留下的是原来的第几号的那位。
    现有有N个学生的数据记录,每个记录包括学号、姓名、三科成绩。 编写一个函数input,用来输入一个学生的数据记录。 编写一个函数print,打印一个学生的数据记录。 在主函数调用这两个函数,读取N条记录输入,再按要求输出。 N<100
    求Sn=1!+2!+3!+4!+5!+…+n!之值,其中n是一个数字
    分数相加减的代码(c++)
    Caesar cipher
    db2、Oracle存储过程引号用法
    CSS基础总结
  • 原文地址:https://www.cnblogs.com/leezx/p/12649855.html
Copyright © 2020-2023  润新知