• MAGMA | gene analysis and generalized gene-set analysis of GWAS data


    基于GWAS的分析结果做基因集富集分析。

    基本代码:

    # download sumstats file from https://ctg.cncr.nl/software/summary_statistics
    # download NCBI37 and g1000_eur from https://ctg.cncr.nl/software/magma
    # download msigdb.v7.2.entrez.gmt from http://www.gsea-msigdb.org/gsea/downloads.jsp
    
    # Annotation ../magma --annotate --snp-loc g1000_eur/g1000_eur.bim --gene-loc NCBI37/NCBI37.3.gene.loc --out test # Gene analysis - SNP p-values ../magma --bfile g1000_eur/g1000_eur --pval sumstats/SESA_neuro_clus_sumstats.txt N=449484 --gene-annot test.genes.annot --out test.gene # cat msigdb.v7.2.entrez.gmt | cut -f1,3- > formated.msigdb.v7.2.entrez.gmt
    # Gene-set analysis ../magma --gene-results test.gene.genes.raw --set-annot geneSet/formated.msigdb.v7.2.entrez.gmt --out GS_test

      

    Annotation的结果

    就是把SNP的ID归到了gene里,NCBI (Entrez) Gene IDs

    # window_up = 0
    # window_down = 0
    79501   1:69091:70008   rs140739101     rs200505207     rs527512746     rs200676709     rs570453814
    100996442       1:142447:174392 rs371474651     rs547574544     rs564576411
    148398  1:859993:879961 rs568015169     rs542412615     rs61464428      rs57465118      rs78033073      rs57924093      rs60837925      rs57816
    26155   1:879583:894679 rs544040354     rs6605067       rs2839  rs553373622     rs143853699     rs528505070     rs557736773     rs370120035
    339451  1:895967:901099 rs113034360     rs112905931     rs532448472     rs144174542     rs562855695     rs531742319     rs28393498      rs56856
    84069   1:901872:910488 rs116147894     rs201730138     rs62639981      rs199941754     rs28499371      rs530698349     rs561236606     rs77359
    84808   1:910579:917473 rs556054629     rs532303174     rs4970429       rs182213575     rs2340592       rs533147788     rs3748588       rs14146
    57801   1:934342:936608 rs186167017     rs569070395     rs201391049     rs200512735     rs113602214     rs531408504     rs2298214       rs14625
    9636    1:948847:949920 rs4615788       rs15842 rs28491407      rs140198533     rs2465124       rs568580969     rs371949419     rs148041041
    

     

    Gene analysis的结果

    类似GWAS的sumstats的结果,核心就是P-value

    GENE       CHR      START       STOP  NSNPS  NPARAM       N        ZSTAT            P
    148398       1     859993     879961     92      21  449484      0.33571      0.36854
    26155        1     879583     894679     72      13  449484      0.22321      0.41168
    339451       1     895967     901099     27       8  449484      0.68675      0.24612
    84069        1     901872     910488     36      17  449484       1.9254     0.027088
    84808        1     910579     917473     24       6  449484      0.94576      0.17214
    57801        1     934342     936608     15       5  449484      0.98676      0.16188
    9636         1     948847     949920      9       4  449484      0.55709      0.28873
    375790       1     955503     991499    193      32  449484       1.0999      0.13569
    401934       1    1007126    1009687      9       4  449484       2.4592    0.0069632
    54991        1    1017198    1051736    123      17  449484      0.38241      0.35108
    254173       1    1109286    1133315    154      20  449484       1.4226     0.077429
    8784         1    1138888    1142163     15       4  449484       1.9105     0.028037
    7293         1    1146706    1149703     10       4  449484       1.7823     0.037349
    51150        1    1152288    1167447     96       6  449484       0.1506      0.44014
    126792       1    1167629    1170421      9       3  449484       2.1349     0.016383
    388581       1    1177826    1182102     13       3  449484      0.49731      0.30948
    118424       1    1189292    1209234     66      12  449484       1.5713     0.058059
    6339         1    1215816    1227409     50      18  449484     0.069082      0.47246
    116983       1    1227764    1243304     76      19  449484      0.33408      0.36916
    

      

    Gene-set analysis的结果

    以pathway为对象的sumstats结果

    # TOTAL_GENES = 18152
    # TEST_DIRECTION = one-sided, positive (set), two-sided (covar)
    # CONDITIONED_INTERNAL = gene size, gene density, inverse mac, log(gene size), log(gene density), log(inverse mac)
    VARIABLE                               TYPE  NGENES         BETA     BETA_STD           SE            P FULL_NAME
    BIOCARTA_FEEDER_PATHWAY                 SET       9     -0.24616     -0.00548      0.21239      0.87677 BIOCARTA_FEEDER_PATHWAY
    BIOCARTA_PROTEASOME_PATHWAY             SET      17    -0.076691   -0.0023459      0.22729       0.6321 BIOCARTA_PROTEASOME_PATHWAY
    BIOCARTA_KREB_PATHWAY                   SET       8     -0.51618    -0.010834      0.40188      0.90049 BIOCARTA_KREB_PATHWAY
    ST_INTERFERON_GAMMA_PATHWAY             SET      10      0.53983     0.012667      0.32686      0.04932 ST_INTERFERON_GAMMA_PATHWAY
    ST_WNT_CA2_CYCLIC_GMP_PATHWAY           SET      20      0.14579    0.0048366      0.19018      0.22167 ST_WNT_CA2_CYCLIC_GMP_PATHWAY
    ST_DIFFERENTIATION_PATHWAY_I...         SET      44    -0.037477    -0.001843      0.14289      0.60345 ST_DIFFERENTIATION_PATHWAY_IN_PC12_CELL
    ST_TUMOR_NECROSIS_FACTOR_PAT...         SET      28      0.11224    0.0044051      0.17192      0.25692 ST_TUMOR_NECROSIS_FACTOR_PATHWAY
    ST_ERK1_ERK2_MAPK_PATHWAY               SET      29     -0.12764   -0.0050977      0.17231      0.77057 ST_ERK1_ERK2_MAPK_PATHWAY
    ST_GA12_PATHWAY                         SET      22      0.11411    0.0039702      0.21219      0.29537 ST_GA12_PATHWAY
    ST_G_ALPHA_S_PATHWAY                    SET      16       0.1042    0.0030923      0.22091      0.31858 ST_G_ALPHA_S_PATHWAY
    ST_G_ALPHA_I_PATHWAY                    SET      36      0.15883    0.0070663      0.14975      0.14444 ST_G_ALPHA_I_PATHWAY
    ST_IL_13_PATHWAY                        SET       5     0.048314   0.00080176       0.4724      0.45927 ST_IL_13_PATHWAY
    ST_P38_MAPK_PATHWAY                     SET      36     -0.16636   -0.0074015      0.15474      0.85882 ST_P38_MAPK_PATHWAY
    ST_JAK_STAT_PATHWAY                     SET       9      0.28402    0.0063229      0.34345      0.20413 ST_JAK_STAT_PATHWAY
    

    pathway里每一个基因的信息

    # ALPHA = 0.05
    # NUMBER_OF_TESTS = 31071
    
    # _SET1_  VARIABLE = GO_DISTAL_AXON (set)
    # _SET1_  NGENES = 290
    # _SET1_  P-VALUE = 1.46129e-06
    _SET1_   GENE     CHR      START       STOP  NSNPS  NPARAM       N        ZSTAT            P  ZFITTED_BASE  ZRESID_BASE
    _SET1_   116983     1    1227764    1243304     76      19  449484   -0.0069549      0.36916    2.1164e-16   -0.0069549
    _SET1_   1855       1    1270658    1284509     53      14  449484       1.3402     0.050325    2.2204e-16       1.3402
    _SET1_   127262     1    3541556    3546695     20       6  449484      0.78373      0.12976    2.2204e-16      0.78373
    _SET1_   8514       1    6052358    6161253    345      59  449484       2.4031     0.001132             0       2.4031
    _SET1_   65018      1   20959948   20978004     89      15  449484     0.064676      0.28667    2.0817e-16     0.064676
    _SET1_   10236      1   23635953   23671143     84      16  449484       1.5148      0.02933    2.2204e-16       1.5148
    _SET1_   4985       1   29138654   29190208    231      37  449484      0.46618      0.14046    2.2204e-16      0.46618
    _SET1_   2899       1   37261128   37499844    562      68  449484       2.5742   0.00053178             0       2.5742
    _SET1_   1996       1   50513686   50669442    258      42  449484      0.97766     0.059361    2.2204e-16      0.97766
    _SET1_   85440      1   62920397   63154039    399      19  449484     0.064887      0.26601    2.0817e-16     0.064887
    _SET1_   4919       1   64239690   64647181   1179      91  449484      -1.5572       0.7101    2.2204e-16      -1.5572
    _SET1_   58155      1   97187161   97280605    303      28  449484        1.435     0.022342    2.2204e-16        1.435
    _SET1_   22911      1  109512835  109584860    241      32  449484       1.1109      0.04975    2.2204e-16       1.1109
    _SET1_   3749       1  110753336  110776674     40      14  449484     -0.78973      0.60279    2.2204e-16     -0.78973
    _SET1_   3737       1  111136202  111174096    110      15  449484      0.59466      0.15466    2.2204e-16      0.59466
    _SET1_   3738       1  111196182  111217655     49      10  449484      -1.3769      0.81738    2.2204e-16      -1.3769
    _SET1_   9860       1  113615792  113667824    127      22  449484     0.032914      0.31657    2.0817e-16     0.032914
    _SET1_   57657      1  155247218  155259639     34       8  449484      0.15263      0.28912    2.2204e-16      0.15263
    _SET1_   23208      1  155829260  155854990     45      12  449484     0.093267      0.30148    2.0817e-16     0.093267
    _SET1_   1314       1  160258377  160313354    113      23  449484       1.2371      0.03771    2.2204e-16       1.2371
    _SET1_   55811      1  167778357  167883608    487      74  449484      0.39334       0.1405    2.2204e-16      0.39334
    _SET1_   63923      1  175036994  175117202    333      37  449484      0.96139     0.050889    2.2204e-16      0.96139
    _SET1_   2752       1  182350839  182361341     40      10  449484      0.27951      0.22106    2.2204e-16      0.27951
    _SET1_   10092      1  183595328  183605076     47       3  449484      0.29785      0.33574    2.2204e-16      0.29785
    _SET1_   23046      1  200938514  200992828    204      29  449484      0.70264      0.10325    2.2204e-16      0.70264
    _SET1_   134        1  203096833  203136533    140      29  449484      0.40025      0.16777    2.2204e-16      0.40025
    

      

    核心原理:

    Gene analysis

    The gene analysis in MAGMA is based on a multiple linear principal components regression [18] model, using an F-test to compute the gene p-value.

    This model first projects the SNP matrix for a gene onto its principal components (PC), pruning away PCs with very small eigenvalues, and then uses those PCs as predictors for the phenotype in the linear regression model.

    This improves power by removing redundant parameters, and guarantees that the model is identifiable in the presence of highly collinear SNPs. By default only 0.1% of the variance in the SNP data matrix is pruned away.

      

    Gene-set analysis

    To perform the gene-set analysis, for each gene g the gene p-value pg computed with the gene analysis is converted to a Z-value zg = Φ−1(1 – pg), where Φ−1 is the probit function. This yields a roughly normally distributed variable Z with elements zg that reflects the strength of the association each gene has with the phenotype, with higher values corresponding to stronger associations.

     

    参考链接:

    https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004219

    https://ctg.cncr.nl/software/magma

    https://ctg.cncr.nl/software/MAGMA/doc/manual_v1.08.pdf

    ~/softwares/magma_v1.08b/test_data/

  • 相关阅读:
    Zigbee安全基础篇Part.3
    Zigbee安全基础篇Part.2
    Zigbee安全基础篇Part.1
    mini2440 Nor Flash工作原理分析
    fuck the browser mode
    valgrind使用
    windows下自己常用的几个bat
    二叉树可视化
    npm的使用
    tp5的phpword使用
  • 原文地址:https://www.cnblogs.com/leezx/p/14309121.html
Copyright © 2020-2023  润新知