• highly variable gene | 高变异基因的选择 | feature selection | 特征选择


    在做单细胞的时候,有很多基因属于noise,就是变化没有规律,或者无显著变化的基因。在后续分析之前,我们需要把它们去掉。

    以下是一种找出highly variable gene的方法:

    The feature selection procedure is based on the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data) See Figure S1C. In particular, Support Vector Regression (SVR, Smola and Vapnik, 1997) was used for this purpose (scikit-learn python implementation, default parameters with gamma = 0.06; Pedregosa et al., 2011).

    #Pre-filtering
    df_f = df_merge.copy()
    df_f = df_f.ix[sum(df_f>=1, 1)>=5,:] # is at least 1 in X cells
    df_f = df_f.ix[sum(df_f>=2, 1)>=2,:] # is at least 2 in X cells
    df_f = df_f.ix[sum(df_f>=3, 1)>=1,:] # is at least 2 in X cells
    
    #Fitting
    mu = df_f.mean(1).values
    sigma = df_f.std(1, ddof=1).values
    cv = sigma/mu
    score, mu_linspace, cv_fit , params = fit_CV(mu,cv, 'SVR', svr_gamma=0.005)
    
    #Plotting
    def plot_cvmean():
        figure()
        scatter(log2(mu),log2(cv), marker='o', edgecolor ='none',alpha=0.1, s=5)
        mu_sorted = mu[argsort(score)[::-1]]
        cv_sorted = cv[argsort(score)[::-1]]
        scatter(log2(mu_sorted[:thrs]),log2(cv_sorted[:thrs]), marker='o', edgecolor ='none',alpha=0.15, s=8, c='r')
        plot(mu_linspace, cv_fit,'-k', linewidth=1, label='$Fit$')
        plot(linspace(-9,7), -0.5*linspace(-9,7), '-r', label='$Poisson$')
        ylabel('log2 CV')
        xlabel('log2 mean')
        grid(alpha=0.3)
        xlim(-8.6,6.5)
        ylim(-2,6.5)
        legend(loc=1, fontsize='small')
        gca().set_aspect(1.2)
        
    plot_cvmean()
    
    #Adjusting plot
    

    对每一个基因在不同细胞中的表达量的mean和CV散点图,通过SVR拟合出noise的曲线。

    通过the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data)就能找出highly variable gene了。

      

  • 相关阅读:
    Spoj-ANTP Mr. Ant & His Problem
    Spoj-BIPCSMR16 Team Building
    cf575A Fibonotci
    cf660E Different Subsets For All Tuples
    cf468C Hack it!
    cf468B Two Sets
    cf701E Connecting Universities
    Spoj-BLMIRINA Archery Training
    Spoj-BITDIFF Bit Difference
    Spoj-VISIBLEBOX Decreasing Number of Visible Box
  • 原文地址:https://www.cnblogs.com/leezx/p/8631812.html
Copyright © 2020-2023  润新知