• 特征选择


    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 10 20:26:15 2016
    
    @author: qqhfeng
    """
    
    #模块1 VarianceThreshold 选择特征值
    '''
    Feature selector that removes all low-variance features. 
    This feature selection algorithm looks only at the features (X), 
    not the desired outputs (y), and can thus be used for unsupervised learning.
    
    VarianceThreshold is a simple baseline approach to feature selection. 
    It removes all features whose variance doesn’t meet some threshold.
    By default, it removes all zero-variance features, i.e. 
    features that have the same value in all samples. 
    As an example, suppose that we have a dataset with boolean features, 
    and we want to remove all features that are either one or zero (on or off) 
    in more than 80% of the samples. Boolean features are Bernoulli random variables,
    and the variance of such variables is given by
    '''
    
    from sklearn.feature_selection import VarianceThreshold
    X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
    #sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
    sel = VarianceThreshold()
    print sel.fit_transform(X)
    
    
    
    
    #模块2 选择最重要的 SelectKBest removes all but the k highest scoring features
    from sklearn.datasets import load_iris
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import chi2
    iris = load_iris()
    X, y = iris.data, iris.target
    print X.shape
    X_new = SelectKBest(chi2, k=2).fit_transform(X, y) #chi2是一种特征重要性评价方法
    print X_new.shape
    
    
    
    #模块3 递归特征消除法
  • 相关阅读:
    个人记录--当前年月,求当月天数和上月
    java修改图片大小
    多层iframe的页面取子标签
    oracle的游标
    json中获取key值
    iOS开发常用代码块(2)
    大话数据结构(六)——链式存储
    项目中比较有用得到js经验
    微信公众号开发——php sdk php中curl用法
    微信页面设计weui源代码(4)——Pciker微信页面中实现下拉菜单
  • 原文地址:https://www.cnblogs.com/qqhfeng/p/5758354.html
Copyright © 2020-2023  润新知