• Python for Data Science


    Chapter 5 - Dimensionality Reduction Methods

    Segment 1 - Explanatory factor analysis

    Factor Analysis

    A method that explores a data set in order to find root causes which explain why data is acting a certain way

    Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

    Factor Analysis Assumptions

    • Features are metric
    • Feature are continuous or ordinal
    • There is r > 0.3 correlation between the features in your dataset
    • You have > 100 observations and > 5 observations per feature
    • Sample is homogenous

    The Iris Dataset

    Iris flowers(labels):

    • Setosa
    • Versicolour
    • Virginica

    Attributes (predictive features):

    • Sepal length
    • Sepal length
    • Petal length
    • Petal width

    Factor Loading

    • ~ -1 or 1 = Factor has a strong influence on the variable
    • ~0 = Factor weakly influences on the variable
    • '>1 = That means these are highly correlated factors
    import pandas as pd
    import numpy as np
    
    import sklearn
    from sklearn.decomposition import FactorAnalysis
    
    from sklearn import datasets
    

    Factor analysis on iris dataset

    iris = datasets.load_iris()
    
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    factor = FactorAnalysis().fit(X)
    
    DF = pd.DataFrame(factor.components_, columns=variable_names)
    print(DF)
    
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0           0.706989         -0.158005           1.654236           0.70085
    1           0.115161          0.159635          -0.044321          -0.01403
    2          -0.000000          0.000000           0.000000           0.00000
    3          -0.000000          0.000000           0.000000          -0.00000
  • 相关阅读:
    HDOJ1213 并查集
    poj 3070 Fibonacci
    csu 1102 Palindrome
    C#格式化数值结果表
    正则表达式基础知识
    C#验证Email是否真正存在
    【翻译】Scott Mitchell的ASP.NET2.0数据教程中文版索引
    分块下载,测试文件 3.8GB
    asp.net的TextBox回车触发事件
    Cookie加密
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321001.html
Copyright © 2020-2023  润新知