• 机器学习实战-学习笔记-第十三章


    1.通过Window10“所有应用”->"Anaconda(64bit)"->"Anaconda Command Prompt",启动pyhon,

    2.在Python REPL中输入脚本

    def loadDataSet(fileName, delim='	'):
        fr = open(fileName)
        stringArr = [line.strip().split(delim) for line in fr.readlines()]
        datArr = [map(float,line) for line in stringArr]
        return mat(datArr)
    
    def pca(dataMat, topNfeat=9999999):
        meanVals = mean(dataMat, axis=0)
        meanRemoved = dataMat - meanVals #remove mean
        covMat = cov(meanRemoved, rowvar=0)
        eigVals,eigVects = linalg.eig(mat(covMat))
        eigValInd = argsort(eigVals)            #sort, sort goes smallest to largest
        eigValInd = eigValInd[:-(topNfeat+1):-1]  #cut off unwanted dimensions
        redEigVects = eigVects[:,eigValInd]       #reorganize eig vects largest to smallest
        lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions
        reconMat = (lowDDataMat * redEigVects.T) + meanVals
        return lowDDataMat, reconMat

    运行结果如下:

    Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://binstar.org
    >>> from numpy import *
    >>>
    >>> def loadDataSet(fileName, delim='	'):
    ...     fr = open(fileName)
    ...     stringArr = [line.strip().split(delim) for line in fr.readlines()]
    ...     datArr = [map(float,line) for line in stringArr]
    ...     return mat(datArr)
    ...
    >>> def pca(dataMat, topNfeat=9999999):
    ...     meanVals = mean(dataMat, axis=0)
    ...     meanRemoved = dataMat - meanVals #remove mean
    ...     covMat = cov(meanRemoved, rowvar=0)
    ...     eigVals,eigVects = linalg.eig(mat(covMat))
    ...     eigValInd = argsort(eigVals)            #sort, sort goes smallest to largest
    ...     eigValInd = eigValInd[:-(topNfeat+1):-1]  #cut off unwanted dimensions
    ...     redEigVects = eigVects[:,eigValInd]       #reorganize eig vects largest to smallest
    ...     lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions
    ...     reconMat = (lowDDataMat * redEigVects.T) + meanVals
    ...     return lowDDataMat, reconMat

    验证:

    >>> dataMat = loadDataSet('F:\studio\MachineLearningInAction\ch13\testSet.txt')
    >>> shape(dataMat)
    (1000L, 2L)
    >>> lowMat, reconMat = pca(dataMat, 1)
    >>> shape(lowMat)
    (1000L, 1L)
    >>> import matplotlib
    >>> import matplotlib.pyplot as plt>>> fig = plt.figure()
    >>> ax = fig.add_subplot(111)
    >>> import matplotlib.pyplot as plt
    >>> fig = plt.figure()
    >>> ax = fig.add_subplot(111)
    >>> ax.scatter(dataMat[:,0].flatten().A[0], dataMat[:,1].flatten().A[0], marker='^', s=90)
    <matplotlib.collections.PathCollection object at 0x0000000009F556D8>
    >>> ax.scatter(reconMat[:,0].flatten().A[0], reconMat[:,1].flatten().A[0], marker='^', s=90)
    <matplotlib.collections.PathCollection object at 0x0000000009F55710>
    >>> plt.show()
  • 相关阅读:
    JMeter使用正则表达式提取相应信息
    Python的configparser生成配置文件,以及相关操作
    Python函数主要的作用
    Python函数传参位置关系总结
    Python中的作用域
    Python高阶函数解析
    Python中set集合的增加,update和add的区别
    Oracle 11g服务详细介绍及哪些服务是必须开启的?
    javacc jjtree 写法 以及 jj写法 基本语法 以及应用
    Python 3.3 try catch所有的错误Error,不包括Exception。关键在于 sys.exc_info()
  • 原文地址:https://www.cnblogs.com/littlesuccess/p/5096193.html
Copyright © 2020-2023  润新知