• python numpy小记2 PCA


    关于利用PCA算法的过程

    https://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html#numpy.cov

    计算数据的协方差 $X = [x_1,...x_N]^T$.

    >>> x = np.array([[0, 2], [1, 1], [2, 0]]).T
    >>> x
    array([[0, 1, 2],
           [2, 1, 0]])
    # 计算x0,x1的协方差矩阵
    >>> np.cov(x)
    array([[ 1., -1.],
           [-1.,  1.]])

    要执行数据的PCA算法。

    (1) 首先需要将数据归一化,得到数据的协方差矩阵

    (2) 再计算协方差矩阵的特征值和特征向量,并进行按照特征值的大小进行排序。

    (3)将数据投影到选择的PCA空间,得到对应的投影向量,与原来的初始值计算最小均方误差。

    1. 归一化数据。

    def normalize(X):
        """Normalize the given dataset X
        Args:
            X: ndarray, dataset
        
        Returns:
            (Xbar, mean, std): ndarray, Xbar is the normalized dataset
            with mean 0 and standard deviation 1; mean and std are the 
            mean and standard deviation respectively.
        
        Note:
            You will encounter dimensions where the standard deviation is
            zero, for those when you do normalization the normalized data
            will be NaN. Handle this by setting using `std = 1` for those 
            dimensions when doing normalization.
        """
        mu = np.zeros(X.shape[1]) # EDIT 
        mu = np.mean(X, axis=0) #按列求平均值
        std = np.std(X, axis=0) #按照列求标准差
        std_filled = std.copy()  #复制
        std_filled[std==0] = 1.  #若标准差为0,将其变成1,防止分母为0
        Xbar =  np.divide(X-mu, std_filled)            # EDIT THIS
        return Xbar, mu, std

    2. np.argsort() 

    x = np.array([3,1,2])
    print (np.argsort(x)  )#升序排列,返回索引
    # ouput:  [1,2,0]
    #=========降序排列
    print(np.argsort(-x)) 
    #output: [0, 1, 2]

    3.计算特征向量特征值

    def eig(S):
        """Compute the eigenvalues and corresponding eigenvectors 
            for the covariance matrix S.
        Args:
            S: ndarray, covariance matrix
        
        Returns:
            (eigvals, eigvecs): ndarray, the eigenvalues and eigenvectors
    
        Note:
            the eigenvals and eigenvecs SHOULD BE sorted in descending
            order of the eigen values
            
            Hint: take a look at np.argsort for how to sort in numpy.
        """
        w, v = np.linalg.eigh(S)
        sorted_indices = np.argsort(-w) #降序排列,返回索引
        W = w[sorted_indices] #按照索引重排重排
        V = v[:, sorted_indices] #按照索引重
        return (W, V) # EDIT THIS

    通过排序可以轻易得到PCA向量空间。

    4.求投影矩阵

    def projection_matrix(B):
        """Compute the projection matrix onto the space spanned by `B`
        Args:
            B: ndarray of dimension (D, M), the basis for the subspace
        
        Returns:
            P: the projection matrix
        """
        P = np.eye(B.shape[0]) # EDIT THIS
        P = np.dot(B,B.T)
        return P

     5. PCA算法

    def PCA(X, num_components):
        """
        Args:
            X: ndarray of size (N, D), where D is the dimension of the data,
               and N is the number of datapoints
            num_components: the number of principal components to use.
        Returns:
            X_reconstruct: ndarray of the reconstruction
            of X from the first `num_components` principal components.
        """
        # Compute the data covariance matrix S
        S = np.cov(X.T)
        
        # Next find eigenvalues and corresponding eigenvectors for S by implementing eig().
        eig_vals, eig_vecs = eig(S)
        eig_vals = eig_vals[ : num_components]
        eig_vecs = eig_vecs[:, : num_components]
        
        # Reconstruct the images from the lowerdimensional representation
        # To do this, we first need to find the projection_matrix (which you implemented earlier)
        # which projects our input data onto the vector space spanned by the eigenvectors
        P = projection_matrix(eig_vecs) # projection matrix
        
        # Then for each data point x_i in the dataset X 
        #   we can project the original x_i onto the eigenbasis.
        X_reconstruct = np.zeros(X.shape)
        X_reconstruct = np.dot(X, P)
        return X_reconstruct

     6. 关于高维(N < D)数据的PCA优化

    假设归一化数据矩阵为 $X in R^{NxD},D>N$。为了做PCA需要执行以下步骤。

    (1) 我们需要执行特征值/特征向量,特征矩阵$frac{1}{N} X X^T $。需要解决 $lambda_i,c_i$.

    $frac{1}{N} X X^T c_i = lambda_i c_i$

    (2) 与初始的特征向量$b_i$,  $frac{1}{N} X^T X b_i = lambda_i b_i$

    (3) 左乘一个矩阵。$frac{1}{N} X^T X X^T c_i = lambda_i X ^T c_i$

    故我们可以重新得到 $b_i = X^T c_i $ 作为协方差矩阵$S$的特征向量,以及特征值$lambda_i$。

    def PCA_high_dim(X, num_components):
        """Compute PCA for small sample size. 
        Args:
            X: ndarray of size (N, D), where D is the dimension of the data,
               and N is the number of data points in the training set. You may assume the input 
               has been normalized.
            num_components: the number of principal components to use.
        Returns:
            X_reconstruct: (N, D) ndarray. the reconstruction
            of X from the first `num_components` principal components.
        """
        N, D = X.shape
        M = np.zeros((N, N)) # EDIT THIS, compute the matrix frac{1}{N}XX^T.
        M = np.dot(X, X.T) / N
        eig_vals, eig_vecs = eig(M) # EDIT THIS, compute the eigenvalues. 
        U = eig_vecs[:, : num_components] # EDIT THIS. Compute the eigenvectors for the original PCA problem.
        # Similar to what you would do in PCA, compute the projection matrix,
        # then perform the projection.
        P = projection_matrix(U) # projection matrix
        X_reconstruct = np.zeros((N, D)) # EDIT THIS.
        X_reconstruct = np.dot(P, X)
        return X_reconstruct






    The Safest Way to Get what you Want is to Try and Deserve What you Want.
  • 相关阅读:
    jQuery入门级part.2
    jQuery入门级part.1
    总结十二天
    延时器和定时器
    总结第十一天
    总结第十天
    总结第九天
    android特殊字符
    android 查看 当前activity
    京东运营 不错的帖子
  • 原文地址:https://www.cnblogs.com/Shinered/p/9221029.html
Copyright © 2020-2023  润新知