IDL只能通过调用envi的二次接口做图像的变换,但是对于普通的数据没有提供函数。根据主成分变换的原理,用IDL写出来了,这样就不用每次再去用matlab的princomp去做了。主成分变化的基本过程:
(1)把原始数据中每个样本用一个向量表示,然后把所有样本组合起来构成一个矩阵。当然了,为了避免样本的单位的影响,样本集需要标准化。
(2)求该矩阵的协防差矩阵
(3)求步骤2中得到的协方差矩阵的特征值和特征向量。
(4)将求出的特征向量按照特征值的大小进行组合形成一个映射矩阵。
(5)用步骤4的映射矩阵对标准化后的原始数据进行映射。
IDL代码:
;+
; :AUTHOR: Cao zhigang
; :Copyright:CAS-NIGLAS
; :email:zhigang_niglas@163.com
; :blog:blog.sina.com.cn/ahnucao
;-
PRO PRINCOMP,IN_DATA = in_data,LOADINGS = loadings,SCORES = scores,LATENT = latent
; Principal component analysis (PCA) on data
;
; IN_DATA: n*p matrix as input,n-observations, p-variables.
; LOADINGS is a p-by-p matrix
; LATENT: a vector containing the eigenvalues of the covariance matrix of IN_DATA
; SCORES: the principal component scores
; Attention: the structure of array in IDL is: col * row, must transpose the matrix according to actual situation.
;
;
; To determin the deminsions of in_data
IF N_ELEMENTS(SIZE(in_data,/dimension)) NE 2 THEN BEGIN
PRINT,'Input data must be 2-d form.'
RETURN
ENDIF
;
; Get the col and row
;
dims = SIZE(in_data,/dimensions)
col = dims[0]
row = dims[1]
;------------------------PCA----------------------------------------------
;
;1. Normalize the data,i.e., to minus the mean of every column
avg = FLTARR(col)
avg = MEAN(in_data,dimension = 2); Get the mean values of every column
nor_data = FLTARR(col,row)
; Normalize
FOR i=0,col-1 DO BEGIN
nor_data[i,*] = in_data[i,*] - avg[i]
ENDFOR
;2 Get covariance matrix
cov = IMSL_COVARIANCES(TRANSPOSE(nor_data))
;3 Calc eigvalues and eig-vector by IMSL Advanced Math and Statistics
eig = IMSL_EIG(cov,vector = eig_matrix)
;
scores = nor_data##FLOAT(eig_matrix)
;
latent = FLOAT(eig)
loadings =FLOAT(eig_matrix)
END
测试:
PRO TEST_PRINCOMP
;
x1 = [2.5,0.5,2.2,1.9,3.1,2.3,2,1,1.5,1.1]
x2 = [2.4,0.7,2.9,2.2,3.0,2.7,1.6,1.1,1.6,0.9]
x = [[x1],[x2]]
PRINCOMP,in_data = TRANSPOSE(x),loadings = loadings,scores= scores,latent= latent
HELP,loadings,scores,latent
PRINT,loadings,STRING(13b)
PRINT,scores,STRING(13b)
PRINT,latent,STRING(13b)
END
输出结果:
0.677873 0.735179
0.735179 -0.677873
0.827970 0.175115
-1.77758 -0.142857
0.992198 -0.384375
0.274210 -0.130417
1.67580 0.209498
0.912949 -0.175282
-0.0991096 0.349825
-1.14457 -0.0464174
-0.438046 -0.0177647
-1.22382 0.162675
1.28403 0.0490834
和matlab的结果对比,基本一样。但是大量数据的测试还未进行。