一、scale函数
R语言base库中自带数据标准化接口scale函数,函数介绍如下
Usage
scale(x, center = TRUE, scale = TRUE)
Arguments
x: a numeric matrix(like object).
center: either a logical value or a numeric vector of length equal to the number of columns of x.
scale: either a logical value or a numeric vector of length equal to the number of columns of x.
Details
The value of center determines how column centering is performed. If center is a numeric vector with length equal to the number of columns of x, then each column of x has the corresponding value from center subtracted from it. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.
The value of scale determines how column scaling is performed (after centering). If scale is a numeric vector with length equal to the number of columns of x, then each column of x is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done.
The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)
Value
For scale.default, the centered, scaled matrix. The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale"
scale方法默认进行z-score标准化,先减去均值,再除以标准差
z-score 标准化(zero-mean normalization)
也叫标准差标准化,这种方法给予原始数据的均值(mean)和标准差(standard deviation)进行数据的标准化。
经过处理的数据符合标准正态分布,即均值为0,标准差为1,其转化函数为:
其中μ为所有样本数据的均值,σ为所有样本数据的标准差。
二、unscale函数
DMwR中函数unscale可以根据scale的返回对象,还原数据
Usage
unscale(vals, norm.data, col.ids)
Arguments
vals: A numeric matrix with the values to un-scale
norm.data: A numeric and scaled matrix. This should be an object to which the function scale() was applied.
col.ids: The columns of the vals matrix that are to be un-scaled (defaults to all of them).
Value
An object with the same dimension as the parameter vals
三、使用示例
> df<-data.frame(x=c(1,2,3),y=c(2,4,6),z=c(3,6,9))
> df
x y z
1 1 2 3
2 2 4 6
3 3 6 9
> scaledData<-scale(df)
> scaledData
x y z
[1,] -1 -1 -1
[2,] 0 0 0
[3,] 1 1 1
attr(,"scaled:center")
x y z
2 4 6
attr(,"scaled:scale")
x y z
1 2 3
> unscale(scaledData,scaledData)
x y z
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
> ndf<-data.frame(x=c(1,2),y=c(2,4),z=c(3,6))
> ndf
x y z
1 1 2 3
2 2 4 6
> scale(ndf,center=attr(scaledData, "scaled:center"),scale=attr(scaledData, "scaled:scale"))
x y z
[1,] -1 -1 -1
[2,] 0 0 0
attr(,"scaled:center")
x y z
2 4 6
attr(,"scaled:scale")
x y z
1 2 3
>