• 线性回归算法-4.多元线性回归算法


    多元线性回归算法

    (x^{(i)})由一个特征变为多个特征,此时拟合函数不是简单的(y^{(i)} = ax^{(i)}+b), 而是:
    **$$hat y^{(i)} = heta _0x^{(i)}_0 + heta _1x^{(i)}_1+ heta _2x^{(i)}_2+...+ heta _{n}x{(i)}_n,x{(i)}_0equiv 1 $$ **
    注:上标i为第i个样本,下标1-n为样本i的多个特征值

    由上式可得:

    [ heta = ( heta_0, heta_1, heta_2, heta _0,..., heta _n)^T ]

    [X^{(i)} = (X^{(i)}_0,X^{(i)}_1,X^{(i)}_2,...,X^{(i)}_n) ]

    [hat y^{(i)} = X^{(i)}cdot heta ]

    把上式推广到所有样本中:

    [ X_b = egin{bmatrix} 1 & x^{(1)}_1 & x^{(1)}_2 & ... & x^{(1)}_n\ 1 & x^{(2)}_1 & x^{(2)}_2 & ... & x^{(2)}_n\ ... & & & & ...\ 1 & x^{(m)}_1 & x^{(m)}_2 & ... & x^{(m)}_n end{bmatrix}. heta = egin{bmatrix} heta_0\ heta_1\ heta_2\ ...\ heta_n\ end{bmatrix}]

    [hat y = X_bcdot heta ]

    得到( heta)的表达式(此推导过程较为复杂,感兴趣的话在网上找):

    [ heta = (X^{T}_bX_b)^{-1}X^{T}_by ]

    上式为多元线性回归的正规方程解(Normal Equation)
    问题:时间复杂度较高,(O(n^3))

    多元线性回归的实现

    封装LineRegression算法库

    import numpy
    from .metrics import r2_score
    
    class LineRegression(object):
    	"""docstring fos LineRegression"""
    	def __init__(self):
    		self.coef_ = None
    		self.interception_ = None
    		self._theta = None
    
    	def fit_normal(self,X_train,y_train):
    
    		assert X_train.shape[0] == y_train.shape[0],
    			"size of x_train must be equal to the size of y_train"
    
    		X_b = numpy.hstack([numpy.ones((len(X_train),1)),X_train])
    		self._theta = numpy.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
    
    		self.coef_ = self._theta[1:]    #系数
    		self.interception_ = self._theta[0] #截距
    		return self
    
    	def predict(self,X_predict):
    
    		assert self.coef_ is not None and self.interception_ is not None,
    			"must be fit before predict"
    		assert X_predict.shape[1] == len(self.coef_),
    			"the feature number of X_predict must be equal X_train"
    
    		X_b = numpy.hstack([numpy.ones((len(X_predict),1)),X_predict])
    		y_predict = X_b.dot(self._theta)
    		return y_predict
    
    	def score(self,x_test,y_test):
    		y_predict = self.predict(x_test)
    		return r2_score(y_test,y_predict)
    
    
    	def __repr__(self):
    		return "LineRegression"
    

    算法调用

    import numpy
    import matplotlib.pyplot as plt
    from sklearn import datasets
    
    # 加载波士顿房产数据集
    boston = datasets.load_boston()
    X = boston.data
    y = boston.target
    
    X = X[y<50]
    y = y[y<50]
    

    from mylib.model_selection import train_test_split
    from mylib.LineRegression import LineRegression
    
    X_train,X_test,y_train,y_test = train_test_split(X,y,seed=666)
    
    reg = LineRegression()
    reg.fit_normal(X_train,y_train)
    # reg.predict(X_test)
    reg.score(X_test,y_test)
    

    scikit-learn 中的LinearRegression

    from sklearn.linear_model import LinearRegression
    line_reg = LinearRegression()
    line_reg.fit(X,y)
    
  • 相关阅读:
    c#类,接口,结构,抽象类介绍 以及抽象和接口的比较
    存储过程基本语法
    有关uploadifive的使用经验
    堆栈的浅析
    JavaScript语言精粹4递归(汉诺塔游戏寻常解)及作用域
    JavaScript语言精粹3异常,扩充类型功能
    有关this,4种调用模式小例子
    JavaScript语言精粹2函数对象,函数字面量,调用
    JavaScript语言精粹1字符串,对象字面量
    JQUERY选择器第一天
  • 原文地址:https://www.cnblogs.com/shuai-long/p/11185430.html
Copyright © 2020-2023  润新知