• Python机器学习--回归


    • 线性回归

    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 30 19:55:37 2017
    
    @author: Administrator
    """
    
    '''
    背景:与房价密切相关的除了单位的房价,还有房屋的尺寸。我们可以根
    据已知的房屋成交价和房屋的尺寸进行线性回归,继而可以对已知房屋尺
    寸,而未知房屋成交价格的实例进行成交价格的预测
    '''
    
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
     
     
    # 读取数据集
    datasets_X = []
    datasets_Y = []
    fpath='F:\RANJIEWEN\MachineLearning\Python机器学习实战_mooc\data\回归\'
    fr = open(fpath+'prices.txt','r')
    lines = fr.readlines()
    for line in lines:
        items = line.strip().split(',')
        datasets_X.append(int(items[0]))
        datasets_Y.append(int(items[1]))
     
    length = len(datasets_X)
    datasets_X = np.array(datasets_X).reshape([length,1])
    datasets_Y = np.array(datasets_Y)
     
    minX = min(datasets_X)
    maxX = max(datasets_X)
    X = np.arange(minX,maxX).reshape([-1,1])
     
     
    linear = linear_model.LinearRegression()
    linear.fit(datasets_X, datasets_Y)
     
    # 图像中显示
    plt.scatter(datasets_X, datasets_Y, color = 'red')
    plt.plot(X, linear.predict(X), color = 'blue')
    plt.xlabel('Area')
    plt.ylabel('Price')
    plt.show()
    • 多项式回归

    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 30 20:24:09 2017
    
    @author: Administrator
    """
    
    
    '''
    我们在前面已经根据已知的房屋成交价和房屋的尺寸进行了线
    性回归,继而可以对已知房屋尺寸,而未知房屋成交价格的实例进行了成
    交价格的预测,但是在实际的应用中这样的拟合往往不够好,因此我们在
    此对该数据集进行多项式回归。
    
    '''
    
    
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.preprocessing import PolynomialFeatures
     
     
    # 读取数据集
    datasets_X = []
    datasets_Y = []
    
    fpath='F:\RANJIEWEN\MachineLearning\Python机器学习实战_mooc\data\回归\'
    fr = open(fpath+'prices.txt','r')
    lines = fr.readlines()
    for line in lines:
        items = line.strip().split(',')
        datasets_X.append(int(items[0]))
        datasets_Y.append(int(items[1]))
     
    length = len(datasets_X)
    datasets_X = np.array(datasets_X).reshape([length,1])
    datasets_Y = np.array(datasets_Y)
     
    minX = min(datasets_X)
    maxX = max(datasets_X)
    X = np.arange(minX,maxX).reshape([-1,1])
     
     
    poly_reg = PolynomialFeatures(degree = 2)
    X_poly = poly_reg.fit_transform(datasets_X)
    lin_reg_2 = linear_model.LinearRegression()
    lin_reg_2.fit(X_poly, datasets_Y)
     
    # 图像中显示
    plt.scatter(datasets_X, datasets_Y, color = 'red')
    plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
    plt.xlabel('Area')
    plt.ylabel('Price')
    plt.show()
    • 岭回归

    • 还有就是容易过拟合,才出现了岭回归,L2正则项

    # -*- coding: utf-8 -*-
    """
    Created on Wed Aug 30 20:33:00 2017
    
    @author: Administrator
    """
    
    '''
    数据介绍:
    数据为某路口的交通流量监测数据,记录全年小时级别的车流量。
    实验目的:
    根据已有的数据创建多项式特征,使用岭回归模型代替一般的线性模型,对
    车流量的信息进行多项式回归。
    '''
    
    import numpy as np
    
    from sklearn.linear_model import Ridge
    from sklearn import cross_validation
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import PolynomialFeatures
    
    fpath='F:RANJIEWENMachineLearningPython机器学习实战_moocdata回归岭回归.csv'
    
    data=pd.read_csv(fpath,encoding='gbk',parse_dates=[0],index_col=0)
    
    #data.sort_index(0,ascending=True,inplace=True)
    
    X=data.iloc[:,:4]  ##语法
    y=data.iloc[:,4] 
    poly=PolynomialFeatures(6) #设置多项式的最高次数
    X=poly.fit_transform(X)
    
    train_set_X,test_set_X,train_set_y,test_set_y=   
        cross_validation.train_test_split(X,y,test_size=0.3,random_state=0) #设置测试集的比例,random_state随机数种子
    
    clf=Ridge(alpha=1.0,fit_intercept=True)
    clf.fit(train_set_X,train_set_y)
    clf.score(test_set_X,test_set_y)  
    
    
    #plot
    start=200
    end=300
    y_pre=clf.predict(X)
    time=np.arange(start,end)
    plt.plot(time,y[start:end],'b',label='real')
    plt.plot(time,y_pre[start:end],'r',label='predict')
    plt.legend(loc='upper left')
    plt.show()
    • Lasso回归,添加L1正则项,具有稀疏解 
  • 相关阅读:
    Qt5.3.2(VS2010)_调试_进入Qt源码
    Qt5.3.2(VS2010)_调试_遇到的问题
    Qt_QString::split测试
    Qt_QString.indesOf和mid测试
    激活_目标窗口
    DrawDibDraw__ZC测试
    【转】DrawDibDraw
    数学运算_基本_01
    get和post请求及进程和线程及cookie和session的区别
    Redis性能优化之redis.cnf配置文件
  • 原文地址:https://www.cnblogs.com/ranjiewen/p/7458697.html
Copyright © 2020-2023  润新知