• python数据分析——城市气候与海洋的关系研究+机器学习【实例】


    城市气候与海洋的关系研究

     

    导入包

    In [2]:
    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    
    import matplotlib.pyplot as plt
    
    
    from pylab import mpl
    mpl.rcParams['font.sans-serif'] = ['FangSong'] # 指定默认字体
    mpl.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题
    
     

    导入数据各个海滨城市数据

    In [4]:
    # ignore_index忽略行索引
    ferrara1 = pd.read_csv('./ferrara_150715.csv')
    ferrara2 = pd.read_csv('./ferrara_250715.csv')
    ferrara3 = pd.read_csv('./ferrara_270615.csv')
    ferrara=pd.concat([ferrara1,ferrara2,ferrara3],ignore_index=True)
    
    torino1 = pd.read_csv('./torino_150715.csv')
    torino2 = pd.read_csv('./torino_250715.csv')
    torino3 = pd.read_csv('./torino_270615.csv')
    torino = pd.concat([torino1,torino2,torino3],ignore_index=True) 
    
    mantova1 = pd.read_csv('./mantova_150715.csv')
    mantova2 = pd.read_csv('./mantova_250715.csv')
    mantova3 = pd.read_csv('./mantova_270615.csv')
    mantova = pd.concat([mantova1,mantova2,mantova3],ignore_index=True) 
    
    milano1 = pd.read_csv('./milano_150715.csv')
    milano2 = pd.read_csv('./milano_250715.csv')
    milano3 = pd.read_csv('./milano_270615.csv')
    milano = pd.concat([milano1,milano2,milano3],ignore_index=True) 
    
    ravenna1 = pd.read_csv('./ravenna_150715.csv')
    ravenna2 = pd.read_csv('./ravenna_250715.csv')
    ravenna3 = pd.read_csv('./ravenna_270615.csv')
    ravenna = pd.concat([ravenna1,ravenna2,ravenna3],ignore_index=True)
    
    asti1 = pd.read_csv('./asti_150715.csv')
    asti2 = pd.read_csv('./asti_250715.csv')
    asti3 = pd.read_csv('./asti_270615.csv')
    asti = pd.concat([asti1,asti2,asti3],ignore_index=True)
    
    bologna1 = pd.read_csv('./bologna_150715.csv')
    bologna2 = pd.read_csv('./bologna_250715.csv')
    bologna3 = pd.read_csv('./bologna_270615.csv')
    bologna = pd.concat([bologna1,bologna2,bologna3],ignore_index=True)
    
    piacenza1 = pd.read_csv('./piacenza_150715.csv')
    piacenza2 = pd.read_csv('./piacenza_250715.csv')
    piacenza3 = pd.read_csv('./piacenza_270615.csv')
    piacenza = pd.concat([piacenza1,piacenza2,piacenza3],ignore_index=True)
    
    cesena1 = pd.read_csv('./cesena_150715.csv')
    cesena2 = pd.read_csv('./cesena_250715.csv')
    cesena3 = pd.read_csv('./cesena_270615.csv')
    cesena = pd.concat([cesena1,cesena2,cesena3],ignore_index=True)
    
    faenza1 = pd.read_csv('./faenza_150715.csv')
    faenza2 = pd.read_csv('./faenza_250715.csv')
    faenza3 = pd.read_csv('./faenza_270615.csv')
    faenza = pd.concat([faenza1,faenza2,faenza3],ignore_index=True)
    
     

    去除没用的列

    In [9]:
    cesena.head(5)
    
    Out[9]:
     temphumiditypressuredescriptiondtwind_speedwind_degcitydaydist
    0 29.15 83 1015 moderate rain 1436863101 3.62 94.001 Cesena 2015-07-14 10:38:21 14
    1 29.37 74 1015 moderate rain 1436866691 3.60 20.000 Cesena 2015-07-14 11:38:11 14
    2 29.51 78 1015 moderate rain 1436870392 3.60 70.000 Cesena 2015-07-14 12:39:52 14
    3 29.88 70 1016 moderate rain 1436874000 4.60 60.000 Cesena 2015-07-14 13:40:00 14
    4 30.12 70 1016 moderate rain 1436877549 4.10 70.000 Cesena 2015-07-14 14:39:09 14
    In [7]:
    city_list = [ferrara,torino,mantova,milano,ravenna,asti,bologna,piacenza,cesena,faenza]
    for city in city_list:
        city.drop(labels='Unnamed: 0',axis=1,inplace=True)
    
     

    显示最高温度于离海远近的关系(观察多个城市)

    In [10]:
    city_max_temp = []
    city_dist = []
    for city in city_list:
        max_temp = city['temp'].max()
        city_max_temp.append(max_temp)
        dist = city['dist'][0]
        city_dist.append(dist)
    
    In [11]:
    city_max_temp
    
    Out[11]:
    [33.43000000000001,
     34.69,
     34.18000000000001,
     34.81,
     32.79000000000002,
     34.31,
     33.850000000000016,
     33.920000000000016,
     32.81,
     32.74000000000001]
    In [12]:
    city_dist
    
    Out[12]:
    [47, 357, 121, 250, 8, 315, 71, 200, 14, 37]
    In [14]:
    plt.scatter(city_dist,city_max_temp)
    plt.xlabel('距离')
    plt.ylabel('最高温度')
    plt.title('距离和温度之间的关系图')
    
    Out[14]:
    Text(0.5,1,'距离和温度之间的关系图')
     
     

    观察发现,离海近的可以形成一条直线,离海远的也能形成一条直线。

    - 分别以100公里和50公里为分界点,划分为离海近和离海远的两组数据(近海:小于100  远海:大于50)
    In [16]:
    #找出所有的近海城市(温度和距离)
    np_city_dist = np.array(city_dist)#【转换成numpy;因为可进行多维变形】
    np_city_max_temp = np.array(city_max_temp)
    
    In [20]:
    near_condition = np_city_dist < 100
    near_city_dist = np_city_dist[near_condition]
    near_city_max_temp = np_city_max_temp[near_condition]
    
    In [21]:
    plt.scatter(near_city_dist,near_city_max_temp)
    
    Out[21]:
    <matplotlib.collections.PathCollection at 0x8950320>
     
     

    机器学习

    - 算法模型对象:特殊的对象.在该对象中已经集成好个一个方程(还没有求出解的方程).
    - 模型对象的作用:通过方程实现预测或者分类
    - 样本数据(df,np):
        - 特征数据:自变量
        - 目标(标签)数据:因变量
    - 模型对象的分类:
        - 有监督学习:模型需要的样本数据中存在特征和目标
        - 无监督学习:模型需要的样本数据中存在特征
        - 半监督学习:模型需要的样本数据部分需要有特征和目标,部分只需要特征数据
    - sklearn模块:封装了多种模型对象.可以直接使用。
    
    • 面积 采光率 楼层 售价
    • 100 30% 18 33w
    • 80 80% 3 133w
     

    导入sklearn,建立线性回归算法模型对象

    In [22]:
    #1.导包
    from sklearn.linear_model import LinearRegression
    
    In [23]:
    #2.实例化模型对象
    linner = LinearRegression()
    
    In [ ]:
    #3.提取样本数据
    
    In [25]:
    #4.训练模型;reshape(-1,1)【n行,1列】一种属性,多组特征
    linner.fit(near_city_dist.reshape(-1,1),near_city_max_temp)
    
    Out[25]:
    LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
    In [26]:
    #5.预测
    linner.predict(38)
    
    Out[26]:
    array([33.16842645])
    In [27]:
    # 模型精准度评分
    linner.score(near_city_dist.reshape(-1,1),near_city_max_temp)
    
    Out[27]:
    0.77988083971852
    In [28]:
    #绘制回归曲线
    x = np.linspace(10,70,num=100) # linspace等差数列
    y = linner.predict(x.reshape(-1,1))
    
    In [33]:
    plt.scatter(near_city_dist,near_city_max_temp)
    plt.scatter(x,y,marker=1)# marker调整点粗细
    
    Out[33]:
    <matplotlib.collections.PathCollection at 0xaaf7940>
     
  • 相关阅读:
    thrift python安装
    第二周习题F
    Equivalent Strings
    生成可重集的排列(方法)
    分数拆分(刘汝佳紫书P183)
    Prime ring problem
    Maximun product
    Division
    每周一赛(E题,广搜求方案)
    ICE CAVE(BFS搜索(模拟))
  • 原文地址:https://www.cnblogs.com/bilx/p/11644738.html
Copyright © 2020-2023  润新知