• 【pandas官方文档新手教程】:2.十个入门问题


    1. Pandas处理什么样的数据?

    核心知识点:

    • 导入包,又名Import pandas as pd

    • 一个数据表被存储为一个DataFrame

    • DataFrame中的每一列都是一个Series

    • 你可以通过对DataFrame或Serie应用一个方法来做事情

    pandas适合处理表格状的数据。在pandas中,表格被称为DataFrame。

    1.1. 引入pandas模块

    • pd是pandas在行业内的标准缩写。
    # 第一个案例
    
    import pandas as pd
    

    1.2. 用DataFrame来处理表格

    • 当使用字典来引入数据时,key就是表格的列名,value就是每列的值。
    • 每个不同的列所对应的值,可以是不同的类型。
    df = pd.DataFrame(
         {
             "Name": [
                 "Braund, Mr. Owen Harris",
                 "Allen, Mr. William Henry",
                 "Bonnell, Miss. Elizabeth",
             ],
             "Age": [22, 35, 58],
             "Sex": ["male", "male", "female"],
         }
    )
    df
    

    1.3. DataFrame的每列是Siries

    df['Age']
    
    0    22
    1    35
    2    58
    Name: Age, dtype: int64
    
    • 当选中DataFrame的一列时,返回的时Series
    • 使用DataFrame[‘列名’]来选择某列

    Series也可以直接创建:

    ages = pd.Series([22, 35, 58], name="Age")
    ages
    
    0    22
    1    35
    2    58
    Name: Age, dtype: int64
    

    Series没有列标签,但是有行标签。

    1.4. DataFrame和Series可以调用一些方法

    举几个例子:

    df["Age"].max()
    
    58
    
    ages.max()
    
    58
    
    df.describe()
    
    Age
    count 3.000000
    mean 38.333333
    std 18.230012
    min 22.000000
    25% 28.500000
    50% 35.000000
    75% 46.500000
    max 58.000000

    2.怎样读写各种表格数据?

    • pandas支持很多种数据形式:csv, excel, sql, json, parquet ...

    • 读取:read_*

    • 写入:to_*

    import pandas as pd
    titanic_data_path = "../data/titanic.csv"
    titanic_df = pd.read_csv(titanic_data_path)
    
    # df.head()、tail() 用法 
    # titanic_df.head(8)
    titanic_df.tail(8)
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    883 884 0 2 Banfield, Mr. Frederick James male 28.0 0 0 C.A./SOTON 34068 10.500 NaN S
    884 885 0 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.050 NaN S
    885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.125 NaN Q
    886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.000 NaN S
    887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.000 B42 S
    888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.450 NaN S
    889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.000 C148 C
    890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.750 NaN Q

    df有很多种方法,还有很多属性可以直接调用。

    titanic_df.dtypes
    
    PassengerId      int64
    Survived         int64
    Pclass           int64
    Name            object
    Sex             object
    Age            float64
    SibSp            int64
    Parch            int64
    Ticket          object
    Fare           float64
    Cabin           object
    Embarked        object
    dtype: object
    
    titanic_df.to_excel('ddd.xlsx', sheet_name = 'sheet1', index=False)
    
    titanic_df.info
    
    <bound method DataFrame.info of      PassengerId  Survived  Pclass  \
    0              1         0       3   
    1              2         1       1   
    2              3         1       3   
    3              4         1       1   
    4              5         0       3   
    ..           ...       ...     ...   
    886          887         0       2   
    887          888         1       1   
    888          889         0       3   
    889          890         1       1   
    890          891         0       3   
    
                                                      Name     Sex   Age  SibSp  \
    0                              Braund, Mr. Owen Harris    male  22.0      1   
    1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
    2                               Heikkinen, Miss. Laina  female  26.0      0   
    3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
    4                             Allen, Mr. William Henry    male  35.0      0   
    ..                                                 ...     ...   ...    ...   
    886                              Montvila, Rev. Juozas    male  27.0      0   
    887                       Graham, Miss. Margaret Edith  female  19.0      0   
    888           Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1   
    889                              Behr, Mr. Karl Howell    male  26.0      0   
    890                                Dooley, Mr. Patrick    male  32.0      0   
    
         Parch            Ticket     Fare Cabin Embarked  
    0        0         A/5 21171   7.2500   NaN        S  
    1        0          PC 17599  71.2833   C85        C  
    2        0  STON/O2. 3101282   7.9250   NaN        S  
    3        0            113803  53.1000  C123        S  
    4        0            373450   8.0500   NaN        S  
    ..     ...               ...      ...   ...      ...  
    886      0            211536  13.0000   NaN        S  
    887      0            112053  30.0000   B42        S  
    888      2        W./C. 6607  23.4500   NaN        S  
    889      0            111369  30.0000  C148        C  
    890      0            370376   7.7500   NaN        Q  
    
    [891 rows x 12 columns]>
    
    titanic_df.__class__
    
    pandas.core.frame.DataFrame 
    

    3.怎样选取表格的子集?

    pandas可以:

    • 选择或者筛选特定的行/列

    • 按条件筛选数据

    • 切片、选择、提取...

    核心知识点:

    • 用 方括号[] 选取子集

    • 在括号里面, 可以是:

      • 单个行/列的标签

      • 行/列的标签的列表

      • 切片

      • 条件

    • 用 loc 时,用行列的名字(label name)

    • 用 iloc时,用行列的位置(索引值,不含后)

    • 用 loc/iloc的时候可以重新个DataFrame数据进行赋值。

    3.1 用方括号[]

    import pandas as pd
    
    titanic = pd.read_csv('data/titanic.csv')
    titanic.head()
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
    # 单列
    ages = titanic['Age']
    ages.head(),ages.shape, ages.dtype,type(ages)
    
    (0    22.0
     1    38.0
     2    26.0
     3    35.0
     4    35.0
     Name: Age, dtype: float64,
     (891,),
     dtype('float64'),
     pandas.core.series.Series)
    

    DataFrame中的每一列都是Series。

    # 多个不同列
    age_sex = titanic[['Age','Sex']] # 方括号中间是list
    age_sex.head(), age_sex.shape, age_sex.columns, type(age_sex)
    
    (    Age     Sex
     0  22.0    male
     1  38.0  female
     2  26.0  female
     3  35.0  female
     4  35.0    male,
     (891, 2),
     Index(['Age', 'Sex'], dtype='object'),
     pandas.core.frame.DataFrame)
    

    多列就是DataFrame。

    # 筛选行
    above_35 = titanic[titanic['Age'] > 35 ]
    above_35.head(), above_35.shape
    
    (    PassengerId  Survived  Pclass  \
     1             2         1       1   
     6             7         0       1   
     11           12         1       1   
     13           14         0       3   
     15           16         1       2   
    
                                                      Name     Sex   Age  SibSp  \
     1   Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
     6                             McCarthy, Mr. Timothy J    male  54.0      0   
     11                           Bonnell, Miss. Elizabeth  female  58.0      0   
     13                        Andersson, Mr. Anders Johan    male  39.0      1   
     15                   Hewlett, Mrs. (Mary D Kingcome)   female  55.0      0   
    
         Parch    Ticket     Fare Cabin Embarked  
     1       0  PC 17599  71.2833   C85        C  
     6       0     17463  51.8625   E46        S  
     11      0    113783  26.5500  C103        S  
     13      5    347082  31.2750   NaN        S  
     15      0    248706  16.0000   NaN        S  ,
     (217, 12))
    
    type(titanic['Age'] > 35) # 括号里面的居然是个Series,index的序列
    
    pandas.core.series.Series
    
    condition = titanic['Age'] > 35
    condition
    
    0      False
    1       True
    2      False
    3      False
    4      False
           ...  
    886    False
    887    False
    888    False
    889    False
    890    False
    Name: Age, Length: 891, dtype: bool
    
    # 区间值
    titanic[titanic['Pclass'].isin([2,3])].head() # 属于2、3的才是范围内
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
    5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
    7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S
    titanic[(titanic['Pclass'] == 2) | (titanic['Pclass'] == 3)].head()
    

    4.怎样绘制散点图?

    pandas用Matplotlib来创造性的绘制各种图形:

    • 散点图

    • 条状图

    • 盒装图

    • 等等

    核心知识点:

    • plot.系列方法对Series和DataFrame都有用
    • 每一列默认用不同的图形表示
    • Pandas的plot都是Matplotlib对象

    数据

    import pandas as pd
    import matplotlib.pyplot as plt
    
    data_source = "./data/air_quality_no2.csv"
    air_quality = pd.read_csv(data_source, index_col=0, parse_dates=True) 
    air_quality.head()
    
    • index_col参数定义第一列为index;
    • parse_dates参数定义是否把日期转换为时间戳

    4.1. 默认参数全表绘图

    # 绘图
    # 所有序列
    air_quality.plot()
    
    <AxesSubplot:xlabel='datetime'>
    

    4.2. 指定列绘图

    # 指定序列
    air_quality['station_paris'].plot()
    
    <AxesSubplot:xlabel='datetime'>
    

    4.3. 散点图

    # 散点图
    air_quality.plot.scatter(x='station_london', y= 'station_paris',alpha=0.5)
    
    <AxesSubplot:xlabel='station_london', ylabel='station_paris'>
    

    4.4. 所有图形

    for i in dir(air_quality.plot):
        if not i.startswith("_"):
            print(i)
    #下面这个写法更舒服
    # ----------------------------------------------------------       
    [
        method_name 
        for method_name in dir(air_quality.plot) 
        if not method_name.startswith("_")
    ]
    
    area
    bar
    barh
    box
    density
    hexbin
    hist
    kde
    line
    pie
    scatter
    
    
    
    
    
    ['area',
     'bar',
     'barh',
     'box',
     'density',
     'hexbin',
     'hist',
     'kde',
     'line',
     'pie',
     'scatter']
    
    # 盒状图
    air_quality.plot.box()
    
    <AxesSubplot:>
    

    4.5. 子图

    # 分开的子图
    axs = air_quality.plot.area(figsize=(12,4), subplots=True) 
    
    type(axs)
    
    numpy.ndarray
    
    # 1. 生成一个空的matplotlib的Figure和Axes对象
    fig, axs = plt.subplots(figsize= (12,4)) 
    # 2. Use pandas to put the area plot on the prepared Figure/Axes
    air_quality.plot.area(ax=axs)     
    # 3. Do any matplotlib customization you like   
    axs.set_ylabel("NO$_2$ concentration")
    # 4. Save the Figure/Axes using the existing matplotlib method.
    fig.savefig('4.1-no2_concentrations.png') 
    

    5.怎样从现有列衍生新的列?

    • 无需使用循环计算

    • 直接以列为单位进行计算

    比如在原表基础上新增一列数据

    核心知识点:

    • 新增列的方法就是DataFrame['新列名称']
    • 对新增列不用对行进行循环
    • 可以通过字典或函数的方法对行列进行重命名

    数据

    import pandas as pd
    data_source = "data/air_quality_no2.csv"
    air_quality = pd.read_csv(data_source, index_col=0, parse_dates=True)
    air_quality.head()
    
    station_antwerp station_paris station_london
    datetime
    --- --- --- ---
    2019-05-07 02:00:00 NaN NaN 23.0
    2019-05-07 03:00:00 50.5 25.0 19.0
    2019-05-07 04:00:00 45.0 27.7 19.0
    2019-05-07 05:00:00 NaN 50.4 16.0
    2019-05-07 06:00:00 NaN 61.9 NaN

    5.1 新增列

    • 不用对每一行去循环
    air_quality['london_mg_per_cubic'] = air_quality['station_london'] * 1.882
    air_quality.head()
    
    station_antwerp station_paris station_london london_mg_per_cubic
    datetime
    --- --- --- --- ---
    2019-05-07 02:00:00 NaN NaN 23.0 43.286
    2019-05-07 03:00:00 50.5 25.0 19.0 35.758
    2019-05-07 04:00:00 45.0 27.7 19.0 35.758
    2019-05-07 05:00:00 NaN 50.4 16.0 30.112
    2019-05-07 06:00:00 NaN 61.9 NaN NaN
    air_quality["ratio_paris_antwerp"] = (
        air_quality["station_paris"] / air_quality["station_antwerp"]) #非法计算直接空值。
    air_quality.head()
    
    station_antwerp station_paris station_london london_mg_per_cubic ratio_paris_antwerp
    datetime
    --- --- --- --- --- ---
    2019-05-07 02:00:00 NaN NaN 23.0 43.286 NaN
    2019-05-07 03:00:00 50.5 25.0 19.0 35.758 0.495050
    2019-05-07 04:00:00 45.0 27.7 19.0 35.758 0.615556
    2019-05-07 05:00:00 NaN 50.4 16.0 30.112 NaN
    2019-05-07 06:00:00 NaN 61.9 NaN NaN NaN

    数学计算和罗技计算都是跟随元素(element wise)。
    复杂点的情况还可以使用apply()函数。

    5.2 重命名

    • 字典方法
    • 函数方法
    # 字典
    air_quality_renamed = air_quality.rename(
        columns={
            "station_antwerp": "BETR801",
            "station_paris": "FR04014",
            "station_london": "London Westminster",
        }
    )
    air_quality_renamed.head()
    
    BETR801 FR04014 London Westminster london_mg_per_cubic ratio_paris_antwerp
    datetime
    --- --- --- --- --- ---
    2019-05-07 02:00:00 NaN NaN 23.0 43.286 NaN
    2019-05-07 03:00:00 50.5 25.0 19.0 35.758 0.495050
    2019-05-07 04:00:00 45.0 27.7 19.0 35.758 0.615556
    2019-05-07 05:00:00 NaN 50.4 16.0 30.112 NaN
    2019-05-07 06:00:00 NaN 61.9 NaN NaN NaN
    air_quality_renamed = air_quality_renamed.rename(columns=str.lower)
    air_quality_renamed.head()
    
    betr801 fr04014 london westminster london_mg_per_cubic ratio_paris_antwerp
    datetime
    --- --- --- --- --- ---
    2019-05-07 02:00:00 NaN NaN 23.0 43.286 NaN
    2019-05-07 03:00:00 50.5 25.0 19.0 35.758 0.495050
    2019-05-07 04:00:00 45.0 27.7 19.0 35.758 0.615556
    2019-05-07 05:00:00 NaN 50.4 16.0 30.112 NaN
    2019-05-07 06:00:00 NaN 61.9 NaN NaN NaN

    6.怎样汇总统计?

    • 基础统计很轻易的搞定,适用于:

      • 整个数据集

      • 双向队列

      • 分类汇总

    • 还可以“split-apply-combine“(化整为零)

    核心知识点:

    • 可以对整行整列进行汇总计算

    • groupby可以实现split-apply-combine模式

    • value_counts是计算每个类型变量的快捷方式

    6.1 聚合统计

    import pandas as pd
    
    data_source = "data/titanic.csv"
    titanic = pd.read_csv(data_source)
    titanic.head()
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
    titanic['Age'].mean()
    
    29.69911764705882
    
    titanic[['Age' ,'Fare']].median()
    
    Age     28.0000
    Fare    14.4542
    dtype: float64
    
    titanic[['Age' ,'Fare']].describe()
    
    Age Fare
    count 714.000000 891.000000
    mean 29.699118 32.204208
    std 14.526497 49.693429
    min 0.420000 0.000000
    25% 20.125000 7.910400
    50% 28.000000 14.454200
    75% 38.000000 31.000000
    max 80.000000 512.329200
    # 不想使用默认的方式,可以使用agg().
    
    titanic.agg({
        "Age": ["min", "max","median","skew"],
        "Fare": ["min", "max", "median", "mean"],
    })
    
    Age Fare
    min 0.420000 0.000000
    max 80.000000 512.329200
    median 28.000000 14.454200
    skew 0.389108 NaN
    mean NaN 32.204208

    6.2. 按类别分组统计

    titanic[['Age' ,'Sex']].groupby('Sex').mean()
    
    Age
    Sex
    --- ---
    female 27.915709
    male 30.726645

    计算一列数据里面不同类别的统计值很常见。groupby方法可被用来做这样的操作。

    这就是所谓:

    • split-apply-combine模式
      • Split 数据分组
      • Apply 每组独立进行计算
      • Combine 合并成一种数据结构

    Pandas里,Apply和combine一起执行了。

    如果不对上面例子进行指定列,将会对每个数字值的列进行统计。

    titanic.groupby("Sex").mean()
    
    PassengerId Survived Pclass Age SibSp Parch Fare
    Sex
    --- --- --- --- --- --- --- ---
    female 431.028662 0.742038 2.159236 27.915709 0.694268 0.649682 44.479818
    male 454.147314 0.188908 2.389948 30.726645 0.429809 0.235702 25.523893

    分组数据也支持使用方括号来进行选取。

    titanic.groupby("Sex")["Age"].mean()
    
    Sex
    female    27.915709
    male      30.726645
    Name: Age, dtype: float64
    

    [pandas-note\Docs\06_groupby_select_detail.svg]

    对于一些以数据值来区别类别的列,比如例子中的Pclass, 1、2、3等舱计算个平均值没有任何意义。针对这种情况,pandas提供了Categorical数据类型来处理。

    titanic.groupby(["Sex","Pclass"])["Fare"].mean() 
    
    Sex     Pclass
    female  1         106.125798
            2          21.970121
            3          16.118810
    male    1          67.226127
            2          19.741782
            3          12.661633
    Name: Fare, dtype: float64
    

    可以一次性处理不同属性复合列的统计计算。

    6.3.分类计数

    同一列不同数字代表不同类型,需要计数可以用 value_counts()

    titanic["Pclass"].value_counts() # 3类舱位统计个数
    
    3    491
    1    216
    2    184
    Name: Pclass, dtype: int64
    
    titanic.groupby('Pclass')["Pclass"].count() #对Pclass进行分组,然后对Pclass进行count().
    
    Pclass
    1    216
    2    184
    3    491
    Name: Pclass, dtype: int64
    
    titanic.groupby('Pclass')["Pclass"].size() # 和上面有什么区别?
    
    Pclass
    1    216
    2    184
    3    491
    Name: Pclass, dtype: int64
    

    不同方法的区别:

    • size() 含缺失值数据,是行数。
    • count() 忽略了缺失值
    • value_counts() 可以用参数 dropna 来明确是否包含NaN数据

    7.怎样重构表格?

    • melt() 可以从长到宽 long to wide

    • pivot()可以从宽到长 wide to long

    核心知识点:

    • sort_values 可以对一个或多个列进行排序

    • pivot函数就是重构数据,pivot后的表格也支持聚合

    • pivot (long to wide format) 的反向操作是 melt (wide to long format)

    import pandas as pd 
    
    data1 = "data/titanic.csv"
    data2 = "data/air_quality_long.csv"
    
    titanic = pd.read_csv(data1)
    air_quality = pd.read_csv(data2, index_col= "date.utc", parse_dates=True)
    
    titanic.head(), air_quality.head()
    
    (   PassengerId  Survived  Pclass  \
     0            1         0       3   
     1            2         1       1   
     2            3         1       3   
     3            4         1       1   
     4            5         0       3   
    
                                                     Name     Sex   Age  SibSp  \
     0                            Braund, Mr. Owen Harris    male  22.0      1   
     1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
     2                             Heikkinen, Miss. Laina  female  26.0      0   
     3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
     4                           Allen, Mr. William Henry    male  35.0      0   
    
        Parch            Ticket     Fare Cabin Embarked  
     0      0         A/5 21171   7.2500   NaN        S  
     1      0          PC 17599  71.2833   C85        C  
     2      0  STON/O2. 3101282   7.9250   NaN        S  
     3      0            113803  53.1000  C123        S  
     4      0            373450   8.0500   NaN        S  ,
                                     city country location parameter  value   unit
     date.utc                                                                     
     2019-06-18 06:00:00+00:00  Antwerpen      BE  BETR801      pm25   18.0  µg/m³
     2019-06-17 08:00:00+00:00  Antwerpen      BE  BETR801      pm25    6.5  µg/m³
     2019-06-17 07:00:00+00:00  Antwerpen      BE  BETR801      pm25   18.5  µg/m³
     2019-06-17 06:00:00+00:00  Antwerpen      BE  BETR801      pm25   16.0  µg/m³
     2019-06-17 05:00:00+00:00  Antwerpen      BE  BETR801      pm25    7.5  µg/m³)
    

    7.1. 对行进行排序

    • sort_values()
    • sort_index()
    titanic.sort_values(by='Age').head()
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    803 804 1 3 Thomas, Master. Assad Alexander male 0.42 0 1 2625 8.5167 NaN C
    755 756 1 2 Hamalainen, Master. Viljo male 0.67 1 1 250649 14.5000 NaN S
    644 645 1 3 Baclini, Miss. Eugenie female 0.75 2 1 2666 19.2583 NaN C
    469 470 1 3 Baclini, Miss. Helene Barbara female 0.75 2 1 2666 19.2583 NaN C
    78 79 1 2 Caldwell, Master. Alden Gates male 0.83 0 2 248738 29.0000 NaN S
    titanic.sort_values(by=['Pclass','Age'],ascending=False).head()
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    851 852 0 3 Svensson, Mr. Johan male 74.0 0 0 347060 7.7750 NaN S
    116 117 0 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.7500 NaN Q
    280 281 0 3 Duane, Mr. Frank male 65.0 0 0 336439 7.7500 NaN Q
    483 484 1 3 Turkula, Mrs. (Hedwig) female 63.0 0 0 4134 9.5875 NaN S
    326 327 0 3 Nysveen, Mr. Johan Hansen male 61.0 0 0 345364 6.2375 NaN S

    长型数据转宽数据

    • 长数据:一般是指数据集中的变量没有做明确的细分,即变量中至少有一个变量中的元素存在值严重重复循环的情况(可以归为几类),表格整体的形状为长方形,数据总体的表现为:变量少而观察值多。
    • 宽数据:是指数据集对所有的变量进行了明确的细分,各变量的值不存在重复循环的情况也无法归类。数据总体的表现为:变量多而观察值少。 Python实现长型数据与宽型数据转换
    • pivot()
      • 参数
        • columns:来一项项分列
        • values:显示哪列的值
    no2 = air_quality[air_quality['parameter']=='no2']
    no2_subset = no2.sort_index().groupby(['location']).head(2)
    no2_subset
    
    no2_subset.pivot(columns='location',values="value")
    
    city country parameter value unit
    location BETR801 FR04014 London Westminster BETR801 FR04014 London Westminster BETR801 FR04014 London Westminster BETR801 FR04014 London Westminster BETR801 FR04014 London Westminster
    --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    date.utc
    --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    2019-04-09 01:00:00+00:00 Antwerpen Paris NaN BE FR NaN no2 no2 NaN 22.5 24.4 NaN µg/m³ µg/m³ NaN
    2019-04-09 02:00:00+00:00 Antwerpen Paris London BE FR GB no2 no2 no2 53.5 27.4 67.0 µg/m³ µg/m³ µg/m³
    2019-04-09 03:00:00+00:00 NaN NaN London NaN NaN GB NaN NaN no2 NaN NaN 67.0 NaN NaN µg/m³
    no2.pivot(columns='location',values='value').plot()
    
    <AxesSubplot:xlabel='date.utc'>
    
    • pivot_table()
      • 当需要对多个值进行聚合的时候用
    air_quality.pivot_table(
        values='value', index='location', columns='parameter',aggfunc="mean"
    )
    
    parameter no2 pm25
    location
    --- --- ---
    BETR801 26.950920 23.169492
    FR04014 29.374284 NaN
    London Westminster 29.740050 13.443568
    air_quality.pivot_table(
        values='value', 
        index='location', 
        columns='parameter',
        aggfunc="mean",
        margins=True,
        margins_name= "合计"
    )
    
    parameter no2 pm25 合计
    location
    --- --- --- ---
    BETR801 26.950920 23.169492 24.982353
    FR04014 29.374284 NaN 29.374284
    London Westminster 29.740050 13.443568 21.491708
    合计 29.430316 14.386849 24.222743
    air_quality.groupby(['parameter','location']).mean()
    
    value
    parameter location
    --- --- ---
    no2 BETR801 26.950920
    FR04014 29.374284
    London Westminster 29.740050
    pm25 BETR801 23.169492
    London Westminster 13.443568

    7.3.宽到长

    no2_pivoted = no2.pivot(columns='location',values='value').reset_index()
    no2_pivoted
    
    location date.utc BETR801 FR04014 London Westminster
    0 2019-04-09 01:00:00+00:00 22.5 24.4 NaN
    1 2019-04-09 02:00:00+00:00 53.5 27.4 67.0
    2 2019-04-09 03:00:00+00:00 54.5 34.2 67.0
    3 2019-04-09 04:00:00+00:00 34.5 48.5 41.0
    4 2019-04-09 05:00:00+00:00 46.5 59.5 41.0
    ... ... ... ... ...
    1700 2019-06-20 20:00:00+00:00 NaN 21.4 NaN
    1701 2019-06-20 21:00:00+00:00 NaN 24.9 NaN
    1702 2019-06-20 22:00:00+00:00 NaN 26.5 NaN
    1703 2019-06-20 23:00:00+00:00 NaN 21.8 NaN
    1704 2019-06-21 00:00:00+00:00 NaN 20.0 NaN

    1705 rows × 4 columns

    no_2 = no2_pivoted.melt(id_vars='date.utc')
    no_2.head()
    
    date.utc location value
    0 2019-04-09 01:00:00+00:00 BETR801 22.5
    1 2019-04-09 02:00:00+00:00 BETR801 53.5
    2 2019-04-09 03:00:00+00:00 BETR801 54.5
    3 2019-04-09 04:00:00+00:00 BETR801 34.5
    4 2019-04-09 05:00:00+00:00 BETR801 46.5
    no_2 = no2_pivoted.melt(
        id_vars='date.utc',
        value_vars=["BETR801", "FR04014", "London Westminster"],
        value_name="NO_2",
        var_name="id_location",
    )
    no_2.head()
    

    8.怎样合并多个表格

    • merge() 按行合并

    • join() 按列合并

    核心知识点:

    • concat(): 多个表按行合并,按列合并

    • merge(): 类似数据库操作时,用这个

    8.1 concat()

    import pandas as pd
    
    data1 = "data/air_quality_no2_long.csv"
    data2 = "data/air_quality_pm25_long.csv"
    
    air_quality_no2 = pd.read_csv(data1)
    air_quality_pm25 = pd.read_csv(data2)
    
    air_quality = pd.concat([air_quality_no2,air_quality_pm25], axis=0)
    air_quality.head()
    
    city country date.utc location parameter value unit
    0 Paris FR 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 µg/m³
    1 Paris FR 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 µg/m³
    2 Paris FR 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 µg/m³
    3 Paris FR 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 µg/m³
    4 Paris FR 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 µg/m³
    print('Shape of the ``air_quality_pm25`` table: ', air_quality_pm25.shape)
    
    Shape of the ``air_quality_pm25`` table:  (1110, 7)
    
    print('Shape of the ``air_quality_no2`` table: ', air_quality_no2.shape)
    
    Shape of the ``air_quality_no2`` table:  (2068, 7)
    
    print('Shape of the resulting ``air_quality`` table: ', air_quality.shape)
    
    Shape of the resulting ``air_quality`` table:  (3178, 7)
    
    air_quality = air_quality.sort_values('date.utc')
    air_quality.head()
    
    city country date.utc location parameter value unit
    1109 London GB 2019-05-07 01:00:00+00:00 London Westminster pm25 8.0 µg/m³
    2067 London GB 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 µg/m³
    1098 Antwerpen BE 2019-05-07 01:00:00+00:00 BETR801 no2 50.5 µg/m³
    1003 Paris FR 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 µg/m³
    100 Antwerpen BE 2019-05-07 01:00:00+00:00 BETR801 pm25 12.5 µg/m³
    air_quality_ = pd.concat(
        [air_quality_pm25, air_quality_no2], 
        keys=["PM25", "NO2"]
    )
    air_quality_.head()
    
    city country date.utc location parameter value unit
    PM25 0 Antwerpen BE 2019-06-18 06:00:00+00:00 BETR801 pm25 18.0 µg/m³
    1 Antwerpen BE 2019-06-17 08:00:00+00:00 BETR801 pm25 6.5 µg/m³
    2 Antwerpen BE 2019-06-17 07:00:00+00:00 BETR801 pm25 18.5 µg/m³
    3 Antwerpen BE 2019-06-17 06:00:00+00:00 BETR801 pm25 16.0 µg/m³
    4 Antwerpen BE 2019-06-17 05:00:00+00:00 BETR801 pm25 7.5 µg/m³

    8.2 用公共标识进行表连接

    stations_coord = pd.read_csv("data/air_quality_stations.csv")
    stations_coord.head()
    
    location coordinates.latitude coordinates.longitude
    0 BELAL01 51.23619 4.38522
    1 BELHB23 51.17030 4.34100
    2 BELLD01 51.10998 5.00486
    3 BELLD02 51.12038 5.02155
    4 BELR833 51.32766 4.36226
    air_quality = pd.merge(air_quality, stations_coord, how="left", on="location")
    air_quality.head()
    
    city country date.utc location parameter value unit coordinates.latitude coordinates.longitude
    0 London GB 2019-05-07 01:00:00+00:00 London Westminster pm25 8.0 µg/m³ 51.49467 -0.13193
    1 London GB 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 µg/m³ 51.49467 -0.13193
    2 Antwerpen BE 2019-05-07 01:00:00+00:00 BETR801 no2 50.5 µg/m³ 51.20966 4.43182
    3 Paris FR 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 µg/m³ 48.83724 2.39390
    4 Paris FR 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 µg/m³ 48.83722 2.39390
    air_quality_parameters = pd.read_csv("data/air_quality_parameters.csv")
    air_quality_parameters.head()
    
    id description name
    0 bc Black Carbon BC
    1 co Carbon Monoxide CO
    2 no2 Nitrogen Dioxide NO2
    3 o3 Ozone O3
    4 pm10 Particulate matter less than 10 micrometers in... PM10
    air_quality = pd.merge(air_quality, air_quality_parameters,
                           how='left', left_on='parameter', right_on='id')
    air_quality.head()
    
    city country date.utc location parameter value unit coordinates.latitude coordinates.longitude id description name
    0 London GB 2019-05-07 01:00:00+00:00 London Westminster pm25 8.0 µg/m³ 51.49467 -0.13193 pm25 Particulate matter less than 2.5 micrometers i... PM2.5
    1 London GB 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 µg/m³ 51.49467 -0.13193 no2 Nitrogen Dioxide NO2
    2 Antwerpen BE 2019-05-07 01:00:00+00:00 BETR801 no2 50.5 µg/m³ 51.20966 4.43182 no2 Nitrogen Dioxide NO2
    3 Paris FR 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 µg/m³ 48.83724 2.39390 no2 Nitrogen Dioxide NO2
    4 Paris FR 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 µg/m³ 48.83722 2.39390 no2 Nitrogen Dioxide NO2

    9.怎样处理时间序列?

    pandas能很好的处理时间序列。

    核心知识点

    • 使用to_datetime函数或作为read函数的一部分,可以将有效的日期字符串转换为datetime对象。

    • pandas中的Datetime对象使用dt访问器支持计算、逻辑操作和方便的日期相关属性。

    • DatetimeIndex包含这些与日期相关的属性,并支持方便的切片。

    • 重采样是改变时间序列频率的一种强有力的方法。

    数据:

    import pandas as pd
    import matplotlib.pyplot as plt
    
    air_quality = pd.read_csv("data/air_quality_no2_long.csv")
    air_quality = air_quality.rename(columns={'date.utc': "datetime"})
    air_quality.head()
    
    city country datetime location parameter value unit
    0 Paris FR 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 µg/m³
    1 Paris FR 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 µg/m³
    2 Paris FR 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 µg/m³
    3 Paris FR 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 µg/m³
    4 Paris FR 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 µg/m³
    air_quality.city.unique()
    
    array(['Paris', 'Antwerpen', 'London'], dtype=object)
    

    9.1. 使用pandas日期时间属性

    datetime这一列是纯文字,改成datetime对象

    air_quality['datetime'] = pd.to_datetime(air_quality['datetime'])
    air_quality['datetime'].dtype
    
    datetime64[ns, UTC]
    

    在读取文件的时候,可以通过设置parse_dates=参数将文字形式的时间转换为时间戳,它可以作为对象进行处理。

    aq = pd.read_csv("data/air_quality_no2_long.csv", parse_dates=["date.utc"])
    aq['date.utc'].dtypes
    aq = aq.rename(columns={'date.utc': "datetime"})  # 改名后要赋值给原来的dataframe
    aq['datetime']
    
    0      2019-06-21 00:00:00+00:00
    1      2019-06-20 23:00:00+00:00
    2      2019-06-20 22:00:00+00:00
    3      2019-06-20 21:00:00+00:00
    4      2019-06-20 20:00:00+00:00
                      ...           
    2063   2019-05-07 06:00:00+00:00
    2064   2019-05-07 04:00:00+00:00
    2065   2019-05-07 03:00:00+00:00
    2066   2019-05-07 02:00:00+00:00
    2067   2019-05-07 01:00:00+00:00
    Name: datetime, Length: 2068, dtype: datetime64[ns, UTC]
    

    改成时间戳后,可以进行一些关于时间的操作。

    比如起始时间:

    air_quality["datetime"].min(), air_quality["datetime"].max()
    
    (Timestamp('2019-05-07 01:00:00+0000', tz='UTC'),
     Timestamp('2019-06-21 00:00:00+0000', tz='UTC'))
    
    air_quality["datetime"].max() - air_quality["datetime"].min()
    
    Timedelta('44 days 23:00:00')
    

    结果是一个pandas.Timedelta对象,类似于标准Python库中的datetime.timedelta,用于定义时间区间。

    还记得统计计算教程中groupby提供的split-apply-combine模式吗?这里,我们想要计算每个工作日和每个测量位置的给定统计量(例如,平均值(NO_2))。为了对工作日进行分组,我们使用pandas Timestamp的datetime属性weekday(星期一=0,星期日=6),dt访问器也可以访问该属性。可以对位置和工作日进行分组,以拆分这些组合的平均值计算。

    air_quality.groupby(
        [air_quality['datetime'].dt.weekday, 'location']
    )['value'].mean()
    
    datetime  location          
    0         BETR801               27.875000
              FR04014               24.856250
              London Westminster    23.969697
    1         BETR801               22.214286
              FR04014               30.999359
              London Westminster    24.885714
    2         BETR801               21.125000
              FR04014               29.165753
              London Westminster    23.460432
    3         BETR801               27.500000
              FR04014               28.600690
              London Westminster    24.780142
    4         BETR801               28.400000
              FR04014               31.617986
              London Westminster    26.446809
    5         BETR801               33.500000
              FR04014               25.266154
              London Westminster    24.977612
    6         BETR801               21.896552
              FR04014               23.274306
              London Westminster    24.859155
    Name: value, dtype: float64
    

    与前面的情况类似,我们想要计算一天中每个小时的给定统计数据(例如,mean (NO_2)),我们可以再次使用split-apply-combine方法。在这种情况下,我们使用pandas Timestamp的datetime属性hour,dt访问器也可以访问它。

    # 绘制图形
    fig, axs = plt.subplots(figsize=(12, 4))
    
    air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(
        kind='bar', rot=0, ax=axs
    )
    
    plt.xlabel("Hour of the day")
    plt.ylabel("$NO_2 (µg/m^3)$")
    
    Text(0, 0.5, '$NO_2 (µg/m^3)$')
    

    9.2. Datetime作为索引

    在'7.reshaping'的教程中,pivot()用于调整数据表的形状,将每个测量位置作为单独的列:

    no_2 = air_quality.pivot(index='datetime',columns='location',values='value')
    no_2.head()
    
    location BETR801 FR04014 London Westminster
    datetime
    --- --- --- ---
    2019-05-07 01:00:00+00:00 50.5 25.0 23.0
    2019-05-07 02:00:00+00:00 45.0 27.7 19.0
    2019-05-07 03:00:00+00:00 NaN 50.4 19.0
    2019-05-07 04:00:00+00:00 NaN 61.9 16.0
    2019-05-07 05:00:00+00:00 NaN 72.4 NaN

    通过旋转数据,datetime信息成为表的索引。通常,可以通过set_index函数将列设置为索引。

    使用日期时间索引(即DatetimeIndex)提供了强大的功能。例如,我们不需要dt访问器来获取时间序列属性,就可以直接在索引上使用这些属性:

    no_2.index.year, no_2.index.weekday
    
    (Int64Index([2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
                 ...
                 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019],
                dtype='int64', name='datetime', length=1033),
     Int64Index([1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                 ...
                 3, 3, 3, 3, 3, 3, 3, 3, 3, 4],
                dtype='int64', name='datetime', length=1033))
    

    其他优点包括方便地划分时段或调整时间尺度。举例:

    no_2["2019-05-20":"2019-05-21"].plot();
    

    通过提供一个解析为datetime的字符串,可以在DatetimeIndex上选择数据的特定子集。

    9.3. 调整时间频率

    monthly_max = no_2.resample("M").max()
    monthly_max
    
    location BETR801 FR04014 London Westminster
    datetime
    --- --- --- ---
    2019-05-31 00:00:00+00:00 74.5 97.0 97.0
    2019-06-30 00:00:00+00:00 52.5 84.7 52.0

    resample()方法类似于groupby操作:

    • 它通过使用定义目标频率的字符串(例如M, 5H,…)来提供基于时间的分组

    • 它需要一个聚合函数,如mean, max,…

    定义时,时间序列的频率由freq属性提供:

    monthly_max.index.freq
    
    <MonthEnd>
    
    no_2.resample("D").mean().plot(style="-o", figsize=(10, 5));
    

    10.怎样处理字符串数据?

    pandas同样有强大的能力处理字符串数据。

    核心知识点:

    • 可以使用str访问器使用字符串方法。

    • 字符串方法以元素为单位工作,可以用于条件索引。

    • replace方法是一种根据给定字典转换值的方便方法。

    数据:

    import pandas as pd
    
    titanic = pd.read_csv("data/titanic.csv")
    titanic.head()
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

    案例分析

    Q1:使所有名称字符小写。

    titanic["Name"].str.lower()
    
    0                                braund, mr. owen harris
    1      cumings, mrs. john bradley (florence briggs th...
    2                                 heikkinen, miss. laina
    3           futrelle, mrs. jacques heath (lily may peel)
    4                               allen, mr. william henry
                                 ...                        
    886                                montvila, rev. juozas
    887                         graham, miss. margaret edith
    888             johnston, miss. catherine helen "carrie"
    889                                behr, mr. karl howell
    890                                  dooley, mr. patrick
    Name: Name, Length: 891, dtype: object
    

    要使Name列中的每个字符串都小写,请选择Name列(参见关于数据选择的教程),添加str访问器并应用下面的方法。因此,每个字符串都按元素顺序进行了转换。

    与时间序列教程中的datetime对象具有dt访问器类似,在使用str访问器时,可以使用许多专门的字符串方法。这些方法的名称通常与单个元素的等效内置字符串方法相匹配,但是对每个列的值应用元素级(还记得元素级计算吗?)

    Q2:通过提取逗号前的部分,创建一个新列Surname,该列包含乘客的姓氏。

    titanic["Name"].str.split(",")
    
    0                             [Braund,  Mr. Owen Harris]
    1      [Cumings,  Mrs. John Bradley (Florence Briggs ...
    2                              [Heikkinen,  Miss. Laina]
    3        [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
    4                            [Allen,  Mr. William Henry]
                                 ...                        
    886                             [Montvila,  Rev. Juozas]
    887                      [Graham,  Miss. Margaret Edith]
    888          [Johnston,  Miss. Catherine Helen "Carrie"]
    889                             [Behr,  Mr. Karl Howell]
    890                               [Dooley,  Mr. Patrick]
    Name: Name, Length: 891, dtype: object
    

    使用Series.str.split()方法,每个值都作为包含2个元素的列表返回。第一个元素是逗号前的部分,第二个元素是逗号后的部分。

    titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)
    titanic['Surname']
    
    0         Braund
    1        Cumings
    2      Heikkinen
    3       Futrelle
    4          Allen
             ...    
    886     Montvila
    887       Graham
    888     Johnston
    889         Behr
    890       Dooley
    Name: Surname, Length: 891, dtype: object
    

    因为我们只对表示姓氏(元素0)的第一部分感兴趣,所以可以再次使用str访问器并应用Series.str.get()来提取相关部分。实际上,这些字符串函数可以连接起来一次组合多个函数!

    Q3:提取泰坦尼克号上的女伯爵的乘客数据。

    titanic["Name"].str.contains("Countess")
    
    0      False
    1      False
    2      False
    3      False
    4      False
           ...  
    886    False
    887    False
    888    False
    889    False
    890    False
    Name: Name, Length: 891, dtype: bool
    
    titanic[titanic['Name'].str.contains('Countess')]
    
    PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Surname
    759 760 1 1 Rothes, the Countess. of (Lucy Noel Martha Dye... female 33.0 0 0 110152 86.5 B77 S Rothes

    字符串方法Series.str.contains()检查列Name中的每个值,如果字符串包含单词Countess,则返回每个值True(Countess是名称的一部分)或False(Countess不是名称的一部分)。该输出可用于使用数据教程子集中介绍的条件(布尔)索引对数据进行子选择。因为泰坦尼克号上只有一位女伯爵,所以我们只得到一行数据。

    Series.str.extract()方法接受正则表达式

    Q4:泰坦尼克号上哪位乘客的名字最长?

    titanic['Name'].str.len()
    
    0      23
    1      51
    2      22
    3      44
    4      24
           ..
    886    21
    887    28
    888    40
    889    21
    890    19
    Name: Name, Length: 891, dtype: int64
    

    要获得最长的名称,我们首先必须获得name列中每个名称的长度。通过使用pandas字符串方法,Series.str.len()函数分别应用于每个名称(按元素顺序)。

    titanic['Name'].str.len().idxmax()
    
    307
    

    接下来,我们需要获得名称长度最大的表中的相应位置,最好是索引标签。idxmax()方法正是这样做的。它不是字符串方法,应用于整数,因此不使用str。

    titanic.loc[titanic['Name'].str.len().idxmax(),'Name']
    
    'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'
    

    根据行(307)和列(name)的索引名,我们可以使用loc操作符进行选择,这在subsetting教程中介绍过。

    Q5:在“性别”一栏,将“男性”的值替换为“M”,将“女性”的值替换为“F”。

    titanic["Sex_short"] = titanic["Sex"].replace({"male": "M", "female": "F"})
    titanic['Sex_short']
    
    0      M
    1      F
    2      F
    3      F
    4      M
          ..
    886    M
    887    F
    888    F
    889    M
    890    M
    Name: Sex_short, Length: 891, dtype: object
    

    虽然replace()不是一个字符串方法,但它提供了一种使用映射或词汇表来转换某些值的方便方法。它需要一个字典来定义{from: to}的映射。

  • 相关阅读:
    Java 入门 9 Random 类 随机数
    Java 入门16 面向对象实现之前的小案例购物车
    Java 入门10 java的内存分配,数组内存图( 方法区 栈内存 堆内存)
    JAVA 12 联系回顾之前知识
    Java 13 面向对象的编程 小案例 实现购物车
    面试泛型
    面试反射
    从HashMap的执行流程开始 揭开HashMap底层实现
    面试什么是面向对象?面向对象三大特性?
    面试JMM内存模型
  • 原文地址:https://www.cnblogs.com/watalo/p/16319286.html
Copyright © 2020-2023  润新知