• 3-1 Pandas-概述


     Pandas章节应用的数据可以在以下链接下载:

    https://files.cnblogs.com/files/AI-robort/Titanic_Data-master.zip

               Pandas:数据分析处理库

    In [1]:
    import pandas as pd
    
    In [4]:
    df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
    
     

    .head():可以读取前几条数据,或指定前几条都可以

    In [5]:
    df.head(6)
    
    Out[5]:
     
     PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
    5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
     

    .info():返回当前的信息

    In [6]:
    df.info()
    
     
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 891 entries, 0 to 890
    Data columns (total 12 columns):
    PassengerId    891 non-null int64
    Survived       891 non-null int64
    Pclass         891 non-null int64
    Name           891 non-null object
    Sex            891 non-null object
    Age            714 non-null float64
    SibSp          891 non-null int64
    Parch          891 non-null int64
    Ticket         891 non-null object
    Fare           891 non-null float64
    Cabin          204 non-null object
    Embarked       889 non-null object
    dtypes: float64(2), int64(5), object(5)
    memory usage: 83.6+ KB
    
     

    查看表格的各项属性和细节

    In [7]:
    df.index#索引值的属性
    
    Out[7]:
    RangeIndex(start=0, stop=891, step=1)
    In [8]:
    df.columns#每一列的名字
    
    Out[8]:
    Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
           'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
          dtype='object')
    In [9]:
    df.dtypes#每一列的值的类型
    
    Out[9]:
    PassengerId      int64
    Survived         int64
    Pclass           int64
    Name            object
    Sex             object
    Age            float64
    SibSp            int64
    Parch            int64
    Ticket          object
    Fare           float64
    Cabin           object
    Embarked        object
    dtype: object
    In [10]:
    df.values#每行的值
    
    Out[10]:
    array([[1, 0, 3, ..., 7.25, nan, 'S'],
           [2, 1, 1, ..., 71.2833, 'C85', 'C'],
           [3, 1, 3, ..., 7.925, nan, 'S'],
           ...,
           [889, 0, 3, ..., 23.45, nan, 'S'],
           [890, 1, 1, ..., 30.0, 'C148', 'C'],
           [891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
     

    自己创建data_frame数据

    In [11]:
    data={'country':['aaa','bbb','ccc'],'population':[10,12,14]}
    df_data=pd.DataFrame(data)
    df_data
    
    Out[11]:
     
     countrypopulation
    0 aaa 10
    1 bbb 12
    2 ccc 14
    In [12]:
    df_data.info()
    
     
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 2 columns):
    country       3 non-null object
    population    3 non-null int64
    dtypes: int64(1), object(1)
    memory usage: 128.0+ bytes
    
    In [15]:
    age=df['Age']#搜索对应的一列
    age[:5]#显示前5行数据
    
    Out[15]:
    0    22.0
    1    38.0
    2    26.0
    3    35.0
    4    35.0
    Name: Age, dtype: float64
     

    series:dataframe中的一行/列

    In [16]:
    age.index
    
    Out[16]:
    RangeIndex(start=0, stop=891, step=1)
    In [17]:
    age.values[:5]
    
    Out[17]:
    array([22., 38., 26., 35., 35.])
    In [18]:
    df.head()
    
    Out[18]:
     
     PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
    In [19]:
    df['Age'][:5]
    
    Out[19]:
    0    22.0
    1    38.0
    2    26.0
    3    35.0
    4    35.0
    Name: Age, dtype: float64
     

    改变索引对象

    In [20]:
    df=df.set_index('Name')
    df.head()
    
    Out[20]:
     
     PassengerIdSurvivedPclassSexAgeSibSpParchTicketFareCabinEmbarked
    Name           
    Braund, Mr. Owen Harris 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN S
    Cumings, Mrs. John Bradley (Florence Briggs Thayer) 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
    Heikkinen, Miss. Laina 3 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
    Allen, Mr. William Henry 5 0 3 male 35.0 0 0 373450 8.0500 NaN S
    In [21]:
    df['Age'][:5]
    
    Out[21]:
    Name
    Braund, Mr. Owen Harris                                22.0
    Cumings, Mrs. John Bradley (Florence Briggs Thayer)    38.0
    Heikkinen, Miss. Laina                                 26.0
    Futrelle, Mrs. Jacques Heath (Lily May Peel)           35.0
    Allen, Mr. William Henry                               35.0
    Name: Age, dtype: float64
    In [25]:
    age=df['Age']
    age[:5]
    
    Out[25]:
    Name
    Braund, Mr. Owen Harris                                22.0
    Cumings, Mrs. John Bradley (Florence Briggs Thayer)    38.0
    Heikkinen, Miss. Laina                                 26.0
    Futrelle, Mrs. Jacques Heath (Lily May Peel)           35.0
    Allen, Mr. William Henry                               35.0
    Name: Age, dtype: float64
    In [26]:
    age['Allen, Mr. William Henry']#索引名字对应的值
    
    Out[26]:
    35.0
    In [27]:
    age=age+10
    age[:5]
    
    Out[27]:
    Name
    Braund, Mr. Owen Harris                                32.0
    Cumings, Mrs. John Bradley (Florence Briggs Thayer)    48.0
    Heikkinen, Miss. Laina                                 36.0
    Futrelle, Mrs. Jacques Heath (Lily May Peel)           45.0
    Allen, Mr. William Henry                               45.0
    Name: Age, dtype: float64
     

    对值统计指标

    In [28]:
    age.mean()
    
    Out[28]:
    39.69911764705882
    In [29]:
    age.max()
    
    Out[29]:
    90.0
    In [30]:
    age.min()
    
    Out[30]:
    10.42
    In [31]:
    df.describe()####整体一次性统计各项的指标基本统计特性
    
    Out[31]:
     
     PassengerIdSurvivedPclassAgeSibSpParchFare
    count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
    mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
    std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
    min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
    25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
    50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
    75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
    max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
  • 相关阅读:
    MFC中添加ToolTip提示框
    神经网络算法程序
    DOS命令大全(经典收藏)
    axure团队合作开发原型图
    POJ 3233 Matrix Power Series(矩阵高速功率+二分法)
    了解你的家公家IP
    HDOJ 3518 Boring counting
    模板方法模式的房间改造-组合查询
    6最好的之一 HTML5/CSS3 演示(PPT)框架
    试想一下,在代码学习Swift!
  • 原文地址:https://www.cnblogs.com/AI-robort/p/11636703.html
Copyright © 2020-2023  润新知