• 利用python 学习数据分析 (学习四)


    内容学习自:

    Python for Data Analysis, 2nd Edition        

    就是这本

    纯英文学的很累,对不对取决于百度翻译了

    前情提要:

    各种方法贴:

      https://www.cnblogs.com/baili-luoyun/p/10250177.html

        内容提要:本次内容主要讲的是pands基本入门

          一:pandas 主要有两种数据结构

            Series,DataFrame

          二: Series 

            1:定义:

      Series是一种类似于一维数组的对象,它由一组数据(各种NumPy数据类型)以及一组与之相关的数据标签(即索引)组成

            2:表现形式

      Series的字符串表现形式为:索引在左边,值在右边。

            3:创建一个一维数组

    obj =pd.Series([4,5,6,7,8])        #创建一维数组
    print(obj)
    
    
    print(obj.index)
    print(obj.values)
    >>>>>>>>>
    0    4
    1    5
    2    6
    3    7
    4    8
    dtype: int64
    RangeIndex(start=0, stop=5, step=1)
    [4 5 6 7 8]

            4:通过索引获得内容

              1>:单索引

    obj1 = pd.Series([4,6,-7,-8],index=['d','a','b','c']) #修改索引
    print(obj1)
    >>>>
    #通过索引获得内容
    print(obj1['d'])
    >>>>

    d 4
    a 6
    b -7
    c -8
    dtype: int64
    4

              2>:多索引

    #多索引
    print(obj1[['d','a','c']])
    >>>>
    d    4
    a    6
    b   -7
    c   -8
    dtype: int64
    d    4
    a    6
    c   -8
    dtype: int64

              3>:布尔过滤

    print(obj1[obj1<0])
    >>>>

    d 4
    a 6
    b -7
    c -8
    dtype: int64
    b -7
    c -8
    dtype: int64

              4>:应用乘法

    print(obj1*2)
    >>>>>>>>>>
    d    4
    a    6
    b   -7
    c   -8
    dtype: int64
    d     8
    a    12
    b   -14
    c   -16
    dtype: int64

             5>:应用级函数

    print(np.exp(obj1))
    >>>>>
    d    4
    a    6
    b   -7
    c   -8
    dtype: int64
    d     54.598150
    a    403.428793
    b      0.000912
    c      0.000335
    dtype: float64

            6>:索引的映射关系

    print('b'in obj1)
    print('e'in obj1)
    
    >>>>>
    d    4
    a    6
    b   -7
    c   -8
    dtype: int64
    True
    False

            5 :创建字典的Series:

              1:>创建字典型Series

    sdata ={'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000 }
    obj3 =pd.Series(sdata)
    print(obj3)
    
    >>>>
    
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64

              2:>Series 插入index 和valuse

    sdata ={'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000 }
    obj3 =pd.Series(sdata)
    print(obj3)
    # 插入index 和valuse
    states =['California','Ohio','Oregon','Texas']
    obj4 =pd.Series(sdata,index=states)
    
    print(obj4)
    
    >>>>>>>>>>>>>>
    
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64

              3>:检测数据是否缺失

    l =pd.isnull(obj4)
    print(l)
    l2 =pd.notnull(obj4)
    print(l2)
    
    >>>>>>>>>>>>
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool

              4>:赋予名字

    obj4.name ='population'
    obj4.index.name ='state'
    print(obj4)
    >>>>>>>>>
    state
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    Name: population, dtype: float64

              5>:修改索引,修改索引的名字

    obj =pd.Series([4,7,-6,3])
    print(obj)
    obj.index=['bob','Steve','jeff','Ryan']
    print(obj)
    >>>>>>>>>
    0    4
    1    7
    2   -6
    3    3
    dtype: int64
    
    
    bob      4
    Steve    7
    jeff    -6
    Ryan     3
    dtype: int64

             三:DataFrame

        一:定义

      

    DataFrame是一个表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型(数值、字符串、布尔值等)。DataFrame既有行索引也有列索引,它可以被看做由Series组成的字典(共用同一个索引)。DataFrame中的数据是以一个或多个二维块存放的(而不是列表、字典或别的一维数据结构)
        二:创建
    data ={'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
           'year':[2000,2001,2002,2001,2002,2003],
           'pop':[1.5,1.7,3.6,2.4,2.8,3.2]
                  }
    frame =pd.DataFrame(data)
    print(frame)
    >>>>>>>>>
        state  year  pop
    0    Ohio  2000  1.5
    1    Ohio  2001  1.7
    2    Ohio  2002  3.6
    3  Nevada  2001  2.4
    4  Nevada  2002  2.8
    5  Nevada  2003  3.2
               2.1 head() #只获取前5行
    print(frame.head())
    
    
    >>>>>>>
    
        state  year  pop
    0    Ohio  2000  1.5
    1    Ohio  2001  1.7
    2    Ohio  2002  3.6
    3  Nevada  2001  2.4
    4  Nevada  2002  2.8

        

         2.2# 利用抬头排序


    print(pd.DataFrame(data,columns=['year','pop','state']))
    >>>>>>>>
       year  pop   state
    0  2000  1.5    Ohio
    1  2001  1.7    Ohio
    2  2002  3.6    Ohio
    3  2001  2.4  Nevada
    4  2002  2.8  Nevada
    5  2003  3.2  Nevada

          2.3:拆入数据如果找不到,缺失值,则返回None

    # #插入数据如果找不到,缺失值,则返回NaN
    #columns 列名
    #index 行名
    frame2 =pd.DataFrame(data,columns=['year','state','pop','debt'],
                         index=['one','two','three','four','five','six']
                                              )
    print(frame2)
    >>>>>>>>>>>>
           year   state  pop debt
    one    2000    Ohio  1.5  NaN
    two    2001    Ohio  1.7  NaN
    three  2002    Ohio  3.6  NaN
    four   2001  Nevada  2.4  NaN
    five   2002  Nevada  2.8  NaN
    six    2003  Nevada  3.2  NaN

       2.4:返回columns 的值

        

    print(frame2.columns)
    >>>>>>>>
    Index(['year', 'state', 'pop', 'debt'], dtype='object')

        2.5:通过标记,或者属性的方式,获取某一列的值

    # #单独获取某一列
    print(frame2['state'])
    print(frame2.year)
    print('>>>>>>>>>>>>>>>>>>')
    print(frame2['year'])
    
    >>>>>>>>>>>>>>
    one        Ohio
    two        Ohio
    three      Ohio
    four     Nevada
    five     Nevada
    six      Nevada
    Name: state, dtype: object
    one      2000
    two      2001
    three    2002
    four     2001
    five     2002
    six      2003
    Name: year, dtype: int64
    >>>>>>>>>>>>>>>>>>
    one      2000
    two      2001
    three    2002
    four     2001
    five     2002
    six      2003
    Name: year, dtype: int64

        2.6:loc 属性获取行的所有内容

    print(frame2.loc['three'])
    >>>>>>>>>>
    year     2002
    state    Ohio
    pop       3.6
    debt      NaN
    Name: three, dtype: object

        2.7:通过赋值的方式进行修改

    frame2['debt']=16.5
    print(frame2)
    >>>>>>>>
           year   state  pop  debt
    one    2000    Ohio  1.5  16.5
    two    2001    Ohio  1.7  16.5
    three  2002    Ohio  3.6  16.5
    four   2001  Nevada  2.4  16.5
    five   2002  Nevada  2.8  16.5
    six    2003  Nevada  3.2  16.5

       2.8:以 范围内容生成赋值

    frame2['dabt']=np.arange(6.)
    print(frame2)
    >>>>>>>>>>

    year state pop debt dabt
    one 2000 Ohio 1.5 NaN 0.0
    two 2001 Ohio 1.7 NaN 1.0
    three 2002 Ohio 3.6 NaN 2.0
    four 2001 Nevada 2.4 NaN 3.0
    five 2002 Nevada 2.8 NaN 4.0
    six 2003 Nevada 3.2 NaN 5.0

      2.9:以Series的方式进行赋值

    print(frame2)
    print(">>>>>>>>>>>>")
    val =pd.Series([-1.2,-1.5,-1.7],index =['two','four','five'])
    print(val)
    print(">>>>>>>>>>>>>>")
    frame2['debt'] =val
    print(frame2)
    >>>>>>>>>>>>>>>>>>>>>
           year   state  pop debt
    one    2000    Ohio  1.5  NaN
    two    2001    Ohio  1.7  NaN
    three  2002    Ohio  3.6  NaN
    four   2001  Nevada  2.4  NaN
    five   2002  Nevada  2.8  NaN
    six    2003  Nevada  3.2  NaN
    >>>>>>>>>>>>
    two    -1.2
    four   -1.5
    five   -1.7
    dtype: float64
    >>>>>>>>>>>>>>
           year   state  pop  debt
    one    2000    Ohio  1.5   NaN
    two    2001    Ohio  1.7  -1.2
    three  2002    Ohio  3.6   NaN
    four   2001  Nevada  2.4  -1.5
    five   2002  Nevada  2.8  -1.7
    six    2003  Nevada  3.2   NaN

       2.10:布尔型运算

    frame2['eastern'] =frame2.state =='Ohio'
    print(frame2)
    >>>>>>>>
           year   state  pop debt  eastern
    one    2000    Ohio  1.5  NaN     True
    two    2001    Ohio  1.7  NaN     True
    three  2002    Ohio  3.6  NaN     True
    four   2001  Nevada  2.4  NaN    False
    five   2002  Nevada  2.8  NaN    False
    six    2003  Nevada  3.2  NaN    False

      

      

  • 相关阅读:
    谷歌浏览器解决跨域
    实现Linux共享Window文件
    linux安装显卡驱动
    jsduck 文档生成器
    linux 笔记
    Linux phpstorm 无法输入中文
    linux 安装composer
    Extjs动态生成表头(适用报表)
    关于git的配置与使用
    JSP解决中文乱码问题
  • 原文地址:https://www.cnblogs.com/baili-luoyun/p/10268364.html
Copyright © 2020-2023  润新知