• Python pandas DataFrame操作


    1. 从字典创建Dataframe

    >>> import pandas as pd
    >>> dict1 = {'col1':[1,2,5,7],'col2':['a','b','c','d']}
    >>> df = pd.DataFrame(dict1)
    >>> df
       col1 col2
    0     1    a
    1     2    b
    2     5    c
    3     7    d

    2. 从列表创建Dataframe (先把列表转化为字典,再把字典转化为DataFrame)

    >>> lista = [1,2,5,7]
    >>> listb = ['a','b','c','d']
    >>> df = pd.DataFrame({'col1':lista,'col2':listb})
    >>> df
       col1 col2
    0     1    a
    1     2    b
    2     5    c
    3     7    d

    3. 从列表创建DataFrame,指定data和columns

    >>> a = ['001','zhangsan','M']
    >>> b = ['002','lisi','F']
    >>> c = ['003','wangwu','M']
    >>> df = pandas.DataFrame(data=[a,b,c],columns=['id','name','sex'])
    >>> df
        id      name sex
    0  001  zhangsan   M
    1  002      lisi   F
    2  003    wangwu   M

    4. 修改列名,从['id','name','sex']修改为['Id','Name','Sex']

    >>> df.columns = ['Id','Name','Sex']
    >>> df
        Id      Name Sex
    0  001  zhangsan   M
    1  002      lisi   F
    2  003    wangwu   M

    5. 调整DataFrame列顺序、调整列编号从1开始

    http://www.cnblogs.com/huahuayu/p/8324755.html 

    6. DataFrame随机生成10行4列int型数据

    >>> import pandas
    >>> import numpy
    >>> df = pandas.DataFrame(numpy.random.randint(0,100,size=(10, 4)), columns=list('ABCD')) # 0,100指定随机数为0到100之间(包括0,不包括100),size = (10,4)指定数据为10行4列,column指定列名
    >>> df
        A   B   C   D
    0  67  28  37  66
    1  21  27  43  37
    2  73  54  98  85
    3  40  78   4  93
    4  99  60  63  16
    5  48  46  24  61
    6  59  52  62  28
    7  20  74  36  64
    8  14  13  46  60
    9  18  44  70  36

    7. 用时间序列做index名

    >>> df # 原本index为自动生成的0~9
        A   B   C   D
    0  31  25  45  67
    1  62  12  61  88
    2  79  36  20  97
    3  26  57  50  44
    4  24  12  50   1
    5   4  61  99  62
    6  40  47  52  27
    7  83  66  71   4
    8  58  59  25  62
    9  38  81  60   8
    >>> import pandas
    >>> dates = pandas.date_range('20180121',periods=10)
    >>> dates # 从20180121开始,共10天
    DatetimeIndex(['2018-01-21', '2018-01-22', '2018-01-23', '2018-01-24',
                   '2018-01-25', '2018-01-26', '2018-01-27', '2018-01-28',
                   '2018-01-29', '2018-01-30'],
                  dtype='datetime64[ns]', freq='D')
    >>> df.index = dates # 将dates赋值给index
    >>> df
                 A   B   C   D
    2018-01-21  31  25  45  67
    2018-01-22  62  12  61  88
    2018-01-23  79  36  20  97
    2018-01-24  26  57  50  44
    2018-01-25  24  12  50   1
    2018-01-26   4  61  99  62
    2018-01-27  40  47  52  27
    2018-01-28  83  66  71   4
    2018-01-29  58  59  25  62
    2018-01-30  38  81  60   8

    8. dataframe 实现类SQL操作

    pandas官方文档 Comparison with SQL

    https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html

    【Python实战】Pandas:让你像写SQL一样做数据分析(一)

    https://www.cnblogs.com/en-heng/category/778194.html 

  • 相关阅读:
    springmvc
    POJ 3683 Priest John's Busiest Day
    POJ 3678 Katu Puzzle
    HDU 1815 Building roads
    CDOJ UESTC 1220 The Battle of Guandu
    HDU 3715 Go Deeper
    HDU 3622 Bomb Game
    POJ 3207 Ikki's Story IV
    POJ 3648 Wedding
    HDU 1814 Peaceful Commission
  • 原文地址:https://www.cnblogs.com/huahuayu/p/8227494.html
Copyright © 2020-2023  润新知