• pandas之数据选择


    pandas中有三种索引方法:.loc.iloc[],注意:.ix的用法在0.20.0中已经不建议使用

    import pandas as pd
    import numpy as np
    

    In [5]:

    dates = pd.date_range("20170101",periods=6)
    df1 = pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=["A","B","C","D"])
    df1
    

    Out[5]:

    A B C D
    2017-01-01 0 1 2 3
    2017-01-02 4 5 6 7
    2017-01-03 8 9 10 11
    2017-01-04 12 13 14 15
    2017-01-05 16 17 18 19
    2017-01-06 20 21 22 23

    In [6]:

    将dataframe的列获取为一个series

    df1["A"]#将dataframe的列获取为一个series
    

    Out[6]:

    2017-01-01     0
    2017-01-02     4
    2017-01-03     8
    2017-01-04    12
    2017-01-05    16
    2017-01-06    20
    Freq: D, Name: A, dtype: int32
    

    In [7]:

    df1.A#另一种获取
    

    Out[7]:

    2017-01-01     0
    2017-01-02     4
    2017-01-03     8
    2017-01-04    12
    2017-01-05    16
    2017-01-06    20
    Freq: D, Name: A, dtype: int32
    

    In [8]:

    切片,获取前2行

    df1[0:2]#切片,获取前2行
    

    Out[8]:

    A B C D
    2017-01-01 0 1 2 3
    2017-01-02 4 5 6 7

    In [9]:

    通过索引获取指定行

    df1["20170102":"20170104"]#通过索引获取指定行
    

    Out[9]:

    A B C D
    2017-01-02 4 5 6 7
    2017-01-03 8 9 10 11
    2017-01-04 12 13 14 15

    In [11]:

    通过标签选择数据

    #通过标签选择数据
    df1.loc["20170102"]
    

    Out[11]:

    A    4
    B    5
    C    6
    D    7
    Name: 2017-01-02 00:00:00, dtype: int32
    

    In [12]:

    提取某个行的指定列

    df1.loc["20170102",["A","C"]]#提取某个行的指定列
    

    Out[12]:

    A    4
    C    6
    Name: 2017-01-02 00:00:00, dtype: int32
    

    In [13]:

    df1.loc[:,["A","B"]]
    

    Out[13]:

    A B
    2017-01-01 0 1
    2017-01-02 4 5
    2017-01-03 8 9
    2017-01-04 12 13
    2017-01-05 16 17
    2017-01-06 20 21

    In [14]:

    通过位置选择数据

    #通过位置选择数据
    df1.iloc[2]#提取第二行
    

    Out[14]:

    A     8
    B     9
    C    10
    D    11
    Name: 2017-01-03 00:00:00, dtype: int32
    

    In [15]:

    df1.iloc[1:3,2:4]
    

    Out[15]:

    C D
    2017-01-02 6 7
    2017-01-03 10 11

    In [18]:

    提取不连续的行和列

    #提取不连续的行和列
    df1.iloc[[1,2,4],[1,3]]
    

    Out[18]:

    B D
    2017-01-02 5 7
    2017-01-03 9 11
    2017-01-05 17 19

    In [20]:

    #混合标签位置选择
    df1.ix[2:4,["A","C"]]
    
    c:userswuzsappdatalocalprogramspythonpython36-32libsite-packagesipykernel_launcher.py:2: FutureWarning: 
    .ix is deprecated. Please use
    .loc for label based indexing or
    .iloc for positional indexing
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
      
    c:userswuzsappdatalocalprogramspythonpython36-32libsite-packagespandascoreindexing.py:808: FutureWarning: 
    .ix is deprecated. Please use
    .loc for label based indexing or
    .iloc for positional indexing
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
      retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    

    Out[20]:

    A C
    2017-01-03 8 10
    2017-01-04 12 14

    In [23]:

    df1.ix["20170102":"20170104",2:4]
    
    c:userswuzsappdatalocalprogramspythonpython36-32libsite-packagesipykernel_launcher.py:1: FutureWarning: 
    .ix is deprecated. Please use
    .loc for label based indexing or
    .iloc for positional indexing
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
      """Entry point for launching an IPython kernel.
    

    Out[23]:

    C D
    2017-01-02 6 7
    2017-01-03 10 11
    2017-01-04 14 15

    In [24]:

    判断某一行的值大小

    #判断某一行的值大小
    df1.A >6
    

    Out[24]:

    2017-01-01    False
    2017-01-02    False
    2017-01-03     True
    2017-01-04     True
    2017-01-05     True
    2017-01-06     True
    Freq: D, Name: A, dtype: bool
    

    In [25]:

    df1[df1.A>6]#根据判断组成新的DataFrame
    

    Out[25]:

    A B C D
    2017-01-03 8 9 10 11
    2017-01-04 12 13 14 15
    2017-01-05 16 17 18 19
    2017-01-06 20 21 22 23

    In [ ]:

     
    
  • 相关阅读:
    Beats数据采集---PacketbeatFilebeatTopbeatWinlogBeat使用指南
    《书读完了》—— 随笔
    《历史的天空》—— 读后总结
    Hadoop HDFS 用户指南
    单节点部署Hadoop教程
    [收藏]IntelliJ Idea快捷键
    《鬼谷子的局5》—— 读后总结
    Logstash为什么那么慢?—— json序列化
    《一线架构师实践指南》—— 读后总结
    Oracle Redo 以及 Archived日志简述
  • 原文地址:https://www.cnblogs.com/mrwuzs/p/11325021.html
Copyright © 2020-2023  润新知