• pandas 处理数据中NaN数据


    使用dropna()函数去掉NaN的行或列

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    print(df.dropna(axis=0,how='any'))

    输出:

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用fillna()函数替换NaN值

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #将NaN值替换为0
    print(df.fillna(value=0))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-10   0   0.0   2.0   3
    2018-03-11   4   5.0   0.0   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用isnull()函数判断数据是否丢失

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #矩阵用布尔来进行表示 是nan为ture 不是nan为false
    print(pd.isnull(df))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                    A      B      C      D
    2018-03-10  False   True  False  False
    2018-03-11  False  False   True  False
    2018-03-12  False  False  False  False
    2018-03-13  False  False  False  False
    2018-03-14  False  False  False  False
    2018-03-15  False  False  False  False

    #判断数据中是否会存在NaN值 

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #判断数据中是否会存在NaN值
    print(np.any(df.isnull()))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
    True

  • 相关阅读:
    xapian的使用
    Andriod 环境配置以及第一个Android Application Project
    2013Esri全球用户大会之ArcGIS for Server&Portal for ArcGIS
    window server 2012 更改密钥 更改系统序列号
    持续集成之路——数据访问层的单元测试(续)
    多项式相乘与相加演示
    hdu 1847 博弈基础题 SG函数 或者规律2种方法
    solaris之cpu
    Android音效SoundPool问题:soundpool 1 not retry
    poj1845-Sumdiv
  • 原文地址:https://www.cnblogs.com/sea-stream/p/10319470.html
Copyright © 2020-2023  润新知