• pandas 处理数据中NaN数据


    使用dropna()函数去掉NaN的行或列

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    print(df.dropna(axis=0,how='any'))

    输出:

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用fillna()函数替换NaN值

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #将NaN值替换为0
    print(df.fillna(value=0))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                 A     B     C   D
    2018-03-10   0   0.0   2.0   3
    2018-03-11   4   5.0   0.0   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23

    使用isnull()函数判断数据是否丢失

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #矩阵用布尔来进行表示 是nan为ture 不是nan为false
    print(pd.isnull(df))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
                    A      B      C      D
    2018-03-10  False   True  False  False
    2018-03-11  False  False   True  False
    2018-03-12  False  False  False  False
    2018-03-13  False  False  False  False
    2018-03-14  False  False  False  False
    2018-03-15  False  False  False  False

    #判断数据中是否会存在NaN值 

    import pandas as pd
    import pickle
    import numpy as np
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1,2]=np.nan
    print(df)
    #判断数据中是否会存在NaN值
    print(np.any(df.isnull()))

    输出

                 A     B     C   D
    2018-03-10   0   NaN   2.0   3
    2018-03-11   4   5.0   NaN   7
    2018-03-12   8   9.0  10.0  11
    2018-03-13  12  13.0  14.0  15
    2018-03-14  16  17.0  18.0  19
    2018-03-15  20  21.0  22.0  23
    True

  • 相关阅读:
    CentOS网络接口配置文件ifcfgeth详解
    python session
    Plateau problem
    Maximum subsequence sum
    回溯法解符号三角形
    切莫开一块地荒一块地
    BackTracking_Fixed sum for array elements
    DP_LCS
    Shortest distance between two arrays
    BSP 面试总结
  • 原文地址:https://www.cnblogs.com/sea-stream/p/10319470.html
Copyright © 2020-2023  润新知