• 数据清洗与准备


    数据清洗与准备

    1、抽样:

    import numpy as np
    import pandas as pd
    
    choices = pd.Series([5,7,-1,6,4])
    draws = choices.sample(n=10, replace=True)
    draws
    

     OUT:

    0    5
    1    7
    3    6
    2   -1
    4    4
    4    4
    4    4
    2   -1
    3    6
    2   -1
    dtype: int64

    2、分割:
    x = 'a|b|c'
    x.split('|')
    ['a', 'b', 'c']

    3、取唯一值:
    l1 = ['a','a', 'c','b',  'b', 'c','c']
    pd.unique(l1)
    
    array(['a', 'c', 'b'], dtype=object)

    4、索引取值:
    data = pd.DataFrame(np.arange(16).reshape((4,4)), index=['one','two','three','four'], columns=['a','b','c','d'])
    data
    data.columns.get_indexer(['c','a', 'b' ])
    

     

     abcd
    one 0 1 2 3
    two 4 5 6 7
    three 8 9 10 11
    four 12 13 14 15
    array([2, 0, 1])
    data.iloc[1,data.columns.get_indexer(['c','a', 'b' ])] =88
    data
    

      

     abcd
    one 0 1 2 3
    two 88 88 88 7
    three 8 9 10 11
    four 12 13 14 15
    value = data.iloc[:2,data.columns.get_indexer(['c','a', 'b' ])]
    value
    

      

     cab
    one 2 0 1
    two 88 88 88
    value2 = data.loc[['one','two'],['c','a', 'b' ]]
    value2
    

      

     cab
    one 2 0 1
    two 88 88 88

    5、筛选行与列

    data = pd.DataFrame(np.arange(16).reshape((4,4)), index=['one','two','three','four'], columns=['a','b','c','d'])
    data
    

      

     abcd
    one 0 1 2 3
    two 4 5 6 7
    three 8 9 10 11
    four 12 13 14 15
    data > 5
    

      

     abcd
    one False False False False
    two False False True True
    three True True True True
    four True True True True
    data[data>5]
    

      

     abcd
    one NaN NaN NaN NaN
    two NaN NaN 6.0 7.0
    three 8.0 9.0 10.0 11.0
    four 12.0 13.0 14.0 15.0
    data[(data>5).any(1)]#轴1方向上,选出值大于5的行(至少有一个值大于5)
    

      

     abcd
    two 4 5 6 7
    three 8 9 10 11
    four 12 13 14 15
    (data>5).any(0)  #轴0方向上,是否有值大于5的列
    

    a True

    b    True
    c    True
    d    True 

    dtype: bool 

    data.loc[:,(data>5).any(0)]#选出值大于5的列(至少有一个值大于5)
    

      

     abcd
    one 0 1 2 3
    two 4 5 6 7
    three 8 9 10 11
    four 12 13 14 15
  • 相关阅读:
    页面性能优化之内容加载优化(转)
    (Windbg调试六)c++句柄泄漏问题定位
    (Windbg调试五)C++内存越界导致的std::map异常
    (Windbg调试四)C++死锁问题定位与分析
    (Windbg调试三)C++ delete指针后依然可以访问的问题
    (Windbg调试二)Windows下c++程序崩溃问题定位
    (Windbg调试一)minidump崩溃捕捉
    WinDbg调试:配置和查看符号
    为WinDbg设置符号文件路径
    Windbg符号与源码 《第二篇》
  • 原文地址:https://www.cnblogs.com/djlbolgs/p/12507162.html
Copyright © 2020-2023  润新知