• dataframe按值(非索引)查找多行


    很多情况下,我们会根据一个dataframe里面的值来查找而不是根据索引来查找。

    首先我们创建一个dataframe:

    >>> col = ["id","name","sex","age"]
    
    >>> name = {1:"chen",2:"wang",3:"hu",4:"lee",5:"liu"}
    >>> id = range(1,6)
    >>> sex = {1:1,2:0,3:1,4:1,5:0}
    >>> age = {1:20,2:18,3:21,4:20,5:18}
    >>> data = {"id":id,"name":name,"sex":sex,"age":age}
    
    >>> data
    {'sex': {1: 1, 2: 0, 3: 1, 4: 1, 5: 0}, 'age': {1: 20, 2: 18, 3: 21, 4: 20, 5: 18}, 'name': {1: 'chen', 2: 'wang', 3: 'hu', 4: 'lee', 5: 'liu'}, 'id': range(1, 6)}
    
    >>> df = pd.DataFrame(data,columns=col,index=id)
    >>> df
       id  name  sex  age
    1   1  chen    1   20
    2   2  wang    0   18
    3   3    hu    1   21
    4   4   lee    1   20
    5   5   liu    0   18
    
    
    >>> df = df.set_index("id")
    
    >>> df.set_index("id")
        name  sex  age
    id
    1   chen    1   20
    2   wang    0   18
    3     hu    1   21
    4    lee    1   20
    5    liu    0   18

    如果我们要选年龄大于等于20岁的,这个好办:

    >>> df[df["age"]>=20]
        name  sex  age
    id
    1   chen    1   20
    3     hu    1   21
    4    lee    1   20

    或者选出所有女生(sex=0的),也好办:

    >>> df[df["sex"]==0]
        name  sex  age
    id
    2   wang    0   18
    5    liu    0   18

    也可用where,但不太方便:(一般不会这样用)

    >>> df.where(df["sex"]==0)
        name  sex   age
    id
    1    NaN  NaN   NaN
    2   wang  0.0  18.0
    3    NaN  NaN   NaN
    4    NaN  NaN   NaN
    5    liu  0.0  18.0
    >>> df.where(df["age"]>=20)
        name  sex   age
    id
    1   chen  1.0  20.0
    2    NaN  NaN   NaN
    3     hu  1.0  21.0
    4    lee  1.0  20.0
    5    NaN  NaN   NaN

    但是如果要按名字来选出,就不能这样了,得用.isin()方法。

    >>> select_name = ["chen","lee","liu"]
    
    >>> df[df["name"]==select_name]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "E:Python3libsite-packagespandascoreops.py", line 855, in wrapper
        res = na_op(values, other)
      File "E:Python3libsite-packagespandascoreops.py", line 759, in na_op
        result = _comp_method_OBJECT_ARRAY(op, x, y)
      File "E:Python3libsite-packagespandascoreops.py", line 737, in _comp_method_OBJECT_ARRAY
        result = lib.vec_compare(x, y, op)
      File "pandaslib.pyx", line 868, in pandas.lib.vec_compare (pandaslib.c:15418)
    ValueError: Arrays were different lengths: 5 vs 3
    # 可以看到匹配会出错
    
    
    >>> df[df["name"].isin(select_name)]
        name  sex  age
    id
    1   chen    1   20
    4    lee    1   20
    5    liu    0   18

    如果要选出既是属于名字里的又是男生(sex=1):

    >>> df[df["name"].isin(select_name) & df["sex"]==1]
        name  sex  age
    id
    1   chen    1   20
    4    lee    1   20

    这里如果用

    >>> df.isin({"name":select_name,"sex":[1]})
         name    sex    age
    id
    1    True   True  False
    2   False  False  False
    3   False   True  False
    4    True   True  False
    5    True  False  False
    
    >>> df[df.isin({"name":select_name,"sex":[1]})] # 这里得是[1],非1
        name  sex  age
    id
    1   chen  1.0  NaN
    2    NaN  NaN  NaN
    3    NaN  1.0  NaN
    4    lee  1.0  NaN
    5    liu  NaN  NaN

    好像并不好。

  • 相关阅读:
    什么是面向对象(OOP)
    Java虚拟机(JVM)你只要看这一篇就够了!
    ES6中新增的Object.assign()方法详解
    微信小程序_专题_脚本之家(小程序全部知识点)
    微信小程序 生命周期详解
    vue 阻止事件冒泡,捕获方法
    Java必备常见单词
    JS夸页面通信极简方案&纯前端实现文件下载
    vue keep-alive以及activated,deactivated生命周期的用法
    JVM实用参数 内存调优
  • 原文地址:https://www.cnblogs.com/cymwill/p/8494302.html
Copyright © 2020-2023  润新知