• Elastic Search 的搜索


    批量导入数据

    ES 提供了一个叫 bulk 的 API 来进行批量操作

    在ES安装目录下新建一个文件,文件名可以自定义,这里是player

    文件内容如下:

    {"index":{"_index":"nba","_type":"_doc","_id":"1"}}
    {"countryEn":"United States","teamName":"老鹰","birthDay":831182400000,"country":"美国","teamCityEn":"Atlanta","code":"jaylen_adams","displayAffiliation":"United States","displayName":"杰伦 亚当斯","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"86.2 公斤","teamCity":"亚特兰大","playYear":1,"jerseyNo":"10","teamNameEn":"Hawks","draft":2018,"displayNameEn":"Jaylen Adams","heightValue":1.88,"birthDayStr":"1996-05-04","position":"后卫","age":23,"playerId":"1629121"}
    {"index":{"_index":"nba","_type":"_doc","_id":"2"}}
    {"countryEn":"New Zealand","teamName":"雷霆","birthDay":743140800000,"country":"新西兰","teamCityEn":"Oklahoma City","code":"steven_adams","displayAffiliation":"Pittsburgh/New Zealand","displayName":"斯蒂文 亚当斯","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"120.2 公斤","teamCity":"俄克拉荷马城","playYear":6,"jerseyNo":"12","teamNameEn":"Thunder","draft":2013,"displayNameEn":"Steven Adams","heightValue":2.13,"birthDayStr":"1993-07-20","position":"中锋","age":26,"playerId":"203500"}

    注意:最后需要空一行

    执行以下命令,可以将文件里的数据批量导入

    curl -X POST "localhost:9200/_bulk" -H "Content-Type: application/json" --data-binary @player

    ES 之 term 的多种查询

    单词级别查询:这些查询通常用于结构化的数据,比如:number, date, keyword 等,而不是对 text。也就是说,全文本查询之前要先对文本内容进行分词,而单词级别的查询直接在相应字段的反向索引中精确查找,单词级别的查询一般用于数值、日期等类型的字段上。

    准备工作

    • 删除nba索引
    • 新增nba索引
    • POST:localhost:9200/nba/_mapping
      
      {
          "properties":{
              "birthDay":{
                  "type":"date"
              },
              "birthDayStr":{
                  "type":"keyword"
              },
              "age":{
                  "type":"integer"
              },
              "code":{
                  "type":"text"
              },
              "country":{
                  "type":"text"
              },
              "countryEn":{
                  "type":"text"
              },
              "displayAffiliation":{
                  "type":"text"
              },
              "displayName":{
                  "type":"text"
              },
              "displayNameEn":{
                  "type":"text"
              },
              "draft":{
                  "type":"long"
              },
              "heightValue":{
                  "type":"float"
              },
              "jerseyNo":{
                  "type":"text"
              },
              "playYear":{
                  "type":"long"
              },
              "playerId":{
                  "type":"keyword"
              },
              "position":{
                  "type":"text"
              },
              "schoolType":{
                  "type":"text"
              },
              "teamCity":{
                  "type":"text"
              },
              "teamCityEn":{
                  "type":"text"
              },
              "teamConference":{
                  "type":"keyword"
              },
              "teamConferenceEn":{
                  "type":"keyword"
              },
              "teamName":{
                  "type":"keyword"
              },
              "teamNameEn":{
                  "type":"keyword"
              },
              "weight":{
                  "type":"text"
              }
          }
      }
    • 批量导入数据(player文件)

    Term query

    精准匹配查询(查找号码为23的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "jerseyNo": "23"
            }
        }
    }

    Exsit Query

    在特定的字段中查找非空值的文档(查找队名非空的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "exists": {
                "field": "teamNameEn"
            }
        }
    }

    Prefix Query

    查找包含带有指定前缀 term 的文档(查找队名以Rock开头的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "prefix": {
                "teamNameEn": "Rock"
            }
        }
    }

    Wildcard Query

    支持通配符查询,*表示任意字符,?表示任意单个字符(查找火箭队的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "wildcard": {
                "teamNameEn": "Ro*s"
            }
        }
    }

    Regexp Query

    正则表达式查询(查找火箭队的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "regexp": {
                "teamNameEn": "Ro.*s"
            }
        }
    }

    Ids Query

    id 查询(查找id为1和2的球员)

    POST:localhost:9200/nba/_search
    {
        "query": {
            "ids": {
                "values": [1,2]
            }
        }
    }

    ES 的范围查询

    查找指定字段在指定范围内包含值(日期、数字或字符串)的文档。

    查找在nba打了2年到10年以内的球员
    POST:localhost:9200/nba/_search
    {
        "query": {
            "range": {
                "playYear": {
                    "gte": 2,
                    "lte": 10
                }
            }
        }
    }
    
    
    查找1980年到1999年出生的球员
    POST:localhost:9200/nba/_search
    {
        "query": {
            "range": {
                "birthDay": {
                    "gte": "01/01/1999",
                    "lte": "2022",
                    "format": "dd/MM/yyyy||yyyy"
                }
            }
        }
    }

    ES 的布尔查询

    • must:必须出现在匹配文档中
    • filter:必须出现在文档中,但是不打分
    • must_not:不能出现在文档中
    • should:应该出现在文档中

    must

    查找名字叫做 James 的球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "displayNameEn": "james"
                        }
                    }
                ]
            }
        }
    }

    filter

    效果同 must,但是不打分(查找名字叫做 James 的球员)

    must_not

    查找名字叫做 James 的西部球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "displayNameEn": "james"
                        }
                    }
                ],
                "must_not": [
                    {
                        "term": {
                            "teamConferenceEn": {
                                "value": "Eastern"
                            }
                        }
                    }
                ]
            }
        }
    }

    should

    即使匹配不到也返回,只是评分不同

    查找名字叫做James的打球时间应该在11到20年西部球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "displayNameEn": "james"
                        }
                    }
                ],
                "must_not": [
                    {
                        "term": {
                            "teamConferenceEn": {
                                "value": "Eastern"
                            }
                        }
                    }
                ],
                "should": [
                    {
                        "range": {
                            "playYear": {
                                "gte": 11,
                                "lte": 20
                            }
                        }
                    }
                ]
            }
        }
    }

    如果 minimum_should_match=1,则变成要查出名字叫做 James 的打球时间在11到20年西部球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "displayNameEn": "james"
                        }
                    }
                ],
                "must_not": [
                    {
                        "term": {
                            "teamConferenceEn": {
                                "value": "Eastern"
                            }
                        }
                    }
                ],
                "should": [
                    {
                        "range": {
                            "playYear": {
                                "gte": 11,
                                "lte": 20
                            }
                        }
                    }
                ],
                "minimum_should_match": 1
            }
        }
    }

    ES 的排序查询

    火箭队中按打球时间从大到小排序的球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "match": {
                "teamNameEn": "Rockets"
            }
        },
        "sort": [
            {
                "playYear": {
                    "order": "desc"
                }
            }
        ]
    }

    火箭队中按打球时间从大到小,如果年龄相同则按照身高从高到低排序的球员

    POST:localhost:9200/nba/_search
    {
        "query": {
            "match": {
                "teamNameEn": "Rockets"
            }
        },
        "sort": [
            {
                "playYear": {
                    "order": "desc"
                }
            },
            {
                "heightValue": {
                    "order": "asc"
                }
            }
        ]
    }

    ES 聚合查询之指标聚合

    ES 聚合分析是什么

    聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES 作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

    对一个数据集求最大、最小、和、平均值等指标的聚合,在 ES 中称为指标聚合;而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组 group by,再在组上进行指标聚合,这在 ES 中称为桶聚合。

    max/min/sum/avg

    求出火箭队球员的平均年龄
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "avgAge": {
                "avg": {
                    "field": "age"
                }
            }
        },
        "size": 0
    }

    value_count

    统计非空字段的文档数

    求出火箭队中球员打球时间不为空的数量
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "countPlayerYear": {
                "value_count": {
                    "field": "playYear"
                }
            }
        },
        "size": 0
    }
    
    查出火箭队有多少名球员
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        }
    }

    Cardinality

    值去重计数

    查出火箭队中年龄不同的数量
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "counAget": {
                "cardinality": {
                    "field": "age"
                }
            }
        },
        "size": 0
    }

    stats

    统计 count max min avg sum 5个值

    查出火箭队球员的年龄stats
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "statsAge": {
                "stats": {
                    "field": "age"
                }
            }
        },
        "size": 0
    }

    Extended stats

    比 stats 多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间

    查出火箭队球员的年龄 Extend stats
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "extendStatsAge": {
                "extended_stats": {
                    "field": "age"
                }
            }
        },
        "size": 0
    }

    Percentiles

    占比百分位对应的值统计,默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值

    查出火箭的球员的年龄占比
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "pecentAge": {
                "percentiles": {
                    "field": "age"
                }
            }
        },
        "size": 0
    }
    
    查出火箭的球员的年龄占比(指定分位值)
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "percentAge": {
                "percentiles": {
                    "field": "age",
                    "percents": [
                        20,
                        50,
                        75
                    ]
                }
            }
        },
        "size": 0
    }

    ES 聚合查询之桶聚合

    Terms Aggregation

    根据字段项分组聚合

    火箭队根据年龄进行分组
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "aggsAge": {
                "terms": {
                    "field": "age",
                    "size": 10
                }
            }
        },
        "size": 0
    }

    order

    分组聚合排序

    火箭队根据年龄进行分组,分组信息通过年龄从大到小排序 (通过指定字段)
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "aggsAge": {
                "terms": {
                    "field": "age",
                    "size": 10,
                    "order": {
                        "_key": "desc"
                    }
                }
            }
        },
        "size": 0
    }
    
    
    火箭队根据年龄进行分组,分组信息通过文档数从大到小排序 (通过文档数)
    POST:localhost:9200/nba/_search
    {
        "query": {
            "term": {
                "teamNameEn": {
                    "value": "Rockets"
                }
            }
        },
        "aggs": {
            "aggsAge": {
                "terms": {
                    "field": "age",
                    "size": 10,
                    "order": {
                        "_count": "desc"
                    }
                }
            }
        },
        "size": 0
    }
    
    每支球队按该队所有球员的平均年龄进行分组排序 (通过分组指标值)
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "aggsTeamName": {
                "terms": {
                    "field": "teamNameEn",
                    "size": 30,
                    "order": {
                        "avgAge": "desc"
                    }
                },
                "aggs": {
                    "avgAge": {
                        "avg": {
                            "field": "age"
                        }
                    }
                }
            }
        },
        "size": 0
    }

    筛选分组聚合

    湖人和火箭队按球队平均年龄进行分组排序 (指定值列表)
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "aggsTeamName": {
                "terms": {
                    "field": "teamNameEn",
                    "include": [
                        "Lakers",
                        "Rockets",
                        "Warriors"
                    ],
                    "exclude": [
                        "Warriors"
                    ],
                    "size": 30,
                    "order": {
                        "avgAge": "desc"
                    }
                },
                "aggs": {
                    "avgAge": {
                        "avg": {
                            "field": "age"
                        }
                    }
                }
            }
        },
        "size": 0
    }
    
    湖人和火箭队按球队平均年龄进行分组排序 (正则表达式匹配值)
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "aggsTeamName": {
                "terms": {
                    "field": "teamNameEn",
                    "include": "Lakers|Ro.*|Warriors.*",
                    "exclude": "Warriors",
                    "size": 30,
                    "order": {
                        "avgAge": "desc"
                    }
                },
                "aggs": {
                    "avgAge": {
                        "avg": {
                            "field": "age"
                        }
                    }
                }
            }
        },
        "size": 0
    }

    Range Aggregation

    范围分组聚合

    NBA球员年龄按20,20-35,35这样分组
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "ageRange": {
                "range": {
                    "field": "age",
                    "ranges": [
                        {
                            "to": 20
                        },
                        {
                            "from": 20,
                            "to": 35
                        },
                        {
                            "from": 35
                        }
                    ]
                }
            }
        },
        "size": 0
    }
    
    NBA球员年龄按20,20-35,35这样分组 (起别名)
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "ageRange": {
                "range": {
                    "field": "age",
                    "ranges": [
                        {
                            "to": 20,
                            "key": "A"
                        },
                        {
                            "from": 20,
                            "to": 35,
                            "key": "B"
                        },
                        {
                            "from": 35,
                            "key": "C"
                        }
                    ]
                }
            }
        },
        "size": 0
    }

    Date Range Aggregation

    时间范围分组聚合

    NBA球员按出生年月分组
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "birthDayRange": {
                "date_range": {
                    "field": "birthDay",
                    "format": "MM-yyy",
                    "ranges": [
                        {
                            "to": "01-1989"
                        },
                        {
                            "from": "01-1989",
                            "to": "01-1999"
                        },
                        {
                            "from": "01-1999",
                            "to": "01-2009"
                        },
                        {
                            "from": "01-2009"
                        }
                    ]
                }
            }
        },
        "size": 0
    }

    Date Histogram Aggregation

    时间柱状图聚合:按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day(1d), hour (1h), minute (1m), second (1s) 间隔聚合

    NBA球员按出生年分组
    POST:localhost:9200/nba/_search
    {
        "aggs": {
            "birthday_aggs": {
                "date_histogram": {
                    "field": "birthDay",
                    "format": "yyyy",
                    "interval": "year"
                }
            }
        },
        "size": 0
    }

    ES 之 query_string 查询

    query_string 查询,如果熟悉 lucene 的查询语法,我们可以直接用 lucene 查询语法写一个查询串进行查询,ES 中接到请求后,通过查询解析器,解析查询串生成对应的查询。

    指定单个字段查询

    POST:localhost:9200/nba/_search
    {
        "query": {
            "query_string": {
                "default_field": "displayNameEn",
                "query": "james OR curry"
            }
        },
        "size": 100
    }
    
    {
        "query": {
            "query_string": {
                "default_field": "displayNameEn",
                "query": "james AND harden"
            }
        },
        "size": 100
    }

    指定多个字段查询

    POST:localhost:9200/nba/_search
    {
        "query": {
            "query_string": {
                "fields": [
                    "displayNameEn",
                    "teamNameEn"
                ],
                "query": "James AND Rockets"
            }
        },
        "size": 100
    }
  • 相关阅读:
    简析时序数据库 InfluxDB
    tensorflow_1.x(四):线性回归问题初步(准备数据、构建模型、训练模型、进行预测)
    (二) 差分隐私直观理解
    (一) 差分隐私
    (四)PyTorch 的 torch.backends.cudnn.benchmark
    (三)PyTorch 的 Autograd
    (二)PyTorch 中的 tensor 及使用
    (一)PyTorch 中的 ModuleList 和 Sequential
    文本分类(六):不平衡文本分类,Focal Loss理论及PyTorch实现
    tensorflow_1.x(三):Tensorflow2入门(基础、张量、常量与变量、变量的赋值、计算模型、图执行模式、兼容1.0、会话、变量、占位符、feed提交数据)
  • 原文地址:https://www.cnblogs.com/jwen1994/p/12639827.html
Copyright © 2020-2023  润新知