接续上篇,本篇使用python的elasticsearch-dsl库操作elasticsearch进行查询。
7.查询
Elasticsearch是功能非常强大的搜索引擎,使用它的目的就是为了快速的查询到需要的数据。
查询分类:
- 基本查询:使用es内置查询条件进行查询
- 组合查询:把多个查询组合在一起进行复合查询
- 过滤:查询同时,通过filter条件在不影响打分的情况下筛选数据
7.1、基本查询
-
- 查询前先创建一张表
1 PUT chaxun 2 { 3 "mappings": { 4 "job":{ 5 "properties": { 6 "title":{ 7 "store": true, 8 "type": "text", 9 "analyzer": "ik_max_word" 10 }, 11 "company_name":{ 12 "store": true, 13 "type": "keyword" 14 }, 15 "desc":{ 16 "type": "text" 17 }, 18 "comments":{ 19 "type":"integer" 20 }, 21 "add_time":{ 22 "type":"date", 23 "format": "yyyy-MM-dd" 24 } 25 } 26 } 27 } 28 }
表截图:
- 查询前先创建一张表
-
- match查询
1 GET chaxun/job/_search 2 { 3 "query": { 4 "match": { 5 "title": "python" 6 } 7 } 8 }
1 s = Search(index='chaxun').query('match', title='python') 2 response = s.execute()
- term查询
term查询不会对查询条件进行解析(分词)
1 GET chaxun/job/_search 2 { 3 "query": { 4 "term":{ 5 "title":"python爬虫" 6 } 7 } 8 }
1 s = Search(index='chaxun').query('term', title='python爬虫') 2 response = s.execute()
- terms查询
1 GET chaxun/job/_search 2 { 3 "query": { 4 "terms":{ 5 "title":["工程师", "django", "系统"] 6 } 7 } 8 }
1 s = Search(index='chaxun').query('terms', title=['django', u'工程师', u'系统']) 2 response = s.execute()
- 控制查询的返回数量
1 GET chaxun/job/_search 2 { 3 "query": { 4 "term":{ 5 "title":"python" 6 } 7 }, 8 "from":1, 9 "size":2 10 }
1 s = Search(index='chaxun').query('terms', title=['django', u'工程师', u'系统'])[0:2] 2 response = s.execute()
- match_all 查询所有
1 GET chaxun/job/_search 2 { 3 "query": { 4 "match_all": {} 5 } 6 }
1 s = Search(index='chaxun').query('match_all') 2 response = s.execute()
- match_phrase短语查询
1 GET chaxun/job/_search 2 { 3 "query": { 4 "match_phrase": { 5 "title": { 6 "query": "python系统", 7 "slop": 3 8 } 9 } 10 } 11 }
1 s = Search(index='chaxun').query('match_phrase', title={"query": u"elasticsearch引擎", "slop": 3}) 2 response = s.execute()
注释:将查询条件“python系统”分词成[“python”, “系统”],结果需同时满足列表中分词短语,“slop”指定分词词距,匹配结果需不超过slop,比如“python打造推荐引擎系统”,如果slop小于6则无法匹配。
- multi_match查询
1 GET chaxun/job/_search 2 { 3 "query": { 4 "multi_match": { 5 "query": "python", 6 "fields": ["title^3", "desc"] 7 } 8 } 9 }
1 q = Q('multi_match', query="python", fields=["title", "desc"]) 2 s = Search(index='chaxun').query(q) 3 response = s.execute()
注释:指定查询多个字段,”^3”指定”title”权重是”desc”的3倍。
- 指定返回字段
1 GET chaxun/job/_search 2 { 3 "stored_fields": ["title", "company_name"], 4 "query": { 5 "match": { 6 "title": "python" 7 } 8 } 9 }
1 s = Search(index='chaxun').query('match', title='python').source(['title', 'company_name']) 2 response = s.execute()
- 通过sort对结果排序
1 GET chaxun/job/_search 2 { 3 "query": { 4 "match_all": {} 5 }, 6 "sort": [ 7 { 8 "comments": { 9 "order": "desc" 10 } 11 } 12 ] 13 }
1 s = Search(index='chaxun').query('match_all').sort({"comments": {"order": "desc"}}) 2 response = s.execute()
- range查询范围
1 GET chaxun/job/_search 2 { 3 "query": { 4 "range": { 5 "comments": { 6 "gte": 10, 7 "lte": 50, 8 "boost": 2.0 --权重 9 } 10 } 11 } 12 }
1 s = Search(index='chaxun').query('range', comments={"gte": 10, "lte": 50, "boost": 2.0}) 2 response = s.execute()
- wildcard查询
1 GET chaxun/job/_search 2 { 3 "query": { 4 "wildcard": { 5 "title": { 6 "value": "pyth*n", 7 "boost": 2 8 } 9 } 10 } 11 }
1 s = Search(index='chaxun').query('wildcard', title={"value": "pyth*n", "boost": 2}) 2 response = s.execute()
- match查询
7.2、组合查询
-
- 新建一张查询表
-
- bool查询
- 格式如下
1 bool:{ 2 "filter":[], 3 "must":[], 4 "should":[], 5 "must_not":[] 6 }
-
- 最简单的filter查询
1 select * from testdb where salary=20
1 GET bool/testdb/_search 2 { 3 "query": { 4 "bool": { 5 "must": { 6 "match_all":{} 7 }, 8 "filter": { 9 "term":{ 10 "salary":20 11 } 12 } 13 } 14 } 15 }
1 s = Search(index='bool').query('bool', filter=[Q('term', salary=20)]) 2 response = s.execute()
- 查看分析器解析(分词)的结果
1 GET _analyze 2 { 3 "analyzer": "ik_max_word", 4 "text": "成都电子科技大学" 5 }
注释:”ik_max_word”,精细分词;”ik_smart”,粗略分词
- bool组合过滤查询
1 select * from testdb where (salary=20 or title=python) and (salary !=30)
1 GET bool/testdb/_search 2 { 3 "query": { 4 "bool": { 5 "should": [ 6 {"term":{"salary":20}}, 7 {"term":{"title":"python"}} 8 ], 9 "must_not": [ 10 {"term":{"salary":30}} 11 ] 12 } 13 } 14 }
1 q = Q('bool', should=[Q('term', salary=20), Q('term', title='python')],must_not=[Q('term', salary=30)]) 2 response = s.execute()
- 嵌套查询
1 select * from testdb where title=python or (title=django and salary=30)
1 GET bool/testdb/_search 2 { 3 "query": { 4 "bool":{ 5 "should":[ 6 {"term":{"title":"python"}}, 7 {"bool":{ 8 "must":[{"term":{"title":"django"}}, 9 {"term":{"salary":30}}] 10 }} 11 ] 12 } 13 } 14 }
1 q = Q('bool', should=[Q('term', title='python'), Q('bool', must=[Q('term', title='django'), Q('term', salary=30)])]) 2 s = Search(index='bool').query(q) 3 response = s.execute()
- 过滤空和非空
- 最简单的filter查询
- 建立测试数据
1 POST null/testdb2/_bulk 2 {"index":{"_id":1}} 3 {"tags":["search"]} 4 {"index":{"_id":2}} 5 {"tags":["search", "python"]} 6 {"index":{"_id":3}} 7 {"other_field":["some data"]} 8 {"index":{"_id":4}} 9 {"tags":null} 10 {"index":{"_id":5}} 11 {"tags":["search", null]}
- 处理null空值的方法
1 select tags from testdb2 where tags is not NULL
1 GET null/testdb2/_search 2 { 3 "query": { 4 "bool":{ 5 "filter": { 6 "exists": { 7 "field": "tags" 8 } 9 } 10 } 11 } 12 }
1 s = Search(index='null').query('bool', filter={"exists": {"field": "tags"}}) 2 response = s.execute()
7.3、聚合查询
未完待续...