一 。使用命令搜索
1》uri搜索(参考https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html)
uri搜索表示将查询以及操作的动作置于uri参数中
为了方便搜索 添加测试数据(/root/my.json)
{"index":{"_id":"1"}}
{"id":"1","country":"美国","provice":"加利福尼亚州","city":"旧金山","age":"30","name":"John","desc":"John is come from austrina John,s Dad is Johh Super"}
{"index":{"_id":"2"}}
{"id":"2","country":"美国","provice":"加利福尼亚州","city":"好莱坞","age":"40","name":"Mike","desc":"Mike is come from austrina Mike,s Dad is Mike Super"}
{"index":{"_id":"3"}}
{"id":"3","country":"美国","provice":"加利福尼亚州","city":"圣地牙哥","age":"50","name":"Cherry","desc":"Cherry is come from austrina Cherry,s Dad is Cherry Super"}
{"index":{"_id":"4"}}
{"id":"4","country":"美国","provice":"德克萨斯州","city":"休斯顿","age":"60","name":"Miya","desc":"Miya is come from austrina Miya,s Dad is Miya Super"}
{"index":{"_id":"5"}}
{"id":"5","country":"美国","provice":"德克萨斯州","city":"大学城","age":"70","name":"fubos","desc":"fubos is come from austrina fubos,s Dad is fubos Super"}
{"index":{"_id":"6"}}
{"id":"6","country":"美国","provice":"德克萨斯州","city":"麦亚伦","age":"20","name":"marry","desc":"marry is come from austrina marry,s Dad is marry Super"}
{"index":{"_id":"7"}}
{"id":"7","country":"中国","provice":"湖南省","city":"长沙市","age":"18","name":"张三","desc":"张三来自长沙市 是公务员一名"}
{"index":{"_id":"8"}}
{"id":"8","country":"中国","provice":"湖南省","city":"岳阳市","age":"15","name":"李四","desc":"李四来自岳阳市 是一名清洁工"}
{"index":{"_id":"9"}}
{"id":"9","country":"中国","provice":"湖南省","city":"株洲市","age":"33","name":"李光四","desc":"李光四 老家岳阳市 来自株洲 是李四的侄子"}
{"index":{"_id":"10"}}
{"id":"10","country":"中国","provice":"广东省","city":"深圳市","age":"67","name":"王五","desc":"王五来自深圳市 是来自深圳的一名海关缉私精英"}
{"index":{"_id":"11"}}
{"id":"11","country":"中国","provice":"广东省","city":"广州市","age":"89","name":"王冠宇","desc":"王冠宇是王五的儿子"}
使用bulkapi 来批量导入cd /root && curl -XPOST '192.168.58.147:9200/user/info/_bulk?pretty' --data-binary @my.json
可以使用 _search api来实现文档的搜索 比如 (q=查询的列:值)
[root@node1 ~]# curl -XGET 'http://192.168.58.147:9200/_search?q=name=marry&pretty'
{
"took" : 42,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.6127073,
"hits" : [
{
"_index" : "user",
"_type" : "info",
"_id" : "6",
"_score" : 1.6127073,
"_source" : {
"id" : "6",
"country_s" : "美国",
"provice_s" : "德克萨斯州",
"city_s" : "麦亚伦",
"age_i" : "20",
"name_s" : "marry",
"desc_s" : "marry is come from austrina marry,s Dad is marry Super"
}
}
]
}
}
查询所有用户索引中 年龄是50的curl -XGET 'http://192.168.58.147:9200/user/_search?q=age:50&pretty'
查询所有用户索引库中 用户信息类型 用户资产类型 年龄是30的 多个索引或者多个类型使用,隔开
curl -XGET 'http://192.168.58.147:9200/student/info,money/_search?q=age:30&pretty'
查询所有索引库中(_all)类型是info的年龄是30的用户
curl -XGET 'http://192.168.58.147:9200/_all/info/_search?q=age:30&pretty'
使用from指定从第几条开始查 size返回的结果数 默认from=0 size=10
curl -XGET 'http://192.168.58.147:9200/user/_search?q=age:30&from=0&size=2&pretty'
常用的查询参数如下:
Name | Description |
---|---|
q |
表示查询字符串 |
df |
在查询中,当没有定义字段的前缀的情况下的默认字段前缀 |
analyzer |
当分析查询字符串时,分析器的名字 |
default_operator |
被用到的默认操作,有AND 和OR 两种,默认是OR |
explain |
对于每一个命中(hit),对怎样得到命中得分的计算给出一个解释 |
_source |
将其设置为false,查询就会放弃检索_source 字段。你也可以通过设置_source_include 和_source_exclude 检索部分文档 |
fields |
命中的文档返回的字段 |
sort |
排序执行。可以以fieldName 、fieldName:asc 或者fieldName:desc 的格式设置。fieldName 既可以是存在的字段,也可以是_score 字段。可以有多个sort参数 |
track_scores |
当排序的时候,将其设置为true,可以返回相关度得分 |
timeout |
默认没有timeout |
from |
默认是0 |
size |
默认是10 |
search_type |
搜索操作执行的类型,有dfs_query_then_fetch , dfs_query_and_fetch , query_then_fetch , query_and_fetch , count , scan 几种,默认是query_then_fetch |
lowercase_expanded_terms |
terms是否自动小写,默认是true |
analyze_wildcard |
是否分配通配符和前缀查询,默认是false |
terminate_after |
The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, the response will have a boolean field terminated_early to indicate whether the query execution has actually terminated_early. Defaults to no terminate_after. |
指定使用中文名查询时 都无法查询数据 应该是分词器 分词的问题
curl -XGET 'http://192.168.58.147:9200/user/info/_search?q=name=张&pretty'
curl -XGET 'http://192.168.58.147:9200/user/info/_search?q=name=张三&pretty'
测试默认分词器的分词功能[root@node1 ~]# curl -XPOST 'http://192.168.58.147:9200/_analyze?pretty' -d '
> {
> "tokenizer": "standard",
> "text": "我是饺子"
> }';
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "饺",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "子",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}
]
}
既然是逐个分词 q=name:张 应该可以查出来啊 奇怪 后来想到http协议有可能在url上带参数 中文乱码 不使用url方式 使用requestbody的方式试试
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"term" : { "name" : "三" }
}
}'
发现这样搜索 就是搜索张或者三都能出结果 输入张三无法出结果 确实是standard分词器的特点实际上 张三 就是一个词 这里推荐使用ik分词器的分词 分成的词 具有一定的意义(比如我是中国人 拆分成 我 是 中国人 中国等有意义的词)
设置ik分词器 (IK插件地址 https://github.com/medcl/elasticsearch-analysis-ik)
我这里5.6.4的es下载对应 5.6.4的ikcd /home/es/elasticsearch-5.6.4 &&
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.4/elasticsearch-analysis-ik-5.6.4.zip
安装完成后 检查plugins目录[es@node1 elasticsearch-5.6.4]$ cd plugins/
[es@node1 plugins]$ ll
total 4
drwxr-xr-x 2 es es 4096 Dec 5 18:49 analysis-ik
重启 elasticsearch ./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name -Enetwork.host=192.168.58.147
ik 带有两个分词器 ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有
测试
[es@node1 plugins]$ curl -XGET 'http://192.168.58.147:9200/_analyze?pretty&analyzer=ik_max_word' -d '我是中国人'
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中国",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "国人",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
}
]
}
[es@node1 plugins]$ curl -XGET 'http://192.168.58.147:9200/_analyze?pretty&analyzer=ik_smart' -d '我是中国人'
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}
删除索引user 重新创建索引 user curl -XDELETE 'http://192.168.58.147:9200/user?pretty'
curl -XPUT 'http://192.168.58.147:9200/user?pretty' -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
}
}
}
},
"mappings" : {
"info" : {
"dynamic" : true,
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "ik_max_word"
},"desc" : {
"type" : "string",
"analyzer" : "ik_max_word"
}
}
}
}
}';
上面的创建 表示设置 user的分词器是ik mapping用于设置 类型和属性的类型 以及他的分词器info类型下的name_s和desc_s使用分词器
再次进入/root导入之前的数据文件
cd /root && curl -XPOST '192.168.58.147:9200/user/info/_bulk?pretty' --data-binary @my.json
测试分词器的效果 发现张 无法查询了 张三才能查询
[root@node1 ~]# curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"term" : { "name" : "张" }
}
}';
{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
2》请求体搜索
》》term 词查询
中文搜索只能使用请求体搜索 否则无法搜索 所以一般搜索都是使用请求体 简单测试使用uri搜索 一般请求体中使用dsl语言 比如
查询所有数据
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
}
}';
查询包含词 name字段=三的"query" : {
"term" : { "name" : "三" }
}
和query同级 支持以下参数
Name | Description |
---|---|
timeout |
默认没有timeout |
from |
默认是0 |
size |
默认是10 |
search_type |
搜索操作执行的类型,有dfs_query_then_fetch , dfs_query_and_fetch , query_then_fetch , query_and_fetch , count , scan 几种,默认是query_then_fetch |
query_cache |
当?search_type=count 时,查询结果是否缓存 |
terminate_after |
The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, the response will have a boolean field terminated_early to indicate whether the query execution has actually terminated_early. Defaults to no terminate_after. |
from和size 设置分页 order设置排序(注意请求体的参数不要用tab键代替空格 否则出错)
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"from":0,
"size":5,
"sort":[
{"_score":{"order":"desc"}},
{"age":{"order":"desc"}}
],
"query" : {
"term" : { "desc" : "来自" }
}
}';
按照每个document的得分降序 得分相同的数据再按照age降序排序 我之前导入数据时 age使用的""表示是字符串 一般都会抛出异常"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on
表示默认的字符串 不允许排序或者聚合 应该将fielddata设置为true 修改之前创建的映射 添加一个age将fielddata设置为true"properties" : {
"name" : {
"type" : "string",
"analyzer" : "ik_max_word"
},"desc" : {
"type" : "string",
"analyzer" : "ik_max_word"
},"age" : {
"type" : "string",
"fielddata" : true
}
}
或者 type改成integer"age" : {
"type" : "integer"
}
_source指定显示哪些列curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"term" : { "desc" : "来自" }
},
"_source":["name", "desc"]
}';
》》terms 词查询
terms和term功能类似 允许搜索多个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"terms" : {
"desc" : ["来自","com"]
}
}
}';
》》match查询
对比term和match区别curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"term" : { "desc" : "李 来自" }
}
}';
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"match" : { "desc" : "李 来自" }
}
}';
term会认为 李 来自是一个词 去匹配 所有查不到任何数据match 将包含李 和来自 两个词的数据合并查询
还有另外一个match_phrase和term是一样的认为是一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": { "match_phrase": { "desc" : "李 来自" } }
}'
》》bool查询bool查询可以组合查询 比如 must必须两个条件都满足 必须有李和来自两个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "desc": "李" } },
{ "match": { "desc": "来自" } }
]
}
}
}'
should 出现任意一个词即可curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"bool": {
"should": [
{ "match": { "desc": "李" } },
{ "match": { "desc": "来自" } }
]
}
}
}';
must_not必须没有某些词curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"bool": {
"must_not": [
{ "match": { "desc": "李" } },
{ "match": { "desc": "来自" } }
]
}
}
}';
也可以组合使用 必须出现词李 不能出现词 来自curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"bool": {
"must_not": [
{ "match": { "desc": "李" } }
],
"must":[{ "match": { "desc": "来自" } }]
}
}
}';
》》regexp正则查询
支持正则匹配模式 例如
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"regexp" : { "desc" : "李.*" }
}
}';
》》prefix前缀查询以什么词开头 必须是一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"prefix" : { "desc" : "李" }
}
}';
》》multi_match多字段查询
多个字段等于同一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"multi_match" : {
"query" : "李" ,
"fields":["name","desc"]
}
}
}';
》》range范围过滤范围操作符包含:
- gt :: 大于
- gte:: 大于等于
- lt :: 小于
- lte:: 小于等于
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"range" : {
"age" : {"gt":20}
}
}
}';
假设需要判断 age>=20 and age<=30使用range
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query": {
"range" : {
"age" : {"gte":20,"lte":"30"}
}
}
}';
可是使用boolcurl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query":{
"bool":{
"must": {
"range" : {
"age" : {"gte":20}
}
},
"must": {
"range" : {
"age" : {"lte":30}
}
}
}
}
}';
》》exists 和 missing 过滤 exists 和 missing 过滤可以用于查找文档中是否包含指定字段或没有某个字段
比如 查询所有带有age字段的doc
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query":{
"exists":{"field":"age"}
}
}';
查询所有没有 age字段的docmiising已经过期 替代的是使用bool+exists
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query":{
"bool":{
"must_not": {
"exists" : {
"field":"age"
}
}
}
}
}';
3》过滤
每次的搜索返回的结果中都有一个_score的字段 这个得分是指定的搜索查询匹配程度的一个相对度量。得分越高,文档越相关,
得分越低文档的相关度越低。
Elasticsearch中的所有的查询都会触发相关度得分的计算。对于那些我们不需要相关度得分的场景下,Elasticsearch以过滤器的形式提供了另一种查询功能。过滤器在概念上类似于查询,但是它们有非常快的执行速度,这种快的执行速度主要有以下两个原因:
过滤器不会计算相关度的得分,所以它们在计算上更快一些过滤器可以被缓存到内存中,这使得在重复的搜索查询上,其要比相应的查询快出许多。
比如查询是否存在字段age的doc 可以用query 也可以使用过滤 过滤放在bool中
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query":{
"bool":{
"filter": {
"exists" : {
"field":"age"
}
}
}
}
}';
返回的结果中 _score都为0 比如{
"_index" : "user",
"_type" : "info",
"_id" : "7",
"_score" : 0.0,
"_source" : {
"id" : "7",
"country" : "中国",
"provice" : "湖南省",
"city" : "长沙市",
"age" : "18",
"name" : "张三",
"desc" : "张三来自长沙市 是公务员一名"
}
}
4》聚合 默认数字类型 比如 integer float long都可以直接聚合 如果是字符串需要聚合 必须设置属性 fielddata=true 假设我使用国家进行聚合(group by)
首先修改 mapping 添加映射(删除索引 重建 重新导入数据)
"country" : {
"type" : "string",
"analyzer" : "ik_max_word",
"fielddata":true
},"provice" : {
"type" : "string",
"analyzer" : "ik_max_word",
"fielddata":true
},"city" : {
"type" : "string",
"analyzer" : "ik_max_word",
"fielddata":true
}
执行简单的聚合 (按某个字段分组 每个分组有多少个元素) 注意指定分词器 否则 逐个单词来分组curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
}
}
}
}';
最后聚合结果为:
"aggregations" : {
"group_by_country" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "美国",
"doc_count" : 6
},
{
"key" : "中国",
"doc_count" : 5
}
]
}
}
类似于sql语句SELECT country as key,COUNT(*) as doc_count from user_info GROUP BY country ORDER BY COUNT(*) DESC
上面的group_by_country是给当前分组取一个名字 可以多个分组 下面结果会出现两个分组结果 互不影响curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
}
},
"group_by_provice": {
"terms": {
"field": "provice"
}
}
}
}';
上面size:0表示只是获取分组的结果 类似于solr中的facets
如果设置了size:数字 返回结果中 还会带出具体分组的具体数据 类似于solr的聚合
es的聚合支持嵌套 比如 按国家分组后 再按城市分组 (安装国家分组 各个分组内部的数据 再按照城市分组 )
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
},
"aggs":{
"group_by_age":{
"terms":{
"field":"city"
}
}
}
}
}
}';
也可以在嵌套的聚合中使用聚合函数获取平均值 最大值 或者最小值分组获取每组平均年龄 按照平均年龄大到小排序 支持max min sum 等聚合
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
},
"aggs":{
"avg_age":{
"avg":{
"field":"age"
}
}
}
}
}
}';
支持 按照范围值进行分组 比如按照国家分组后 按照每个国家的年龄段分组curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
},
"aggs":{
"group_by_age_range":{
"range": {
"field": "age",
"ranges": [
{
"from": 10,
"to": 20
},
{
"from": 20,
"to": 40
},
{
"from": 40,
"to": 100
}
]
}
}
}
}
}
}';
二 。使用命令搜索
java操作 搜索的api
package es;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.List;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.StringTerms;
import org.elasticsearch.search.aggregations.bucket.terms.StringTerms.Bucket;
import org.elasticsearch.search.aggregations.metrics.avg.Avg;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
/**
* 之前的 /user/info数据为例
* @author jiaozi
*
*/
public class Search {
/**
* 查询所有的 的doc
* [root@node1 ~]# curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
> {
> "query":{}
> }'
{
"hits" : {
"total" : 1,
"max_score" : 0.5377023,
"hits" : [
{
"_index" : "user",
"_type" : "info",
"_id" : "7",
"_score" : 0.5377023,
"_source" : {
"id" : "7",
"country" : "中国",
"provice" : "湖南省",
"city" : "长沙市",
"age" : "18",
"name" : "张三",
"desc" : "张三来自长沙市 是公务员一名"
}
}
]
}
}
上面的结果是 hits里有多个hits 代码也是一样 hits里有source
*/
public static void search(){
SearchResponse searchResponse = client.prepareSearch("user")
.setTypes("info")
.addSort("age", SortOrder.ASC)
.setFrom(0)
.setSize(5)
.get();
SearchHit[] hits = searchResponse.getHits().getHits();
for (int i = 0; i < hits.length; i++) {
System.out.println(hits[i].getSource());
}
}
/**
* curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query" : {
"term" : { "desc" : "来自" }
},
"_source":["name", "desc"]
}';
*/
public static void searchTerms(){
SearchResponse searchResponse = client.prepareSearch("user")
.setTypes("info")
.addSort("age", SortOrder.ASC)
//QueryBuilders可以指定其他 比如 match或者match_phase或者 regexp 或者prefix或者其他
.setQuery(QueryBuilders.termQuery("desc", "来自")) //搜索 有得分
//.setQuery(QueryBuilders.rangeQuery("age").lte(30)) //范围查询
//.setQuery(QueryBuilders.regexpQuery("desc", "来自"))//正则
//.setQuery(QueryBuilders.prefixQuery("desc", "张三"))//前缀
//.setQuery(QueryBuilders.matchQuery("desc", "张三 来自"))//match
//.existsQuery("desc") //exists 是否存在字段
//.setPostFilter(QueryBuilders.termQuery("desc", "来自")) //过滤没得分
.setFrom(0)
.setSize(5)
.get();
SearchHit[] hits = searchResponse.getHits().getHits();
for (int i = 0; i < hits.length; i++) {
System.out.println(hits[i].getSource());
}
}
/**
* 等价于
* curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"query":{
"bool":{
"must": {
"range" : {
"age" : {"gte":20}
}
},
"must": {
"range" : {
"age" : {"lte":30}
}
}
}
}
}';
*/
public static void searchBool(){
SearchResponse searchResponse = client.prepareSearch("user")
.setTypes("info")
.setQuery(QueryBuilders
.boolQuery()
.must(QueryBuilders.rangeQuery("age").lte(40))
.must(QueryBuilders.rangeQuery("age").gte(20))
//.mustNot(queryBuilder)
//.should(queryBuilder)
)
.get();
SearchHit[] hits = searchResponse.getHits().getHits();
for (int i = 0; i < hits.length; i++) {
System.out.println(hits[i].getSource());
}
}
/**
* 聚合
* curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
}
}
}
}';
*/
public static void aggs(){
SearchResponse searchResponse = client.prepareSearch("user")
.addAggregation(AggregationBuilders.terms("group_by_country").field("country"))
.setSize(4)
.get();
StringTerms terms = searchResponse.getAggregations().get("group_by_country");
List<Bucket> buckets = terms.getBuckets();
for (int i = 0; i < buckets.size(); i++) {
Bucket bucket = buckets.get(i);
System.out.println(bucket.getKey()+"----"+bucket.getDocCount());
}
SearchHit[] hits = searchResponse.getHits().getHits();
for (int i = 0; i < hits.length; i++) {
System.out.println(hits[i].getSource());
}
}
/**
* curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_country": {
"terms": {
"field": "country"
},
"aggs":{
"avg_age":{
"avg":{
"field":"age"
}
}
}
}
}
}';
*/
public static void aggsAvgs(){
SearchResponse searchResponse = client.prepareSearch("user")
.addAggregation(AggregationBuilders
.terms("group_by_country")
.field("country")
.subAggregation(AggregationBuilders.avg("avg_age").field("age"))
)
.get();
StringTerms terms = searchResponse.getAggregations().get("group_by_country");
List<Bucket> buckets = terms.getBuckets();
for (int i = 0; i < buckets.size(); i++) {
Bucket bucket = buckets.get(i);
Avg st=bucket.getAggregations().get("avg_age");
System.out.println(bucket.getKey()+"----"+bucket.getDocCount());
System.out.println(st.getValue());
}
}
public static void main(String[] args) {
aggsAvgs();
}
static TransportClient client;
static{
try {
client=getClient();
} catch (UnknownHostException e) {
e.printStackTrace();
}
}
/**
* 获取客户端操作对象
* @return
* @throws UnknownHostException
*/
public static TransportClient getClient() throws UnknownHostException{
Settings settings = Settings.builder()
.put("cluster.name", "my_cluster_name")
//.put("index.analysis.analyzer.default.type","ik_max_word")
.build();
TransportClient client = new PreBuiltTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.58.147"), 9300));
return client;
}
}