• 初识elasticsearch_2(查询和整合springboot)


    初始化

    首先将官网所下载的json文件,放入到es中,采用如下命令:

    curl -H "Content-Type: application/json" -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
    curl 'localhost:9200/_cat/indices?v'
    

    search API

    接下来可以开始查询啦.可以通过2种方式进行查询,分别为将其放在RESTAPI中或者将其放在RESTAPI的请求体中.显然请求体的形式更加具有代表性并且也更加易读/
    先看放在RESTAPI中的,下面的语句查询出了bank索引的所有的文档.

    GET /bank/_search?q=*&sort=account_number:asc&pretty
    

    参数列表代表q=*查询所有,sort=account_number:asc,代表结果按照account_number升序排列,pretty代表将返回结果以格式化JSON的形式输出.
    可以看看返回值,返回值说明写在注释里面:

    {
      "took" : 63,
      // 是否延迟
      "timed_out" : false,
     // 当前搜索的有多少个shards 
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      // 搜索结果
      "hits" : {
        // 符合搜索结果的条数
        "total" : 1000,
        "max_score" : null,
        // 结果的数组,默认显示前10条
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "0",
        // 排序字段
          "sort": [0],
          "_score" : null,
          "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "1",
          "sort": [1],
          "_score" : null,
          "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, ...
        ]
      }
    }
    

    可以采用请求体的方式去请求:

    GET /bank/_search
    {
      "query": { "match_all": {} },
      "sort": [
        { "account_number": "asc" }
      ]
    }
    

    返回的结果是一样的.
    通过增加参数,可以控制返回的结果条数:

    // 展示一条
    GET /bank/_search
    {
      "query": { "match_all": {} },
      "size": 1
    }
    
    // 第10条~第20条
    GET /bank/_search
    {
      "query": { "match_all": {} },
      "from": 10,
      "size": 10
    }
    

    下面的是根据balance进行倒序排列

    GET /bank/_search
    {
      "query": { "match_all": {} },
      "sort": { "balance": { "order": "desc" } }
    }
    

    默认情况下,返回的source是包含所有的数据结构的,如果我们不想返回document的所有的数据结构,可以采用下面的语句:

    GET /bank/_search
    {
      "query": { "match_all": {} },
      "_source": ["account_number", "balance"]
    }
    

    可以看看返回值:

    {
        "took": 11,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 999,
            "max_score": 1,
            "hits": [
                {
                    "_index": "bank",
                    "_type": "account",
                    "_id": "25",
                    "_score": 1,
                    "_source": {
                        "account_number": 25,
                        "balance": 40540
                    }
                }
            ]
        }
    }
    

    接下来可以看看根据字段过滤的,下面的筛选了account_number为20的订单

    GET /bank/_search
    {
      "query": { "match": { "account_number": 20 } }
    }
    

    下面筛选出了地址值包含mill,lane的结果

    GET /bank/_search
    {
      "query": { "match": { "address": "mill lane" } }
    }
    

    如果要筛选包含短语mill lane的呢:

    GET /bank/_search
    {
      "query": { "match_phrase": { "address": "mill lane" } }
    }
    

    紧接着来看看bool查询.
    以下bool查询和上面的查询是一样的,查询出包含短语包含短语mill lane的:

    GET /bank/_search
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    

    Must代表所有的查询都必须返回true.再看看下面的语句:

    GET /bank/_search
    {
      "query": {
        "bool": {
          "should": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    

    should代表这些查询中,当中的一个,必须返回true.
    下面的语句,代表地址中既不能包含mill也不能包含lane:

    GET /bank/_search
    {
      "query": {
        "bool": {
          "must_not": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }
    

    must_not要求查询结果对于所有的query都不满足
    各个条件之间是可以相互组合的,如下:

    GET /bank/_search
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "age": "40" } }
          ],
          "must_not": [
            { "match": { "state": "ID" } }
          ]
        }
      }
    }
    

    我们可以通过过滤器(filter)搜索banalance在20000到30000之间的东西

    GET /bank/_search
    {
      "query": {
        "bool": {
          "must": { "match_all": {} },
          "filter": {
            "range": {
              "balance": {
                "gte": 20000,
                "lte": 30000
              }
            }
          }
        }
      }
    }
    

    注意,must中”match”是不支持gte和lte的.
    分组,注意,es可以在额外返回一个aggressions的数组,可以通过参数说明对返回的数组进行分组.如下所示:

    GET /bank/_search
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state.keyword"
          }
        }
      }
    }
    

    上面的语句大概等同于如下SQL:

    SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
    

    下面的语句计算了按照state分类后,balance的平均值

    GET /bank/_search
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
    

    注意,我们使用了两次aggs,注意,当我们需要对结果进行操作的时候,我们可以使用aggs嵌套的方式去从返回值中提取需要的数据.
    下面是一个演示aggs嵌套的例子:

    GET /bank/_search
    {
      "size": 0,
      "aggs": {
        "group_by_age": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "from": 20,
                "to": 30
              },
              {
                "from": 30,
                "to": 40
              },
              {
                "from": 40,
                "to": 50
              }
            ]
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender.keyword"
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
      }
    }
    

    这行语句的目的主要是先按照年龄段进行分组,在按照性别进行分组,最后取balance的平均值.返回值如下:

    {
      "took": 8,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 999,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "group_by_age": {
          "buckets": [
            {
              "key": "20.0-30.0",
              "from": 20,
              "to": 30,
              "doc_count": 450,
              "group_by_gender": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "M",
                    "doc_count": 231,
                    "average_balance": {
                      "value": 27400.982683982686
                    }
                  },
                  {
                    "key": "F",
                    "doc_count": 219,
                    "average_balance": {
                      "value": 25341.260273972603
                    }
                  }
                ]
              }
            },
            {
              "key": "30.0-40.0",
              "from": 30,
              "to": 40,
              "doc_count": 504,
              "group_by_gender": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "F",
                    "doc_count": 253,
                    "average_balance": {
                      "value": 25670.869565217392
                    }
                  },
                  {
                    "key": "M",
                    "doc_count": 251,
                    "average_balance": {
                      "value": 24288.239043824702
                    }
                  }
                ]
              }
            },
            {
              "key": "40.0-50.0",
              "from": 40,
              "to": 50,
              "doc_count": 45,
              "group_by_gender": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "M",
                    "doc_count": 24,
                    "average_balance": {
                      "value": 26474.958333333332
                    }
                  },
                  {
                    "key": "F",
                    "doc_count": 21,
                    "average_balance": {
                      "value": 27992.571428571428
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    springboot整合elasticsearch

    由于springboot使用的是spring-data-elasticsearch,但是目前这个最高版本对应的es版本没有到5,因此我们使用较低的es版本进行测试.采用的es版本是2.3.2,对应的spring-data-elasticsearch版本为2.1.0,spring-boot版本采用1.5.1,springboot-starter-elasticsearch版本为1.5.1.RELEASE

    • pom.xml
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
                <version>1.5.1.RELEASE</version>
            </dependency>
    
    • application.properties
    # ES
    spring.data.elasticsearch.repositories.enabled = true
    spring.data.elasticsearch.cluster-nodes = 127.0.0.1:9300
    
    • 实体类(Account)

    需要注意的是,indexName,type都不能有大写.否则会报错

    @Document(indexName = "bank",type = "account")
    public class Account implements Serializable{
    
    	@Id
    	private Long id;
    
    	private Integer account_number;
    
    	private Long balance;
    
    	private String firstname;
    
    	private String lastname;
    
    	private Integer age;
    
    	private String gender;
    
    	private String address;
    
    	private String employer;
    
    	private String email;
    
    	private String city;
    
    	private String state;
    
            //  get&set
    }
    
    • 操作es的repository

    非常简单只需要继承即可.

    public interface AccountRepository extends ElasticsearchRepository<Account,Long> {
    
    }
    
    • service

    需要注意的是,在保存的时候,当文档对应的索引没有的时候,es会为我们手动创建,在保存文档的时候需要手动指定id,否则es会将null作为文档的id.

    @Service
    public class AccountServiceEsImpl {
    
    	@Autowired AccountRepository accountRepository;
    
    	/**
    	 * 保存账号
    	 */
    	public Long save(Account account) {
    		Account acountSaved = accountRepository.save(account);
    		return acountSaved.getId();
    	}
    
    	/**
    	 * 根据地址值过滤
    	 * @return
    	 */
    	public List<Account> queryByAddress() {
    		// 根据地址值过滤
    		Pageable page = new PageRequest(0,10);
    		BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
    		queryBuilder.must(QueryBuilders.matchQuery("address","Beijing"));
    		SearchQuery query =
    				new NativeSearchQueryBuilder().withQuery(queryBuilder).withPageable(page).build();
    		Page<Account> pages = accountRepository.search(query);
    		return pages.getContent();
    	}
    }
    
  • 相关阅读:
    cmd 进入mysql 小技巧
    【scikit-learn】交叉验证及其用于參数选择、模型选择、特征选择的样例
    向txt文件中写入换行
    CTabCtrl的使用
    unicode下数据之间的转换
    下载数据库包
    python3.5.1语法
    配置Python+selenium+firefox自动化测试
    使用Tesseract OCR识别验证码
    white的配置使用
  • 原文地址:https://www.cnblogs.com/hlhdidi/p/7976447.html
Copyright © 2020-2023  润新知