集群和节点: 节点(node) 是一个运行着的Elasticsearch 实例, 集群(cluster)是一组具有相同cluster.name 的节点集合可以组成一个集群。 你最好找一个合适的名字带替换cluster.name的默认值,这样可以防止一个新启动的节点加入到相同的网络中 cluster.name: es_cluster node.name: node01 path.data: /elk/elasticsearch/data path.logs: /elk/elasticsearch/logs network.host: 192.168.32.80 network.port: 9200 discovery.zen.ping.unicast.hosts: ["192.168.32.80", "192.168.32.81"] http://192.168.32.81:9200/_count?pretty/ GET { "query": { "match_all": {} } } 返回: { "count": 500, "_shards": { "total": 21, "successful": 21, "failed": 0 } } 面向文档: Relational DB -> Databases -> Tables -> Rows -> Columns Elasticsearch -> Indices -> Types -> Documents -> Fields 索引->类型->文档->字段 Elasticsearch集群可以包含多个索引,每个索引可以包含多个类型的(type), 每个类型包含多个文档,然后每个文档包含多个字段 所以为了创建员工目录,我们将进行如下操作: 1.为每个员工的文档(document)建立索引,每个文档包含了相应员工的所有信息。 2.每个文档的类型为 employee 。 3.employee 类型归属于索引 megacorp 。 4.megacorp 索引存储在Elasticsearch集群中。 http://192.168.32.81:9200/megacorp/employee/1/ PUT { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } 我们看到path: /megacorp/employee/1 包含三部分信息: 名字 说明 megacorp 索引名 employee 类型名 1 这个员工的ID 让我们在目录中加入更多额员工信息: PUT /megacorp/employee/2 { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] } PUT /megacorp/employee/3 { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } Elasticsearch集群可以包含多个索引 检索文档: http://192.168.32.80:9200/megacorp/employee/1/ GET { "_index": "megacorp", "_type": "employee", "_id": "1", "_version": 1, "found": true, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports" , "music" ] } } 我们通过HTTP 方法get来检索文档,同样的,我们可以使用DELETE 方法删除文档, 使用HEAD 方法检索某文档是否存在。如果想要更新已存在的文档,我们只需要PUT一次。 简单搜索: GET 请求非常简单---你能轻松获取你想要的文档,让我们来进一步尝试一些东西,比如简单的搜索! 我们尝试一个最简单的搜索全部员工的请求: http://192.168.32.80:9200/megacorp/employee/_search/ 你可以看到我们依然使用megacorp 索引和employee 索引,但是我们在结尾使用关键字_search 来 取代原来的文档ID.响应内部的hits 数组包含了我们所有的三个文档,默认情况下搜索返回前10个结果 接下来,让我们搜索姓氏包含"Smith"的员工,要做到这一点,我们将在命令行中使用轻量级的搜索方法。 这种方法被称作查询字符串(query string)搜索,因为我们像传递URL参数一样去传递查询语句 curl localhost:9200/films/md/_search?q=tag:good demo:/root# curl http://192.168.32.81:9200/megacorp/employee/_search?q=last_name:lee {"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits": {"total":1,"max_score":0.30685282,"hits": [{"_index":"megacorp","_type":"employee","_id":"3","_score":0.30685282,"_source": {"first_name":"Jane","last_name":"lee","age":32,"about":"I like to collect rock albums","interests":["music"]}}]}} demo:/root# http://192.168.32.81:9200/megacorp/employee/_search/ ?q=last_name:lee GET { "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 0.30685282, "_source": { "first_name": "Jane", "last_name": "lee", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } 使用DSL语句查询: DSL 以JSON 请求体的形式出现,我们可以这样表示之前关于“Smith”的查询: 必须POST 请求: http://192.168.32.81:9200/megacorp/employee/_search/ POST { "query" : { "match" : { "last_name" : "Smith" } } } 更复杂的搜索: 我们让搜索稍微改变的复杂一些,我们依旧像要找到姓氏为"Smith"的员工,但是我们只想得到 年龄大于30岁的员工。 我们的语句将添加过滤器(filter),它是得我们高效率的执行一个结果话的检索: http://192.168.32.81:9200/megacorp/employee/_search/ POST { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } } }, "query" : { "match" : { "last_name" : "smith" } } } } } 返回: { "took": 29, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 0.30685282, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } <1> 这部分查询属于区间过滤器(range filter),它用于查找所有年龄大于30岁的数据 <2> 这部分查询与之前的 match 语句(query)一致。 全文搜索: 到目前为止搜索都很简单:搜索特定的名字,通过年龄筛选。让我们尝试一种更高级的搜索, 全文搜索---一种传统数据库很难实现的功能。 http://192.168.32.80:9200/megacorp/employee/_search/ POST { "query" : { "match" : { "about" : "rock climbing" } } } 返回: { "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.16273327, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports" , "music" ] } } , { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } , { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "lee", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }默认情况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性 评分」就是文档与查询条件的匹配程度。很显然,排名第一的 John Smith 的 about 字段明确 的写到“rock climbing”。 但是为什么 Jane Smith 也会出现在结果里呢?原因是“rock”在她的 abuot 字段中被提及了。 因为只有“rock”被提及而“climbing”没有,所以她的 _score 要低于John。 这个例子很好的解释了Elasticsearch如何在各种文本字段中进行全文搜索,并且返回相关性 最大的结果集。相关性(relevance)的概念在Elasticsearch中非常重要,而这个概念在传统关 系型数据库中是不可想象的,因为传统数据库对记录的查询只有匹配或者不匹配 短语搜索: 目前我们可以在字段搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases). 例如我们想要查询同时包含"rock" 和"combing"(并且是相邻的)员工记录。 要做到这个,我们只要将match查询变更为match_phrase查询既可: http://192.168.32.80:9200/megacorp/employee/_search/ POST { "query" : { "match_phrase" : { "about" : "rock climbing" } } } 查询 {"query":{"match_all":{}}} 易读 结果转换器? 重复请求 显示选项? { "took": 15, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports" , "music" ] } } ] } } 分析; 最后,我们还有一个需求需要完成:允许管理者在职员中进行分析。 Elasticsearch 有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的 GROUP BY 但是功能更强大。 http://192.168.32.80:9200/megacorp/employee/_search/ POST { "aggs": { "all_interests": { "terms": { "field": "interests" } } } { "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Douglas", "last_name": "Fir", "age": 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } } , { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports" , "music" ] } } , { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Jane", "last_name": "lee", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] }, "aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "music", "doc_count": 2 } , { "key": "forestry", "doc_count": 1 } , { "key": "sports", "doc_count": 1 } ] } } } 我们可以看到两个职员对音乐有兴趣,一个喜欢林学,一个喜欢运动。这些数据并没有被预 先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有 姓"Smith"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可: /megacorp/employee/3 { "first_name" : "Douglas", "last_name" : "smith", "age" : 35, "about": "I like to build cabinets", "interests": [ "music" ] } http://192.168.32.80:9200/megacorp/employee/_search/ POST { "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } } http://192.168.32.80:9200/megacorp/employee/_search/ POST { "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } } 聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄: 分布式的特性; Elasticsearch致力于隐藏分布式系统的复杂性。以下这些操作都是在底层自动完成的: 将你的文档分区到不同的容器或者分片(shards)中,它们可以存在于一个或多个节点 中。 将分片均匀的分配到各个节点,对索引和搜索做负载均衡。 冗余每一个分片,防止硬件故障造成的数据丢失。 将集群中任意一个节点上的请求路由到相应数据所在的节点。 无论是增加节点,还是移除节点,分片都可以做到无缝的扩展和迁移。