• 入门


    集群和节点:
    
    
    节点(node) 是一个运行着的Elasticsearch 实例, 集群(cluster)是一组具有相同cluster.name 的节点集合可以组成一个集群。
    
    
    你最好找一个合适的名字带替换cluster.name的默认值,这样可以防止一个新启动的节点加入到相同的网络中
    
    
    cluster.name: es_cluster
    node.name: node01
    path.data: /elk/elasticsearch/data
    path.logs: /elk/elasticsearch/logs
    network.host: 192.168.32.80
    network.port: 9200
    discovery.zen.ping.unicast.hosts: ["192.168.32.80", "192.168.32.81"]
    
    
    http://192.168.32.81:9200/_count?pretty/
    
                                 GET
    
    {
    "query": {
    "match_all": {}
    }
    }
    
    返回:
    
    {
    
        "count": 500,
        "_shards": {
            "total": 21,
            "successful": 21,
            "failed": 0
        }
    
    }
    
    
    面向文档:
    
    Relational DB -> Databases -> Tables -> Rows -> Columns
    
    Elasticsearch -> Indices -> Types -> Documents -> Fields
    
                     索引->类型->文档->字段
    
    Elasticsearch集群可以包含多个索引,每个索引可以包含多个类型的(type),
    
    每个类型包含多个文档,然后每个文档包含多个字段
    
    所以为了创建员工目录,我们将进行如下操作:
    
    1.为每个员工的文档(document)建立索引,每个文档包含了相应员工的所有信息。
    
    2.每个文档的类型为 employee  。
    
    3.employee  类型归属于索引 megacorp  。
    
    4.megacorp  索引存储在Elasticsearch集群中。
    
    
    
    http://192.168.32.81:9200/megacorp/employee/1/
                                             PUT
    
    {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests": [ "sports", "music" ]
    }
    
    我们看到path: /megacorp/employee/1  包含三部分信息:
    名字 说明
    
    megacorp 索引名
    
    employee 类型名
    
    1        这个员工的ID
    
    
    
    让我们在目录中加入更多额员工信息:
    
    PUT /megacorp/employee/2
    {
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests": [ "music" ]
    }
    
    PUT /megacorp/employee/3
    {
    "first_name" : "Douglas",
    "last_name" : "Fir",
    "age" : 35,
    "about": "I like to build cabinets",
    "interests": [ "forestry" ]
    }
    
    
    
    Elasticsearch集群可以包含多个索引
    
    
    
    检索文档:
    
    http://192.168.32.80:9200/megacorp/employee/1/
                                             GET
    {
    
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_version": 1,
        "found": true,
        "_source": {
            "first_name": "John",
            "last_name": "Smith",
            "age": 25,
            "about": "I love to go rock climbing",
            "interests": [
                "sports"
                ,
                "music"
            ]
        }
    
    }
    
     我们通过HTTP 方法get来检索文档,同样的,我们可以使用DELETE 方法删除文档,
    
    使用HEAD 方法检索某文档是否存在。如果想要更新已存在的文档,我们只需要PUT一次。
    
    
    
    简单搜索:
    
    GET 请求非常简单---你能轻松获取你想要的文档,让我们来进一步尝试一些东西,比如简单的搜索!
    
    我们尝试一个最简单的搜索全部员工的请求:
    
    http://192.168.32.80:9200/megacorp/employee/_search/
    
    
    你可以看到我们依然使用megacorp 索引和employee 索引,但是我们在结尾使用关键字_search 来
    
    取代原来的文档ID.响应内部的hits 数组包含了我们所有的三个文档,默认情况下搜索返回前10个结果
    
    
    接下来,让我们搜索姓氏包含"Smith"的员工,要做到这一点,我们将在命令行中使用轻量级的搜索方法。
    
    这种方法被称作查询字符串(query string)搜索,因为我们像传递URL参数一样去传递查询语句
    
    
    
    curl localhost:9200/films/md/_search?q=tag:good 
    
    demo:/root# curl http://192.168.32.81:9200/megacorp/employee/_search?q=last_name:lee
    {"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":
    
    {"total":1,"max_score":0.30685282,"hits":
    
    [{"_index":"megacorp","_type":"employee","_id":"3","_score":0.30685282,"_source":
    
    {"first_name":"Jane","last_name":"lee","age":32,"about":"I like to collect rock albums","interests":["music"]}}]}}
    
    demo:/root# 
    
    
    http://192.168.32.81:9200/megacorp/employee/_search/
                                             
    ?q=last_name:lee                          GET
    
    {
    
        "took": 7,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "3",
                    "_score": 0.30685282,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "lee",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        }
    
    
    
    
    使用DSL语句查询:
    
    
    DSL 以JSON 请求体的形式出现,我们可以这样表示之前关于“Smith”的查询:
    
    
    必须POST 请求:
    
    http://192.168.32.81:9200/megacorp/employee/_search/
                
                                               POST
    
    {
    "query" : {
    "match" : {
    "last_name" : "Smith"
    }
    }
    }
    
    
    更复杂的搜索:
    
      我们让搜索稍微改变的复杂一些,我们依旧像要找到姓氏为"Smith"的员工,但是我们只想得到
    
    年龄大于30岁的员工。 我们的语句将添加过滤器(filter),它是得我们高效率的执行一个结果话的检索:
    
    
    http://192.168.32.81:9200/megacorp/employee/_search/
         
                                                 POST
    
    
    {
    "query" : {
    "filtered" : {
    "filter" : {
    "range" : {
    "age" : { "gt" : 30 } 
    }
    },
    "query" : {
    "match" : {
    "last_name" : "smith" 
    }
    }
    }
    }
    }
    
    
    返回:
    
    {
    
        "took": 29,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 0.30685282,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "Smith",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        }
    
    }
    
    <1> 这部分查询属于区间过滤器(range filter),它用于查找所有年龄大于30岁的数据
    
    
    <2> 这部分查询与之前的 match  语句(query)一致。
    
    
    
    全文搜索:
    
    
    到目前为止搜索都很简单:搜索特定的名字,通过年龄筛选。让我们尝试一种更高级的搜索,
    
    全文搜索---一种传统数据库很难实现的功能。
    
    
    
    
    http://192.168.32.80:9200/megacorp/employee/_search/
    
                                                 POST
    {
    "query" : {
    "match" : {
    "about" : "rock climbing"
    }
    }
    }
    
    
    返回:
    
    {
    
        "took": 6,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 3,
            "max_score": 0.16273327,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "1",
                    "_score": 0.16273327,
                    "_source": {
                        "first_name": "John",
                        "last_name": "Smith",
                        "age": 25,
                        "about": "I love to go rock climbing",
                        "interests": [
                            "sports"
                            ,
                            "music"
                        ]
                    }
                }
                ,
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 0.016878016,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "Smith",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
                ,
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "3",
                    "_score": 0.016878016,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "lee",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        }
    
    }默认情况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性
    评分」就是文档与查询条件的匹配程度。很显然,排名第一的 John Smith  的 about  字段明确
    的写到“rock climbing”。
    但是为什么 Jane Smith  也会出现在结果里呢?原因是“rock”在她的 abuot  字段中被提及了。
    因为只有“rock”被提及而“climbing”没有,所以她的 _score  要低于John。
    这个例子很好的解释了Elasticsearch如何在各种文本字段中进行全文搜索,并且返回相关性
    最大的结果集。相关性(relevance)的概念在Elasticsearch中非常重要,而这个概念在传统关
    系型数据库中是不可想象的,因为传统数据库对记录的查询只有匹配或者不匹配
    
    
    
    
    
    短语搜索:
    
    
    目前我们可以在字段搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases).
    
    
    例如我们想要查询同时包含"rock" 和"combing"(并且是相邻的)员工记录。
    
    
    要做到这个,我们只要将match查询变更为match_phrase查询既可:
    
    
    
    http://192.168.32.80:9200/megacorp/employee/_search/
     
                                                POST
    {
    "query" : {
    "match_phrase" : {
    "about" : "rock climbing"
    }
    }
    }
    
    查询
    
    {"query":{"match_all":{}}}
    易读
    结果转换器?
    重复请求
    显示选项?
    {
    
        "took": 15,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.23013961,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "1",
                    "_score": 0.23013961,
                    "_source": {
                        "first_name": "John",
                        "last_name": "Smith",
                        "age": 25,
                        "about": "I love to go rock climbing",
                        "interests": [
                            "sports"
                            ,
                            "music"
                        ]
                    }
                }
            ]
        }
    
    }
    
    
    分析;
    
    最后,我们还有一个需求需要完成:允许管理者在职员中进行分析。
    
    Elasticsearch 有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的
    
    GROUP BY 但是功能更强大。
    
    
    
    http://192.168.32.80:9200/megacorp/employee/_search/
    
                                                POST
    
    
    {
    "aggs": {
    "all_interests": {
    "terms": { "field": "interests" }
    }
    }
    
    
    {
    
        "took": 8,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 3,
            "max_score": 1,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 1,
                    "_source": {
                        "first_name": "Douglas",
                        "last_name": "Fir",
                        "age": 35,
                        "about": "I like to build cabinets",
                        "interests": [
                            "forestry"
                        ]
                    }
                }
                ,
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "1",
                    "_score": 1,
                    "_source": {
                        "first_name": "John",
                        "last_name": "Smith",
                        "age": 25,
                        "about": "I love to go rock climbing",
                        "interests": [
                            "sports"
                            ,
                            "music"
                        ]
                    }
                }
                ,
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "3",
                    "_score": 1,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "lee",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        },
        "aggregations": {
            "all_interests": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "music",
                        "doc_count": 2
                    }
                    ,
                    {
                        "key": "forestry",
                        "doc_count": 1
                    }
                    ,
                    {
                        "key": "sports",
                        "doc_count": 1
                    }
                ]
            }
        }
    
    }
    
    
    
    
    我们可以看到两个职员对音乐有兴趣,一个喜欢林学,一个喜欢运动。这些数据并没有被预
    先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有
    姓"Smith"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:
    
    
    /megacorp/employee/3
    {
    "first_name" : "Douglas",
    "last_name" : "smith",
    "age" : 35,
    "about": "I like to build cabinets",
    "interests": [ "music" ]
    }
    
    
    
    http://192.168.32.80:9200/megacorp/employee/_search/
                                              
                                               POST
    
    {
    "query": {
    "match": {
    "last_name": "smith"
    }
    },
    "aggs": {
    "all_interests": {
    "terms": {
    "field": "interests"
    }
    }
    }
    }
    
    
    http://192.168.32.80:9200/megacorp/employee/_search/
                                          POST
    
    {
    "aggs" : {
    "all_interests" : {
    "terms" : { "field" : "interests" },
    "aggs" : {
    "avg_age" : {
    "avg" : { "field" : "age" }
    }
    }
    }
    }
    }
    
    聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:
    
    
    分布式的特性;
    
    
    Elasticsearch致力于隐藏分布式系统的复杂性。以下这些操作都是在底层自动完成的:
    将你的文档分区到不同的容器或者分片(shards)中,它们可以存在于一个或多个节点
    中。
    将分片均匀的分配到各个节点,对索引和搜索做负载均衡。
    冗余每一个分片,防止硬件故障造成的数据丢失。
    将集群中任意一个节点上的请求路由到相应数据所在的节点。
    无论是增加节点,还是移除节点,分片都可以做到无缝的扩展和迁移。
    
    
    
    
    
    
    
    
    
    
    
    
    
    

  • 相关阅读:
    Gym
    [APIO2014] 回文串
    python选课系统
    python面向对象之类成员修饰符
    python面向对象之类成员
    python的shelve模块
    python的re模块
    python的configparser模块
    python的sys和os模块
    python的hashlib模块
  • 原文地址:https://www.cnblogs.com/hzcya1995/p/13350451.html
Copyright © 2020-2023  润新知