• Elasticsearch 搜索模块之Cross Cluster Search(跨集群搜索)


    Cross Cluster Search简介

    cross-cluster search功能允许任何节点作为跨多个群集的federated client(联合客户端),与tribe node不同的是cross-cluster search节点并不会加入remote cluster(远程集群),而是用轻量的方法连接到remote cluster,以便执行federated search(联合搜索)

    Remote cluster

    要使用cross-cluster search之前需要先了解remote cluster

    一个remote cluster中有"name"和seed nodes(种子节点)列表以供引用,注册remote cluster时,会从其中一个seed node来检查其集群状态,以便在默认情况下选择最多三个有资格的节点作为gateway nodes(网关节点), 集群中配置了remote cluster的每个节点都连接到一个或多个gateway nodes,并使用它们将federated search到remote cluster。

    可以使用集群设置(可以动态更新)在全局指定remote cluster,也可以在各个节点中的elasticsearch.yml指定remote cluster 

    如果节点通过elasticsearch.yml文件配置remote cluster,则可以通过该节点连接到remote cluster。换句话说,federated search只有发送到该节点才能连接到remote cluster。通过cluster settings API 设置的remote cluster集群中的每个节点(设置了cluster.remote.connect: true的节点)都可以连接。

    通过elasticsearch.yml设置

    cluster:
        remote:
            cluster_one: 
                seeds: 127.0.0.1:9300
            cluster_two: 
                seeds: 127.0.0.1:9301

    cluster_one和cluster_two表示与每个群集连接的任意群集别名。这些名称之后用于区分本地和远程索引

    使用cluster settings API设置:

    PUT _cluster/settings
    {
      "persistent": {
        "cluster": {
          "remote": {
            "cluster_one": {
              "seeds": [
                "127.0.0.1:9300"
              ]
            },
            "cluster_two": {
              "seeds": [
                "127.0.0.1:9301"
              ]
            },
            "cluster_three": {
              "seeds": [
                "127.0.0.1:9302"
              ]
            }
          }
        }
      }
    }

    删除远程群集:

    PUT _cluster/settings
    {
      "persistent": {
        "cluster": {
          "remote": {
            "cluster_three": {
              "seeds": null 
            }
          }
        }
      }
    }

     删除cluster_three保留cluster_one和cluster_tow

    Remote cluster的设置:

    cluster.remote.connections_per_cluster

    gateway nodes数量,默认是3

    cluster.remote.initial_connect_timeout

    节点启动时等待远程节点的超时时间,默认是30s

    cluster.remote.node.attr

    一个节点属性,用于过滤掉remote cluster中 符合gateway nodes的节点,比如设置cluster.remote.node.attr=gateway,那么将匹配节点属性node.attr.gateway: true

    cluster.remote.connect

    默认情况下,群集中的任意节点都可以充当federated client并连接到remote cluster,cluster.remote.connect可以设置为 false(默认为true)以防止某些节点连接到remote cluster

    cluster.remote.${cluster_alias}.skip_unavailable

     在节点中跳过特定的群集别名,默认是false

    使用cross-cluster search查询

    要搜索远程集群cluster_one上的twitter索引,index名和集群别用冒号分开:

    GET /cluster_one:twitter/_search
    {
      "query": {
        "match": {
          "user": "kimchy"
        }
      }
    }

    与tribe特征相反,cross-cluster search还可以在不同群集上搜索相同名称的index:

    GET /cluster_one:twitter,twitter/_search
    {
      "query": {
        "match": {
          "user": "kimchy"
        }
      }
    }

    搜索结果的歧义与索引在请求中消除歧义的方式相同。即使index名称相同,这些index也会在合并结果时被视为不同的index。从远程index检索的所有结果都将以remote cluster的name为前缀:

    {
      "took": 150,
      "timed_out": false,
      "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0,
        "skipped": 0
      },
      "_clusters": {
        "total": 2,
        "successful": 2,
        "skipped": 0
      },
      "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
          {
            "_index": "cluster_one:twitter",
            "_type": "_doc",
            "_id": "0",
            "_score": 1,
            "_source": {
              "user": "kimchy",
              "date": "2009-11-15T14:12:12",
              "message": "trying out Elasticsearch",
              "likes": 0
            }
          },
          {
            "_index": "twitter",
            "_type": "_doc",
            "_id": "0",
            "_score": 2,
            "_source": {
              "user": "kimchy",
              "date": "2009-11-15T14:12:12",
              "message": "trying out Elasticsearch",
              "likes": 0
            }
          }
        ]
      }
    }

     跳过已经断开连接的集群:

    默认情况下,在执行搜索请求时,通过cross-cluster search搜索的所有remote cluster都必须可用,否则整个请求将失败,并且尽管某些群集可用,但不会返回搜索结果。可以通过skip_unavailable设置使remote cluster可选,默认设置为false。

    PUT _cluster/settings
    {
      "persistent": {
        "cluster.remote.cluster_two.skip_unavailable": true 
      }
    }

    cluster_two就变成可选的了

    GET /cluster_one:twitter,cluster_two:twitter,twitter/_search 
    {
      "query": {
        "match": {
          "user": "kimchy"
        }
      }
    }

    在本地、cluster_onecluster_two中搜索索引twitter

    {
      "took": 150,
      "timed_out": false,
      "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0,
        "skipped": 0
      },
      "_clusters": { #clusters部分表示一个群集不可用并被跳过
        "total": 3,
        "successful": 2,
        "skipped": 1
      },
      "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
          {
            "_index": "cluster_one:twitter",
            "_type": "_doc",
            "_id": "0",
            "_score": 1,
            "_source": {
              "user": "kimchy",
              "date": "2009-11-15T14:12:12",
              "message": "trying out Elasticsearch",
              "likes": 0
            }
          },
          {
            "_index": "twitter",
            "_type": "_doc",
            "_id": "0",
            "_score": 2,
            "_source": {
              "user": "kimchy",
              "date": "2009-11-15T14:12:12",
              "message": "trying out Elasticsearch",
              "likes": 0
            }
          }
        ]
      }
    }
  • 相关阅读:
    书摘--可能与不可能的边界
    电影-茶室
    使用unittest,if __name__ == '__main__':里代码不执行的解决办法
    Pycharm中配置鼠标悬停快速提示方法参数
    Python 解决pip使用超时的问题
    Linux性能监控命令——sar详解
    Linux系统管理
    Linux top命令的用法详细详解
    CentOs7排查CPU高占用
    centos 7 查看磁盘io ,找出占用io读写很高的进程
  • 原文地址:https://www.cnblogs.com/37yan/p/9989841.html
Copyright © 2020-2023  润新知