• 最新的ES 5.0路由算法底层实现


    http://www.cnblogs.com/bonelee/p/6078947.html 里分析了ES bulk实现,其中路由代码:

    ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(), request.routing()).shardId();

    其实现: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/cluster/routing/OperationRouting.java

        public ShardIterator indexShards(ClusterState clusterState, String index, String id, @Nullable String routing) {
            return shards(clusterState, index, id, routing).shardsIt();
        }
    
        protected IndexShardRoutingTable shards(ClusterState clusterState, String index, String id, String routing) {
            int shardId = generateShardId(indexMetaData(clusterState, index), id, routing);
            return clusterState.getRoutingTable().shardRoutingTable(index, shardId);
        }
    
        static int generateShardId(IndexMetaData indexMetaData, String id, @Nullable String routing) {
            final int hash;
            if (routing == null) {
                hash = Murmur3HashFunction.hash(id);
            } else {
                hash = Murmur3HashFunction.hash(routing);
            }
            // we don't use IMD#getNumberOfShards since the index might have been shrunk such that we need to use the size
            // of original index to hash documents
            return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();
        }

    可以看到最新的Es代码实现路由是:

    Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();

    在https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/cluster/metadata/IndexMetaData.java 里可以看到getRoutingFactor实现:

        /**
         * Returns the routing factor for this index. The default is <tt>1</tt>.
         *
         * @see #getRoutingFactor(IndexMetaData, int) for details
         */
        public int getRoutingFactor() {
            return routingFactor;
        }

    构造函数里有:

            assert numberOfShards * routingFactor == routingNumShards :  routingNumShards + " must be a multiple of " + numberOfShards;

    反正默认是1,也就是所有的shard节点都会负责路由!

    当心,ES2.4版本的路由实现:https://github.com/elastic/elasticsearch/blob/2.4/core/src/main/java/org/elasticsearch/cluster/routing/

        @SuppressForbidden(reason = "Math#abs is trappy")
        private int generateShardId(ClusterState clusterState, String index, String type, String id, @Nullable String routing) {
            IndexMetaData indexMetaData = clusterState.metaData().index(index);
            if (indexMetaData == null) {
                throw new IndexNotFoundException(index);
            }
            final Version createdVersion = indexMetaData.getCreationVersion();
            final HashFunction hashFunction = indexMetaData.getRoutingHashFunction();
            final boolean useType = indexMetaData.getRoutingUseType();
    
            final int hash;
            if (routing == null) {
                if (!useType) {
                    hash = hash(hashFunction, id);
                } else {
                    hash = hash(hashFunction, type, id);
                }
            } else {
                hash = hash(hashFunction, routing);
            }
            if (createdVersion.onOrAfter(Version.V_2_0_0_beta1)) {
                return MathUtils.mod(hash, indexMetaData.getNumberOfShards());
            } else {
                return Math.abs(hash % indexMetaData.getNumberOfShards());
            }
        }
        @Deprecated
        protected int hash(HashFunction hashFunction, String type, String id) {
            if (type == null || "_all".equals(type)) {
                throw new IllegalArgumentException("Can't route an operation with no type and having type part of the routing (for backward comp)");
            }
            return hashFunction.hash(type, id);
        }

    而该hash function实现由:

    DjbHashFunction.java

    SimpleHashFunction.java

    Murmur3HashFunction.java

    三种。

    hash相关设置如下:

    #分片数
    index.number_of_shards
    #副本数
    index.number_of_replicas

    #该index各索引的routing规则,采用何种Hash方式,默认使用Murmur3,还有一种普通的Hash算法 index.legacy.routing.hash.type #routing计算是否使用type,内部计算shard id的方法已经废弃,建议不使用,不设置,默认false即可 index.legacy.routing.use_type
  • 相关阅读:
    软件工程之美8讲——怎样平衡软件质量与时间成本范围的关系?
    软件工程之美7讲——大厂都在用哪些敏捷方法?(下)
    软件工程之美6讲——大厂都在用哪些敏捷方法?(上)
    C++问题少年系列
    有点捞的算法笔记
    UnityEditor简单介绍及案例
    数据可视化之图表用法(参考Antv整理)
    【思维导图】携程平台化常态化数据治理之路
    李宏毅2021机器学习/深度学习视频笔记
    力扣动态规划中等题困难题+背包问题DP专题202108012
  • 原文地址:https://www.cnblogs.com/bonelee/p/6078956.html
Copyright © 2020-2023  润新知