最新的ES 5.0路由算法底层实现

http://www.cnblogs.com/bonelee/p/6078947.html 里分析了ES bulk实现，其中路由代码：

ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(), request.routing()).shardId();

其实现： https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/cluster/routing/OperationRouting.java

    public ShardIterator indexShards(ClusterState clusterState, String index, String id, @Nullable String routing) {
        return shards(clusterState, index, id, routing).shardsIt();
    }

    protected IndexShardRoutingTable shards(ClusterState clusterState, String index, String id, String routing) {
        int shardId = generateShardId(indexMetaData(clusterState, index), id, routing);
        return clusterState.getRoutingTable().shardRoutingTable(index, shardId);
    }

    static int generateShardId(IndexMetaData indexMetaData, String id, @Nullable String routing) {
        final int hash;
        if (routing == null) {
            hash = Murmur3HashFunction.hash(id);
        } else {
            hash = Murmur3HashFunction.hash(routing);
        }
        // we don't use IMD#getNumberOfShards since the index might have been shrunk such that we need to use the size
        // of original index to hash documents
        return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();
    }

可以看到最新的Es代码实现路由是：

Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();

在https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/cluster/metadata/IndexMetaData.java 里可以看到getRoutingFactor实现：

    /**
     * Returns the routing factor for this index. The default is <tt>1</tt>.
     *
     * @see #getRoutingFactor(IndexMetaData, int) for details
     */
    public int getRoutingFactor() {
        return routingFactor;
    }

构造函数里有：

        assert numberOfShards * routingFactor == routingNumShards :  routingNumShards + " must be a multiple of " + numberOfShards;

反正默认是1，也就是所有的shard节点都会负责路由！

当心，ES2.4版本的路由实现：https://github.com/elastic/elasticsearch/blob/2.4/core/src/main/java/org/elasticsearch/cluster/routing/

    @SuppressForbidden(reason = "Math#abs is trappy")
    private int generateShardId(ClusterState clusterState, String index, String type, String id, @Nullable String routing) {
        IndexMetaData indexMetaData = clusterState.metaData().index(index);
        if (indexMetaData == null) {
            throw new IndexNotFoundException(index);
        }
        final Version createdVersion = indexMetaData.getCreationVersion();
        final HashFunction hashFunction = indexMetaData.getRoutingHashFunction();
        final boolean useType = indexMetaData.getRoutingUseType();

        final int hash;
        if (routing == null) {
            if (!useType) {
                hash = hash(hashFunction, id);
            } else {
                hash = hash(hashFunction, type, id);
            }
        } else {
            hash = hash(hashFunction, routing);
        }
        if (createdVersion.onOrAfter(Version.V_2_0_0_beta1)) {
            return MathUtils.mod(hash, indexMetaData.getNumberOfShards());
        } else {
            return Math.abs(hash % indexMetaData.getNumberOfShards());
        }
    }

    @Deprecated
    protected int hash(HashFunction hashFunction, String type, String id) {
        if (type == null || "_all".equals(type)) {
            throw new IllegalArgumentException("Can't route an operation with no type and having type part of the routing (for backward comp)");
        }
        return hashFunction.hash(type, id);
    }

而该hash function实现由：

DjbHashFunction.java

SimpleHashFunction.java

Murmur3HashFunction.java

三种。

hash相关设置如下：

#分片数
index.number_of_shards
#副本数
index.number_of_replicas

#该index各索引的routing规则，采用何种Hash方式，默认使用Murmur3，还有一种普通的Hash算法
index.legacy.routing.hash.type

#routing计算是否使用type，内部计算shard id的方法已经废弃，建议不使用，不设置，默认false即可
index.legacy.routing.use_type

相关阅读:
软件工程之美8讲——怎样平衡软件质量与时间成本范围的关系？
软件工程之美7讲——大厂都在用哪些敏捷方法？（下）
软件工程之美6讲——大厂都在用哪些敏捷方法？（上）
C++问题少年系列
 有点捞的算法笔记
 UnityEditor简单介绍及案例
 数据可视化之图表用法（参考Antv整理）
【思维导图】携程平台化常态化数据治理之路
 李宏毅2021机器学习/深度学习视频笔记
 力扣动态规划中等题困难题+背包问题DP专题202108012
原文地址：https://www.cnblogs.com/bonelee/p/6078956.html