• dubbo在redis注册中心下 ReconnectTimerTask 不停重连provider 问题


         问题描述 : 使用redis 注册中心时, dubbo消费端一直不停重试reconnect dubbo provider, 并报错;

    [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [10.1.1.12:0 -> /10.1.1.228:20888]], dubbo version: 2.7.3, current host: 10.1.1.12
    2019-08-30 20:33:52.283 [/] httpwrapper [dubbo-client-idleCheck-thread-1] ERROR o.a.d.r.e.s.h.ReconnectTimerTask - [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [10.1.1.12:0 -> /10.1.1.228:20888]], dubbo version: 2.7.3, current host: 10.1.1.12 
    org.apache.dubbo.remoting.RemotingException: client(url: dubbo://10.1.1.228:20888/com.cxq56.service.GeoService?actives=0&anyhost=true&application=httpwrapper&async=false&bean.name=providers:dubbo:com.cxq56.service.GeoService&check=false&cluster=failover&codec=dubbo&default.deprecated=false&default.dynamic=false&default.register=true&default.retries=1&default.timeout=10000&deprecated=false&dubbo=2.0.2&dynamic=false&generic=false&heartbeat=60000&interface=com.cxq56.service.GeoService&lazy=false&loadbalance=random&methods=createForbiddenGeo,calculatedDistance,createSiteInfo,getSiteAndDistance,getAllGeoByCityId,searchForPOI,createGeo&pid=1&qos.enable=false&register=true&register.ip=10.1.1.12&release=2.7.1&remote.application=geo-provider&retries=0&revision=1.0-SNAPSHOT&shutwait=40000&side=consumer&sticky=false&timeout=3000&timestamp=1567049198218&validation=false) failed to connect to server /10.1.1.228:20888 client-side timeout 3000ms (elapsed: 3000ms) from netty client 10.1.1.12 using dubbo version 2.7.3
    at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:171)
    at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:190)
    at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:246)
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:155)
    at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49)
    at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87)
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:648)
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:727)
    at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:449)
    at java.lang.Thread.run(Thread.java:748)
    

    先说结论,这种异常有两种情况:

    1. dubbo消费端启动时,获取到一些已经过期的provider注册信息,并尝试重新连接。
    2, dubbo消费端,dubbo provider正常运行过程中,provider突然非正常停机,导致的不停尝试重新连接。(当然一般情况下, 非正常停机是不能容忍的)

    背景:

    1. dubbo针对服务可用性的检测有自己的实现机制, 主要通过ReconnectTimerTask来实现定时重连, 确保服务可用; 
    2. 另一方面, dubbo在使用redis注册中心注册时, 会往redis写入一个hash值, key为service接口名, 如"/dubbo/com.sample.configmgmt.api.CustomerServiceApi",field字段是方法接口的一些基本信息如:“dubbo://127.0.0.1:20881/com.sample.configmgmt.api.OrgServiceApi?accepts=0&accesslog=true&anyhost=true&application=configmgmt-service&bean.name=ServiceBean:com.cxq56.configmgmt.api.OrgServiceApi&cluster=failover&deprecated=false&dubbo=2.0.2&dump.directory=/tmp&dynamic=true&generic=false&interface=com.cxq56.configmgmt.api.OrgServiceApi&loadbalance=random&methods=createOrg,getOrg,updateOrg,listOrg,pageOrg&pid=1&register=true&release=2.7.3&retries=0&revision=1.0-SNAPSHOT&shutwait=40000&side=provider&timeout=3000&timestamp=1567416954354&token=true”,value是一个定时更新的时间戳,由一个单独的线程来维护; 当有别的服务通过注册中心获取可用的dubbo服务时, 如果value的时间戳小于当前日期, 那么就会认定这个服务时过期的, 不可用的;
    

        针对第二种情况,当有新的provider注册并发送notify消息是,Notify线程会执行一系列操作,RegistryDirectory 会更新最新可用的provider信息(refreshInvoker),这样就能把对应过期dubbo服务的ReconnectTimerTask注销掉。

        经过查看,本次我们出现的问题,是属于第一种情况; 当dubbo消费端启动时,获取到的服务端注册信息,即使有些接口的时间戳已经过期了,但是还是尝试重新连接;者和我预期的完全不同;这是因为dubbo服务端有个关键的参数设置的有问题。

    HeaderExchangeClient.java :心跳机制和重连机制的启动器

        private final Client client;
        private final ExchangeChannel channel;
    
        private static final HashedWheelTimer IDLE_CHECK_TIMER = new HashedWheelTimer(
                new NamedThreadFactory("dubbo-client-idleCheck", true), 1, TimeUnit.SECONDS, TICKS_PER_WHEEL);
        private HeartbeatTimerTask heartBeatTimerTask;
        private ReconnectTimerTask reconnectTimerTask;
    
        public HeaderExchangeClient(Client client, boolean startTimer) {
            Assert.notNull(client, "Client can't be null");
            this.client = client;
            this.channel = new HeaderExchangeChannel(client);
    
            if (startTimer) {
                URL url = client.getUrl();
                startReconnectTask(url);
                startHeartBeatTask(url);
            }
        }
    

        可以看到使用的是HashedWheelTimer来定时轮询的;这里的reConnectTask如果失败,就会打印出我们的异常日志;而且失败后不会停止重试,会一直尝试下去;那么这里有一个问题,是否redis有的历史注册信息,consumer都会去尝试reconnect呢?
        所以我们尝试打个断点尝试分析一下;并往上追述可以发现当dubbo consumer启动时会在redis中注册本身的消费端信息,同时也会通过接口名获取所有provider注册信息,并在RedisRegistry.java(我用的是redis注册中心)中进行过滤,代码如下:
    RedisRegistry.java:实现注册,取消注册,以及订阅,取消订阅

    private void doNotify(Jedis jedis, Collection<String> keys, URL url, Collection<NotifyListener> listeners) {
            if (keys == null || keys.isEmpty()
                    || listeners == null || listeners.isEmpty()) {
                return;
            }
            long now = System.currentTimeMillis();
            List<URL> result = new ArrayList<>();
            List<String> categories = Arrays.asList(url.getParameter(CATEGORY_KEY, new String[0]));
            String consumerService = url.getServiceInterface();
            for (String key : keys) {
                if (!ANY_VALUE.equals(consumerService)) {
                    String providerService = toServiceName(key);
                    if (!providerService.equals(consumerService)) {
                        continue;
                    }
                }
                String category = toCategoryName(key);
                if (!categories.contains(ANY_VALUE) && !categories.contains(category)) {
                    continue;
                }
                List<URL> urls = new ArrayList<>();
                Map<String, String> values = jedis.hgetAll(key);
                if (CollectionUtils.isNotEmptyMap(values)) {
                    for (Map.Entry<String, String> entry : values.entrySet()) {
                        URL u = URL.valueOf(entry.getKey());
                //如果dynamic为false 或者 过期时间 大于 当前时间 就加入这个注册url,后面进行reconnect
                        if (!u.getParameter(DYNAMIC_KEY, true)
                                || Long.parseLong(entry.getValue()) >= now) {
                            if (UrlUtils.isMatch(url, u)) {
                                urls.add(u);
                            }
                        }
                    }
                }
                if (urls.isEmpty()) {
                    urls.add(URLBuilder.from(url)
                            .setProtocol(EMPTY_PROTOCOL)
                            .setAddress(ANYHOST_VALUE)
                            .setPath(toServiceName(key))
                            .addParameter(CATEGORY_KEY, category)
                            .build());
                }
                result.addAll(urls);
                if (logger.isInfoEnabled()) {
                    logger.info("redis notify: " + key + " = " + urls);
                }
            }
            if (CollectionUtils.isEmpty(result)) {
                return;
            }
            for (NotifyListener listener : listeners) {
                notify(url, listener, result);
            }
        }
    

        provider注册信息的过滤条件是,dynamic为true且过期时间小于当前时间,一般旧的注册数据的过期时间肯定都会小于当前时间(这种数据算是脏数据,优雅停机和dubbo monitor都可以移除),源头就在这个dynamic上,由于这个provider使用的dubbo版本是2.7.1,有一个bug,默认的dynamic的值为false,直接导致现在的问题;另外这个dynamic的官方文档解释的意思是 "服务是否动态注册,如果设为false,注册后将显示后disable状态,需人工启用,并且服务提供者停止时,也不会自动取消册,需人工禁用。" 但是并没有说,consumer会一直重连。

    重连代码如下

    
    
    /**
     * ReconnectTimerTask
     */
    public class ReconnectTimerTask extends AbstractTimerTask {
    
        private static final Logger logger = LoggerFactory.getLogger(ReconnectTimerTask.class);
    
        private final int idleTimeout;
    
        public ReconnectTimerTask(ChannelProvider channelProvider, Long heartbeatTimeoutTick, int idleTimeout) {
            super(channelProvider, heartbeatTimeoutTick);
            this.idleTimeout = idleTimeout;
        }
        
        //2.7.3版本默认每分钟执行一次
        @Override
        protected void doTask(Channel channel) {
            try {
                Long lastRead = lastRead(channel);
                Long now = now();
    
                // Rely on reconnect timer to reconnect when AbstractClient.doConnect fails to init the connection
                //如果此时channel已经断开了,那么立即重连
                if (!channel.isConnected()) {
                    try {
                        logger.info("Initial connection to " + channel);
                        ((Client) channel).reconnect();
                    } catch (Exception e) {
                        logger.error("Fail to connect to " + channel, e);
                    }
                // check pong at client
                //如果此时channel没有断开,但是从上次
                } else if (lastRead != null && now - lastRead > idleTimeout) {
                    logger.warn("Reconnect to channel " + channel + ", because heartbeat read idle time out: "
                            + idleTimeout + "ms");
                    try {
                        ((Client) channel).reconnect();
                    } catch (Exception e) {
                        logger.error(channel + "reconnect failed during idle time.", e);
                    }
                }
            } catch (Throwable t) {
                logger.warn("Exception when reconnect to remote channel " + channel.getRemoteAddress(), t);
            }
        }
    }
    
    

    -----------------------------------------------------------------------------------------------------------end------------------------------------------------------------------------

  • 相关阅读:
    1058 A+B in Hogwarts (20分)
    我的Vue之小功能统计
    H5如何用Canvas画布生成并保存带图片文字的新年快乐的海报
    微信小程序之特殊效果及功能
    移动端H5适配方法(盒子+图片+文字)
    5分钟教你3种实现验证码功能
    微信小程序动态生成保存二维码
    微信授权获取code(微信支付)
    H5微信自定义分享链接(设置标题+简介+图片)
    带你走近WebSocket协议
  • 原文地址:https://www.cnblogs.com/IC1101/p/11437139.html
Copyright © 2020-2023  润新知