• 【事件中心 Azure Event Hub】Event Hub Java SDK的消费端出现不消费某一个分区中数据的情况,出现IdleTimerExpired错误消息记录


    问题情形

    使用Java SDK编写的Event Hub消费端应用,随机性遇见了某个分区没有消费消息的情况,在检查日志时候,有发现IdelTimeExpired的错误记录。在重启应用后,连接EventHub正常,并又能正常消费数据。比较怀疑的方面,在又开启Retry机制的情况下,为什么分区(Partition)连接断掉后没有重连呢?

    错误消息:

    {"time":"2020-09-21 05:11:19.578", "level":"ERROR", "thread":"bounded-71", "appName":"events-service", "traceId":"", "spanId":"", "url":"", "clientIp":"", 
    "method":"", "elapse":"", "code":"", "message":"", "class":"c.h.socialhub.eventhub.EventHub",
    "line":"EventHub.java:150",
    "msg":"Error occurred while processing events The connection was inactive for more than the allowed 240000 milliseconds and is closed by container 'cd8a74181e68151dde4_G28'.,
    errorContext[NAMESPACE: shprod-member.servicebus.chinacloudapi.cn,
    PATH: xxxx/ConsumerGroups/$default/Partitions/1, REFERENCE_ID: 2_xxxxxxxx LINK_CREDIT: 253]
    "}

    消费端代码:

            eventProcessorClient = new EventProcessorClientBuilder()
                    .consumerGroup(EventHubClientBuilder.DEFAULT_CONSUMER_GROUP_NAME)
                    .connectionString(currentEventHubProperty.getConnectionString(), this.topic)
                    .retry(retryOptions)
                    .checkpointStore(new BlobCheckpointStore(blobContainerAsyncClient))
                    .processEvent(eventContext -> {
                        String currentData = "";
                        try {
                                EventData event = eventContext.getEventData();
                                PartitionContext partitionContext = eventContext.getPartitionContext();
    
                                EventMessage eventMessage = new EventMessage();
                                currentData = new String(event.getBody(), Charset.defaultCharset());
                                eventMessage.setContent(currentData);
                                eventMessage.setPartitionId(partitionContext.getPartitionId());
                                eventMessage.setSequenceNumber(event.getSequenceNumber());
                                log.info("Topic: {} - Partition: {} - Sequence: {} - EnqueuedTime: {}", this.topic,  partitionContext.getPartitionId(), event.getSequenceNumber(),event.getEnqueuedTime());
    
                                eventContext.updateCheckpoint();
                            } catch (Exception e) {
                                String msg = e.getMessage();
                                if (StringUtils.isBlank(msg)) {
                                    msg = e.getStackTrace().toString();
                                }
                                log.error("Error occurred while do works with events[{}] : {}, data: {} ", this.topic, msg, currentData);
                            }
                    })
                    .processError(errorContext -> log.error("Error occurred while processing events " + errorContext.getThrowable().getMessage()))
                    .buildEventProcessorClient();

    分析原因

    第一步,需要根据日志来判断当前分区是否在问题时间点闲置了240秒,在此期间没有数据进入该分区中,如日志中有关于每一天消息进入Queue的时间(enqueued time),则可以通过日志分析,如果没有,这可以在代码日志中添加:(这是为了下一次发生问题时候,可以直接在日志中分析)

     log.info("Topic: {} - Partition: {} - Sequence: {} - EnqueuedTime: {}", this.topic,  partitionContext.getPartitionId(), event.getSequenceNumber(),event.getEnqueuedTime());

    而对于已经发生的问题,根据EventHub数据保留的设置,如果Event等信息还在保留时间期内,则可以通过SDK的receiveFromPartition方法来指定需要获取的数据范围,来查看其进入Queue的时间。(注:需要建一个不同的consumer group,不要用$Default,免得连不上),示例代码:https://azuresdkdocs.blob.core.windows.net/$web/java/azure-messaging-eventhubs/5.2.0/index.html

    Consume events from an Event Hub partition

    To consume events, create an EventHubConsumerAsyncClient or EventHubConsumerClient for a specific consumer group. In addition, a consumer needs to specify where in the event stream to begin receiving events.

    Consume events with EventHubConsumerAsyncClient

    In the snippet below, we create an asynchronous consumer that receives events from partitionId and only listens to newest events that get pushed to the partition. Developers can begin receiving events from multiple partitions using the same EventHubConsumerAsyncClient by calling receiveFromPartition(String, EventPosition) with another partition id.

    EventHubConsumerAsyncClient consumer = new EventHubClientBuilder()
        .connectionString("<< CONNECTION STRING FOR SPECIFIC EVENT HUB INSTANCE >>")
        .consumerGroup(EventHubClientBuilder.DEFAULT_CONSUMER_GROUP_NAME)
        .buildAsyncConsumerClient();
    
    // Receive newly added events from partition with id "0". EventPosition specifies the position
    // within the Event Hub partition to begin consuming events.
    consumer.receiveFromPartition("0", EventPosition.latest()).subscribe(event -> {
        // Process each event as it arrives.
    });
    // add sleep or System.in.read() to receive events before exiting the process.
    

    Consume events with EventHubConsumerClient

    Developers can create a synchronous consumer that returns events in batches using an EventHubConsumerClient. In the snippet below, a consumer is created that starts reading events from the beginning of the partition's event stream.

    EventHubConsumerClient consumer = new EventHubClientBuilder()
        .connectionString("<< CONNECTION STRING FOR SPECIFIC EVENT HUB INSTANCE >>")
        .consumerGroup(EventHubClientBuilder.DEFAULT_CONSUMER_GROUP_NAME)
        .buildConsumerClient();
    
    String partitionId = "<< EVENT HUB PARTITION ID >>";
    
    // Get the first 15 events in the stream, or as many events as can be received within 40 seconds.
    IterableStream<PartitionEvent> events = consumer.receiveFromPartition(partitionId, 15,
        EventPosition.earliest(), Duration.ofSeconds(40));
    for (PartitionEvent event : events) {
        System.out.println("Event: " + event.getData().getBodyAsString());
    }

    以上。 并没有发现问题是否是应用端逻辑问题还是是SDK端问题,在借鉴了GitHub上的很多相类似的情况后,大部分倾向于Java SDK问题。需要等待Github中的进一步更新:

    AmqpEventHubConsumer.IdleTimerExpired in Java EventHubConsumer SDK:https://github.com/Azure/azure-sdk-for-java/issues/11233

    当在复杂的环境中面临问题,格物之道需:浊而静之徐清,安以动之徐生。 云中,恰是如此!

  • 相关阅读:
    有固态硬盘的电脑还是不流畅?这些值得了解
    一名神舟笔记本电脑用户的内心独白
    验证码无法显示:Could not initialize class sun.awt.X11GraphicsEnvironment 解决方案
    nginx 负载均衡时,一台tomcat宕机时的问题 自动切换
    tomcat结合nginx使用小结
    金九银十面试突击,卧底去阿里、京东、美团、腾讯带回来的面试题(内含答案)
    干货分享 ▏Jmeter-场景设置/运行/参数化访问地址/【一次性打包呈现】
    各类APP功能测试用例/设计方法/数据库和日记分析——【范本】
    软件测试这些坑,千万不要踩!
    技术面试中,遇到不会回答的问题怎么破?来,教3招!
  • 原文地址:https://www.cnblogs.com/lulight/p/13704655.html
Copyright © 2020-2023  润新知