• Ozone Insight工具的使用


    前言


    分布式系统的运行过程比一般的企业级系统要复杂许多,里面会牵扯到很多服务的调用以及复杂的并行逻辑处理。因此对于分布式系统的问题研究分析,并不是一件简单的事情。但是如果我们有一些路径能够知道它里面运行的一些情况,比如关键metric指标等等,这会给我们带来很大的帮助。现有的许多系统提供的最多的可供外界使用的信息,就是metric,不过有时这些metric指标查询起来并不是很方便。倘若系统能够提供一个直接的命令操作,让用户能直接获取这些指标,这样在操作性上无疑会大大提升其可用性。Ozone在这块做了特别的实现,专门做了insight命令工具来提升其observability。本文笔者来简单聊聊这个insight工具。

    Ozone的Insight视角


    在介绍Ozone insight命令之前,我们先来了解下Ozone系统内所谓的Insight具体指的是什么呢?

    Ozone为了提升其系统对外的可观察性,通过对其内部各个关键服务模块(不仅仅是进程级别,还是内部线程级别,Protocol级别)做了endpoint的实现,然后对外能够展示出有效的信息,这里的有效信息包括:

    • 关键服务的(实时)日志
    • 关键服务的metric指标
    • 关键服务的配置

    具体的实现原理,笔者在之前的文章:如何提高分布式系统的可观察性:Insight Tool的引入描述过,感兴趣的同学可仔细阅读里面的细节实现,这里就不多加阐述了。

    可能有人说了,上述3个信息并没有特别之处,在普通系统内也能够得到。没错,但是ozone将这些查询行为直接做成了工具命令给用户使用,在这点上还是做得比较创新的。下面来看这些insight命令的具体使用方式,然后我们就能感受到它到底有多方便了。

    Ozone的insight工具命令的使用


    首先,我们可以通过-help参数来获取insight命令的所有可用命令,

    [hdfs@lyq bin]$ ./ozone insight -help
    Unknown option: -elp (while processing option: '-help')
    Usage: ozone insight [-hV] [--verbose] [-conf=<configurationPath>]
                         [-D=<String=String>]... [COMMAND]
    Show debug information about a selected Ozone component
          --verbose   More verbose output. Show the stack trace of the errors.
          -conf=<configurationPath>
    
      -D, --set=<String=String>
    
      -h, --help      Show this help message and exit.
      -V, --version   Print version information and exit.
    Commands:
      list             Show available insight points.
      log, logs        Show log4j events related to the insight point
      metrics, metric  Show available metrics.
      config           Show configuration for a specific subcomponents
    

    然后在命令具体使用之前,我们要知道当前有哪些可用的insight point,insight point意为那些关键的服务点,例如关键线程服务,关键Protocol协议操作等等。

    [hdfs@lyq bin]$ ./ozone insight list
    Available insight points:
    
      scm.node-manager                     SCM Datanode management related information.
      scm.replica-manager                  SCM closed container replication manager
      scm.event-queue                      Information about the internal async event delivery
      scm.protocol.block-location          SCM Block location protocol endpoint
      om.key-manager                       OM Key Manager
      om.protocol.client                   Ozone Manager RPC endpoint
    

    我们可以看到上面的insight point的粒度已经是非常细粒度的级别了。

    下面我们来一一使用上面的3个子命令,首先是log命令,log这里会实时抓取目标insight point对应的日志类的log,如下为point scm.node-manager的日志获取:

    [hdfs@lyq apache]$ ozone/bin/ozone insight log scm.node-manager
    [SCM] 2019-12-13 21:04:46,966 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=lyq-xxx.com]
    [SCM] 2019-12-13 21:05:14,998 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=lyq-xxx.com]
    

    然后是metric指标的获取,这里的metric指标和我们平常在页面上通过jmx拿到的指标基本是一致的,不过在这里 通过不同的insight point其实是做了二次归类的。

    
    [hdfs@lyq apache]$ ozone/bin/ozone insight metric om.protocol.client
    Metrics for `om.protocol.client` (Ozone Manager RPC endpoint)
    
    RPC connections
    
      Open connections: 0
      Dropped connections: 0
      Received bytes: 2037
      Sent bytes: 1760
    
    
    RPC queue
    
      RPC average queue time: 0.5
      RPC call queue length: 0
    
    
    RPC performance
    
      RPC processing time average: 8.0
      Number of slow calls: 0
    
    
    Message type counters
    
      Number of CreateVolume: 1
      Number of SetVolumeProperty: 0
      Number of CheckVolumeAccess: 0
      Number of InfoVolume: 2
      Number of DeleteVolume: 0
      Number of ListVolume: 0
      Number of CreateBucket: 0
      Number of InfoBucket: 0
      Number of SetBucketProperty: 0
      Number of DeleteBucket: 0
      Number of ListBuckets: 0
      Number of CreateKey: 0
      Number of LookupKey: 0
      Number of RenameKey: 0
      Number of DeleteKey: 0
      Number of ListKeys: 0
      Number of CommitKey: 0
      Number of AllocateBlock: 0
      Number of CreateS3Bucket: 0
      Number of DeleteS3Bucket: 0
      Number of InfoS3Bucket: 0
      Number of ListS3Buckets: 0
      Number of InitiateMultiPartUpload: 0
      Number of CommitMultiPartUpload: 0
      Number of CompleteMultiPartUpload: 0
      Number of AbortMultiPartUpload: 0
      Number of GetS3Secret: 0
      Number of ListMultiPartUploadParts: 0
      Number of ServiceList: 4
      Number of DBUpdates: 0
      Number of GetDelegationToken: 0
      Number of RenewDelegationToken: 0
      Number of CancelDelegationToken: 0
      Number of GetFileStatus: 0
      Number of CreateDirectory: 0
      Number of CreateFile: 0
      Number of LookupFile: 0
      Number of ListStatus: 0
      Number of AddAcl: 0
      Number of RemoveAcl: 0
      Number of SetAcl: 0
      Number of GetAcl: 1
      Number of PurgeKeys: 0
      Number of ListMultipartUploads: 0
    

    最后一个命令是config配置值的获取,这里获取到的是当前系统所加载使用的配置项的值,而不是获取本地的配置文件值,系统真正在使用的配置值才是我们想知道的。

    [hdfs@lyq bin]$ ./ozone insight config scm.replica-manager
    Configuration for `scm.replica-manager` (SCM closed container replication manager)
      hdds.scm.replication.thread.interval
           default: 300s
           current: 300s
    
    When a heartbeat from the data node arrives on SCM, It is queued for processing with the time stamp of when the heartbeat arrived. There is a heartbeat processing thread inside SCM that runs at a specified interval. This value controls how frequently this thread is run.
    
    There are some assumptions build into SCM such as this value should allow the heartbeat processing thread to run at least three times more frequently than heartbeats and at least five times more than stale node detection time. If you specify a wrong value, SCM will gracefully refuse to run. For more info look at the node manager tests in SCM.
    
    In short, you don't need to change this.
    
    
      hdds.scm.replication.event.timeout
           default: 10m
           current: 10m
    
    Timeout for the container replication/deletion commands sent  to datanodes. After this timeout the command will be retried.
    

    上面config的命令输出信息提供了insight point相关的配置信息,对于用户来说还是十分友好的,不仅仅有当前值还有默认值的大小,以及配置的描述信息。

    笔者在使用完这个工具后,不得不说Ozone实现的这套insight工具使用性还是很高的。其内部核心思想通过对关键服务设置insight point,然后对外暴露信息。

    引用


    [1].https://blog.csdn.net/Androidlushangderen/article/details/100824677
    [2].https://issues.apache.org/jira/browse/HDDS-1935 . Improve the visibility with Ozone Insight tool

  • 相关阅读:
    linux可执行文件添加到PATH环境变量的方法
    PHPExcel所遇到问题的知识点总结
    如何查看已经安装的nginx、apache、mysql和php的编译参数
    oracle 创建用户及表空间命令
    datetimepicker 设置日期格式、初始化
    Linux 修改系统时间(自动同步)
    Nginx 负载均衡配置
    CenterOS7 安装 Nginx【转】
    java https post请求并忽略证书,参数放在body中
    将.cer证书导入java密钥库?
  • 原文地址:https://www.cnblogs.com/bianqi/p/12183500.html
Copyright © 2020-2023  润新知