• Prometheus + Grafana(八)系统监控之Kafka


    前言

    第一种:JMX

    https://help.aliyun.com/document_detail/141108.html?spm=a2c4g.11186623.6.621.12bb4dea7EyM9F

    第二种:kafka_exporter

    本文就是采用第二种方式实现,相比JMX,优势在于不需要消耗 JVM资源,指标收集时间从分钟级别降到秒级别,便于大规模集群的监控。

    技术架构

    图片引用:https://zhuanlan.zhihu.com/p/57704357

    安装kafka_exporter

    注:1个kafka集群只需要1个exporter,在集群上的任意1台服务器部署。

    • 上传解压

    从 https://github.com/danielqsj/kafka_exporter 下载并传kafka_exporter-1.2.0.linux-amd64.tar安装包并解压到/usr/local目录

    wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.2.0/kafka_exporter-1.2.0.linux-amd64.tar.gz
    tar -xvf kafka_exporter-1.2.0.linux-amd64.tar
    cd kafka_exporter-1.2.0.linux-amd64/
    • 配置

    使用默认配置

    • 启动

    进入根目录下,输入以下命令:

    cd /usr/local/kafka_exporter-1.2.0.linux-amd64
    nohup ./kafka_exporter --kafka.server=172.16.10.93:9092 &

    启动成功后,可以访问 http://172.16.10.93:9308/metrics/ ,(IP和端口要改成相应环境的)

    看抓取的信息如下:

     

    Prometheus配置

    • 配置

    修改prometheus组件的prometheus.yml加入kafka监控:

    vi /usr/local/prometheus-2.15.1/prometheus.yml

     

    • 启动验证

    先kill掉Prometheus进程,用以下命令重启它,然后查看targets:

    cd /usr/local/prometheus-2.15.1
    nohup ./prometheus --config.file=prometheus.yml &

    注:State=UP,说明成功

    Grafana配置

    • 导入仪表盘模板

    通过浏览器访问:http://grafana服务器IP:3000

     添加数据源,选择prometheus,填入prometheus服务器IP端口,点击保存

     导入监控图表

    输入7589,光标往下移,如下图

    图表数据就出来了

     以上仪表盘导入后再结合自身业务修改过的最终仪表盘:

    • 预警指标

    序号

    预警名称

    预警规则

    描述

    1

    Broker数量预警

    当Broker数量达到阈值【<3】时进行预警

    2

    消费延迟预警

    当积压的消息数量达到阈值【>1000】时进行预警

    3

    失效副本分区预警

    当失效副本分区数量达到阈值【>0】时进行预警

    • Grafana仪表盘参考:

    1. https://grafana.com/grafana/dashboards/7589 (推介)
    2. https://grafana.com/grafana/dashboards/9018 (参考-新的)
    3. https://grafana.com/grafana/dashboards/9947(参考-新的)
    4. https://grafana.com/grafana/dashboards/10973(JMX-阿里云)
    5. https://www.menina.cn/article/88
    6. https://cloud.tencent.com/developer/news/377416
     

    其它

    • 注册系统服务开机自动启动
    复制代码
    ## 准备配置文件
    cat <<\EOF >/etc/systemd/system/kafka_exporter.service
    [Unit]
    Description=Elasticsearch stats exporter for Prometheus
    Documentation=Prometheus exporter for various metrics about ElasticSearch, written in Go.
    
    [Service]
    ExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.50.16:9092
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    
    ## 启动并设置为开机自动启动
    systemctl daemon-reload
    systemctl enable kafka_exporter.service
    systemctl stop kafka_exporter.service
    systemctl start kafka_exporter.service
    systemctl status kafka_exporter.service




    报警规则:

    cat kafka_prometheusRule.yaml 
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: k8s
        role: alert-rules
      name: kafka-prometheus-rules
      namespace: monitoring
    spec:
      groups:
      - name: kafka.rules
        rules:
        - alert: KafkaTopicsReplicas
          expr: sum(kafka_topic_partition_in_sync_replica) by (topic) < 1
          for: 1m
          labels:
            severity: critical
          annotations:
            title: 'Kafka topics replicas less than 3'
            description: "Topic: {{ $labels.topic }} partition less than 3, Current Value: {{ $value }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    
        - alert: KafkaConsumersGroupLag
          expr: sum(kafka_consumergroup_lag) by (consumergroup) > 50
          for: 1m
          labels:
            severity: critical
          annotations:
            title: 'Kafka consumers group 消费滞后'
            description: "Kafka consumers group 消费滞后 (Lag > 50), Lag值: {{ $value }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
            
        - alert: KafkaConsumersTopicLag
          expr: sum(kafka_consumergroup_lag) by (topic) > 50
          for: 1m
          labels:
            severity: critical
          annotations:
            title: 'Kafka Topic 消费滞后'
            description: "Kafka Topic 消费滞后 (Lag > 50), Lag值: {{ $value }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    

      

  • 相关阅读:
    data:image/png;base64
    需要去了解的知识
    【转】react的高阶组件
    几个css问题
    antd中form中resetFields清空输入框
    react中map循环中key取值问题
    react中父组件调用子组件的方法
    hive 初始化 时间问题 The server time zone value 'EDT' is unrecognized
    centos7安装MySQL8 无法修改密码 无法修改密码策略
    虚拟机 Linux 不能连 xshell 不能上网
  • 原文地址:https://www.cnblogs.com/weifeng1463/p/16877908.html
Copyright © 2020-2023  润新知