• 基于grafana+prometheus构建Flink监控


    先上一个架构图

    Flink App : 通过report 将数据发出去

    Pushgateway :  Prometheus 生态中一个重要工具

    Prometheus :  一套开源的系统监控报警框架 (Prometheus 入门与实践

    Grafana: 一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知(可视化工具Grafana:简介及安装

    Node_exporter : 跟Pushgateway一样是Prometheus 的组件,采集到主机的运行指标如CPU, 内存,磁盘等信息

    以下安装,大部分参考博客: https://www.cnblogs.com/xiao987334176/p/9930517.html#autoid-0-0-0

    1、docker  pull 镜像

    docker pull prom/node-exporter
    docker pull prom/pushgateway
    docker pull prom/prometheus
    docker pull grafana/grafana

    查看下载的镜像

    $ docker images
    REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
    prom/prometheus      latest              d5b9d7ed160a        2 weeks ago         138MB
    grafana/grafana      latest              a6e14b4109af        2 weeks ago         253MB
    prom/pushgateway     latest              20e6dcae675f        4 weeks ago         19.2MB
    prom/node-exporter   latest              e5a616e4b9cf        2 months ago        22.9MB

    2、编辑prometheus.yml 、创建 Grafana 数据存储目录

    $ mkdir /opt/grafana-storage  # grafana 数据存储目录

    $ cat /opt/prometheus/prometheus.yml # prometheus 配置
    global:
      scrape_interval:     60s
      evaluation_interval: 60s
     
    scrape_configs:
      - job_name: prometheus
        static_configs:
          - targets: ['localhost:9090']
            labels:
              instance: prometheus
     
      - job_name: linux
        static_configs:
          - targets: ['venn:9100']
            labels:
              instance: localhost
      - job_name: 'pushgateway'
        static_configs:
          - targets: ['venn:9091']
            labels:
              instance: 'pushgateway'

    3、启动各个组件

    docker run -d -p 3000:3000   --name=grafana   -v /opt/grafana-storage:/var/lib/grafana   grafana/grafana
    docker run -d -p 9100:9100  -v "/proc:/host/proc:ro"  -v "/sys:/host/sys:ro"  -v "/:/rootfs:ro"  --net="host"  prom/node-exporter
    docker run -d -p 9090:9090  -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml  prom/prometheus
    docker run -d -p 9091:9091 prom/pushgateway

    查看docker进程

    $ docker ps
    CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                    NAMES
    4a689cf48e10        prom/pushgateway     "/bin/pushgateway"       5 days ago          Up 5 days           0.0.0.0:9091->9091/tcp   infallible_goldstine
    fcc40433bf75        grafana/grafana      "/run.sh"                5 days ago          Up 5 days           0.0.0.0:3000->3000/tcp   grafana
    8ba942d0cf35        prom/prometheus      "/bin/prometheus --c…"   5 days ago          Up 5 days           0.0.0.0:9090->9090/tcp   quizzical_colden
    b84b0f4be2b2        prom/node-exporter   "/bin/node_exporter"     5 days ago          Up 5 days                                    fervent_poitras

    查看端口

    $ netstat -apn | grep -E '9091|3000|9090|9100'
    (Not all processes could be identified, non-owned process info
     will not be shown, you would have to be root to see it all.)
    tcp        0      0 172.17.0.1:39028        172.17.0.4:9091         ESTABLISHED -                   
    tcp6       0      0 :::9100                 :::*                    LISTEN      -                   
    tcp6       0      0 :::3000                 :::*                    LISTEN      -                   
    tcp6       0      0 :::9090                 :::*                    LISTEN      -                   
    tcp6       0      0 :::9091                 :::*                    LISTEN      -                   
    tcp6       0      0 192.168.229.129:45864   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45856   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45824   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45874   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45854   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45836   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45814   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.128:9100    192.168.229.1:13405     ESTABLISHED -                   
    tcp6       0      0 192.168.229.129:45826   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45844   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.128:9091    172.17.0.2:53930        ESTABLISHED -                   
    tcp6       0      0 192.168.229.129:45846   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.128:9100    172.17.0.2:54776        ESTABLISHED -                   
    tcp6       0      0 192.168.229.129:45816   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45876   192.168.229.128:9091    ESTABLISHED 40846/java          
    tcp6       0      0 192.168.229.129:45834   192.168.229.128:9091    TIME_WAIT   -                   
    tcp6       0      0 192.168.229.129:45866   192.168.229.128:9091    TIME_WAIT   -   

    4、查看组件页面

    node_exporter:  ip:9100/metrics

    查看 prometheus: ip:9090/targets

    如果state 不是 UP 的,等一会就起来了 

    查看Grafana: 

     

      默认用户名密码 : amin/admin

    此处不再赘述,配置数据源、创建系统负载监控参考博客:https://www.cnblogs.com/xiao987334176/p/9930517.html#autoid-0-0-0 

    5、配置Flink report :

    在Flink 配置文件 flink-conf.yml 中添加如下内容:

    ##metrics
    metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
    metrics.reporter.promgateway.host: venn
    metrics.reporter.promgateway.port: 9091
    metrics.reporter.promgateway.jobName: myJob
    metrics.reporter.promgateway.randomJobNameSuffix: true
    metrics.reporter.promgateway.deleteOnShutdown: false

    启动一个任务(上一篇博客的案例迟到数据处理):

    flink run -m yarn-cluster -ynm LateDataProcess -yn 1 -c com.venn.stream.api.sideoutput.lateDataProcess.LateDataProcess jar/flinkDemo-1.0.jar

    查看任务webUI:

    PS:任务已经跑了一段时间了

    6、Grafana 中配置Flink监控

    由于上面一句配置好Flink report、 pushgateway、prometheus,并且在Grafana中已经添加了prometheus 数据源,所以Grafana中会自动获取到 flink job的metrics 。

     Grafana 首页,点击New dashboard,创建一个新的dashboard

    选中之后,即会出现对应的监控指标

    至此,Flink 的metrics 的指标展示在Grafana 中了

    flink 指标对应的指标名比较长,可以在Legend 中配置显示内容,在{{key}} 将key换成对应需要展示的字段即可,如: {{job_name}},{{operator_name}}

    对应显示如下:

    保存,搞定

  • 相关阅读:
    hive实战记录
    并发
    软件安装
    idea
    maven
    thingkinginjava
    JVM
    并发模块concurrent
    ffmpeg_分割一个mp4文件到多个小的mp4文件
    用 ffmpeg 压缩视频
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/11445023.html
Copyright © 2020-2023  润新知