• prometheus集群方案-thanos


     场景:

    随着监控数据的增长,单个prometheus采集数据性能无法满足,即使100G+内存,也会出现OOM现象。

    解决思路:

    1.减少prometheus驻留内存的数据量,将数据持久化到tsdb或对象存储;

    2.根据业务切割成多个prometheus,分模块存储数据。若需要进行多个promenade之间的汇聚,利用thanos的query实现。

    搭建thanos前提假设:

    1.已经安装docker和docker compose(本例子通过docker-compose进行安装部署)

    2.通过2个prometheus验证thanos的可用性

    安装步骤:

    1.

    #定义2个prometheus的存储路径

    mkdir -p /home/dockerdata/prometheus   
    
    mkdir -p /home/dockerdata/prometheus2

    #定义minio(用于对象存储)和docker-compose的路径

    mkdir -p /home/dockerfile/thanos
    mkdir -p /home/minio/data

    2.minio配置文件(位于/home/dockerfile/thanos/bucket_config.yaml)

    type: S3
    config:
      bucket: "thanos"
      endpoint: 'minio:9000'
      access_key: "danny"
      insecure: true  #是否使用安全协议http或https
      signature_version2: false
      encrypt_sse: false
      secret_key: "xxxxxxxx" #设置s3密码,保证8位以上的长度
      put_user_metadata: {}
      http_config:
        idle_conn_timeout: 90s
        response_header_timeout: 2m
        insecure_skip_verify: false
      trace:
        enable: false
      part_size: 134217728

    3.prometheus配置文件(位于/home/dockerfile/thanos/prometheus.yml和/home/dockerfile/thanos/prometheus2.yml)。有2个prometheus配置文件主要是区分端口和extends label。必须定义extends label,用于区分相同metrics的数据源

    prometheus:

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
      external_labels:
        monitor: 'danny-ecs'
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        #- targets: ['localhost:9090']
        - targets: ['node-exporter:9100']

    prometheus2.yml

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
      external_labels:
        monitor: 'prometheus2'
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        #- targets: ['localhost:9091']
        - targets: ['node-exporter:9100']

    4.配置Dockerfile(docker-compose用到的,不确定是否必须。位于:/home/dockerfile/thanos)

    FROM quay.io/prometheus/busybox:latest
    LABEL maintainer="danny"
    
    COPY /thanos_tmp_for_docker /bin/thanos
    
    ENTRYPOINT [ "/bin/thanos" ]

    5.docker-compose配置文件定义(基本所有内容都在这里了)

    version: '2'
    services:
      prometheus1:
        container_name: prometheus1
        image: prom/prometheus
        ports: 
          - 9090:9090
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus:/prometheus
        - /home/dockerfile/thanos/prometheus.yml:/etc/prometheus/prometheus.yml  
        command:
          - --web.enable-lifecycle
          - --web.enable-admin-api
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --web.console.libraries=/usr/share/prometheus/console_libraries
          - --web.console.templates=/usr/share/prometheus/consoles
          - --storage.tsdb.min-block-duration=30m # small just to not wait hours to test :)
          - --storage.tsdb.max-block-duration=30m # small just to not wait hours to test :)
        depends_on:
        - minio
    
      sidecar1:
        container_name: sidecar1
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus:/var/prometheus
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - sidecar
        - --tsdb.path=/var/prometheus
        - --prometheus.url=http://prometheus1:9090
        - --objstore.config-file=/bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio
        - prometheus1
    
      prometheus2:
        container_name: prometheus2
        image: prom/prometheus
        ports:
          - 9091:9090
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus2:/prometheus
        - /home/dockerfile/thanos/prometheus2.yml:/etc/prometheus/prometheus.yml
        command:
          - --web.enable-lifecycle
          - --web.enable-admin-api
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --web.console.libraries=/usr/share/prometheus/console_libraries
          - --web.console.templates=/usr/share/prometheus/consoles
          - --storage.tsdb.min-block-duration=30m 
          - --storage.tsdb.max-block-duration=30m 
        depends_on:
        - minio
    
      sidecar2:
        container_name: sidecar2
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus2:/var/prometheus
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - sidecar
        - --tsdb.path=/var/prometheus
        - --prometheus.url=http://prometheus2:9090
        - --objstore.config-file=/bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio
        - prometheus2
    
      grafana:
        container_name: grafana
        image: grafana/grafana
        ports: 
        - "3000:3000"
    
      # to search on old metrics
      storer:
        container_name: storer
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        volumes:
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - store
        - --data-dir=/var/thanos/store
        - --objstore.config-file=bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio  
    
      # downsample metrics on the bucket
      compactor:
        container_name: compactor
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        volumes:
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - compact
        - --data-dir=/var/thanos/compact
        - --objstore.config-file=bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --wait
        depends_on:
        - minio 
    
      # querier component which can be scaled
      querier:
        container_name: querier
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        labels:
        - "traefik.enable=true"
        - "traefik.port=19192"
        - "traefik.frontend.rule=PathPrefix:/"
        ports: 
        - "19192:19192"
        command:
        - query
        - --http-address=0.0.0.0:19192
        - --store=sidecar1:19090
        - --store=sidecar2:19090
        - --store=storer:19090
        - --query.replica-label=replica
    
      minio:
        image: minio/minio:latest
        container_name: minio
        ports:
          - 9000:9000
        volumes: 
          - "/home/minio/data:/data"
        environment:
          MINIO_ACCESS_KEY: "danny"
          MINIO_SECRET_KEY: "xxxxxxxx" #输入8位以上的密码
        command: server /data
        restart: always
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
    
      node-exporter:
        image: prom/node-exporter:latest
        container_name: node-exporter
        ports:
          - '9100:9100'

    6.启动。第一次up的时候,storer会启动失败,需要单独重启一次storer

    cd /home/dockerfile/thanos
    docker-compose up -d
    docker-compose up -d storer

    7.验证

    docker ps -a

    证明所有容器已经启动成功

    8.promenade 的block数据成功上传到minio验证

    访问地址:http://ip:9000/minio/login

    账号密码在bucket_config.yaml中定义的

    创建bucket:thanos

    如果正常运行,block数据会成功上传到thanos这个bucket(即sidecar组建安装成功):

    9.验证query组建和store组建是否安装成功。

    访问query页面(跟promenade的页面基本一致)

    http://ip:19192

    输入metrices:node_uname_info

    安装部署参考链接:https://www.cnblogs.com/rongfengliang/p/11319933.html

    对thanos架构理解参考链接:http://www.dockone.io/article/10035

  • 相关阅读:
    vim初试(Hello World)
    CSP201809-2 买菜(超简单的方法!!)
    CSP202006-2 稀疏向量
    CSP202012-2 期末预测之最佳阈值
    浮点数表示
    结构体
    全排列-康托展开及逆展开
    CA-031 上手Games101环境 Games101环境怎么配置
    计算机图形学 实验四 AET算法
    计算机图形学 实验三 梁氏裁剪算法
  • 原文地址:https://www.cnblogs.com/danny-djy/p/13230529.html
Copyright © 2020-2023  润新知