prometheus集群方案-thanos

场景：

随着监控数据的增长，单个prometheus采集数据性能无法满足，即使100G+内存，也会出现OOM现象。

解决思路：

1.减少prometheus驻留内存的数据量，将数据持久化到tsdb或对象存储；

2.根据业务切割成多个prometheus，分模块存储数据。若需要进行多个promenade之间的汇聚，利用thanos的query实现。

搭建thanos前提假设：

1.已经安装docker和docker compose（本例子通过docker-compose进行安装部署）

2.通过2个prometheus验证thanos的可用性

安装步骤：

#定义2个prometheus的存储路径

mkdir -p /home/dockerdata/prometheus   

mkdir -p /home/dockerdata/prometheus2

#定义minio（用于对象存储）和docker-compose的路径

mkdir -p /home/dockerfile/thanos
mkdir -p /home/minio/data

2.minio配置文件（位于/home/dockerfile/thanos/bucket_config.yaml）

type: S3
config:
  bucket: "thanos"
  endpoint: 'minio:9000'
  access_key: "danny"
  insecure: true  #是否使用安全协议http或https
  signature_version2: false
  encrypt_sse: false
  secret_key: "xxxxxxxx" #设置s3密码，保证8位以上的长度
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: false
  part_size: 134217728

3.prometheus配置文件（位于/home/dockerfile/thanos/prometheus.yml和/home/dockerfile/thanos/prometheus2.yml）。有2个prometheus配置文件主要是区分端口和extends label。必须定义extends label，用于区分相同metrics的数据源

prometheus：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'danny-ecs'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9090']
    - targets: ['node-exporter:9100']

prometheus2.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'prometheus2'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9091']
    - targets: ['node-exporter:9100']

4.配置Dockerfile（docker-compose用到的，不确定是否必须。位于：/home/dockerfile/thanos）

FROM quay.io/prometheus/busybox:latest
LABEL maintainer="danny"

COPY /thanos_tmp_for_docker /bin/thanos

ENTRYPOINT [ "/bin/thanos" ]

5.docker-compose配置文件定义（基本所有内容都在这里了）

version: '2'
services:
  prometheus1:
    container_name: prometheus1
    image: prom/prometheus
    ports: 
      - 9090:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/prometheus
    - /home/dockerfile/thanos/prometheus.yml:/etc/prometheus/prometheus.yml  
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m # small just to not wait hours to test :)
      - --storage.tsdb.max-block-duration=30m # small just to not wait hours to test :)
    depends_on:
    - minio

  sidecar1:
    container_name: sidecar1
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus1:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus1

  prometheus2:
    container_name: prometheus2
    image: prom/prometheus
    ports:
      - 9091:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/prometheus
    - /home/dockerfile/thanos/prometheus2.yml:/etc/prometheus/prometheus.yml
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m 
      - --storage.tsdb.max-block-duration=30m 
    depends_on:
    - minio

  sidecar2:
    container_name: sidecar2
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus2:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus2

  grafana:
    container_name: grafana
    image: grafana/grafana
    ports: 
    - "3000:3000"

  # to search on old metrics
  storer:
    container_name: storer
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - store
    - --data-dir=/var/thanos/store
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio  

  # downsample metrics on the bucket
  compactor:
    container_name: compactor
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - compact
    - --data-dir=/var/thanos/compact
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --wait
    depends_on:
    - minio 

  # querier component which can be scaled
  querier:
    container_name: querier
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    labels:
    - "traefik.enable=true"
    - "traefik.port=19192"
    - "traefik.frontend.rule=PathPrefix:/"
    ports: 
    - "19192:19192"
    command:
    - query
    - --http-address=0.0.0.0:19192
    - --store=sidecar1:19090
    - --store=sidecar2:19090
    - --store=storer:19090
    - --query.replica-label=replica

  minio:
    image: minio/minio:latest
    container_name: minio
    ports:
      - 9000:9000
    volumes: 
      - "/home/minio/data:/data"
    environment:
      MINIO_ACCESS_KEY: "danny"
      MINIO_SECRET_KEY: "xxxxxxxx" #输入8位以上的密码
    command: server /data
    restart: always
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - '9100:9100'

6.启动。第一次up的时候，storer会启动失败，需要单独重启一次storer

cd /home/dockerfile/thanos
docker-compose up -d
docker-compose up -d storer

7.验证

docker ps -a

证明所有容器已经启动成功

8.promenade 的block数据成功上传到minio验证

访问地址：http://ip:9000/minio/login

账号密码在bucket_config.yaml中定义的

创建bucket：thanos

如果正常运行，block数据会成功上传到thanos这个bucket（即sidecar组建安装成功）：

9.验证query组建和store组建是否安装成功。

访问query页面（跟promenade的页面基本一致）

http://ip:19192

输入metrices：node_uname_info

安装部署参考链接：https://www.cnblogs.com/rongfengliang/p/11319933.html

对thanos架构理解参考链接：http://www.dockone.io/article/10035

相关阅读:
CodeIgniter自定义配置文件
 js中opener和parent的区别
 更改Apache默认起始(索引)页面：DirectoryIndex
基于知识管理的协同办公解决方案
 奥远新思创实用型办公自动化解决方案[1]
OA与公文交换平台的接口解决方案
 金思维OA解决方案
 致力协同电力行业OA办公自动化解决方案[1]
OA与公文交换平台的接口解决方案
 万户OA助力红豆集团信息化建设方案
原文地址：https://www.cnblogs.com/danny-djy/p/13230529.html