• Prometheus+Grafana+Altermanager监控告警(六)


    redis-exporter

    集群监控配置参考:https://www.cnblogs.com/fsckzy/p/12053604.html

    安装:

    wget https://github.com/oliver006/redis_exporter/releases/download/v0.15.0/redis_exporter-v0.15.0.linux-amd64.tar.gz
    tar -zxvf redis_exporter-v0.15.0.linux-amd64.tar.gz -C /usr/local/

    启动命令:

    nohup /usr/local/redis_exporter -web.listen-address=:9121 -redis.addr 10.3.225.66:6379 -redis.password xxx > /dev/null 2>&1 &

    k8s配置redis自动发现,依托于前面的node自动发现

    [root@gw_qa_250_10 test]# ls
    host_discovery_data  main.py
    [root@gw_qa_250_10 test]# ls host_discovery_data/
    redis-server.list
    [root@gw_qa_250_10 test]# cat host_discovery_data/redis-server.list 
    10.3.225.66     redis03-066
    [root@gw_qa_250_10 test]# python main.py 
    ^CTraceback (most recent call last):
      File "main.py", line 67, in <module>
        run()
      File "main.py", line 64, in run
        time.sleep(10)
    KeyboardInterrupt
    [root@gw_qa_250_10 test]# cat host_discovery_data/redis-server.json 
    [{"labels": {"cluster": "redis-server", "alias": "redis03-066", "instance": "10.3.225.66"}, "targets": ["10.3.225.66:9121"]}][root@gw_qa_250_10 test]# 

    redis-server.list是配置redis服务列表,main.py将列表转化为Prometheus能认的json

    main.py

    # -*- coding: utf-8 -*-
    import os
    import logging
    import json
    import time
    
    
    def log_level(level):
    
        if level == "DEBUG":
            logging.basicConfig(level=logging.DEBUG,
                                format='%(asctime)s - %(levelname)s %(message)s',
                                datefmt='%Y-%m-%d,%H:%M:%S',
                                )
            logging.info("log_level:%s", log_level)
            logging.info("Debug mode")
        else:
            logging.basicConfig(level=logging.INFO,
                                format='%(asctime)s - %(levelname)s %(message)s',
                                datefmt='%Y-%m-%d,%H:%M:%S',
                                )
    
    
    def read_list_file():
        dirnames = 'host_discovery_data'
        read_file_names = [name for name in os.listdir(dirnames)
                if name.endswith('.list')]
        for read_file_name in read_file_names:
    
            fo = open(dirnames + '/' + read_file_name, "r")
            node_dict_list = list()
            for fo_line in fo.readlines():
                fo_line = fo_line.rstrip("
    ")
                # for fo_line_element in fo_line.split():
                #     logging.debug("file_name: %s - fo_line_element: %s", read_file_name, fo_line_element)
                node_ip = fo_line.split()[0]
                node_alias = fo_line.split()[1]
                # logging.debug("node_ip: %s - node_alias: %s", node_ip, node_alias)
    
                node_dict = dict()
                node_labels_dict = dict()
                node_labels_dict['cluster'] = read_file_name.replace('.list', '')
                node_dict['targets'] = [node_ip + ':9121']
                node_labels_dict['instance'] = node_ip
                node_labels_dict['alias'] = node_alias
                node_dict['labels'] = node_labels_dict
                node_dict_list.append(node_dict)
                # logging.debug("node_dict_list: %s - type %s", node_dict_list, type(node_dict_list))
            node_json = json.dumps(node_dict_list)
            logging.debug("node_json: %s - type %s", node_json, type(node_json))
            # 将json 写入文件
            node_write_filename = dirnames + '/' + read_file_name.replace('.list', '') + '.json'
            fw = open(node_write_filename, "w")
            fw.write(node_json)
    
    
    def run():
        # 设置LOG 级别
        level = "INFO"
        log_level(level)
        # 持续运行
        while True:
            read_list_file()
            time.sleep(10)
    
    if __name__ == '__main__':
        run()
    View Code

    监控规则redis.rules:

    groups:
    - name:  Redis.rules
      rules: 
        - alert: RedisDown
          expr: redis_up  == 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Redis down (instance {{ $labels.instance }})"
            description: "Redis 挂了啊,mmp
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"     
        - alert: ReplicationBroken
          expr: delta(redis_connected_slaves[1m]) < 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Replication broken (instance {{ $labels.instance }})"
            description: "Redis instance lost a slave
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
        - alert: TooManyConnections
          expr: redis_connected_clients > 1000
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Too many connections (instance {{ $labels.instance }})"
            description: "Redis instance has too many connections
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"       
        - alert: RejectedConnections
          expr: increase(redis_rejected_connections_total[1m]) > 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Rejected connections (instance {{ $labels.instance }})"
            description: "Some connections to Redis has been rejected
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
    View Code

    dashboard导入参考:

    https://www.cnblogs.com/zeq912/p/10614019.html

  • 相关阅读:
    ios常用方法
    XMPP
    ios ebooks
    uinavigationcontroller swipe back
    navigationController and ToolBar
    EMC VNX5200/5400存储 新增LUN与Hosts映射操作
    H3C交换机telnet服务认证模式配置
    配置H3C交换机ftp服务
    克隆CentOS 6.9 配置静态IP,重启网络服务时报错"eth0 does not seem to be present"
    BFS解决八数码问题和狼人过河问题
  • 原文地址:https://www.cnblogs.com/litzhiai/p/15438038.html
Copyright © 2020-2023  润新知