• 容器监控:cAdvisor


    为了能够获取到Docker容器的运行状态,用户可以通过Docker的stats命令获取到当前主机上运行容器的统计信息,可以查看容器的CPU利用率、内存使用量、网络IO总量以及磁盘IO总量等信息。

    除了使用命令以外,用户还可以通过Docker提供的HTTP API查看容器详细的监控统计信息。

    CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具。通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示。
    在本地运行CAdvisor也非常简单,直接运行一下命令即可:

    docker run 
      --volume=/:/rootfs:ro 
      --volume=/var/run:/var/run:rw 
      --volume=/sys:/sys:ro 
      --volume=/var/lib/docker/:/var/lib/docker:ro 
      --publish=8080:8080 
      --detach=true 
      --name=cadvisor 
      google/cadvisor:latest
    

    但是因为主机的8080端口被占用了,所以把上面的命令修改成如下的:

    docker run 
      --volume=/:/rootfs:ro 
      --volume=/var/run:/var/run:rw 
      --volume=/sys:/sys:ro 
      --volume=/var/lib/docker/:/var/lib/docker:ro 
      --publish=9095:9095 
      --detach=true 
      --name=cadvisor 
      google/cadvisor:latest
    

    但是启动后进行查看会有俩端口存在,一个时8080,另一个是9095.

    通过如下步骤登陆到docker容器中查看命令的选项,会有一个-port参数,并且官网中也有明确的说明:

    但是在使用的时候,却没法使用这个参数。

    因此放弃使用docker方式部署,改用二进制的方式。

    进入容器中查看命令选项

    # docker exec -it cadvisor /bin/sh
    / # cd /usr/bin/
    /usr/bin # ./cadvisor --help
    Usage of ./cadvisor:
      -allow_dynamic_housekeeping
            Whether to allow the housekeeping interval to be dynamic (default true)
      -alsologtostderr
            log to standard error as well as files
      -application_metrics_count_limit int
            Max number of application metrics to store (per container) (default 100)
      -boot_id_file string
            Comma-separated list of files to check for boot-id. Use the first one that exists. (default "/proc/sys/kernel/random/boot_id")
      -bq_account string
            Service account email
      -bq_credentials_file string
            Credential Key file (pem)
      -bq_id string
            Client ID
      -bq_project_id string
            Bigquery project ID
      -bq_secret string
            Client Secret (default "notasecret")
      -collector_cert string
            Collector's certificate, exposed to endpoints for certificate based authentication.
      -collector_key string
            Key for the collector's certificate
      -container_hints string
            location of the container hints file (default "/etc/cadvisor/container_hints.json")
      -containerd string
            containerd endpoint (default "unix:///var/run/containerd.sock")
      -disable_metrics metrics
            comma-separated list of metrics to be disabled. Options are 'disk', 'network', 'tcp', 'udp', 'percpu', 'sched', 'process'. Note: tcp and udp are disabled by default due to high CPU usage. (default process,tcp,udp,sched)
      -docker string
            docker endpoint (default "unix:///var/run/docker.sock")
      -docker-tls
            use TLS to connect to docker
      -docker-tls-ca string
            path to trusted CA (default "ca.pem")
      -docker-tls-cert string
            path to client certificate (default "cert.pem")
      -docker-tls-key string
            path to private key (default "key.pem")
      -docker_env_metadata_whitelist string
            a comma-separated list of environment variable keys that needs to be collected for docker containers
      -docker_only
            Only report docker containers in addition to root stats
      -docker_root string
            DEPRECATED: docker root is read from docker info (this is a fallback, default: /var/lib/docker) (default "/var/lib/docker")
      -enable_load_reader
            Whether to enable cpu load reader
      -event_storage_age_limit string
            Max length of time for which to store events (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is a duration. Default is applied to all non-specified event types (default "default=24h")
      -event_storage_event_limit string
            Max number of events to store (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is an integer. Default is applied to all non-specified event types (default "default=100000")
      -global_housekeeping_interval duration
            Interval between global housekeepings (default 1m0s)
      -housekeeping_interval duration
            Interval between container housekeepings (default 1s)
      -http_auth_file string
            HTTP auth file for the web UI
      -http_auth_realm string
            HTTP auth realm for the web UI (default "localhost")
      -http_digest_file string
            HTTP digest file for the web UI
      -http_digest_realm string
            HTTP digest file for the web UI (default "localhost")
      -listen_ip string
            IP to listen on, defaults to all IPs
      -log_backtrace_at value
            when logging hits line file:N, emit a stack trace
      -log_cadvisor_usage
            Whether to log the usage of the cAdvisor container
      -log_dir string
            If non-empty, write log files in this directory
      -log_file string
            If non-empty, use this log file
      -logtostderr
            log to standard error instead of files
      -machine_id_file string
            Comma-separated list of files to check for machine-id. Use the first one that exists. (default "/etc/machine-id,/var/lib/dbus/machine-id")
      -max_housekeeping_interval duration
            Largest interval to allow between container housekeepings (default 1m0s)
      -max_procs int
            max number of CPUs that can be used simultaneously. Less than 1 for default (number of cores).
      -mesos_agent string
            Mesos agent address (default "127.0.0.1:5051")
      -mesos_agent_timeout duration
            Mesos agent timeout (default 10s)
      -port int
            port to listen (default 8080)
      -profiling
            Enable profiling via web interface host:port/debug/pprof/
      -prometheus_endpoint string
            Endpoint to expose Prometheus metrics on (default "/metrics")
      -skip_headers
            If true, avoid header prefixes in the log messages
      -stderrthreshold value
            logs at or above this threshold go to stderr (default 2)
      -storage_driver driver
            Storage driver to use. Data is always cached shortly in memory, this controls where data is pushed besides the local cache. Empty means none. Options are: <empty>, bigquery, elasticsearch, influxdb, kafka, redis, statsd, stdout
      -storage_driver_buffer_duration duration
            Writes in the storage driver will be buffered for this duration, and committed to the non memory backends as a single transaction (default 1m0s)
      -storage_driver_db string
            database name (default "cadvisor")
      -storage_driver_es_enable_sniffer
            ElasticSearch uses a sniffing process to find all nodes of your cluster by default, automatically
      -storage_driver_es_host string
            ElasticSearch host:port (default "http://localhost:9200")
      -storage_driver_es_index string
            ElasticSearch index name (default "cadvisor")
      -storage_driver_es_type string
            ElasticSearch type name (default "stats")
      -storage_driver_host string
            database host:port (default "localhost:8086")
      -storage_driver_influxdb_retention_policy string
            retention policy
      -storage_driver_kafka_broker_list string
            kafka broker(s) csv (default "localhost:9092")
      -storage_driver_kafka_ssl_ca string
            optional certificate authority file for TLS client authentication
      -storage_driver_kafka_ssl_cert string
            optional certificate file for TLS client authentication
      -storage_driver_kafka_ssl_key string
            optional key file for TLS client authentication
      -storage_driver_kafka_ssl_verify
            verify ssl certificate chain (default true)
      -storage_driver_kafka_topic string
            kafka topic (default "stats")
      -storage_driver_password string
            database password (default "root")
      -storage_driver_secure
            use secure connection with database
      -storage_driver_table string
            table name (default "stats")
      -storage_driver_user string
            database username (default "root")
      -storage_duration duration
            How long to keep data stored (Default: 2min). (default 2m0s)
      -store_container_labels
            convert container labels and environment variables into labels on prometheus metrics for each container. If flag set to false, then only metrics exported are container name, first alias, and image name (default true)
      -v value
            log level for V logs
      -version
            print cAdvisor version and exit
      -vmodule value
            comma-separated list of pattern=N settings for file-filtered logging
    

    使用二进制方式部署

    cd /home/cadvisor-0.37.0
    wget https://github.com/google/cadvisor/releases/download/v0.37.0/cadvisor
    # 普通本地运行:./cadvisor  -port=8080 &>>/var/log/cadvisor.log
    

    使用service服务管理程序

    # chown -R prometheus:prometheus /home/cadvisor-0.37.0
    # chmod -R 777 /home/cadvisor-0.37.0   #防止因为selinux出现这个启动错误:Failed at step EXEC spawning /home/cadvisor-0.37.0/cadvisor: Permission denied
    
    # vim /usr/lib/systemd/system/cadvisor.service
    [Unit]
    Description=cadvisor
    Documentation=https://github.com/google/cadvisor/tree/master/docs
    After=network.target
    
    [Service]
    Type=simple
    User=prometheus
    ExecStart=/home/cadvisor-0.37.0/cadvisor -port 9096
    Restart=on-failure
    
    [Install]
    WantedBy=multi-user.target
    

    通过访问http://localhost:9096可以查看,当前主机上容器的运行状态,如下所示:

    下面表格中列举了一些CAdvisor中获取到的典型监控指标:

    指标名称 类型 含义
    container_cpu_load_average_10s gauge 过去10秒容器CPU的平均负载
    container_cpu_usage_seconds_total counter 容器在每个CPU内核上的累积占用时间 (单位:秒)
    container_cpu_system_seconds_total counter System CPU累积占用时间(单位:秒)
    container_cpu_user_seconds_total counter User CPU累积占用时间(单位:秒)
    container_fs_usage_bytes gauge 容器中文件系统的使用量(单位:字节)
    container_fs_limit_bytes gauge 容器可以使用的文件系统总量(单位:字节)
    container_fs_reads_bytes_total counter 容器累积读取数据的总量(单位:字节)
    container_fs_writes_bytes_total counter 容器累积写入数据的总量(单位:字节)
    container_memory_max_usage_bytes gauge 容器的最大内存使用量(单位:字节)
    container_memory_usage_bytes gauge 容器当前的内存使用量(单位:字节
    container_spec_memory_limit_bytes gauge 容器的内存使用量限制
    machine_memory_bytes gauge 当前主机的内存总量
    container_network_receive_bytes_total counter 容器网络累积接收数据总量(单位:字节)
    container_network_transmit_bytes_total counter 容器网络累积传输数据总量(单位:字节)

    与Prometheus集成

    修改/etc/prometheus/prometheus.yml,将cAdvisor添加监控数据采集任务目标当中:

    - job_name: cadvisor
      static_configs:
      - targets:
        - localhost:9096
    

    重启Prometheus服务,查看

  • 相关阅读:
    docker node中uid与gid的授权问题
    windows下docker无法进行端口映射的问题
    IOS/Safari下document对象的scrollHeight值比Chrome更大
    Vue/Egg大型项目开发(二)数据库设计
    .babelrc和babel.config.js的相同配置不能合并
    es6 class中责任链模式与AOP结合
    JS设计模式(10)职责链模式(重要)
    Vue/Egg大型项目开发(一)搭建项目
    你不知道的JS(3)来聊聊this
    CentOS7为php7.2安装php-redis扩展(redis环境搭建二)
  • 原文地址:https://www.cnblogs.com/sanduzxcvbnm/p/13597205.html
Copyright © 2020-2023  润新知