一,物理节点安装配置(简单配置,未涉及报警及grafana图形展示)
1,prometheus 官网下载安装
下载安装
# pwd
/usr/local/src
https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz
# tar xvf prometheus-2.11.1.linux-amd64.tar.gz
# ln -sv /usr/local/src/prometheus-2.11.1.linux-amd64 /usr/local/prometheus
# cd /usr/local/prometheus
服务启动脚本
# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --
config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
配置所监控的node
cd /usr/local/prometheus
# grep -v "#" prometheus.yml | grep -v "^$"
global:
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'promethues-node'
static_configs:
- targets: ['192.168.7.110:9100','192.168.7.111:9100']
修改配置文件后需要重启服务
启动
# systemctl daemon-reload
# systemctl restart prometheus
# systemctl enable prometheus
查看端口是否监听正常
2,节点安装
# pwd
/usr/local/src
https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
# tar xvf node_exporter-0.18.1.linux-amd64.tar.gz
# ln -sv /usr/local/src/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
# cd /usr/local/node_exporter
启动脚本
# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
启动
# systemctl daemon-reload
# systemctl restart node-exporter
# systemctl enable node-exporter
查看端口是否监听正常,关闭防火墙和selinxu
3,监控k8s
参考https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm
起gpu特定容器做监控
监控进程
https://www.cnblogs.com/bigberg/p/10174222.html
监控端口
https://my.oschina.net/54188zz/blog/3147626
blackbox_exporter是Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集