Using Prometheus as a monitor system, it is quite efficent. The most important one is that alert template is quite flexible,
I can alert the message with some other metrics value except the current metric value, it is quite convient. For example,
groups: - name: example rules: - alert: Load alert expr: node_load1 > 1 for: 5s labels: severity: page annotations: title: 'load1: {{ $value }}, load5: {{ printf `node_load5{instance="%s"}` $labels.instance | query | first | value }}, load15: {{ printf `node_load15{instance="%s"}` $labels.instance | query | first | value}}' summary: High load
After configuring alertmanager and adding webhook_configs, I can capture the result of alert as following:
{"receiver":"default","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"annotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"startsAt":"2018-07-15T22:59:09.508199934+08:00","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://bogon:9090/graph?g0.expr=node_load1+%3E+1u0026g0.tab=1"}],"groupLabels":{},"commonLabels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"commonAnnotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"externalURL":"http://bogon:9093","version":"4","groupKey":"{}:{}"}
We can get the values of load average in annotations:
load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375
Afert receiving the message, we know the detail of load average in a machine.