Rules

ansible managed alert rules

9.692s ago

1.783ms

Rule State Error Last Evaluation Evaluation Time
alert: Watchdog expr: vector(1) for: 10m labels: severity: warning annotations: description: This is an alert meant to ensure that the entire alerting pipeline is functional. This alert is always firing, therefore it should always be firing in Alertmanager and always fire against a receiver. There are integrations with various notification mechanisms that send a notification when this alert is not firing. For example the "DeadMansSnitch" integration in PagerDuty. summary: Ensure entire alerting pipeline is functional ok 9.693s ago 452.5us
alert: InstanceDown expr: up == 0 for: 5m labels: severity: critical annotations: description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' summary: Instance {{ $labels.instance }} down ok 9.693s ago 234.5us
alert: CriticalCPULoad expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[5m])) * 100) > 96 for: 2m labels: severity: critical annotations: description: '{{ $labels.instance }} of job {{ $labels.job }} has Critical CPU load for more than 2 minutes.' summary: Instance {{ $labels.instance }} - Critical CPU load ok 9.694s ago 255.3us
alert: CriticalRAMUsage expr: (1 - ((node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes)) * 100 > 98 for: 5m labels: severity: critical annotations: description: '{{ $labels.instance }} has Critical Memory Usage more than 5 minutes.' summary: Instance {{ $labels.instance }} has Critical Memory Usage ok 9.694s ago 261.3us
alert: CriticalDiskSpace expr: node_filesystem_free_bytes{fstype!~"(squashfs|fuse.*)",job="node",mountpoint!~"^/run(/.*|$)"} / node_filesystem_size_bytes{job="node"} < 0.1 for: 4m labels: severity: critical annotations: description: '{{ $labels.instance }} of job {{ $labels.job }} has less than 10% space remaining.' summary: Instance {{ $labels.instance }} - Critical disk space usage ok 9.694s ago 385.2us
alert: RebootRequired expr: node_reboot_required > 0 labels: severity: warning annotations: description: '{{ $labels.instance }} requires a reboot.' summary: Instance {{ $labels.instance }} - reboot required ok 9.694s ago 54.28us
alert: ClockSkewDetected expr: abs(node_timex_offset_seconds) * 1000 > 30 for: 2m labels: severity: warning annotations: description: Clock skew detected on {{ $labels.instance }}. Ensure NTP is configured correctly on this host. summary: Instance {{ $labels.instance }} - Clock skew detected ok 9.694s ago 119.1us