Skip to content

System Monitoring

Garot Conklin edited this page Feb 6, 2025 · 1 revision

System Monitoring Dashboard

Example configuration for a comprehensive system monitoring dashboard.

Basic Configuration

version: "1.0"
dashboards:
  - name: "System Monitoring"
    description: "System performance and health metrics"
    layout_type: "ordered"
    widgets:
      - title: "CPU Usage"
        type: "timeseries"
        query: "avg:system.cpu.user{*} by {host}"

      - title: "Memory Usage"
        type: "timeseries"
        query: "avg:system.mem.used{*} by {host}"

      - title: "Disk Usage"
        type: "query_value"
        query: "avg:system.disk.used{*} by {host}"

      - title: "Network Traffic"
        type: "timeseries"
        query: "avg:system.net.bytes_rcvd{*} by {host}"

Advanced Configuration

version: "1.0"
defaults:
  refresh_interval: 300
  tags:
    - "env:production"
    - "team:infrastructure"

dashboards:
  - name: "Advanced System Monitoring"
    description: "Detailed system metrics with alerts"
    layout_type: "ordered"
    template_variables:
      - name: "env"
        prefix: "env"
        default: "prod"
    widgets:
      - title: "CPU Utilization"
        type: "timeseries"
        query: "avg:system.cpu.user{env:$env} by {host}"
        conditional_formats:
          - comparator: ">"
            value: 80
            palette: "red"
          - comparator: ">"
            value: 60
            palette: "yellow"

      - title: "Memory Usage %"
        type: "query_value"
        query: "avg:system.mem.used{env:$env} by {host}"
        precision: 2

Widget Types

  1. Timeseries

    • Historical data visualization
    • Trend analysis
    • Multiple metrics comparison
  2. Query Value

    • Current value display
    • Threshold indicators
    • Status monitoring
  3. Heat Map

    • Resource utilization patterns
    • Performance bottlenecks
    • Anomaly detection

Best Practices

  • Use appropriate time windows
  • Set meaningful thresholds
  • Group related metrics
  • Include host-level granularity
  • Implement proper tagging

Related Resources

Clone this wiki locally