Skip to content

Troubleshooting

Garot Conklin edited this page Feb 6, 2025 · 1 revision

Troubleshooting Guide

This guide helps you diagnose and resolve common issues when using the DataDog Dashboard Deployer.

Common Issues

1. Authentication Errors

Error: Unauthorized. Please check your API/Application keys

Solution:

  1. Verify environment variables are set correctly:
    echo $DATADOG_API_KEY
    echo $DATADOG_APP_KEY
  2. Check key permissions in DataDog
  3. Ensure keys are valid and not expired
  4. Verify correct API endpoint for your region

2. Configuration Validation Errors

Error: Invalid configuration in dashboard.yaml

Common Causes:

  1. Invalid YAML syntax

    # Invalid
    dashboards:
      name: "My Dashboard"  # Missing hyphen for list
    
    # Valid
    dashboards:
      - name: "My Dashboard"
  2. Missing required fields

    # Invalid - missing required fields
    dashboards:
      - name: "My Dashboard"
    
    # Valid
    dashboards:
      - name: "My Dashboard"
        description: "System metrics dashboard"
        widgets: []
  3. Invalid widget configuration

    # Invalid widget type
    widgets:
      - title: "CPU Usage"
        type: "unknown_type"
    
    # Valid widget type
    widgets:
      - title: "CPU Usage"
        type: "timeseries"

3. Deployment Failures

Error: Failed to deploy dashboard "System Overview"

Troubleshooting Steps:

  1. Run with debug logging:

    datadog-dashboard-deploy --debug config.yaml
  2. Use dry-run mode:

    datadog-dashboard-deploy --dry-run config.yaml
  3. Check API rate limits:

    # View rate limit headers in response
    curl -I -H "DD-API-KEY: ${DATADOG_API_KEY}" \
         -H "DD-APPLICATION-KEY: ${DATADOG_APP_KEY}" \
         "https://api.datadoghq.com/api/v1/dashboard"

4. Query Issues

# Common query problems
queries:
  # Too many group by terms
  bad: "avg:system.cpu.user{*} by {host,region,az,service,version}"

  # Better - limited cardinality
  good: "avg:system.cpu.user{service:web} by {host}"

Solutions:

  1. Reduce query complexity
  2. Limit group by terms
  3. Use appropriate time aggregation
  4. Add relevant filters

Performance Issues

1. Slow Dashboard Loading

Causes:

  • Too many widgets
  • Complex queries
  • High cardinality metrics

Solutions:

# Optimize widget count
dashboards:
  - name: "System Overview"
    widgets:
      # Limit to essential metrics
      - title: "CPU Usage"
        type: "timeseries"
        query: "avg:system.cpu.user{service:$service}.rollup(avg, 300)"

2. High API Usage

Monitoring:

# Check API rate limits
curl -I -H "DD-API-KEY: ${DATADOG_API_KEY}" \
     -H "DD-APPLICATION-KEY: ${DATADOG_APP_KEY}" \
     "https://api.datadoghq.com/api/v1/usage/summary"

Solutions:

  1. Implement rate limiting
  2. Batch updates
  3. Use caching where appropriate

Error Messages

Common Error Codes

Code Description Solution
401 Unauthorized Check API/App keys
403 Forbidden Check permissions
429 Rate Limited Implement backoff
500 Server Error Retry with exponential backoff

Error Handling

try:
    deployer.deploy("config.yaml")
except AuthError:
    # Handle authentication errors
    check_credentials()
except RateLimitError:
    # Handle rate limiting
    implement_backoff()
except ValidationError as e:
    # Handle configuration errors
    print(f"Configuration error: {e}")

Debugging

1. Enable Debug Logging

# Enable debug output
export DATADOG_DASHBOARD_DEBUG=1
datadog-dashboard-deploy config.yaml

2. Use Validation Mode

# Validate configuration
datadog-dashboard-deploy --validate config.yaml

# Dry run deployment
datadog-dashboard-deploy --dry-run config.yaml

3. Check API Responses

# Example debug code
response = api.create_dashboard(config)
print(f"Status: {response.status_code}")
print(f"Headers: {response.headers}")
print(f"Body: {response.json()}")

Getting Help

  1. Check logs for detailed error messages
  2. Review Configuration Guide
  3. Search GitHub Issues
  4. Contact support:

Related Resources

Clone this wiki locally