Skip to content

Commit dbace73

Browse files
authored
Merge pull request #21 from rhythmictech/ENG-4561
created acm certificate renewal failure monitor
2 parents f07c5f4 + 167ed76 commit dbace73

File tree

7 files changed

+192
-1
lines changed

7 files changed

+192
-1
lines changed

aws/acm/.terraform.lock.hcl

Lines changed: 44 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

aws/acm/README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
<!-- BEGIN_TF_DOCS -->
2+
## Requirements
3+
4+
| Name | Version |
5+
|------|---------|
6+
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.5 |
7+
| <a name="requirement_datadog"></a> [datadog](#requirement\_datadog) | >= 3.37 |
8+
| <a name="requirement_null"></a> [null](#requirement\_null) | >= 3.1.0 |
9+
10+
## Providers
11+
12+
| Name | Version |
13+
|------|---------|
14+
| <a name="provider_datadog"></a> [datadog](#provider\_datadog) | 3.76.0 |
15+
16+
## Modules
17+
18+
No modules.
19+
20+
## Resources
21+
22+
| Name | Type |
23+
|------|------|
24+
| [datadog_monitor.certificate_renewal_failure_check](https://registry.terraform.io/providers/datadog/datadog/latest/docs/resources/monitor) | resource |
25+
26+
## Inputs
27+
28+
| Name | Description | Type | Default | Required |
29+
|------|-------------|------|---------|:--------:|
30+
| <a name="input_additional_tags"></a> [additional\_tags](#input\_additional\_tags) | Additional tags (key:value format) to add to this type of check (combined with `local.tags` and `var.base_tags`) | `list(string)` | `[]` | no |
31+
| <a name="input_alert_critical_priority"></a> [alert\_critical\_priority](#input\_alert\_critical\_priority) | Priority for alerts within critical threshold (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |
32+
| <a name="input_alert_message"></a> [alert\_message](#input\_alert\_message) | Message to prepend to alert notifications | `string` | `"Alert"` | no |
33+
| <a name="input_alert_nodata_priority"></a> [alert\_nodata\_priority](#input\_alert\_nodata\_priority) | Priority for alerts within warning threshold (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |
34+
| <a name="input_base_tags"></a> [base\_tags](#input\_base\_tags) | Base tags (key:value format) to add to this type of check (combined with `local.tags` and `var.additional_tags`, generally you should not change this) | `list(string)` | <pre>[<br/> "resource:acm"<br/>]</pre> | no |
35+
| <a name="input_certificate_renewal_failure_check_enabled"></a> [certificate\_renewal\_failure\_check\_enabled](#input\_certificate\_renewal\_failure\_check\_enabled) | Whether to enable the certificate renewal failure check | `bool` | `true` | no |
36+
| <a name="input_cost_center"></a> [cost\_center](#input\_cost\_center) | Cost Center of the monitored resource (leave blank to omit tag) | `string` | `null` | no |
37+
| <a name="input_dashboard_link"></a> [dashboard\_link](#input\_dashboard\_link) | Dashboard link to include in message | `string` | `null` | no |
38+
| <a name="input_env"></a> [env](#input\_env) | Environment the monitored resource is in (leave blank to omit tag) | `string` | `null` | no |
39+
| <a name="input_evaluation_delay"></a> [evaluation\_delay](#input\_evaluation\_delay) | Monitor evaluation delay (see [https://docs.datadoghq.com/monitors/configuration/?tab=thresholdalert#set-alert-conditions](Datadog Docs)) | `number` | `900` | no |
40+
| <a name="input_group_by"></a> [group\_by](#input\_group\_by) | List of tags to group by | `list(string)` | <pre>[<br/> "name",<br/> "aws_account",<br/> "env",<br/> "datadog_managed"<br/>]</pre> | no |
41+
| <a name="input_monitor_exclude_tags"></a> [monitor\_exclude\_tags](#input\_monitor\_exclude\_tags) | Tags to be excluded in the monitoring query. Specify in key:value format | `list(string)` | `[]` | no |
42+
| <a name="input_monitor_include_tags"></a> [monitor\_include\_tags](#input\_monitor\_include\_tags) | Tags to be included in the monitoring query. Specify in key:value format | `list(string)` | `[]` | no |
43+
| <a name="input_new_group_delay"></a> [new\_group\_delay](#input\_new\_group\_delay) | Delay in seconds before generating alerts for a new resource | `number` | `300` | no |
44+
| <a name="input_notify_alert_override"></a> [notify\_alert\_override](#input\_notify\_alert\_override) | List of notifications for alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
45+
| <a name="input_notify_crit_override"></a> [notify\_crit\_override](#input\_notify\_crit\_override) | List of notifications for 24x7 alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
46+
| <a name="input_notify_default"></a> [notify\_default](#input\_notify\_default) | List of alert notifications (can be overridden based on alert type) | `list(string)` | n/a | yes |
47+
| <a name="input_notify_no_data"></a> [notify\_no\_data](#input\_notify\_no\_data) | Alert if no matching data is found | `bool` | `false` | no |
48+
| <a name="input_notify_nodata_override"></a> [notify\_nodata\_override](#input\_notify\_nodata\_override) | List of notifications for no data (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
49+
| <a name="input_notify_nonprod_override"></a> [notify\_nonprod\_override](#input\_notify\_nonprod\_override) | List of notifications for non-prod alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
50+
| <a name="input_notify_prod_override"></a> [notify\_prod\_override](#input\_notify\_prod\_override) | List of notifications for 12x5 prod alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
51+
| <a name="input_notify_recovery_override"></a> [notify\_recovery\_override](#input\_notify\_recovery\_override) | List of notifications for alert recovery (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
52+
| <a name="input_notify_warn_override"></a> [notify\_warn\_override](#input\_notify\_warn\_override) | List of notifications for alerts in warning threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
53+
| <a name="input_renotify_interval"></a> [renotify\_interval](#input\_renotify\_interval) | Interval in minutes to re-send notifications about an alert | `number` | `60` | no |
54+
| <a name="input_runbook_link"></a> [runbook\_link](#input\_runbook\_link) | Runbook link to include in message | `string` | `null` | no |
55+
| <a name="input_service"></a> [service](#input\_service) | Service associated with the monitored resource (leave blank to omit tag) | `string` | `null` | no |
56+
| <a name="input_team"></a> [team](#input\_team) | Team supporting the monitored resource (leave blank to omit tag) | `string` | `null` | no |
57+
| <a name="input_timeout_h"></a> [timeout\_h](#input\_timeout\_h) | Auto-resolve alert in specified hours if condition no longer matches | `number` | `0` | no |
58+
| <a name="input_title_prefix"></a> [title\_prefix](#input\_title\_prefix) | Prefix all alerts with specified value in brackets | `string` | `null` | no |
59+
| <a name="input_title_suffix"></a> [title\_suffix](#input\_title\_suffix) | Suffix all alerts with specified value in parenthesis | `string` | `null` | no |
60+
| <a name="input_warn_priority"></a> [warn\_priority](#input\_warn\_priority) | Priority for alerts with no data (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |
61+
62+
## Outputs
63+
64+
No outputs.
65+
<!-- END_TF_DOCS -->

aws/acm/common.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/common.tf

aws/acm/main.tf

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
locals {
2+
# these must be defined but do not need to be overridden
3+
monitor_alert_default_priority = null
4+
monitor_warn_default_priority = null
5+
monitor_nodata_default_priority = null
6+
7+
title_prefix = var.title_prefix == null ? "" : "[${var.title_prefix}]"
8+
title_suffix = var.title_suffix == null ? "" : " (${var.title_suffix})"
9+
}
10+
11+
resource "datadog_monitor" "certificate_renewal_failure_check" {
12+
count = var.certificate_renewal_failure_check_enabled ? 1 : 0
13+
14+
name = join("", [local.title_prefix, "ACM - Certificate Renewal Failure", local.title_suffix])
15+
type = "event-v2 alert"
16+
message = local.event_alert_base_message
17+
tags = concat(local.common_tags, var.base_tags, var.additional_tags)
18+
include_tags = false
19+
20+
evaluation_delay = var.evaluation_delay
21+
new_group_delay = var.new_group_delay
22+
23+
query = <<-EOQ
24+
events("source:amazon_acm").rollup("count").by("@aggregation_key,env").last("5m") > 0
25+
EOQ
26+
27+
monitor_thresholds {
28+
critical = 0
29+
}
30+
}

aws/acm/variables.tf

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
########################################
2+
# Global variables
3+
########################################
4+
variable "additional_tags" {
5+
default = []
6+
description = "Additional tags (key:value format) to add to this type of check (combined with `local.tags` and `var.base_tags`)"
7+
type = list(string)
8+
}
9+
10+
variable "base_tags" {
11+
default = ["resource:acm"]
12+
description = "Base tags (key:value format) to add to this type of check (combined with `local.tags` and `var.additional_tags`, generally you should not change this)"
13+
type = list(string)
14+
}
15+
16+
########################################
17+
# Certificate Renewal Failure Check
18+
########################################
19+
variable "certificate_renewal_failure_check_enabled" {
20+
default = true
21+
description = "Whether to enable the certificate renewal failure check"
22+
type = bool
23+
}

aws/acm/versions.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/versions.tf

common/common.tf

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,34 @@ END
340340
${local.alert_context}
341341
**Alert Information**
342342
{{#is_alert}} ${local.notify_on_alert} {{/is_alert}}
343-
{{#is_recovery}} ${local.notify_on_recovery} {{/is_recovery}}
343+
END
344+
345+
event_alert_base_message = <<END
346+
${local.alert_context}
347+
348+
**Alert Information**
349+
* **Event Tags**: {{event.tags}}
350+
* **Event Text**: {{event.text}}
351+
{{#is_alert}}
352+
Current value: {{value}}
353+
Threshold: {{threshold}}
354+
355+
Environment: {{env.name}}
356+
357+
{{#is_match "env.name" "prod" "prd"}}
358+
{{#is_match "event.tags.datadog_managed" "critical"}}
359+
${local.notify_on_crit}
360+
{{/is_match}}
361+
{{#is_match "event.tags.datadog_managed" "true"}}
362+
${local.notify_on_prod}
363+
{{/is_match}}
364+
{{/is_match}}
365+
{{^is_match "env.name" "prod" "prd"}}
366+
${local.notify_on_nonprod}
367+
{{/is_match}}
368+
369+
Please investigate and take necessary actions.
370+
{{/is_alert}}
344371
END
345372

346373
service_group_by = join(",", formatlist("\"%s\"", var.group_by))

0 commit comments

Comments
 (0)