Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/severity.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
- [integration_azure-service-bus](#integration_azure-service-bus)
- [integration_azure-sql-database](#integration_azure-sql-database)
- [integration_azure-sql-elastic-pool](#integration_azure-sql-elastic-pool)
- [integration_azure-sql-managed-instances](#integration_azure-sql-managed-instances)
- [integration_azure-storage-account-blob](#integration_azure-storage-account-blob)
- [integration_azure-storage-account-capacity](#integration_azure-storage-account-capacity)
- [integration_azure-storage-account](#integration_azure-storage-account)
Expand Down Expand Up @@ -675,6 +676,14 @@
|Azure SQL Elastic Pool dtu consumption|X|X|-|-|-|


## integration_azure-sql-managed-instances

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|Azure SQL Managed Instances cpu|X|X|-|-|-|
|Azure SQL Managed Instances storage usage|X|X|-|-|-|


## integration_azure-storage-account-blob

|Detector|Critical|Major|Minor|Warning|Info|
Expand Down
111 changes: 111 additions & 0 deletions modules/integration_azure-sql-managed-instances/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# AZURE-SQL-MANAGED-INSTANCES SignalFx detectors

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
:link: **Contents**

- [How to use this module?](#how-to-use-this-module)
- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module)
- [How to collect required metrics?](#how-to-collect-required-metrics)
- [Metrics](#metrics)
- [Related documentation](#related-documentation)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## How to use this module?

This directory defines a [Terraform](https://www.terraform.io/)
[module](https://www.terraform.io/language/modules/syntax) you can use in your
existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a
`module` configuration and setting its `source` parameter to URL of this folder:

```hcl
module "signalfx-detectors-integration-azure-sql-managed-instances" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/integration_azure-sql-managed-instances?ref={revision}"

environment = var.environment
notifications = local.notifications
storage_usage_threshold_critical = 42
storage_usage_threshold_major = 42
}
```

Note the following parameters:

* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required.
Terraform uses it to specify subfolders within a Git repo (see [module
sources](https://www.terraform.io/language/modules/sources)). The `ref` parameter specifies a specific Git tag in
this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch
like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform
[registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source
instead of `git` which is more flexible but less future-proof.

* `environment`: Use this parameter to specify the
[environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this
instance of the module.
Its value will be added to the `prefixes` list at the start of the [detector
name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example).
In general, it will also be used in the `filtering` internal sub-module to [apply
filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default
[tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default.

* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists
of a Terraform [object](https://www.terraform.io/language/expressions/type-constraints#object) where each key represents an available
[detector rule severity](https://docs.splunk.com/observability/alerts-detectors-notifications/create-detectors-for-alerts.html#severity)
and its value is a list of recipients. Every recipients must respect the [detector notification
format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format).
Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding)
documentation to understand the recommended role of each severity.

These 3 parameters along with all variables defined in [common-variables.tf](common-variables.tf) are common to all
[modules](../) in this repository. Other variables, specific to this module, are available in
[variables-gen.tf](variables-gen.tf).
In general, the default configuration "works" but all of these Terraform
[variables](https://www.terraform.io/language/values/variables) make it possible to
customize the detectors behavior to better fit your needs.

Most of them represent usual tips and rules detailed in the
[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the
common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation.

Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about
general usage of this repository.

## What are the available detectors in this module?

This module creates the following SignalFx detectors which could contain one or multiple alerting rules:

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|Azure SQL Managed Instances cpu|X|X|-|-|-|
|Azure SQL Managed Instances storage usage|X|X|-|-|-|

## How to collect required metrics?

This module deploys detectors using metrics reported by the
[Azure integration](https://docs.splunk.com/Observability/gdi/get-data-in/connect/azure/azure.html) configurable
with [this Terraform module](https://github.com/claranet/terraform-signalfx-integrations/tree/master/cloud/azure).


Check the [Related documentation](#related-documentation) section for more detailed and specific information about this module dependencies.



### Metrics


Here is the list of required metrics for detectors in this module.

* `avg_cpu_percent`
* `storage_space_used_mb`




## Related documentation

* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs)
* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector)
* [Splunk Observability integrations](https://docs.splunk.com/Observability/gdi/get-data-in/integrations.html)
* [Azure Monitor metrics](https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-sql-managedinstances-metrics)
* [Splunk Observability metrics](https://docs.splunk.com/observability/en/gdi/get-data-in/connect/azure/azure-metrics.html#azure-sql-managedinstances-metrics)
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
module: "Azure SQL Managed Instances"
name: "CPU"
filtering: "filter('resource_type', 'Microsoft.Sql/managedInstances') and filter('primary_aggregation_type', 'true')"
aggregation: ".mean(by=['azure_resource_name', 'azure_resource_group_name', 'azure_region'])"
value_unit: "%"
transformation: true
signals:
signal:
metric: "avg_cpu_percent"
rollup: max
rules:
critical:
threshold: 90
comparator: ">"
lasting_duration: '15m'
major:
threshold: 80
comparator: ">"
lasting_duration: '15m'
dependency: critical
...
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
module: "Azure SQL Managed Instances"
name: "storage usage"
filtering: "filter('resource_type', 'Microsoft.Sql/managedInstances') and filter('primary_aggregation_type', 'true')"
aggregation: ".mean(by=['azure_resource_name', 'azure_resource_group_name', 'azure_region'])"
value_unit: "Megabyte"
transformation: true
signals:
signal:
metric: "storage_space_used_mb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be compared to reserved_storage_mb metric to monitor the percentage instead of fixed value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. It's better now :)

rules:
critical:
comparator: ">"
major:
comparator: ">"
dependency: critical
...
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
documentations:
- name: Azure Monitor metrics
url: 'https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-sql-managedinstances-metrics'
- name: Splunk Observability metrics
url: 'https://docs.splunk.com/observability/en/gdi/get-data-in/connect/azure/azure-metrics.html#azure-sql-managedinstances-metrics'
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
resource "signalfx_detector" "cpu" {
name = format("%s %s", local.detector_name_prefix, "Azure SQL Managed Instances cpu")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

viz_options {
label = "signal"
value_suffix = "%"
}

program_text = <<-EOF
base_filtering = filter('resource_type', 'Microsoft.Sql/managedInstances') and filter('primary_aggregation_type', 'true')
signal = data('avg_cpu_percent', filter=base_filtering and ${module.filtering.signalflow}, rollup='max')${var.cpu_aggregation_function}${var.cpu_transformation_function}.publish('signal')
detect(when(signal > ${var.cpu_threshold_critical}%{if var.cpu_lasting_duration_critical != null}, lasting='${var.cpu_lasting_duration_critical}', at_least=${var.cpu_at_least_percentage_critical}%{endif})).publish('CRIT')
detect(when(signal > ${var.cpu_threshold_major}%{if var.cpu_lasting_duration_major != null}, lasting='${var.cpu_lasting_duration_major}', at_least=${var.cpu_at_least_percentage_major}%{endif}) and (not when(signal > ${var.cpu_threshold_critical}%{if var.cpu_lasting_duration_critical != null}, lasting='${var.cpu_lasting_duration_critical}', at_least=${var.cpu_at_least_percentage_critical}%{endif}))).publish('MAJOR')
EOF

rule {
description = "is too high > ${var.cpu_threshold_critical}%"
severity = "Critical"
detect_label = "CRIT"
disabled = coalesce(var.cpu_disabled_critical, var.cpu_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.cpu_notifications, "critical", []), var.notifications.critical), null)
runbook_url = try(coalesce(var.cpu_runbook_url, var.runbook_url), "")
tip = var.cpu_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

rule {
description = "is too high > ${var.cpu_threshold_major}%"
severity = "Major"
detect_label = "MAJOR"
disabled = coalesce(var.cpu_disabled_major, var.cpu_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.cpu_notifications, "major", []), var.notifications.major), null)
runbook_url = try(coalesce(var.cpu_runbook_url, var.runbook_url), "")
tip = var.cpu_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.cpu_max_delay
}

resource "signalfx_detector" "storage_usage" {
name = format("%s %s", local.detector_name_prefix, "Azure SQL Managed Instances storage usage")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

viz_options {
label = "signal"
value_suffix = "Megabyte"
}

program_text = <<-EOF
base_filtering = filter('resource_type', 'Microsoft.Sql/managedInstances') and filter('primary_aggregation_type', 'true')
signal = data('storage_space_used_mb', filter=base_filtering and ${module.filtering.signalflow})${var.storage_usage_aggregation_function}${var.storage_usage_transformation_function}.publish('signal')
detect(when(signal > ${var.storage_usage_threshold_critical}%{if var.storage_usage_lasting_duration_critical != null}, lasting='${var.storage_usage_lasting_duration_critical}', at_least=${var.storage_usage_at_least_percentage_critical}%{endif})).publish('CRIT')
detect(when(signal > ${var.storage_usage_threshold_major}%{if var.storage_usage_lasting_duration_major != null}, lasting='${var.storage_usage_lasting_duration_major}', at_least=${var.storage_usage_at_least_percentage_major}%{endif}) and (not when(signal > ${var.storage_usage_threshold_critical}%{if var.storage_usage_lasting_duration_critical != null}, lasting='${var.storage_usage_lasting_duration_critical}', at_least=${var.storage_usage_at_least_percentage_critical}%{endif}))).publish('MAJOR')
EOF

rule {
description = "is too high > ${var.storage_usage_threshold_critical}Megabyte"
severity = "Critical"
detect_label = "CRIT"
disabled = coalesce(var.storage_usage_disabled_critical, var.storage_usage_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.storage_usage_notifications, "critical", []), var.notifications.critical), null)
runbook_url = try(coalesce(var.storage_usage_runbook_url, var.runbook_url), "")
tip = var.storage_usage_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

rule {
description = "is too high > ${var.storage_usage_threshold_major}Megabyte"
severity = "Major"
detect_label = "MAJOR"
disabled = coalesce(var.storage_usage_disabled_major, var.storage_usage_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.storage_usage_notifications, "major", []), var.notifications.major), null)
runbook_url = try(coalesce(var.storage_usage_runbook_url, var.runbook_url), "")
tip = var.storage_usage_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.storage_usage_max_delay
}

10 changes: 10 additions & 0 deletions modules/integration_azure-sql-managed-instances/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
output "cpu" {
description = "Detector resource for cpu"
value = signalfx_detector.cpu
}

output "storage_usage" {
description = "Detector resource for storage_usage"
value = signalfx_detector.storage_usage
}

3 changes: 3 additions & 0 deletions modules/integration_azure-sql-managed-instances/tags.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
locals {
tags = ["integration", "azure-sql-managed-instances"]
}
Loading
Loading