Guidance

Here is a guide to use detectors from this repository. Keep in mind this collection of detectors is spitted into separate similar, based on the same model but different, Terraform modules which all use the the detector resource from the SignalFx provider.

Stack

Modules

This repository is a collection of modules fully independent and autonomous without any "root" module as entry point. However from the point of view of the Terraform registry a repository is a module which could have (and use) several sub-modules. To match this logic we created a fake root module and all modules of detectors are available and listed as sub-modules in this repository registry.

Use

To use a module simply follow its local README.md file. It should contain every required information to setup it correctly in your existing Terraform stack including:

a guideline common to every modules
some generic resources like related docs but contextualized to the module
specific notes and requirements depending on the purpose and dependencies of the module

Compose

Please, explore the modules list, pick some modules fitting your needs and follow their instructions to import and use them.

You have to understand that one module does not always correspond to one "target" to monitor. This is true for a basic use case like nginx for example but sometimes a service is splitted into multiple modules for different known use cases and for better flexibility.

Indeed you can compose your monitoring with one, some or all "fragments" available depending on your situation. There are plenty of reasons to do that:

reduce a big service in small pieces: Kubernetes for example is a huge one. It provides plethora of features and all are often not used or some metrics could simply not be available depending on the installation. In this case and for others we have a common module which should work for every situations and so which do not contain any detector related to the "master" nodes and control plane components which are simply not viewable for managed Kubernetes clusters like GCP GKE.
different versions: rds mysql or postgresql
handle different use cases: sometimes a same software could be used for different purpose or usage and depending on that you will not monitor it from the same way: redis (cache, queue, database)

service could depends on the same data (i.e. metrics) but

to understand is that you need to explore the available modules to select

Detectors

config etc

module splitted to meet different scenarios, import each fragment depending on your need
follow module readme to deploy configure and learn about any eventual specific information.
tagging convention
- monitoring config should be as generic as possible and rely on metadata from sources
- every module implement a default tagging convention (in general from user inputs)
- usual logic is filtering in on
  - sfx_monitored:true a common flag to enable alerting on a resource (or ignore some of them)
  - env set from the module common environment variable by user
- this could differ depending on the source of metrics either because dimensions are prefixed (aws_tag...) or not collected / available (newrelic, aws vpn, gcp).
- if constraints do not allow to match this convention feel free to override with custom one
aggregation

do not aggregate will evaluate every single MTS separately considering every avaiable dimensions values combination. advantage: will apply to every reporting resources no matter the situation and without to know them drawback: will be sensitive to the any dimensions changes (because of a new granularity or the disappearing of a MTS)
aggregate on a set of dimension(s) allow to "group" multiple MTS into one so this restrict the evaluation to these dimensions only (i.e. mts without one of the dimensions key will be ignored) advantage: the behavior is always the same (grouping and granularity do not change) drawback: we must to know valid and available set of dimensions to define the right group but they highly depend on the env, deployment, config ..
do not consider dimension(s) is only possible aggregating to other ones (impossible without aggregation).

In general do not aggregate is the most generic and easy way but:

sometimes we want to evaluate at "higher" level (i.e. not by host but for the entire cluster)
some use cases could be very sensitive to dimensions / grouping changes (i.e. heartbeat)

heartbeat

perfect for healthcheck while it will fire alert for every group which do not report anymore.
but highly depends on aggregation which defines the groups to evaluate and consider as "unhealthy".
do not aggregate make the implementation generic but will lead to alert for every single disappearing MTS (a simple dimension change will remove old MTS and create a new one).
on another side, the aggregation group to define is not always the same and could not be an universal default.
indeed, dimensions could change depending on environment and configuration and even with same dimensions the user could want different alerting granularity (by host, by cluster ..)
as much as possible modules do not use aggregation which will work for basic scenario
some modules use aggregation because the monitor provides not relevant too high granularity for heartbeat (i.e. database dimension on postgresql will lead alert for database dropped).
but in both cases we highly recommend to define the aggregation adapted in your scenario depending of avaialble dimensions and what you expect to montior.
it is also possible to configure
vm state are filtered out automatically to support downscaling from gcp, aws and azure.
when a MTS (when no aggregation) or a group of MTS (when aggregation) disappear and lead to heartbeat alert you need to wait 36h to signalfx consider as inactive and stop to raise alert on it.

notifications

severities
levels definition / best practices
how to mapping with example

agent config tips:

standard deployment and others (disableHostDimensions + extraDimensions)
disableEndpointDimensions
dimensionTransformations
datapointsToExclude (whitelist filtering)
service discovery
kubernetes 2 deployuments

soit que le detector soit dans un module commun mais desactivé par defaut avec une note dans le readme pour expliquer que ca a du sens pour tel ou tel usage (je crois que j'en ai fait des comme ca dans redis)
soit une meilleure decoupe des modules (un module commun qui marche par tout, puis un pour chaque cas du'tilisation)
soit il faut un tag "custom" utilisé dans le signalflow pour identifier un type d'usage pour etre sur que si activée par defaut ce detector ne s'applique que a aux sqs identifiés sur ce cas d'usage

SignalFx/Splunk Infrastructure Monitoring | Claranet France | Claranet Terraform

Usage

Development

claranet-logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance

Stack

Modules

Use

Compose

Detectors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Menu

Usage

Development

Clone this wiki locally