Skip to content

Commit cefc13e

Browse files
Update index pages for EDOT troubleshooting documentation (#4184)
## Summary This PR audits and improves the IA and interlinking of the EDOT troubleshooting documentation. It rewrites the original index pages and adds contextual links to improve navigation. Closes #3426 ## Generative AI disclosure Did you use a generative AI (GenAI) tool to assist in creating this contribution? [x] Yes [ ] No If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.). Tool(s) and model(s) used: Cursor AI (Claude Sonnet 4.5)
1 parent 038cd77 commit cefc13e

26 files changed

+163
-74
lines changed

troubleshoot/ingest/opentelemetry/429-errors-motlp.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ A 429 status means that the rate of requests sent to the Managed OTLP endpoint h
5151
Refer to the [Rate limiting section](opentelemetry://reference/motlp.md#rate-limiting) in the mOTLP reference documentation for details.
5252

5353
* In {{ech}}, the {{es}} capacity for your deployment might be underscaled for the current ingest rate.
54-
* In {{serverless-full}}, rate limiting should not result from {{es}} capacity, since the platform automatically scales ingest capacity. If you suspect a scaling issue, [contact Elastic Support](contact-support.md).
54+
* In {{serverless-full}}, rate limiting should not result from {{es}} capacity, since the platform automatically scales ingest capacity. If you suspect a scaling issue, [contact Elastic Support](/troubleshoot/ingest/opentelemetry/contact-support.md).
5555
* Multiple Collectors or SDKs are sending data concurrently without load balancing or backoff mechanisms.
5656

5757
## Resolution
@@ -62,7 +62,7 @@ To resolve 429 errors, identify whether the bottleneck is caused by ingest limit
6262

6363
If you’ve confirmed that your ingest configuration is stable but still encounter 429 errors:
6464

65-
* {{serverless-full}}: [Contact Elastic Support](contact-support.md) to request an increase in ingest limits.
65+
* {{serverless-full}}: [Contact Elastic Support](/troubleshoot/ingest/opentelemetry/contact-support.md) to request an increase in ingest limits.
6666
* {{ech}} (ECH): Increase your {{es}} capacity by scaling or resizing your deployment:
6767
* [Scaling considerations](../../../deploy-manage/production-guidance/scaling-considerations.md)
6868
* [Resize deployment](../../../deploy-manage/deploy/cloud-enterprise/resize-deployment.md)
@@ -106,7 +106,7 @@ exporters:
106106
enabled: true
107107
```
108108
109-
This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling.
109+
This ensures the Collector buffers data locally while waiting for the ingest endpoint to recover from throttling. For more information on export failures and queue configuration, refer to [Export failures when sending telemetry data](/troubleshoot/ingest/opentelemetry/edot-collector/trace-export-errors.md).
110110
111111
## Best practices
112112

troubleshoot/ingest/opentelemetry/connectivity.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
navigation_title: Connectivity issues
33
description: Troubleshoot connectivity issues between EDOT SDKs, the EDOT Collector, and Elastic.
44
applies_to:
5-
serverless: all
5+
serverless: ga
66
product:
77
edot_collector: ga
88
products:
@@ -75,14 +75,14 @@ Connectivity errors usually trace back to one of the following issues:
7575
Errors can look similar whether they come from an SDK or the Collector. Identifying the source helps you isolate the problem.
7676

7777
:::{note}
78-
Note: Some SDKs support setting a proxy directly (for example, using `HTTPS_PROXY`). Refer to [Proxy settings for EDOT SDKs](../opentelemetry/edot-sdks/proxy.md) for details.
78+
Note: Some SDKs support setting a proxy directly (for example, using `HTTPS_PROXY`). Refer to [Proxy settings for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/proxy.md) for details.
7979
:::
8080

8181
#### SDK
8282

8383
Application logs report failures when the SDK cannot send data to the Collector or directly to Elastic. These often appear as `connection refused` or `timeout` messages. If seen, verify that the Collector endpoint is reachable.
8484

85-
For guidance on enabling logs in your SDK, see [Enable SDK debug logging](../opentelemetry/edot-sdks/enable-debug-logging.md).
85+
For guidance on enabling logs in your SDK, refer to [Enable SDK debug logging](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).
8686

8787
Example (Java SDK):
8888

@@ -154,6 +154,6 @@ If basic checks and configuration look correct but issues persist, collect more
154154

155155
* Review proxy settings. For more information, refer to [Proxy settings](opentelemetry://reference/edot-collector/config/proxy.md).
156156

157-
* If ports are confirmed open but errors persist, [enable debug logging in the SDK](../opentelemetry/edot-sdks/enable-debug-logging.md) or [in the Collector](../opentelemetry/edot-collector/enable-debug-logging.md) for more detail.
157+
* If ports are confirmed open but errors persist, [enable debug logging in the SDK](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md) or [in the Collector](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md) for more detail.
158158

159159
* Contact your network administrator with test results if you suspect firewall restrictions.

troubleshoot/ingest/opentelemetry/contact-support.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ To help Elastic Support investigate the problem efficiently, please include the
7777

7878
### Logs and diagnostics
7979

80-
* Recent Collector logs with relevant errors or warning messages
80+
* Recent Collector logs with relevant errors or warning messages. For guidance on enabling debug logging, refer to [Enable debug logging for the EDOT Collector](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md) or [Enable debug logging for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).
8181
* Output from:
8282

8383
```bash
@@ -92,7 +92,7 @@ To help Elastic Support investigate the problem efficiently, please include the
9292

9393
### Data and UI symptoms
9494

95-
* Are traces, metrics, or logs missing from the UI?
95+
* Are traces, metrics, or logs missing from the UI? For troubleshooting steps, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md) or [No application-level telemetry visible in {{kib}}](/troubleshoot/ingest/opentelemetry/edot-sdks/missing-app-telemetry.md).
9696
* Are you using the [Elastic Managed OTLP endpoint](https://www.elastic.co/docs/observability/apm/otel/managed-otel-ingest/)?
9797
* If data is missing or incomplete, consider enabling the [debug exporter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/debugexporter/README.md) to inspect the raw signal data emitted by the Collector.
9898

troubleshoot/ingest/opentelemetry/edot-collector/collector-not-starting.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ If you're deploying the EDOT Collector in a standalone configuration, try to:
6666
./otelcol --set=service.telemetry.logs.level=debug
6767
```
6868

69-
This is especially helpful for diagnosing configuration parsing issues or startup errors.
69+
This is especially helpful for diagnosing configuration parsing issues or startup errors. For more information on enabling debug logging, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).
7070

7171

7272
* Confirm required components are defined
@@ -95,7 +95,7 @@ If you're deploying the EDOT Collector in a standalone configuration, try to:
9595
lsof -i :4317
9696
```
9797

98-
If needed, adjust your configuration or free up the port.
98+
If needed, adjust your configuration or free up the port. For network connectivity issues, refer to [Connectivity issues](/troubleshoot/ingest/opentelemetry/connectivity.md).
9999

100100
### Kubernetes EDOT Collector
101101

@@ -117,6 +117,8 @@ If you're deploying the EDOT Collector using the Elastic Helm charts, try to:
117117

118118
Common issues include volume mount errors, image pull failures, or misconfigured environment variables.
119119

120+
If the Collector starts but no data appears in {{kib}}, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md) for additional troubleshooting steps.
121+
120122
## Resources
121123

122124
* [Collector configuration documentation](https://opentelemetry.io/docs/collector/configuration/)

troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ products:
1717

1818
If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue.
1919

20+
If you're running the Collector in Kubernetes and experiencing resource allocation issues, refer to [Insufficient resources in Kubernetes](/troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md) for troubleshooting steps.
21+
2022
## Symptoms
2123

2224
These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure:
@@ -25,6 +27,8 @@ These symptoms typically indicate that the EDOT Collector is experiencing a memo
2527
- Memory usage steadily increases before the crash.
2628
- The Collector's logs don't show clear errors before termination.
2729

30+
For more detailed diagnostics, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).
31+
2832
## Resolution
2933

3034
Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod:

troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,4 @@ Debug logging for the Collector is not currently configurable through {{fleet}}.
8888

8989
## Resources
9090

91-
To learn how to enable debug logging for the EDOT SDKs, refer to [Enable debug logging for EDOT SDKs](../edot-sdks/enable-debug-logging.md).
91+
To learn how to enable debug logging for the EDOT SDKs, refer to [Enable debug logging for EDOT SDKs](/troubleshoot/ingest/opentelemetry/edot-sdks/enable-debug-logging.md).

troubleshoot/ingest/opentelemetry/edot-collector/index.md

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,36 @@ products:
1313

1414
# Troubleshoot the EDOT Collector
1515

16-
Perform these checks when troubleshooting common Collector issues:
17-
18-
* Check logs: Review the Collector’s logs for error messages.
19-
* Validate configuration: Use the `--dry-run` option to test configurations.
20-
* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs.
21-
* Check service status: Ensure the Collector is running with `systemctl status <collector-service>` (Linux) or `tasklist` (Windows).
22-
* Test connectivity: Use `telnet <endpoint> <port>` or `curl` to verify backend availability.
23-
* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening.
24-
* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory.
25-
* Validate exporters: Ensure exporters are properly configured and reachable.
26-
* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health.
27-
* Check permissions: Ensure the Collector has the right file and network permissions.
28-
* Review recent changes: Roll back recent config updates if the issue started after changes.
29-
30-
For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).
16+
Use the topics in this section to troubleshoot issues with the EDOT Collector.
17+
18+
If you're not sure where to start, review the Collector's logs for error messages and validate your configuration using the `--dry-run` option. For more detailed diagnostics, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).
19+
20+
## Resource issues
21+
22+
* [Collector out of memory](/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md): Diagnose and resolve out-of-memory issues in the EDOT Collector using Go's Performance Profiler.
23+
24+
* [Insufficient resources in {{k8s}}](/troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md): Troubleshoot resource allocation issues when running the EDOT Collector in {{k8s}} environments.
25+
26+
## Configuration issues
27+
28+
* [Collector doesn't start](/troubleshoot/ingest/opentelemetry/edot-collector/collector-not-starting.md): Resolve startup failures caused by invalid configuration, port conflicts, or missing components.
29+
30+
* [Missing or incomplete traces due to Collector sampling](/troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md): Troubleshoot missing or incomplete traces caused by sampling configuration.
31+
32+
* [Collector doesn't propagate client metadata](/troubleshoot/ingest/opentelemetry/edot-collector/metadata.md): Learn why the Collector doesn't extract custom attributes and how to propagate such values using EDOT SDKs.
33+
34+
## Connectivity and export issues
35+
36+
* [Export failures when sending telemetry data](/troubleshoot/ingest/opentelemetry/edot-collector/trace-export-errors.md): Resolve export failures caused by `sending_queue` overflow and {{es}} exporter timeouts.
37+
38+
## Debugging
39+
40+
* [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md): Learn how to enable debug logging for the EDOT Collector in supported environments.
41+
42+
## See also
43+
44+
* [EDOT SDKs troubleshooting](/troubleshoot/ingest/opentelemetry/edot-sdks/index.md): For end-to-end issues that may involve both the Collector and SDKs.
45+
46+
* [Troubleshoot EDOT](/troubleshoot/ingest/opentelemetry/index.md): Overview of all EDOT troubleshooting resources.
47+
48+
For in-depth details on troubleshooting, refer to the contrib [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).

troubleshoot/ingest/opentelemetry/edot-collector/insufficient-resources-kubestack.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ These symptoms are common when the Kube-Stack chart is deployed with insufficien
2525
- Cluster or Daemon pods are unable to export data to the Gateway collector due being `OOMKilled` (high memory usage).
2626
- Pods have logs similar to: `error internal/queue_sender.go:128 Exporting failed. Dropping data.`
2727

28+
For detailed diagnostics on OOMKilled issues, refer to [Collector out of memory](/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md). For more information on enabling debug logging, refer to [Enable debug logging](/troubleshoot/ingest/opentelemetry/edot-collector/enable-debug-logging.md).
29+
2830
## Resolution
2931

3032
Follow these steps to resolve the issue.

troubleshoot/ingest/opentelemetry/edot-collector/metadata.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ This will not work, as the Collector doesn't automatically extract such values f
6262
6363
## Resolution
6464
65-
If you want to propagate customer IDs or project names into spans or metrics, you must instrument this in your code using one of the SDKs.
65+
If you want to propagate customer IDs or project names into spans or metrics, you must instrument this in your code using one of the SDKs. For SDK-specific troubleshooting guidance, refer to [EDOT SDKs troubleshooting](/troubleshoot/ingest/opentelemetry/edot-sdks/index.md).
6666
6767
Use `span.set_attribute` in your application code, where OpenTelemetry spans are created. For example:
6868

troubleshoot/ingest/opentelemetry/edot-collector/misconfigured-sampling-collector.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
navigation_title: Collector sampling issues
33
description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration.
44
applies_to:
5-
serverless: all
5+
serverless: ga
66
product:
77
edot_collector: ga
88
products:
@@ -12,11 +12,11 @@ products:
1212

1313
# Missing or incomplete traces due to Collector sampling
1414

15-
If traces or spans are missing in {{kib}}, the issue might be related to the Collectors sampling configuration.
15+
If traces or spans are missing in {{kib}}, the issue might be related to the Collector's sampling configuration. For general troubleshooting when no data appears in {{kib}}, refer to [No data visible in {{kib}}](/troubleshoot/ingest/opentelemetry/no-data-in-kibana.md).
1616

1717
{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped.
1818

19-
Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information.
19+
Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. Refer to [Missing or incomplete traces due to SDK sampling](/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md) for more information.
2020

2121
## Symptoms
2222

@@ -79,4 +79,4 @@ Follow these steps to resolve sampling configuration issues:
7979

8080
- [Tail sampling processor (Collector)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor)
8181
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/)
82-
- [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md)
82+
- [Missing or incomplete traces due to SDK sampling](/troubleshoot/ingest/opentelemetry/edot-sdks/misconfigured-sampling-sdk.md)

0 commit comments

Comments
 (0)