Skip to content

Commit 960fdbc

Browse files
committed
added qtelemetry to releasee notes
1 parent 58eb486 commit 960fdbc

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed

doc/markdown/manual/release-notes/03_major_enhancements.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,112 @@
22

33
## v9.0.5
44

5+
### qtelemetry (Developer Preview)
6+
7+
This release introduces **qtelemetry**, a new metrics exporter for Gridware Cluster Scheduler (GCS). It allows administrators to easily collect and expose cluster metrics for monitoring and observability purposes.
8+
9+
**Features:**
10+
11+
- Simple integration with Prometheus and Grafana
12+
13+
- Export cluster metrics, including:
14+
15+
- Host metrics (CPU load, GPU availability, memory usage, and many more)
16+
17+
- Job metrics (queued, running, errored, waiting time, and many more)
18+
19+
- qmaster statistics (CPU/memory usage of `sge_qmaster`, spooling filesystem information)
20+
21+
- Optional per-job metric export for detailed insights (recommended only for very small workloads)
22+
23+
- Built-in support for pre-configured Grafana dashboard:
24+
25+
- [Grafana dashboard example](https://grafana.com/grafana/dashboards/23208-gridware-cluster-scheduler-org/).
26+
27+
**Quick Start:**
28+
29+
By default, `qtelemetry` exports metrics on port `9464` from the `/metrics` endpoint:
30+
31+
```shell
32+
33+
./qtelemetry start
34+
35+
```
36+
37+
Enable additional metrics sources using command-line flags:
38+
39+
```shell
40+
41+
# Export exec host and qmaster metrics
42+
43+
./qtelemetry start --enableExecd --enableMaster
44+
45+
# Export individual job-level metrics (for smaller systems)
46+
47+
./qtelemetry start --singleJobs
48+
49+
```
50+
51+
#### Advanced Configuration (Environment Variables):
52+
53+
- **Change default binding address or port:**
54+
55+
Set `OTEL_EXPORTER_PROMETHEUS_ENDPOINT`.
56+
57+
Example:
58+
59+
```shell
60+
61+
export OTEL_EXPORTER_PROMETHEUS_ENDPOINT=":9000" # port only
62+
63+
export OTEL_EXPORTER_PROMETHEUS_ENDPOINT="1.2.3.4:9000" # IP and port
64+
65+
```
66+
67+
- **Enable Basic Authentication:**
68+
69+
Set `QTELEMETRY_USERNAME` and `QTELEMETRY_PASSWORD`.
70+
71+
```shell
72+
73+
export QTELEMETRY_USERNAME="your-user"
74+
75+
export QTELEMETRY_PASSWORD="your-password"
76+
77+
```
78+
79+
- **Enable TLS:**
80+
81+
Set paths to TLS certificate and key with `QTELEMETRY_TLS_CERT` and `QTELEMETRY_TLS_KEY`.
82+
83+
```shell
84+
85+
export QTELEMETRY_TLS_CERT="/path/to/cert.pem"
86+
87+
export QTELEMETRY_TLS_KEY="/path/to/key.pem"
88+
89+
```
90+
91+
#### Recommended Monitoring Stack Setup:
92+
93+
1. Start `qtelemetry` with the endpoint exposed (as above).
94+
95+
2. Configure Prometheus to scrape metrics from `qtelemetry`.
96+
97+
3. Set up Grafana and connect it to your Prometheus database.
98+
99+
4. Import the provided Grafana dashboard from [here](https://grafana.com/grafana/dashboards/23208-gridware-cluster-scheduler-org/) for immediate insights.
100+
101+
**Supported Platforms:**
102+
103+
- Linux (`lx-amd64` and `lx-arm64`)
104+
105+
**Important Note:**
106+
107+
This release of qtelemetry is currently a Developer Preview. Metric structure, naming, and availability are subject to change in future releases based on development progress and user feedback. We strongly encourage feedback and suggestions to shape this tool’s evolution.
108+
109+
110+
5111
### Out of the Box Support of various MPI Distributions
6112

7113
The `$SGE_ROOT/mpi` directory contains templates of the PE configuration for the following MPI distributions:

0 commit comments

Comments
 (0)