Skip to content

Commit 14fb38f

Browse files
Merge branch 'hpc-gridware:master' into master
2 parents e3b3401 + 111099c commit 14fb38f

File tree

9 files changed

+190
-92
lines changed

9 files changed

+190
-92
lines changed

doc/markdown/manual/release-notes/03_major_enhancements.md

Lines changed: 80 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,87 @@
11
# Major Enhancements
22

3-
## qconf support to add/modify/delete/show complex entries individually
3+
## v9.0.1beta
4+
5+
### Utilization of additional data stores and activation of new thread pools
6+
7+
Beginning with patch v9.0.1 the new internal architecture of `sge_qmaster` is activated so that the component can
8+
utilize additional data stores by starting new thread pools.
9+
10+
* Listener thread pool: The listener thread pool was already activated in v9.0.0. It is used to handle incoming
11+
requests from clients and to distribute them to the corresponding processing components. Additionally, this pool
12+
of threads utilizes a new data store to answer authentication requests. Beginning with v9.0.1 this data store
13+
is used for even more requests to relieve other internal components within `sge_qmaster`.
14+
15+
* Reader thread pool: The reader thread pool is activated and can now utilize a corresponding thread pool.
16+
This will boost the performance of clusters in large environments where also users tend to request the status of the
17+
system very often, by using client commands like `qstat`, `qhost` or other commands that send read-only requests
18+
to `sge_qmaster`. The additional data store needs to be enabled manually by setting following qmaster parameter in the
19+
*qmaster_params* of the cluster configuration:
20+
21+
```
22+
> qconf -mconf
23+
...
24+
qmaster_params ...,DISABLE_SECONDARY_DS_READER=false
25+
...
26+
```
27+
28+
Please note that requests answered by the reader thread pool might deliver slightly outdated data compared to the
29+
requests answered with data from the main data store because both data stores can be slightly out of sync. The
30+
maximum deviation can be configured by setting the `MAX_DS_DEVIATION` in milliseconds within in the `qmaster_params`.
31+
32+
```
33+
> qconf -mconf
34+
...
35+
qmaster_params ...,MAX_DS_DEVIATION=1000
36+
...
37+
```
38+
39+
The default value is 1000 milliseconds. The value should be chosen carefully to balance the performance gain with
40+
the accuracy of the data.
41+
42+
With one of the upcoming patches we will introduce an addition concept of automatic-sessions that will allow to
43+
synchronize the data stores more efficiently and so that client commands can be enforced to get the most recent data.
44+
45+
* Enhanced monitoring: The monitoring of `sge_qmaster` has been enhanced to provide more detailed information about
46+
the utilization of the different thread pools. As also in the past the monitoring is enabled by setting the monitor
47+
time:
48+
49+
```
50+
> qconf -mconf
51+
...
52+
qmaster_params ...,MONITOR_TIME=10
53+
...
54+
```
55+
56+
`qping` will then show statistics about the handled requests per thread.
57+
58+
```
59+
qping -i 1 -f <master_host> $SGE_QMASTER_PORT qmaster 1
60+
...
61+
10/11/2024 12:54:53 | reader: runs: 261.04r/s (GDI (a:0.00,g:2871.45,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s
62+
OTHER (ql:0))
63+
out: 261.04m/s APT: 0.0007s/m idle: 80.88% wait: 0.01% time: 9.99s
64+
10/11/2024 12:54:53 | reader: runs: 279.50r/s (GDI (a:0.00,g:3074.50,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s
65+
OTHER (ql:0))
66+
out: 279.50m/s APT: 0.0007s/m idle: 79.08% wait: 0.01% time: 10.00s
67+
10/11/2024 12:54:53 | listener: runs: 268.65r/s (in (g:268.34 a:0.00 e:0.00 r:0.30)/s
68+
GDI (g:0.00,t:0.00,p:0.00)/s)
69+
out: 0.00m/s APT: 0.0001s/m idle: 98.42% wait: 0.00% time: 9.99s
70+
10/11/2024 12:54:53 | listener: runs: 255.37r/s (in (g:255.37 a:0.00 e:0.00 r:0.00)/s GDI (g:0.00,t:0.00,p:0.00)/s)
71+
out: 0.00m/s APT: 0.0001s/m idle: 98.54% wait: 0.00% time: 10.00s
72+
```
73+
74+
## v9.0.0
75+
76+
### qconf support to add/modify/delete/show complex entries individually
477

578
`Qconf` also allows you to add, modify, delete and display complexes individually using the new `-ace`, `-Ace`,
679
`-mce`, `-Mce`, `-sce` and `-scel` switches. Previously this was only possible as a group command for the whole
780
complex set with `-mq`. More information can be found in the qconf(1) man page or by running `qconf -help`.
881

982
(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)
1083

11-
## Added support to supplementary group IDs in user, operator and manager lists.
84+
### Added support to supplementary group IDs in user, operator and manager lists.
1285

1386
Additionally, to user and primary group names, it is now possible to specify supplementary group IDs in user, operator,
1487
and manager lists. User lists can be specified in host, queue, configuration, and parallel environment objects to allow
@@ -30,7 +103,7 @@ in a production environment. Enabling caching services like `nscd` can help redu
30103

31104
(Available in Gridware Cluster Scheduler only)
32105

33-
## New internal architecture to support multiple Data Stores
106+
### New internal architecture to support multiple Data Stores
34107

35108
The internal data architecture of `sge_qmaster` has been changed to support multiple data stores. This change
36109
does not have a major impact currently and is not visible to the user. However, it is a prerequisite for future
@@ -40,7 +113,7 @@ enhancing the performance of the cluster in large environments.
40113

41114
(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)
42115

43-
## New RSMAP (Resource Map) complex type
116+
### New RSMAP (Resource Map) complex type
44117

45118
Resource Maps are a new complex type that allows administrators to define a list of special resources which
46119
are available on a host, e.g. GPU devices, networking devices, lists of network ports, or other special resources.
@@ -111,7 +184,7 @@ qrsh -l port_numbers=8 env | grep SGE_HGR
111184
SGE_HGR_port_number=65000 65001 65002 65003 65004 65005 65006 65007
112185
```
113186

114-
## Per HOST complex variables
187+
### Per HOST complex variables
115188

116189
The definition of complex variables contains the attribute `consumable` which could so far have the following values:
117190

@@ -125,7 +198,7 @@ of parallel jobs multiple tasks are running on the same host, the requested amou
125198
once. E.g. multiple tasks of a parallel job can share the same GPU.
126199

127200

128-
## One-line JSON format for accounting and reporting files
201+
### One-line JSON format for accounting and reporting files
129202

130203
The accounting and reporting files contain one line per record.
131204
The format of the records used to be a column-based format with a fixed number of columns,
@@ -202,7 +275,7 @@ as extensions to the accounting and reporting records (e.g. more exact timestamp
202275
additional usage values like maxrss) are only done in the new format.
203276

204277

205-
## Resource and queue requests per scope (global, master, slave) for parallel jobs
278+
### Resource and queue requests per scope (global, master, slave) for parallel jobs
206279

207280
In former product versions resource requests for parallel jobs were only possible on the global level.
208281
Resource requests for parallel jobs were applied to all tasks of the job, both master task (the job script)

source/daemons/qmaster/sge_c_gdi.cc

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -269,23 +269,17 @@ sge_c_gdi_process_in_listener(sge_gdi_packet_class_t *packet, sge_gdi_task_class
269269
sge_pack_buffer *pb = &(packet->pb);
270270
switch (operation) {
271271
case SGE_GDI_TRIGGER:
272-
#if 0
273-
MONITOR_GDI_TRIG(monitor);
274-
#endif
272+
MONITOR_LIS_GDI_TRIG(monitor);
275273
sge_c_gdi_trigger_in_listener(packet, task, monitor);
276274
sge_gdi_packet_pack_task(packet, task, answer_list, pb);
277275
DRETURN(true);
278276
case SGE_GDI_PERMCHECK:
279-
#if 0
280-
MONITOR_GDI_PERM(monitor);
281-
#endif
277+
MONITOR_LIS_GDI_PERM(monitor);
282278
sge_c_gdi_permcheck(packet, task, monitor);
283279
sge_gdi_packet_pack_task(packet, task, answer_list, pb);
284280
DRETURN(true);
285281
case SGE_GDI_GET:
286-
#if 0
287-
MONITOR_GDI_GET(monitor);
288-
#endif
282+
MONITOR_LIS_GDI_GET(monitor);
289283
sge_c_gdi_get_in_listener(ao, packet, task, monitor);
290284
sge_gdi_packet_pack_task(packet, task, answer_list, pb);
291285
DRETURN(true);

source/daemons/qmaster/sge_thread_reader.cc

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,12 @@
2727
#include "uti/sge_os.h"
2828
#include "uti/sge_profiling.h"
2929
#include "uti/sge_rmon_macros.h"
30+
#include "uti/sge_time.h"
3031

3132
#include "sgeobj/ocs_DataStore.h"
3233

34+
#include "sge_thread_ctrl.h"
35+
3336
#ifdef OBSERVE
3437
# include "cull/cull_observe.h"
3538
#endif
@@ -143,7 +146,7 @@ sge_reader_main(void *arg) {
143146

144147
// init monitoring
145148
cl_thread_func_startup(thread_config);
146-
sge_monitor_init(p_monitor, thread_config->thread_name, GDI_EXT, MT_WARNING, MT_ERROR);
149+
sge_monitor_init(p_monitor, thread_config->thread_name, GDI_EXT, RT_WARNING, RT_ERROR);
147150
sge_qmaster_thread_init(QMASTER, READER_THREAD, true);
148151

149152
/* register at profiling module */
@@ -164,7 +167,8 @@ sge_reader_main(void *arg) {
164167

165168
MONITOR_SET_QLEN(p_monitor, sge_tq_get_task_count(ReaderRequestQueue));
166169

167-
if (packet != nullptr) {
170+
// handle the packet only if it is not nullptr and the shutdown has not started
171+
if (packet != nullptr && !sge_thread_has_shutdown_started()) {
168172
sge_gdi_task_class_t *task;
169173
bool is_only_read_request = true;
170174

@@ -216,7 +220,7 @@ sge_reader_main(void *arg) {
216220

217221
// handle the request (GDI/Report/Ack ...
218222
if (packet->request_type == PACKET_GDI_REQUEST) {
219-
// sge_usleep(3000000);
223+
//sge_usleep(1000000);
220224

221225
task = packet->first_task;
222226
while (task != nullptr) {
@@ -229,6 +233,7 @@ sge_reader_main(void *arg) {
229233
sge_c_report(packet, task, packet->host, packet->commproc, packet->commproc_id, task->data_list, p_monitor);
230234
} else if (packet->request_type == PACKET_ACK_REQUEST) {
231235
task = packet->first_task;
236+
// @TODO: This could be done by listener already?
232237
sge_c_ack(packet, task, p_monitor);
233238
} else {
234239
DPRINTF("unknown request type %d\n", packet->request_type);
@@ -298,14 +303,25 @@ sge_reader_main(void *arg) {
298303
thread_output_profiling("reader thread profiling summary:\n", &next_prof_output);
299304

300305
sge_monitor_output(p_monitor);
301-
} else {
302-
int execute = 0;
306+
}
303307

308+
// pass the cancellation point at least once or stay here if shutdown was triggered
309+
bool shutdown_started = false;
310+
do {
304311
// pthread cancellation point
312+
int execute = 0;
305313
pthread_cleanup_push(sge_reader_cleanup_monitor, static_cast<void *>(p_monitor));
306314
cl_thread_func_testcancel(thread_config);
307315
pthread_cleanup_pop(execute); // cleanup monitor
308-
}
316+
317+
// shutdown in process?
318+
shutdown_started = sge_thread_has_shutdown_started();
319+
320+
// if we will wait here than do not eat up all cpu time
321+
if (shutdown_started) {
322+
sge_usleep(25000);
323+
}
324+
} while (shutdown_started);
309325
}
310326

311327
// Don't add cleanup code here. It will never be executed. Instead, register a cleanup function with

source/daemons/qmaster/sge_thread_worker.cc

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,12 @@
4141
#include "uti/sge_os.h"
4242
#include "uti/sge_profiling.h"
4343
#include "uti/sge_rmon_macros.h"
44+
#include "uti/sge_time.h"
4445

4546
#include "sgeobj/ocs_DataStore.h"
4647

48+
#include "sge_thread_ctrl.h"
49+
4750
#ifdef OBSERVE
4851
# include "cull/cull_observe.h"
4952
#endif
@@ -193,7 +196,7 @@ sge_worker_main(void *arg) {
193196

194197
// init monitoring
195198
cl_thread_func_startup(thread_config);
196-
sge_monitor_init(p_monitor, thread_config->thread_name, GDI_EXT, MT_WARNING, MT_ERROR);
199+
sge_monitor_init(p_monitor, thread_config->thread_name, GDI_EXT, WT_WARNING, WT_ERROR);
197200
sge_qmaster_thread_init(QMASTER, WORKER_THREAD, true);
198201

199202
/* register at profiling module */
@@ -214,6 +217,7 @@ sge_worker_main(void *arg) {
214217

215218
MONITOR_SET_QLEN(p_monitor, sge_tq_get_task_count(GlobalRequestQueue));
216219

220+
// handle the packet only if it is not nullptr and the shutdown has not started
217221
if (packet != nullptr) {
218222
sge_gdi_task_class_t *task;
219223
bool is_only_read_request = true;
@@ -346,14 +350,13 @@ sge_worker_main(void *arg) {
346350
thread_output_profiling("worker thread profiling summary:\n", &next_prof_output);
347351

348352
sge_monitor_output(p_monitor);
349-
} else {
350-
int execute = 0;
351-
352-
// pthread cancellation point
353-
pthread_cleanup_push(sge_worker_cleanup_monitor, static_cast<void *>(p_monitor));
354-
cl_thread_func_testcancel(thread_config);
355-
pthread_cleanup_pop(execute); // cleanup monitor
356353
}
354+
355+
// pthread cancellation point
356+
int execute = 0;
357+
pthread_cleanup_push(sge_worker_cleanup_monitor, static_cast<void *>(p_monitor));
358+
cl_thread_func_testcancel(thread_config);
359+
pthread_cleanup_pop(execute); // cleanup monitor
357360
}
358361

359362
// Don't add cleanup code here. It will never be executed. Instead, register a cleanup function with

source/dist/inst_sge

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
# set -x
4141

4242
SCRIPT_VERSION="9"
43-
SGE_VERSION="9.0.0"
43+
SGE_VERSION="9.0.1beta"
4444

4545
#Reset PATH to a safe value
4646
#

source/libs/gdi/version.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636

3737
#include "gdi/version.h"
3838

39-
const char GDI_VERSION[] = "9.0.0";
39+
const char GDI_VERSION[] = "9.0.1beta";
4040

4141
// TODO: Add also an entry to the table further down below when you change this
4242
// And change SGE_VERSION in dist/inst_sge

source/libs/uti/msg_utilib.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@
154154
#define MSG_UTI_MONITOR_MEMERROREXT _MESSAGE(59133, _("not enough memory for monitor extension"))
155155
#define MSG_UTI_MONITOR_TETEXT_FF _MESSAGE(59134, _("pending: %.2f executed: %.2f/s"))
156156
#define MSG_UTI_MONITOR_EDTEXT_FFFFFFFF _MESSAGE(59135, _("clients: %.2f mod: %.2f/s ack: %.2f/s blocked: %.2f busy: %.2f | events: %.2f/s added: %.2f/s skipt: %.2f/s"))
157-
#define MSG_UTI_MONITOR_LISEXT_FFFF _MESSAGE(59136, _("in (g:%.2f a:%.2f e:%.2f r:%.2f)/s"))
157+
#define MSG_UTI_MONITOR_LISEXT_FFFFFFF _MESSAGE(59136, _("in (g:%.2f a:%.2f e:%.2f r:%.2f)/s GDI (g:%.2f,t:%.2f,p:%.2f)/s"))
158158
#define MSG_UTI_MONITOR_SCHEXT_UUUUUUUUUU _MESSAGE(59137, _("malloc: arena(" sge_U32CFormat ") |ordblks(" sge_U32CFormat ") | smblks(" sge_U32CFormat ") | hblksr(" sge_U32CFormat ") | hblhkd(" sge_U32CFormat ") usmblks(" sge_U32CFormat ") | fsmblks(" sge_U32CFormat ") | uordblks(" sge_U32CFormat ") | fordblks(" sge_U32CFormat ") | keepcost(" sge_U32CFormat ")"))
159159
#define MSG_UTI_DAEMONIZE_CANT_PIPE _MESSAGE(59140, _("can't create pipe"))
160160
#define MSG_UTI_DAEMONIZE_CANT_FCNTL_PIPE _MESSAGE(59141, _("can't set daemonize pipe to not blocking mode"))

0 commit comments

Comments
 (0)