Deadlock when process killed

When a subprocess managed by taskgraph is killed by the operating system, the pool will automatically spawn a new process, but not the events that have been allocated to that process.  Because of this, the graph deadlocks waiting on events that will never be triggered.

To reproduce:
1. Create a new graph in multiprocessed mode (`n_workers >= 1`)
2. Execute a task
3. Kill that task before it completes
4. Observe graph hanging

A practical way to trigger this is to use a memory-constrained environment such as Sherlock.  On Sherlock, just make sure we have at least 1 task that uses more memory than we have requested for the SLURM job.

Although I suppose it might be ideal to have the appropriate events recreated so the graph can continue to execute, I think it might be better to simply detect that the process has been terminated and then terminate the graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deadlock when process killed #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deadlock when process killed #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions