Bug report
When using a ProcessPoolExecutor with forked child processes, if one of the child processes suddenly dies (segmentation fault, not a Python exception) and if simultaneously data is being sent into the call queue, then the parent process hangs forever.
Reproduction
import ctypes
from concurrent.futures import ProcessPoolExecutor
def segfault():
ctypes.string_at(0)
def func(i, data):
print(f"Start {i}.")
if i == 1:
segfault()
print(f"Done {i}.")
return i
data = list(range(100_000_000))
count = 10
with ProcessPoolExecutor(2) as pool:
list(pool.map(func, range(count), [data] * count))
print(f"OK")
In Python 3.8.10 it raises a BrokenProcessPool exception whereas in 3.9.13 and 3.10.5 it hangs.
Analysis
When a crash happens in a child process, all workers are terminated and they stop reading in communication pipes. However if data is being send in the call queue, the call queue thread which writes data from buffer to pipe (multiprocessing.queues.Queue._feed) can get stuck in send_bytes(obj) when the unix pipe it's writing to is full. _ExecutorManagerThread is blocked in self.join_executor_internals() on line
|
self.call_queue.join_thread() |
(called from
self.terminate_broken()). The main thread itself is blocked on
|
self._executor_manager_thread.join() |
coming from the
__exit__ method of the Executor.
Proposed solution
Drain call queue buffer either in terminate_broken method before calling join_executor_internals or in queue close method.
I will create a pull request with a possible implementation.
Your environment
- CPython versions tested on: reproduced in 3.10.5 and 3.9.13 (works well in 3.8.10: BrokenProcessPool exception)
- Operating system and architecture: Linux, x86_64
Linked PRs
Bug report
When using a ProcessPoolExecutor with forked child processes, if one of the child processes suddenly dies (segmentation fault, not a Python exception) and if simultaneously data is being sent into the call queue, then the parent process hangs forever.
Reproduction
In Python 3.8.10 it raises a BrokenProcessPool exception whereas in 3.9.13 and 3.10.5 it hangs.
Analysis
When a crash happens in a child process, all workers are terminated and they stop reading in communication pipes. However if data is being send in the call queue, the call queue thread which writes data from buffer to pipe (
multiprocessing.queues.Queue._feed) can get stuck insend_bytes(obj)when the unix pipe it's writing to is full._ExecutorManagerThreadis blocked inself.join_executor_internals()on linecpython/Lib/concurrent/futures/process.py
Line 515 in da49128
self.terminate_broken()). The main thread itself is blocked oncpython/Lib/concurrent/futures/process.py
Line 775 in da49128
__exit__method of the Executor.Proposed solution
Drain call queue buffer either in
terminate_brokenmethod before callingjoin_executor_internalsor in queueclosemethod.I will create a pull request with a possible implementation.
Your environment
Linked PRs