Processes not cleaned for reuse
I stumbled upon a problem with ProcessPoolExecutor, where processes access
data, they should not be able to. Let me explain:
I have a situation similar to the below example: I got several runs to start
with different arguments each. They compute their stuff in parallel and have no
reason to interact with each other. Now, as I understand it, when a process
forks, it duplicates itself. The child process has the same (memory) data, as
its parent, but should it change anything, it does so on its own copy. If I
would want the changes to survive the lifetime of the child process, I would
call in queues, pipes and other IPC stuff.
But I actually don't! The processes each manipulate data for their own, which
should not carry over to any of the other runs. The example below shows
otherwise, though. The next runs (not parallel running ones) can access the
data of their previous run, implicating, that the data has not been scrubbed
from the process.
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import current_process, set_start_method
class Static:
integer: int = 0
def inprocess(run: int) -> None:
cp = current_process()
# Print current state
print(f"[{run:2d} {} {}] int: {Static.integer}", flush=True)
# Check value
if Static.integer != 0:
raise Exception(f"[{run:2d} {} {}] Variable already set!")
# Update value
Static.integer = run + 1
def pooling():
cp = current_process()
# Get master's pid
print(f"[{} {}] Start")
with ProcessPoolExecutor(max_workers=2) as executor:
for i, _ in enumerate(, range(4))):
print(f"run #{i} finished", flush=True)
if __name__ == '__main__':
set_start_method("fork") # enforce fork
[1998 MainProcess] Start
[ 0 2020 Process-1] int: 0
[ 2 2020 Process-1] int: 1
[ 1 2021 Process-2] int: 0
[ 3 2021 Process-2] int: 2
run #0 finished
run #1 finished
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.6/concurrent/futures/", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.6/concurrent/futures/", line 153, in <listcomp>
return [fn(*args) for args in chunk]
File "<stdin>", line 14, in inprocess
Exception: [ 2 2020 Process-1] Variable already set!
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 29, in <module>
File "<stdin>", line 24, in pooling
File "/usr/lib/python3.6/concurrent/futures/", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.6/concurrent/futures/", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.6/concurrent/futures/", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/", line 384, in __get_result
raise self._exception
Exception: [ 2 2020 Process-1] Variable already set!
This behaviour can also be reproduced with max_workers=1, as the process is
re-used. The start-method has no influence on the error (though only "fork"
seems to use more than one process).
So to summarise: I want each new run in a process with all previous data, but
no new data from any of the other runs. Is that possible? How would I achive
it? Why does the above not do exactly that?
I found multiprocessing.pool.Pool where one can set maxtasksperchild=1, so
a worker process is destroyed, when its task is finished. But I dislike the
multiprocessing interface; the ProcessPoolExecutor is more comfortable to
use. Additionally, the whole idea of the pool is to save process setup time,
which would be dismissed, when killing the hosting process after each run.
Brand new processes in python do not share memory state. However ProcessPoolExecutor reuses process instances. It's a pool of active processes after all. I assume this is done to prevent the OS overhead of stooping and starting processes all the time.
You see the same behavior in other distribution technologies like celery where if you're not careful you can bleed global state between executions.
I recommend you manage your namespace better to encapsulate your data. Using your example, you could for example encapsulate your code and data in a parent class which you instantiate in inprocess(), instead of storing it in a shared namespace like a static field in classes or directly in a module. That way the object will ultimate be cleaned up by the garbage collector:
class State:
def __init__(self):
self.integer: int = 0
def do_stuff():
self.integer += 42
def use_global_function(state):
state.integer -= 1664
def inprocess(run: int) -> None:
cp = current_process()
state = State()
print(f"[{run:2d} {} {}] int: {state.integer}", flush=True)
if state.integer != 0:
raise Exception(f"[{run:2d} {} {}] Variable already set!")
state.integer = run + 1
I have been facing some potentially similar problems and saw some interesting posts a in this one High Memory Usage Using Python Multiprocessing, that points towards using gc.collector(), however in your case it did not worked. So I thought of how the Static class was initialized, some points:
Unfortunately, I cannot reproduce your minimal example the value error prompts:
ValueError: cannot find context for 'fork'
Considering 1, I use set_start_method("spawn")
A quick fix then could be to initialize every time the Static class as below:
class Static:
integer: int = 0
def __init__(self):
def inprocess(run: int) -> None:
cp = current_process()
# Print current state
print(f"[{run:2d} {} {}] int: {Static().integer}", flush=True)
# Check value
if Static().integer != 0:
raise Exception(f"[{run:2d} {} {}] Variable already set!")
# Update value
Static().integer = run + 1
def pooling():
cp = current_process()
# Get master's pid
print(f"[{} {}] Start")
with ProcessPoolExecutor(max_workers=2) as executor:
for i, _ in enumerate(, range(4))):
print(f"run #{i} finished", flush=True)
if __name__ == "__main__":
# set_start_method("fork") # enforce fork , ValueError: cannot find context for 'fork'
set_start_method("spawn") # Alternative
This returns:
[ 0 1424 SpawnProcess-2] int: 0
[ 1 1424 SpawnProcess-2] int: 0
run #0 finished
[ 2 17956 SpawnProcess-1] int: 0
[ 3 1424 SpawnProcess-2] int: 0
run #1 finished
run #2 finished
run #3 finished
I am trying to follow a ROS2 testing tutorial which tests a topic listener to understand how ROS2 testing works. Here is a screenshot of the related code at 21:15
I have a node target_control_node which subscribes the topic turtle1/pose and then move the turtle to a new random pose.
import math
import random
import rclpy
from geometry_msgs.msg import Twist
from rclpy.node import Node
from turtlesim.msg import Pose
class TargetControlNode(Node):
def __init__(self):
self._target_pose = None
self._cmd_vel_publisher = self.create_publisher(Twist, "turtle1/cmd_vel", 10)
self.create_subscription(Pose, "turtle1/pose", self.subscribe_target_pose, 10)
self.create_timer(1.0, self.control_loop)
def subscribe_target_pose(self, msg):
self._target_pose = msg
def control_loop(self):
if self._target_pose is None:
target_x = random.uniform(0.0, 10.0)
target_y = random.uniform(0.0, 10.0)
dist_x = target_x - self._target_pose.x
dist_y = target_y - self._target_pose.y
distance = math.sqrt(dist_x**2 + dist_y**2)
msg = Twist()
# position
msg.linear.x = 1.0 * distance
# orientation
goal_theta = math.atan2(dist_y, dist_x)
diff = goal_theta - self._target_pose.theta
if diff > math.pi:
diff -= 2 * math.pi
elif diff < -math.pi:
diff += 2 * math.pi
msg.angular.z = 2 * diff
def main(args=None):
node = TargetControlNode()
if __name__ == "__main__":
I am trying to write a simple test for the subscription part based on the tutorial above to understand how it works.
Here is my initial test code. Note inside I am using expected_output=str(msg), however, it is wrong, and I am not sure what to put there.
import pathlib
import random
import sys
import time
import unittest
import uuid
import launch
import launch_ros
import launch_testing
import pytest
import rclpy
import std_msgs.msg
from geometry_msgs.msg import Twist
from turtlesim.msg import Pose
def generate_test_description():
src_path = pathlib.Path(__file__).parent.parent
target_control_node = launch_ros.actions.Node(
additional_env={"PYTHONUNBUFFERED": "1"},
return (
"target_control_node": target_control_node,
class TestTargetControlNodeLink(unittest.TestCase):
def setUpClass(cls):
def tearDownClass(cls):
def setUp(self):
self.node = rclpy.create_node("target_control_test_node")
def tearDown(self):
def test_target_control_node(self, target_control_node, proc_output):
pose_pub = self.node.create_publisher(Pose, "turtle1/pose", 10)
msg = Pose()
msg.x = random.uniform(0.0, 10.0)
msg.y = random.uniform(0.0, 10.0)
msg.theta = 0.0
msg.linear_velocity = 0.0
msg.angular_velocity = 0.0
success = proc_output.waitFor(
# `str(msg)` is wrong, however, I am not sure what to put here.
expected_output=str(msg), process=target_control_node, timeout=1.0
assert success
When I run launch_test src/turtle_robot/test/, it only prints this without telling me what is actual output:
[INFO] [launch]: All log files can be found below /home/parallels/.ros/log/2023-01-02-16-37-27-631032-ubuntu-linux-22-04-desktop-1439830
[INFO] [launch]: Default logging verbosity is set to INFO
test_target_control_node (test_target_control_node.TestTargetControlNodeLink) ... [INFO] [python3-1]: process started with pid [1439833]
[python3-1] [INFO] [1672706247.877402445] [target_control_node]: target_control_node
FAIL: test_target_control_node (test_target_control_node.TestTargetControlNodeLink)
Traceback (most recent call last):
File "/my-ros/src/turtle_robot/test/", line 91, in test_target_control_node
assert success
Ran 1 test in 1.061s
FAILED (failures=1)
[INFO] [python3-1]: sending signal 'SIGINT' to process[python3-1]
[python3-1] Traceback (most recent call last):
[python3-1] File "/my-ros/src/turtle_robot/turtle_robot/", line 59, in <module>
[python3-1] main()
[python3-1] File "/my-ros/src/turtle_robot/turtle_robot/", line 53, in main
[python3-1] rclpy.spin(node)
[python3-1] File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/", line 222, in spin
[python3-1] executor.spin_once()
[python3-1] File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/", line 705, in spin_once
[python3-1] handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
[python3-1] File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/", line 691, in wait_for_ready_callbacks
[python3-1] return next(self._cb_iter)
[python3-1] File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/", line 588, in _wait_for_ready_callbacks
[python3-1] wait_set.wait(timeout_nsec)
[python3-1] KeyboardInterrupt
[ERROR] [python3-1]: process has died [pid 1439833, exit code -2, cmd '/usr/bin/python3 /my-ros/src/turtle_robot/turtle_robot/ --ros-args'].
Ran 0 tests in 0.000s
I checked the source code of waitFor, but still no clue.
Is there a way to print the actual output so that I can give correct expected_output? Thanks!
To answer your question(and give some more general tips): You can always print out the msg inside the node. That said, the reason you're getting an error is because msg is a ros message type, meaning it's an object. So by doing str(msg) your expected output will be something like <object of type Pose at (some_address)>; and not the actual message values. Also since it appears you're just testing a subscriber, it doesn't usually make sense to have a unit test that's expecting stdout. I would also note that you're publishing a message before actually waiting for the result, this should always fail. The reason is because by the time your subscriber is started, the message has already been published and is gone.
Finally, you shouldn't have a publisher included in any unit tests. It adds an extra dependency and since it's a prepackaged transport layer testing if it works is irrelevant. Instead to fix your problem you should simply be calling the individual methods of your node directly. Basically import the script, instantiate the object, and don't try to deal with the whole node lifecycle.
Edit based on comments
Looking at your subscriber code, you'll need to actually print something out. Unfortunately because it's not a std_msg(i.e. it has more fields than just .data) you'll need to decide how you want to go about confirming the data is right. You could simply look at one field or all of them in order. For example you might have in your test:
success = proc_output.waitFor(
process=target_control_node, timeout=1.0)
And in your control node:
def subscribe_target_pose(self, msg):
self._target_pose = msg
That said, this IO handling method doesn't seem like the best to me. Mainly because it relies on stdout which isn't something you always want.
I have a function which is running twice in two parallel processes. Lets call it - parentFunction().
Each process ends with a dictionary which is added to a common list which gives a list of two dictionaries. This I solved by using preset list using manager.
Now, inside parentFunction() L would like to run two parallel processes, each gives one variable to the dictionary. I tried to do this with preset dictionary using manager
At the end I`m converting the list of dictionaries to pandas data frame.
def I(D, a):
D["a"] = a
def II(D, b):
D["a"] = b
def task(L, x):
x = 0
a = 1
b = 2
manager = Manager()
D = manager.dict() # <-- can be shared between processes.
pI = Process(target=I, args=(D, 0))
pII = Process(target=II, args=(D, 0))
if __name__ == "__main__":
with Manager() as manager:
L = manager.list() # <-- can be shared between processes.
p1 = Process(target=task, args=(L, 0)) # Passing the list
p2 = Process(target=task, args=(L, 0)) # Passing the list
returns error:
TypeError: task() missing 1 required positional argument: 'L'
Traceback (most recent call last):
File "C:\Users\user\AppData\Roaming\JetBrains\PyCharmCE2021.2\scratches\", line 88, in <module>
File "<string>", line 2, in __getitem__
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 810, in _callmethod
kind, result = conn.recv()
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 256, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 934, in RebuildProxy
return func(token, serializer, incref=incref, **kwds)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 784, in __init__
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 838, in _incref
conn = self._Client(self._token.address, authkey=self._authkey)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 505, in Client
c = PipeClient(address)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\multiprocessing\", line 707, in PipeClient
_winapi.WaitNamedPipe(address, 1000)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The source you posted does not seem to match your stack trace. You would only get a FileNotFoundException when the main process tries to enumerate any objects within list L with a statement such as print(list(L)), which I see in the stack trace but not in your code. It helps when you post the actual code causing the exception. But here is the cause of your problem:
When you create a new manager with the call manager = Manager() a new process is created and any objects that are created via the manager "live" in the same address space and process as that manager. You are creating two manager processes, once in the main process and once in the child process task. It is in the latter that the dictionary, D is created. When that process terminates the manager process terminates too along with any objects created by that manager. So when the main process attempts to print the list L, the proxy object within it, D, no longer points to an existing object. The solution is to have the main process create the dictionary, D, and pass it to the task child process:
from multiprocessing import Process, Manager
def I(D, a):
D["a"] = a
def II(D, b):
D["a"] = b
def task(L, D, x):
x = 0
a = 1
b = 2
pI = Process(target=I, args=(D, 0))
pII = Process(target=II, args=(D, 0))
if __name__ == "__main__":
with Manager() as manager:
L = manager.list() # <-- can be shared between processes.
D = manager.dict() # <-- can be shared between processes.
p = Process(target=task, args=(L, D, 0)) # Passing the list
{'a': 0}
I have an iterator that will retrive various number of lines from a very large (>20GB) file depend on some features. The iterator works fine, but I can only use 1 thread to process the result. I would like to feed the value from each iteration to multiple threads / processes.
I'm using a text file with 9 lines to mimic my data, here is my code. I've been struggling on how to create the feedback so when one process finished, it will go and retrive the next iteration:
from multiprocessing import Process, Manager
import time
# Iterator
class read_file(object):
def __init__(self, filePath):
self.file = open(filePath, 'r')
def __iter__(self):
return self
def __next__(self):
line = self.file.readline()
if line:
return line
raise StopIteration
# worker for one process
def print_worker(a, n, stat):
stat[n] = True # Set the finished status as True
return None
# main
def main():
file_path = 'tst_mp.txt' # the txt file wit 9 lines
n_worker = 2
file_handle = read_file(file_path)
workers = []
# Create shared list for store dereplicated dict and progress counter
manager = Manager()
status = manager.list([False] * 2) # list of dictonary for each thread
# Initiate the workers
for i in range(n_worker):
workers.append(Process(target=print_worker, args=(file_handle.__next__(), i, status,)))
for worker in workers:
block = file_handle.__next__() # The next block (line)
while block: # continue is there is still block left
time.sleep(1) # for every second
for i in range(2):
if status[i]: # Worker i finished
# workers[i].close()
workers[i] = Process(target=print_worker, args=(block, i, status,))
status[i] = False # Set worker i as busy (False)
workers[i].start() # Start worker i
try: # try to get the next item in the iterator
block = file_handle.__next__()
except StopIteration:
block = False
if __name__ == '__main__':
The code is clumsy, but it did print out the sequence, but also with some error when I ran the code twice:
Process Process-10:
Traceback (most recent call last):
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 802, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 315, in _bootstrap
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/zewei/share/paf_depth/", line 31, in print_worker
stat[n] = True # Set the finished status as True
File "<string>", line 2, in __setitem__
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 806, in _callmethod
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 794, in _connect
dispatch(conn, None, 'accept_connection', (name,))
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 90, in dispatch
kind, result = c.recv()
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 255, in recv
buf = self._recv_bytes()
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/zewei/mambaforge/lib/python3.9/multiprocessing/", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Here is where I'm stucked, I was wondering if there is any fix or more elegant way for this?
Here's a better way to do what you are doing, using pool:
from multiprocessing import Pool
import time
# worker for one process
def print_worker(a):
return None
def main():
file_path = r'' # the txt file wit 9 lines
n_worker = 2
file_handle = read_file(file_path)
results = []
with Pool(n_worker) as pool:
for result in pool.imap(print_worker, file_handle):
if __name__ == '__main__':
Here, the imap function lazily iterates over the iterator, so that the whole file won't be read into memory. Pool handles spreading the tasks across the number of processes you started (using n_worker) automatically so that you don't have to manage it yourself.
I had an issue with os.path.exists() when working with UNCs and network paths in general.
Some servers tend to die in weird fashion that instead of returning error they hang for 130 seconds and then return False (I guess samba has some weird fetish timeout that I was unable find and configure).
So my question is: How to timeout such (atomic) operations?
I just need it to finish in under 2 seconds. I've tried using threading and mutable object for value returning like this:
import time
import threading
import os.path
class ValueContainer(object):
''' Generic imutable type
def __init__(self, value=None):
self.value = value
def timed_out_function(target, timeout, *args, **kwargs):
''' Times out function execution
val = ValueContainer()
# Inner function that passes result to mutable type
def inner(retval, *args, **kwargs):
retval.value = target(*args, **kwargs)
# Create thread with this function
t = threading.Thread(target=inner, args=(val, ) + args, kwargs=kwargs)
t.daemon = True
# Just wait for it
if t.is_alive():
raise Exception('Timeout')
return val.value
print(timed_out_function(os.path.exists, 2, ''))
print(timed_out_function(os.path.exists, 2, 'Nope nope nope'))
timed_out_function(time.sleep, 2, 5)
# True
# False
# Traceback (most recent call last):
# File "D:\tmp\", line 39, in <module>
# timed_out_function(time.sleep, 2, 5)
# File "D:\tmp\", line 32, in timed_out_function
# raise Exception('Timeout')
# Exception: Timeout
But I'm not sure whether that won't create too many parallel IO requests (there's continuous stream of requests one after another each 2 seconds handing for 130), threads or some similar issue.
Do you have any experience with this kind of workarounds?
Apologies in advance, but I am unable to post a fully working example (too much overhead in this code to distill to a runnable snippet). I will post as much explanatory detail as I can, and please do let me know if anything critical seems missing.
Running Python 2.7.5 through IDLE
I am writing a program to compare two text files. Since the files can be large (~500MB) and each row comparison is independent, I would like to implement multiprocessing to speed up the comparison. This is working pretty well, but I am getting stuck on a pseudo-random Bad file descriptor error. I am new to multiprocessing, so I guess there is a technical problem with my implementation. Can anyone point me in the right direction?
Here is the code causing the trouble (specifically the
# openfiles
csvReaderTest = csv.reader(open(testpath, 'r'))
csvReaderProd = csv.reader(open(prodpath, 'r'))
compwriter = csv.writer(open(outpath, 'wb'))
pool = Pool()
num_chunks = 3
chunksTest = itertools.groupby(csvReaderTest, keyfunc)
chunksProd = itertools.groupby(csvReaderProd, keyfunc)
while True:
# make a list of num_chunks chunks
groupsTest = [list(chunk) for key, chunk in itertools.islice(chunksTest, num_chunks)]
groupsProd = [list(chunk) for key, chunk in itertools.islice(chunksProd, num_chunks)]
# merge the two lists (pair off comparison rows)
groups_combined = zip(groupsTest,groupsProd)
if groups_combined:
a_args = groups_combined # a list - set of combinations to be tested
second_arg = True
worker_result =, itertools.izip(itertools.repeat(second_arg),a_args))
Here is the full error output. (This error sometimes occurs, and other times the comparison runs to finish without problems):
Traceback (most recent call last):
File "H:/<PATH_SNIP>/", line 407, in <module>
main(fileTest, fileProd, fileout, stringFields, checkFileLengths)
File "H:/<PATH_SNIP>/", line 306, in main
worker_result =, itertools.izip(itertools.repeat(second_arg),a_args))
File "C:\Python27\lib\multiprocessing\", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Python27\lib\multiprocessing\", line 554, in get
raise self._value
IOError: [Errno 9] Bad file descriptor
If it helps, here are the functions called by
def worker_mini(flag, chunk):
row_comp = []
for entry, entry2 in zip(chunk[0][0], chunk[1][0]):
if entry == entry2:
temp_comp = entry
temp_comp = '%s|%s' % (entry, entry2)
return True, row_comp
#takes a single tuple argument and unpacks the tuple to multiple arguments
def worker_mini_star(flag_chunk):
"""Convert `f([1,2])` to `f(1,2)` call."""
return worker_mini(*flag_chunk)
def main():