Get current process id when using `concurrent.futures.ProcessPoolExecutor` - python

I am using concurent.futures.ProcessPoolExecutor has a high level API to multiprocessing.
I want to identify the current process in the worker functions.
With the low level API multiprocessing I can do it like this.
import multiprocessing
def worker():
print(multiprocessing.current_process())
Is there a current_process() pendant when using workers with the ProcessPoolExecutor()?

Since each execution happens in a separate process, you can simply do
import os
def worker():
# Get the process ID of the current process
pid = os.getpid()
..
.. do something with pid
For example,
from concurrent.futures import ProcessPoolExecutor
import os
import time
def task():
time.sleep(1)
print("Executing on Process {}".format(os.getpid()))
def main():
with ProcessPoolExecutor(max_workers=3) as executor:
for i in range(3):
executor.submit(task)
if __name__ == '__main__':
main()
➜ python3.9 so.py
Executing on Process 71137
Executing on Process 71136
Executing on Process 71138
Note that if the task in hand is small and executed fast enough, your pid might stay the same. Try it out by removing the time.sleep call from my example.

Related

Real time multipocess stdout monitoring

Right now, I'm using subprocess to run a long-running job in the background. For multiple reasons (PyInstaller + AWS CLI) I can't use subprocess anymore.
Is there an easy way to achieve the same thing as below ? Running a long running python function in a multiprocess pool (or something else) and do real time processing of stdout/stderr ?
import subprocess
process = subprocess.Popen(
["python", "long-job.py"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True,
)
while True:
out = process.stdout.read(2000).decode()
if not out:
err = process.stderr.read().decode()
else:
err = ""
if (out == "" or err == "") and process.poll() is not None:
break
live_stdout_process(out)
Thanks
getting it cross platform is messy .... first of all windows implementation of non-blocking pipe is not user friendly or portable.
one option is to just have your application read its command line arguments and conditionally execute a file, and you get to use subprocess since you will be launching yourself with different argument.
but to keep it to multiprocessing :
the output must be logged to queues instead of pipes.
you need the child to execute a python file, this can be done using runpy to execute the file as __main__.
this runpy function should run under a multiprocessing child, this child must first redirect its stdout and stderr in the initializer.
when an error happens, your main application must catch it .... but if it is too busy reading the output it won't be able to wait for the error, so a child thread has to start the multiprocess and wait for the error.
the main process has to create the queues and launch the child thread and read the output.
putting it all together:
import multiprocessing
from multiprocessing import Queue
import sys
import concurrent.futures
import threading
import traceback
import runpy
import time
class StdoutQueueWrapper:
def __init__(self,queue:Queue):
self._queue = queue
def write(self,text):
self._queue.put(text)
def flush(self):
pass
def function_to_run():
# runpy.run_path("long-job.py",run_name="__main__") # run long-job.py
print("hello") # print something
raise ValueError # error out
def initializer(stdout_queue: Queue,stderr_queue: Queue):
sys.stdout = StdoutQueueWrapper(stdout_queue)
sys.stderr = StdoutQueueWrapper(stderr_queue)
def thread_function(child_stdout_queue,child_stderr_queue):
with concurrent.futures.ProcessPoolExecutor(1, initializer=initializer,
initargs=(child_stdout_queue, child_stderr_queue)) as pool:
result = pool.submit(function_to_run)
try:
result.result()
except Exception as e:
child_stderr_queue.put(traceback.format_exc())
if __name__ == "__main__":
child_stdout_queue = multiprocessing.Queue()
child_stderr_queue = multiprocessing.Queue()
child_thread = threading.Thread(target=thread_function,args=(child_stdout_queue,child_stderr_queue),daemon=True)
child_thread.start()
while True:
while not child_stdout_queue.empty():
var = child_stdout_queue.get()
print(var,end='')
while not child_stderr_queue.empty():
var = child_stderr_queue.get()
print(var,end='')
if not child_thread.is_alive():
break
time.sleep(0.01) # check output every 0.01 seconds
Note that a direct consequence of running as a multiprocess is that if the child runs into a segmentation fault or some unrecoverable error the parent will also die, hencing running yourself under subprocess might seem a better option if segfaults are expected.

limit number of threads used by a child process launched with ``multiprocessing.Process`

I'm trying to launch a function (my_function) and stop its execution after a certain time is reached.
So i challenged multiprocessing library and everything works well. Here is the code, where my_function() has been changed to only create a dummy message.
from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
import time
timeout=1
# timeout=3
def my_function(something):
time.sleep(2)
return f'my message: {something}'
def wrapper(something, queue):
message ="too late..."
try:
message = my_function(something)
return message
finally:
queue.put(message)
try:
queue = Queue()
params = ("hello", queue)
child_process = Process(target=wrapper, args=params)
child_process.start()
output = queue.get(timeout=timeout)
print(f"ok: {output}")
except Empty:
timeout_message = f"Timeout {timeout}s reached"
print(timeout_message)
finally:
if 'child_process' in locals():
child_process.kill()
You can test and verify that depending on timeout=1 or timeout=3, i can trigger an error or not.
My main problem is that the real my_function() is a torch model inference for which i would like to limit the number of threads (to 4 let's say)
One can easily do so if my_function were in the main process, but in my example i tried a lot of tricks to limit it in the child process without any success (using threadpoolctl.threadpool_limits(4), torch.set_num_threads(4), os.environ["OMP_NUM_THREADS"]=4, os.environ["MKL_NUM_THREADS"]=4).
I'm completely open to other solution that can monitor the time execution of a function while limiting the number of threads used by this function.
thanks
Regards
You can limit simultaneous process with Pool. (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
You can set max tasks done per child. Check it out.
Here you have a sample from superfastpython by Jason Brownlee:
# SuperFastPython.com
# example of limiting the number of tasks per child in the process pool
from time import sleep
from multiprocessing.pool import Pool
from multiprocessing import current_process
# task executed in a worker process
def task(value):
# get the current process
process = current_process()
# report a message
print(f'Worker is {process.name} with {value}', flush=True)
# block for a moment
sleep(1)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(2, maxtasksperchild=3) as pool:
# issue tasks to the process pool
for i in range(10):
pool.apply_async(task, args=(i,))
# close the process pool
pool.close()
# wait for all tasks to complete
pool.join()

How to manage the exit of a process without blocking its thread in Python?

I'm trying to code a kind of task manager in Python. It's based on a job queue, the main thread is in charge of adding jobs to this queue. I have made this class to handle the jobs queued, able to limit the number of concurrent processes and handle the output of the finished processes.
Here comes the problem, the _check_jobs function I don't get updated the returncode value of each process, independently of its status (running, finished...) job.returncode is always None, therefore I can't run if statement and remove jobs from the processing job list.
I know it can be done with process.communicate() or process.wait() but I don't want to block the thread that launches the processes. Is there any other way to do it, maybe using a ProcessPoolExecutor? The queue can be hit by processes at any time and I need to be able to handle them.
Thank you all for your time and support :)
from queue import Queue
import subprocess
from threading import Thread
from time import sleep
class JobQueueManager(Queue):
def __init__(self, maxsize: int):
super().__init__(maxsize)
self.processing_jobs = []
self.process = None
self.jobs_launcher=Thread(target=self._worker_job)
self.processing_jobs_checker=Thread(target=self._check_jobs_status)
self.jobs_launcher.start()
self.processing_jobs_checker.start()
def _worker_job(self):
while True:
# Run at max 3 jobs concurrently
if self.not_empty and len(self.processing_jobs) < 3:
# Get job from queue
job = self.get()
# Execute a task without blocking the thread
self.process = subprocess.Popen(job)
self.processing_jobs.append(self.process)
# util if queue.join() is used to block the queue
self.task_done()
else:
print("Waiting 4s for jobs")
sleep(4)
def _check_jobs_status(self):
while True:
# Check if jobs are finished
for job in self.processing_jobs:
# Sucessfully completed
if job.returncode == 0:
self.processing_jobs.remove(job)
# Wait 4 seconds and repeat
sleep(4)
def main():
q = JobQueueManager(100)
task = ["stress", "--cpu", "1", "--timeout", "20"]
for i in range(10): #put 10 tasks in the queue
q.put(task)
q.join() #block until all tasks are done
if __name__ == "__main__":
main()
I answer myself, I have come up with a working solution. The JobExecutor class handles in a custom way the Pool of processes. The watch_completed_tasks function tries to watch and handle the output of the tasks when they are done. This way everything is done with only two threads and the main thread is not blocked when submitting processes.
import subprocess
from threading import Timer
from concurrent.futures import ProcessPoolExecutor, as_completed
import logging
def launch_job(job):
process = subprocess.Popen(job, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"launching {process.pid}")
return [process.pid, process.stdout.read(), process.stderr.read()]
class JobExecutor(ProcessPoolExecutor):
def __init__(self, max_workers: int):
super().__init__(max_workers)
self.futures = []
self.watch_completed_tasks()
def submit(self, command):
future = super().submit(launch_job, command)
self.futures.append(future)
return future
def watch_completed_tasks(self):
# Manage tasks completion
for completed_task in as_completed(self.futures):
print(f"FINISHED task with PID {completed_task.result()[0]}")
self.futures.remove(completed_task)
# call this function evevery 5 seconds
timer_thread = Timer(5.0, self.watch_completed_tasks)
timer_thread.setName("TasksWatcher")
timer_thread.start()
def main():
executor = JobExecutor(max_workers=5)
for i in range(10):
task = ["stress",
"--cpu", "1",
"--timeout", str(i+5)]
executor.submit(task)

Python thread.Timer() not works in my process

import os
import sys
from multiprocessing import Process, Queue
import threading
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
print 'Hello :D'
threading.Timer(1, self.say_hello_again_and_again).start()
test = Test()
#test.say_hello_again_and_again()
process = Process(target=test.say_hello_again_and_again)
process.start()
this is test code.
the result:
pi#raspberrypi:~/Plant2 $ python test2.py
__init__ is called
Hello :D
If I use test.say_hello_again_and_again() , "Hello :D" is printed repeatedly.
But, process is not working as I expected. Why is "Hello :D" not being printed in my process?
What's happening in my process?
There are two problems with your code:
First: You start a process with start(). This is doing a fork, that means now you have two processes, the parent and the child running side by side. Now, the parent process immediately exits, because after start() it's the end of the program. To wait until the child has finished (which in your case is never), you have to add process.join().
I did test your suggestion, but it not works
Indeed. There is a second issue: You start a new thread with threading.Timer(1, ...).start() but then immediately end the process. Now, you don't wait until your thread started because the underlying process immediately dies. You'd need to also wait until the thread has stopped with join().
Now that's how your program would look like:
from multiprocessing import Process
import threading
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
print 'Hello :D'
timer = threading.Timer(1, self.say_hello_again_and_again)
timer.start()
timer.join()
test = Test()
process = Process(target=test.say_hello_again_and_again)
process.start()
process.join()
But this is suboptimal at best because you mix multiprocessing (which is using fork to start independent processes) and threading (which starts a thread within the process). While this is not really a problem it makes debugging a lot harder (one problem e.g. with the code above is that you can't stop it with ctrl-c because of some reason your spawned process is inherited by the OS and kept running). Why don't you just do this?
from multiprocessing import Process, Queue
import time
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
while True:
print 'Hello :D'
time.sleep(1)
test = Test()
process = Process(target=test.say_hello_again_and_again)
process.start()
process.join()

Allow Greenlet to finish work at end of main module execution

I'm making a library that uses gevent to do some work asynchronously. I'd like to guarantee that the work is completed, even if the main module finishes execution.
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
from gevent import spawn
spawn(self._do)
def _do(self):
from gevent import sleep
sleep(1)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()
If you run this, you'll notice the program ends immediately, and Done! doesn't get printed.
Now, the main module doesn't know, or care, how separate_library actually accomplishes the work (or even that gevent is being used), so it's unreasonable to require joining there.
Is there any way separate_library can detect certain types of program exits, and stall until the work is done? Keyboard interrupts, SIGINTs, and sys.exit() should end the program immediately, as that is probably the expected behaviour.
Thanks!
Try using a new thread that is not a daemon thread that spawns your gevent threads. Your program will not exit due to this non daemon thread.
import gevent
import threading
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
t = threading.Thread(target=self.spawn_gthreads)
t.setDaemon(False)
t.start()
def spawn_gthreads(self):
from gevent import spawn
gthreads = [spawn(self._do,x) for x in range(10)]
gevent.joinall(gthreads)
def _do(self,sec):
from gevent import sleep
sleep(sec)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()

Categories

Resources