Code written using multiprocessing module is not giving any output - python

So I am trying to learn multiprocessing module and have written a code(below) where 4 processes are generated and are assigned 8 jobs (in the processor function) and each job contains just a sleep function (in the example job function).Now I have written the similar code in multithreading module there is was working fine but here it is not outputting any thing.Please help
from multiprocessing import Process, Lock
import multiprocessing
import time
print_lock = Lock()
def exampleJob(worker): # function simulating some computation
time.sleep(.5)
print_lock.acquire()
print(multiprocessing.current_process.pid,worker)
print_lock.release()
def processor(): #function where process pick up the job
while True:
worker = q.get()
exampleJob(worker)
q.task_done()
q = multiprocessing.JoinableQueue()
process = []
for x in range(4):
p = multiprocessing.Process(target=processor)
process.append(p)
for i in range(0,len(process)):
process[i].start
start = time.time()
for worker in range(8):
q.put(worker)
q.join()
print('Entire job took:',time.time() - start)

The first problem is start needs to be start().
Also, separate processes have separate global variables, so print_lock = Lock() is a different lock in each process. You have to create the lock once and pass it to the individual processes. This goes for the queue as well.
A JoinableQueue isn't really needed. What's needed is a sentinel flag to tell the processes to exit, and join the processes.
Working example with other fixes:
import multiprocessing as mp
import time
def exampleJob(print_lock,worker): # function simulating some computation
time.sleep(.5)
with print_lock:
print(mp.current_process().name,worker)
def processor(print_lock,q): # function where process pick up the job
while True:
worker = q.get()
if worker is None: # flag to exit the process
break
exampleJob(print_lock,worker)
# This "if" required for portability in some OSes.
# Windows for example creates new Python processes and imports the original script.
# Without this the below code would run again in each child process.
if __name__ == '__main__':
print_lock = mp.Lock()
q = mp.Queue()
processes = [mp.Process(target=processor,args=(print_lock,q)) for _ in range(4)]
for process in processes:
process.start() # OP code didn't *call* the start method.
start = time.time()
for worker in range(8):
q.put(worker)
for process in processes:
q.put(None) # quit indicator
for process in processes:
process.join()
print('Entire job took:',time.time() - start)
Output:
Process-2 2
Process-1 0
Process-3 1
Process-4 3
Process-3 6
Process-1 5
Process-2 4
Process-4 7
Entire job took: 1.1350018978118896

Related

How to manage the exit of a process without blocking its thread in Python?

I'm trying to code a kind of task manager in Python. It's based on a job queue, the main thread is in charge of adding jobs to this queue. I have made this class to handle the jobs queued, able to limit the number of concurrent processes and handle the output of the finished processes.
Here comes the problem, the _check_jobs function I don't get updated the returncode value of each process, independently of its status (running, finished...) job.returncode is always None, therefore I can't run if statement and remove jobs from the processing job list.
I know it can be done with process.communicate() or process.wait() but I don't want to block the thread that launches the processes. Is there any other way to do it, maybe using a ProcessPoolExecutor? The queue can be hit by processes at any time and I need to be able to handle them.
Thank you all for your time and support :)
from queue import Queue
import subprocess
from threading import Thread
from time import sleep
class JobQueueManager(Queue):
def __init__(self, maxsize: int):
super().__init__(maxsize)
self.processing_jobs = []
self.process = None
self.jobs_launcher=Thread(target=self._worker_job)
self.processing_jobs_checker=Thread(target=self._check_jobs_status)
self.jobs_launcher.start()
self.processing_jobs_checker.start()
def _worker_job(self):
while True:
# Run at max 3 jobs concurrently
if self.not_empty and len(self.processing_jobs) < 3:
# Get job from queue
job = self.get()
# Execute a task without blocking the thread
self.process = subprocess.Popen(job)
self.processing_jobs.append(self.process)
# util if queue.join() is used to block the queue
self.task_done()
else:
print("Waiting 4s for jobs")
sleep(4)
def _check_jobs_status(self):
while True:
# Check if jobs are finished
for job in self.processing_jobs:
# Sucessfully completed
if job.returncode == 0:
self.processing_jobs.remove(job)
# Wait 4 seconds and repeat
sleep(4)
def main():
q = JobQueueManager(100)
task = ["stress", "--cpu", "1", "--timeout", "20"]
for i in range(10): #put 10 tasks in the queue
q.put(task)
q.join() #block until all tasks are done
if __name__ == "__main__":
main()
I answer myself, I have come up with a working solution. The JobExecutor class handles in a custom way the Pool of processes. The watch_completed_tasks function tries to watch and handle the output of the tasks when they are done. This way everything is done with only two threads and the main thread is not blocked when submitting processes.
import subprocess
from threading import Timer
from concurrent.futures import ProcessPoolExecutor, as_completed
import logging
def launch_job(job):
process = subprocess.Popen(job, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"launching {process.pid}")
return [process.pid, process.stdout.read(), process.stderr.read()]
class JobExecutor(ProcessPoolExecutor):
def __init__(self, max_workers: int):
super().__init__(max_workers)
self.futures = []
self.watch_completed_tasks()
def submit(self, command):
future = super().submit(launch_job, command)
self.futures.append(future)
return future
def watch_completed_tasks(self):
# Manage tasks completion
for completed_task in as_completed(self.futures):
print(f"FINISHED task with PID {completed_task.result()[0]}")
self.futures.remove(completed_task)
# call this function evevery 5 seconds
timer_thread = Timer(5.0, self.watch_completed_tasks)
timer_thread.setName("TasksWatcher")
timer_thread.start()
def main():
executor = JobExecutor(max_workers=5)
for i in range(10):
task = ["stress",
"--cpu", "1",
"--timeout", str(i+5)]
executor.submit(task)

multiprocessing.Process calls main function from process creation line to end of line repetitively?

I have two files. One is to create and return a process. Another is to create many multiple processes asynchronously.
The problem I have is, why line print("all process created") (line 13) in the second file has executed 3 times?
process1.py
import time
import multiprocessing
def wait(s):
print(f"Waiting for {s} seconds...")
time.sleep(s)
print(f"Done Waiting for {s} seconds...")
def create_process(sec):
p1 = multiprocessing.Process(target=wait, args=(sec, ))
p1.start()
return p1
main_file.py
from process1 import create_process
import time
procs = []
def many_process():
global procs
if __name__ == "__main__":
for i in range(1,4):
print(f"creating process to sleep {i}")
p = create_process(i)
procs += [p]
print("all process created")
many_process()
for p in procs:
p.join()
output:
creating process to sleep 1
creating process to sleep 2
creating process to sleep 3
all process created
all process created
Waiting for 1 seconds...
all process created
Waiting for 3 seconds...
all process created
Waiting for 2 seconds...
Done Waiting for 1 seconds...
Done Waiting for 2 seconds...
Done Waiting for 3 seconds...
Multiprocessing can fork or spawn processes. Since Windows doesn't support fork, spawn is its only option. When spawning, a new python instance is created and must be initialized to both load the worker code and build the environment for it to execute. That include importing modules, including the script that started it all.
For this to be successful, the modules must be import safe. That is, mere import doesn't run more code than you want it to. In your case, the extra code was reasonably benign. Since you used an if to keep many_process from creating processes on import, all that happened is that the print that should also have been in the if spit out incorrect information.
But really, the if should be higher up the code than that. The create process function should not have been run at all.
main_file.py
from process1 import create_process
import time
def many_process():
global procs
for i in range(1,4):
print(f"creating process to sleep {i}")
p = create_process(i)
procs += [p]
print("all process created")
if __name__=="__main__":
# emulating windows on other platforms for test, remove in real code
import multiprocessing
multiprocessing.set_start_method("spawn")
procs = []
many_process()
for p in procs:
p.join()

killing Finished threads in python

My multi-threading script raising this error :
thread.error : can't start new thread
when it reached 460 threads :
threading.active_count() = 460
I assume the old threads keeps stack up, since the script didn't kill them. This is my code:
import threading
import Queue
import time
import os
import csv
def main(worker):
#Do Work
print worker
return
def threader():
while True:
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
global q
global city
q = Queue.Queue()
for x in range(20):
t = threading.Thread(target=threader)
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
for worker in workers:
q.put(worker)
q.join()
How do I kill the old threads when their job is done? (Is return not enough?)
i'm sure the old threads work is done as i'm printing the results , but i'm not sure why they still active afterward , any direct way to kill a thread after it finish his work ?

Not able to exchange object/ timeout a child process using multiprocessing.Process() Python

From the main process I am spawning a new process using multiprocessing.Process.
My aim is to do a heavy CPU intensive task in the child process and if the task takes too long (using timeout_in variable) to finish, then terminate it with a response else compute and get back the result from this task in the child process.
I am able to terminate if it is taking too long, but I am not able to get the object (result) in case of no forced termination of child process.
from multiprocessing import Process,Queue
def do_threading(function,argument, timeout_in=1):
# Making a queue for data exchange
q = Queue()
# Start function as a process
p = Process(target=function, args=(argument,q,))
p.start()
# Wait for 10 seconds or until process finishes
p.join(timeout_in)
# If thread is still active
if p.is_alive():
print("running... let's kill it...")
# print(q.get())
# Terminate
p.terminate()
p.join()
def do_big_job(argument, q):
# Do something with passed argument
print(argument)
# heavy computation
result = 2**1234567
# print("in child thread ",result)
# Putting result in the queue for exchange
q.put(result)
def main_2():
print("Main thread starting...")
do_threading( do_big_job, "Child thread starting...", timeout_in=10)
if __name__ == '__main__':
main_2()
I think the problem come from the fact that you create the Queue inside do_threading. So when your calculation runs normally (no timeout), the function is terminated and the queue with it.
Here is an alternative code that works if there is no timeout:
from multiprocessing import Process,Queue
def do_threading(q,function,argument, timeout_in=1):
# Start function as a process
p = Process(target=function, args=(argument,q,))
p.start()
# Wait for 10 seconds or until process finishes
p.join(timeout_in)
print "time out"
# If thread is still active
if p.is_alive():
print("running... let's kill it...")
# print(q.get())
# Terminate
p.terminate()
print "terminate"
p.join()
def do_big_job(argument, q):
# Do something with passed argument
print(argument)
# heavy computation
result = 2**123
# print("in child thread ",result)
# Putting result in the queue for exchange
q.put(result)
if __name__ == '__main__':
q = Queue() # Creating the queue in the main allows you to access it anytime
print("Main thread starting...")
do_threading( q, do_big_job, "Child thread starting...", timeout_in=10)
if q.empty():
pass
else:
print(q.get()) # get your result here.
Try to catch timeout exception in queue instead of process, for example:
...
from multiprocessing.queues import Empty
...
def do_threading(q,function,argument, timeout_in=1):
# Start function as a process
p = Process(target=function, args=(argument,q,))
p.start()
try:
print(q.get(True, timeout_in))
except Empty:
print "time out"
p.terminate()
p.join()
or you can able got result in else from your code:
...
# If thread is still active
if p.is_alive():
print("running... let's kill it...")
# Terminate
p.terminate()
else:
print(q.get())

Synchronize pool of workers - Python and multiproccessing

I want to make an synchronized simulation of graph coloring. To create the graph (tree) I am using igraph package and to synchronization I am using for the first time multiprocessing package. I built a graph where each node has attributes: label, color and parentColor. To color the tree I excecute the following function (I am not giving the full code because it is very long, and I think not necessary to solve my problem):
def sixColor(self):
root = self.graph.vs.find("root")
root["color"] = self.takeColorFromList(root["label"])
self.sendToChildren(root)
lista = []
for e in self.graph.vs():
lista.append(e.index)
p = multiprocessing.Pool(len(lista))
p.map(fun, zip([self]*len(lista), lista),chunksize=300)
def process_sixColor(self, id):
v = self.graph.vs.find(id)
if not v["name"] == "root":
while True:
if v["received"] == True:
v["received"] = False
#------------Part 1-----------
self.sendToChildren(v)
self.printInfo()
#-----------Part 2-------------
diffIdx = self.compareLabelWithParent(v)
if not diffIdx == -1:
diffIdxStr = str(bin(diffIdx))[2:]
charAtPos = (v["label"][::-1])[diffIdx]
newLabel = diffIdxStr + charAtPos
v["label"] = newLabel
self.sendToChildren(v)
colorNum = int(newLabel,2)
if colorNum in sixColorList:
v["color"] = self.takeColorFromList(newLabel)
self.printGraph()
break
I want to have that each node (except root) is calling function process_sixColor synchronously in parallel and will not evaluate Part 2before Part 1 will be made by all nodes. But I notice that this is not working properly and some nodes are evaluating before every other node will execute Part 1. How can I solve that problem?
You can use a combination of a multiprocessing.Queue and a multiprocessing.Event object to synchronize the workers. Make the main process create a Queue and an Event and pass both to all the workers. The Queue will be used by the workers to let the main process know that they are finished with part 1. The Event will be used by the main process to let all the workers know that all the workers are finished with part 1. Basically,
the workers will call queue.put() to let the main process know that they have reached part 2 and then call event.wait() to wait for the main process to give the green light.
the main process will repeatedly call queue.get() until it receives as many messages as there are workers in the worker pool and then call event.set() to give the green light for the workers to start with part 2.
This is a simple example:
from __future__ import print_function
from multiprocessing import Event, Process, Queue
def worker(identifier, queue, event):
# Part 1
print("Worker {0} reached part 1".format(identifier))
# Let the main process know that we have finished part 1
queue.put(identifier)
# Wait for all the other processes
event.wait()
# Start part 2
print("Worker {0} reached part 2".format(identifier))
def main():
queue = Queue()
event = Event()
processes = []
num_processes = 5
# Create the worker processes
for identifier in range(num_processes):
process = Process(target=worker, args=(identifier, queue, event))
processes.append(process)
process.start()
# Wait for "part 1 completed" messages from the processes
while num_processes > 0:
queue.get()
num_processes -= 1
# Set the event now that all the processes have reached part 2
event.set()
# Wait for the processes to terminate
for process in processes:
process.join()
if __name__ == "__main__":
main()
If you want to use this in a production environment, you should think about how to handle errors that occur in part 1. Right now if an exception happens in part 1, the worker will never call queue.put() and the main process will block indefinitely waiting for the message from the failed worker. A production-ready solution should probably wrap the entire part 1 in a try..except block and then send a special error signal in the queue. The main process can then exit immediately if the error signal is received in the queue.

Categories

Resources