Nested multiprocessing: How to terminate properly? - python

I have a python script that implement two levels of multiprocessing:
from multiprocessing import Process, Queue, Lock
if __name__ == '__main__':
pl1s = []
for ip1 in range(10):
pl1 = Process(target=process_level1, args=(...))
pl1.start()
pl1s.append(pl1)
# do somehting for awhile, e.g. for 24 hours
# time to terminate
# ISSUE: this terminates process_level1 processes
# but not process_level2 ones
for pl1 in pl1s:
pl1.terminate()
def process_level1(...):
# subscriibe to external queue
#
with queue.open(name_of_external_queue, 'r') as subq:
qInternal = Queue()
pl2s = []
for ip2 in range(3):
pl2 = Process(target=process_level2, args=(qInternal))
pl2.start()
pl2s.append(pl2)
# grab messages from external queue and push them to
# process_level2 processes to process
#
while True:
message = subq.read()
qInternal.put(m)
def process_level2(qInternal):
while True:
message = qInternal.get()
# do something with date form message
So in main I launch a bunch of slave subprocesses process_level1 each of which launches a bunch of its own subprocesses process_level2. Main is supposed to run for a predefined amount of time (e.g. 24 hrs) then terminate everything. The problem is that the code above terminates the 1st layer of subprocesses but not the 2nd one.
How should I do this to terminate both layers at the same time?
(Maybe an important) caveat: I guess one approach would be to set up an internal queue to communicate from main to process_level1 and then send a signal to each process_level1 subprocess to terminate their respective subprocesses. The problem is that process_level1 runs an infinite loop reading messages form an external queue. So I am not sure where and how I would check for the terminate signal from main.

Related

Python multiprocessing without blocking parent process

I am attempting to create a simple application which continuously monitors an inbox, then calls various functions as child processes, after categorising incoming mail.
I would like the parent process to continue it's while loop without waiting for the child process to complete.
For example:
def main():
while 1:
checkForMail()
if mail:
if mail['type'] = 'Type1':
process1() #
'''
spawn process1, as long as no other process1 process running,
however it's fine for a process2 to be currently running
'''
elif mail['type'] = 'Type2':
process2()
'''
spawn process2, as long as no other process2 process running,
however it's fine for a process1 to be currently running
'''
# Wait a bit, then continue loop regardless of whether child processes have finished or not
time.sleep(10)
if __name__ == '__main__':
main()
As commented above, there should never be more than once concurrent child process instance for a function, however processes can run concurrently if they are running different functions.
Is this possible to do with the multiprocessing package?
Following on from pdeubel's answer which was very helpful, the completed skeleton script is as follows:
So start the two Processes before the main loop, then start the main loop and the mails should get put on the Queues where they get picked up in the subprocesses.
def func1(todo):
# do stuff with current todo item from queue1
def func2(todo):
# do stuff with current todo item from queue2
def listenQ1(q):
while 1:
# Fetch jobs from queue1
todo = q.get()
func1(todo)
def listenQ2(q):
while 1:
# Fetch jobs from queue2
todo = q.get()
func2(todo)
def main(queue1, queue2):
while 1:
checkForMail()
if mail:
if mail['type'] = 'Type1':
# Add to queue1
queue1.put('do q1 stuff')
elif mail['type'] = 'Type2':
# Add job to queue2
queue2.put('do q2 stuff')
time.sleep(10)
if __name__ == '__main__':
# Create 2 multiprocessing queues
queue1 = Queue()
queue2 = Queue()
# Create and start two new processes, with seperate targets and queues
p1 = Process(target=listenQ1, args=(queue1,))
p1.start()
p2 = Process(target=listenQ2, args=(queue2,))
p2.start()
# Start main while loop and check for mail
main(queue1, queue2)
p1.join()
p2.join()
You could use two Queues, one for mails of Type1 and one for mails of Type2 and two Processes again one for mails of Type1 and one for mails of Type2.
Start by creating these Queues. Then create the Processes and give the first Queue to the first Process and the second Queue to the second Process. Both Process objects need a parameter target which is the function that the Process executes. Depending on the logic you probably will need two functions (again one for each type). Inside the function you want something like an infinite loop which takes items from the Queue (i.e. the mails) and then act on them according to your logic. The main function would also consist of an infinite loop where the mails are retrieved and depending on their type they get placed on the correct Queue.
So start the two Processes before the main loop, then start the main loop and the mails should get put on the Queues where they get picked up in the subprocesses.

In python multi-producer & multi-consumer threading, may queue.join() be unreliable?

A python multi-producer & multi-consumer threading pseudocode:
def threadProducer():
while upstreams_not_done:
data = do_some_work()
queue_of_data.put(data)
def threadConsumer():
while True:
data = queue_of_data.get()
do_other_work()
queue_of_data.task_done()
queue_of_data = queue.Queue()
list_of_producers = create_and_start_producers()
list_of_consumers = create_and_start_consumers()
queue_of_data.join()
# is now all work done?
In which queue_of_data.task_done() is called for each item in queue.
When producers work slower then consumers, is there a possibility queue_of_data.join() non-blocks at some moment when no producer generates data yet, but all consumers finish their tasks by task_done()?
And if Queue.join() is not reliable like this, how can I check if all work done?
The usual way is to put a sentinel value (like None) on the queue, one for each consumer thread, when producers are done. Then consumers are written to exit the thread when it pulls None from the queue.
So, e.g., in the main program:
for t in list_of_producers:
t.join()
# Now we know all producers are done.
for t in list_of_consumers:
queue_of_data.put(None) # tell a consumer we're done
for t in list_of_consumers:
t.join()
and consumers look like:
def threadConsumer():
while True:
data = queue_of_data.get()
if data is None:
break
do_other_work()
Note: if producers can overwhelm consumers, create the queue with a maximum size. Then queue.put() will block when the queue reaches that size, until a consumer removes something from the queue.

multithreading spawn new process when worker has finished

I would like to define a pool of n workers and have each execute tasks held in a rabbitmq queue. When this task finished (fails or succeeds) I want the worker execute another task from the queue.
I can see in docs how to spawn a pool of workers and have them all wait for their siblings to complete. I would something like different though: I would like to have a buffer of n tasks where when one worker finishes it adds another tasks to the buffer (so no more than n tasks are in the bugger). Im having difficulty searching for this in docs.
For context, my non-multithreading code is this:
while True:
message = get_frame_from_queue() # get message from rabbit mq
do_task(message.body) #body defines urls to download file
acknowledge_complete(message) # tell rabbitmq the message is acknowledged
At this stage my "multithreading" implementation will look like this:
#recieves('ask_for_a_job')
def get_a_task():
# this function is executed when `ask_for_a_job` signal is fired
message = get_frame_from_queue()
do_task(message)
def do_tasks(task_info):
try:
# do stuff
finally:
# once the "worker" has finished start another.
fire_fignal('ask_for_a_job')
# start the "workers"
for i in range(5):
fire_fignal('ask_for_a_job')
I don't want to reinvent the wheel. Is there a more built in way to achieve this?
Note get_frame_from_queue is not thread safe.
You should be able to have each subprocess/thread consume directly from the queue, and then within each thread, simply process from the queue exactly as you would synchronously.
from threading import Thread
def do_task(msg):
# Do stuff here
def consume():
while True:
message = get_frame_from_queue()
do_task(message.body)
acknowledge_complete(message)
if __name __ == "__main__":
threads = []
for i in range(5):
t = Thread(target=consume)
t.start()
threads.append(t)
This way, you'll always have N messages from the queue being processed simultaneously, without any need for signaling to occur between threads.
The only "gotcha" here is the thread-safety of the rabbitmq library you're using. Depending on how it's implemented, you may need a separate connection per thread, or possibly one connection with a channel per thread, etc.
One solution is to leverage the multiprocessing.Pool object. Use an outer loop to get N items from RabbitMQ. Feed the items to the Pool, waiting until the entire batch is done. Then loop through the batch, acknowledging each message. Lastly continue the outer loop.
source
import multiprocessing
def worker(word):
return bool(word=='whiskey')
messages = ['syrup', 'whiskey', 'bitters']
BATCHSIZE = 2
pool = multiprocessing.Pool(BATCHSIZE)
while messages:
# take first few messages, one per worker
batch,messages = messages[:BATCHSIZE],messages[BATCHSIZE:]
print 'BATCH:',
for res in pool.imap_unordered(worker, batch):
print res,
print
# TODO: acknowledge msgs in 'batch'
output
BATCH: False True
BATCH: False

Python Spawn a Thread with Threading and kill when main finishes

I have a script that does a bunch of things and I want to spawn a thread that monitors the cpu and memory usage of what's happening.
The monitoring portion is:
import psutil
import time
import datetime
def MonitorProcess():
procname = "firefox"
while True:
output_sys = open("/tmp/sysstats_counter.log", 'a')
for proc in psutil.process_iter():
if proc.name == procname:
p = proc
p.cmdline
proc_rss, proc_vms = p.get_memory_info()
proc_cpu = p.get_cpu_percent(1)
scol1 = str(proc_rss / 1024)
scol2 = str(proc_cpu)
now = str(datetime.datetime.now())
output_sys.write(scol1)
output_sys.write(", ")
output_sys.write(scol2)
output_sys.write(", ")
output_sys.write(now)
output_sys.write("\n")
output_sys.close( )
time.sleep(1)
I'm sure there's a better way to do the monitoring but I don't care at this point.
The main script calls:
RunTasks() # which runs the forground tasks
MonitorProcess() # Which is intended to monitor the tasks CPU and Memory Usage over time
I want to run both functions simultaneously. To do this I assume that I have to use the threading library. Is the approach then to do something like:
thread = threading.Thread(target=MonitorProcess())
thread.start
Or am I way off?
Also when the RunTasks() function finishes how do I get MonitorProcess() to automatically stop? I assume I could test for the process to be present and if it's not kill the function???
It sounds like you want a daemon thread. From the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
In your code:
thread = threading.Thread(target=MonitorProcess)
thread.daemon = True
thread.start()
The program will exit when main exits, even if the daemon thread is still active. You will want to run your foreground tasks after you set up and start your monitoring thread.

Communication between parent child processes

Im trying to create a Python 3 program that has one or more child processes.
The Parent process spawns the child processes and then goes on with its own buisiness, now and then I want to send a message to a specific child process that catches it and takes action.
Also the child process need to be non locked while waiting for message, it will run a own loop maintaning a server connection and send any recived messages on to parent.
Im currently looking at multiprocessing, threading, subprocess modules in python but have not been able to find any solution.
What Im trying to achive is to have a Main part of the program that interacts with the user, taking care of user inputs and presenting information to the user.
This will be asychronous from the child parts that talks with different servers, reciving messages from server and sending correct messages from user to server.
The child processes will then send information back to main part where they will be pressented to user
My questions are:
Am I going at this in the wrong way
Which module would be the best to use
2.1 How would I set this up
See Doug Hellmann's (multiprocessing) "Communication Between Processes". Part of his Python Module of the Week series. It is fairly simple to use a dictionary or list to communicate with a process.
import time
from multiprocessing import Process, Manager
def test_f(test_d):
""" frist process to run
exit this process when dictionary's 'QUIT' == True
"""
test_d['2'] = 2 ## change to test this
while not test_d["QUIT"]:
print "test_f", test_d["QUIT"]
test_d["ctr"] += 1
time.sleep(1.0)
def test_f2(name):
""" second process to run. Runs until the for loop exits
"""
for j in range(0, 10):
print name, j
time.sleep(0.5)
print "second process finished"
if __name__ == '__main__':
##--- create a dictionary via Manager
manager = Manager()
test_d = manager.dict()
test_d["ctr"] = 0
test_d["QUIT"] = False
##--- start first process and send dictionary
p = Process(target=test_f, args=(test_d,))
p.start()
##--- start second process
p2 = Process(target=test_f2, args=('P2',))
p2.start()
##--- sleep 3 seconds and then change dictionary
## to exit first process
time.sleep(3.0)
print "\n terminate first process"
test_d["QUIT"] = True
print "test_d changed"
print "data from first process", test_d
time.sleep(5.0)
p.terminate()
p2.terminate()
Sounds like you might be familiar with multi-processing, just not with python.
os.pipe will supply you with pipes to connect parent and child. And semaphores can be used to coordinate/signal between parent&child processes. You might want to consider queues for passing messages.

Categories

Resources