I am attempting to create a simple application which continuously monitors an inbox, then calls various functions as child processes, after categorising incoming mail.
I would like the parent process to continue it's while loop without waiting for the child process to complete.
For example:
def main():
while 1:
checkForMail()
if mail:
if mail['type'] = 'Type1':
process1() #
'''
spawn process1, as long as no other process1 process running,
however it's fine for a process2 to be currently running
'''
elif mail['type'] = 'Type2':
process2()
'''
spawn process2, as long as no other process2 process running,
however it's fine for a process1 to be currently running
'''
# Wait a bit, then continue loop regardless of whether child processes have finished or not
time.sleep(10)
if __name__ == '__main__':
main()
As commented above, there should never be more than once concurrent child process instance for a function, however processes can run concurrently if they are running different functions.
Is this possible to do with the multiprocessing package?
Following on from pdeubel's answer which was very helpful, the completed skeleton script is as follows:
So start the two Processes before the main loop, then start the main loop and the mails should get put on the Queues where they get picked up in the subprocesses.
def func1(todo):
# do stuff with current todo item from queue1
def func2(todo):
# do stuff with current todo item from queue2
def listenQ1(q):
while 1:
# Fetch jobs from queue1
todo = q.get()
func1(todo)
def listenQ2(q):
while 1:
# Fetch jobs from queue2
todo = q.get()
func2(todo)
def main(queue1, queue2):
while 1:
checkForMail()
if mail:
if mail['type'] = 'Type1':
# Add to queue1
queue1.put('do q1 stuff')
elif mail['type'] = 'Type2':
# Add job to queue2
queue2.put('do q2 stuff')
time.sleep(10)
if __name__ == '__main__':
# Create 2 multiprocessing queues
queue1 = Queue()
queue2 = Queue()
# Create and start two new processes, with seperate targets and queues
p1 = Process(target=listenQ1, args=(queue1,))
p1.start()
p2 = Process(target=listenQ2, args=(queue2,))
p2.start()
# Start main while loop and check for mail
main(queue1, queue2)
p1.join()
p2.join()
You could use two Queues, one for mails of Type1 and one for mails of Type2 and two Processes again one for mails of Type1 and one for mails of Type2.
Start by creating these Queues. Then create the Processes and give the first Queue to the first Process and the second Queue to the second Process. Both Process objects need a parameter target which is the function that the Process executes. Depending on the logic you probably will need two functions (again one for each type). Inside the function you want something like an infinite loop which takes items from the Queue (i.e. the mails) and then act on them according to your logic. The main function would also consist of an infinite loop where the mails are retrieved and depending on their type they get placed on the correct Queue.
So start the two Processes before the main loop, then start the main loop and the mails should get put on the Queues where they get picked up in the subprocesses.
Related
I have a basic question that ragards the Python multiprocessing method, how different processes, which use queues to transfer data, could optimally be started.
For that I use a simple example where
Data is received
Data is processed
Data is send
All of the upper steps should happen in parallel through three different processes.
Here the example code:
import multiprocessing
import keyboard
import time
def getData(queue_raw):
for num in range(1000):
queue_raw.put(num)
print("getData: put "+ str(num)+" in queue_raw")
while True:
if keyboard.read_key() == "s":
break
def calcFeatures(queue_raw, queue_features):
while not queue_raw.empty():
data = queue_raw.get()
queue_features.put(data**2)
print("calcFeatures: put "+ str(data**2)+" in queue_features")
def sendFeatures(queue_features):
while not queue_features.empty():
feature = queue_features.get()
print("sendFeatures: put "+ str(feature)+" out")
if __name__ == "__main__":
queue_raw = multiprocessing.Queue()
queue_features = multiprocessing.Queue()
processes = [
multiprocessing.Process(target=getData, args=(queue_raw,)),
multiprocessing.Process(target=calcFeatures, args=(queue_raw, queue_features,)),
multiprocessing.Process(target=sendFeatures, args=(queue_features,))
]
processes[0].start()
time.sleep(0.1)
processes[1].start()
time.sleep(0.1)
processes[2].start()
#for p in processes:
# p.start()
for p in processes:
p.join()
This program works, but my question is regarding the start of the different processes.
Ideally process[1] should start only if process[0] put data in the queue_raw; while process[2] should only start if process[1] put the calculated features in queue_features.
Right now I did that through time.sleep() function, which is suboptimal, since I don't necessarily know how long the processes will take.
I also tried something like:
processes[0].start()
while queue_raw.empty():
time.sleep(0.5)
processes[1].start()
But it won't work, since only the first process is estimated. Any method how this process depending starts could be done?
#moooeeeep pointed out the right comment.
Checking with while not queue.empty(): is not waiting till data is actually in the queue!
An approach via a sentinel object (here None) and a while True loop will enforce that the process waits till the other processes put data in the queue:
FLAG_STOP=False
while FLAG_STOP is False:
data = queue_raw.get() # get will wait
if data is None:
# Finish analysis
FLAG_STOP = True
else:
# work with data
I have a python code where the main process creates a child process. There is a shared queue between the two processes. The child process writes some data to this shared queue. The main process join()s on the child process.
If the data in the queue is not removed with get(), the child process does not terminate and the main is blocked at join(). Why is it so.
Following is the code that I used :
from multiprocessing import Process, Queue
from time import *
def f(q):
q.put([42, None, 'hello', [x for x in range(100000)]])
print (q.qsize())
#q.get()
print (q.qsize())
q = Queue()
print (q.qsize())
p = Process(target=f, args=(q,))
p.start()
sleep(1)
#print (q.get())
print('bef join')
p.join()
print('aft join')
At present the q.get() is commented and so the output is :
0
1
1
bef join
and then the code is blocked.
But if I uncomment one of the q.get() invocations, then the code runs completely with the following output :
0
1
0
bef join
aft join
Well, if you look at the Queue documentation, it explicitly says that
Queue.join : Blocks until all items in the queue have been gotten and processed. It seems logic to me that join() blocks your program if you don't empty the Queue.
To me, you need to learn about the philosophy of Multiprocessing. You have several tasks to do that don't need each other to be run, and your program at the moment is too slow for you. You need to use Multiprocess !
But don't forget there will (trust me) come a time when you will need to wait until some parallel computations are all done, because you need all of these elements to do your next task. And that's where, in your case, join() comes in. You are basically saying : I was doing things asynchronously. But now, my next task needs to be synced with the different items I computed before. Let's wait here until they are all ready.
So I want to run a function which can either search for information on the web or directly from my own mysql database.
The first process will be time-consuming, the second relatively fast.
With this in mind I create a process which starts this compound search (find_compound_view). If the process finishes relatively fast it means it's present on the database so I can render the results immediately. Otherwise, I will render "drax_retrieving_data.html".
The stupid solution I came up with was to run the function twice, once to check if the process takes a long time, the other to actually get the return values of the function. This is pretty much because I don't know how to return the values of my find_compound_view function. I've tried googling but I can't seem to find how to return the values from the class Process specifically.
p = Process(target=find_compound_view, args=(form,))
p.start()
is_running = p.is_alive()
start_time=time.time()
while is_running:
time.sleep(0.05)
is_running = p.is_alive()
if time.time() - start_time > 10 :
print('Timer exceeded, DRAX is retrieving info!',time.time() - start_time)
return render(request,'drax_internal_dbs/drax_retrieving_data.html')
compound = find_compound_view(form,use_email=False)
if compound:
data=*****
return render(request, 'drax_internal_dbs/result.html',data)
You will need a multiprocessing.Pipe or a multiprocessing.Queue to send the results back to your parent-process. If you just do I/0, you should use a Thread instead of a Process, since it's more lightweight and most time will be spend on waiting. I'm showing you how it's done for Process and Threads in general.
Process with Queue
The multiprocessing queue is build on top of a pipe and access is synchronized with locks/semaphores. Queues are thread- and process-safe, meaning you can use one queue for multiple producer/consumer-processes and even multiple threads in these processes. Adding the first item on the queue will also start a feeder-thread in the calling process. The additional overhead of a multiprocessing.Queue makes using a pipe for single-producer/single-consumer scenarios preferable and more performant.
Here's how to send and retrieve a result with a multiprocessing.Queue:
from multiprocessing import Process, Queue
SENTINEL = 'SENTINEL'
def sim_busy(out_queue, x):
for _ in range(int(x)):
assert 1 == 1
result = x
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Process(target=sim_busy, args=(out_queue, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel breaks the loop
print(result)
The queue is passed as argument into the function, results are .put() on the queue and the parent get.()s from the queue. .get() is a blocking call, execution does not resume until something is to get (specifying timeout parameter is possible). Note the work sim_busy does here is cpu-intensive, that's when you would choose processes over threads.
Process & Pipe
For one-to-one connections a pipe is enough. The setup is nearly identical, just the methods are named differently and a call to Pipe() returns two connection objects. In duplex mode, both objects are read-write ends, with duplex=False (simplex) the first connection object is the read-end of the pipe, the second is the write-end. In this basic scenario we just need a simplex-pipe:
from multiprocessing import Process, Pipe
SENTINEL = 'SENTINEL'
def sim_busy(write_conn, x):
for _ in range(int(x)):
assert 1 == 1
result = x
write_conn.send(result)
# If all results are send, send a sentinel-value to let the parent know
# no more results will come.
write_conn.send(SENTINEL)
if __name__ == '__main__':
# duplex=False because we just need one-way communication in this case.
read_conn, write_conn = Pipe(duplex=False)
p = Process(target=sim_busy, args=(write_conn, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(read_conn.recv, SENTINEL): # sentinel breaks the loop
print(result)
Thread & Queue
For use with threading, you want to switch to queue.Queue. queue.Queue is build on top of a collections.deque, adding some locks to make it thread-safe. Unlike with multiprocessing's queue and pipe, objects put on a queue.Queue won't get pickled. Since threads share the same memory address-space, serialization for memory-copying is unnecessary, only pointers are transmitted.
from threading import Thread
from queue import Queue
import time
SENTINEL = 'SENTINEL'
def sim_io(out_queue, query):
time.sleep(1)
result = query + '_result'
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Thread(target=sim_io, args=(out_queue, 'my_query'))
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel-value breaks the loop
print(result)
Read here why for result in iter(out_queue.get, SENTINEL):
should be prefered over a while True...break setup, where possible.
Read here why you should use if __name__ == '__main__': in all your scripts and especially in multiprocessing.
More about get()-usage here.
I have a python script that implement two levels of multiprocessing:
from multiprocessing import Process, Queue, Lock
if __name__ == '__main__':
pl1s = []
for ip1 in range(10):
pl1 = Process(target=process_level1, args=(...))
pl1.start()
pl1s.append(pl1)
# do somehting for awhile, e.g. for 24 hours
# time to terminate
# ISSUE: this terminates process_level1 processes
# but not process_level2 ones
for pl1 in pl1s:
pl1.terminate()
def process_level1(...):
# subscriibe to external queue
#
with queue.open(name_of_external_queue, 'r') as subq:
qInternal = Queue()
pl2s = []
for ip2 in range(3):
pl2 = Process(target=process_level2, args=(qInternal))
pl2.start()
pl2s.append(pl2)
# grab messages from external queue and push them to
# process_level2 processes to process
#
while True:
message = subq.read()
qInternal.put(m)
def process_level2(qInternal):
while True:
message = qInternal.get()
# do something with date form message
So in main I launch a bunch of slave subprocesses process_level1 each of which launches a bunch of its own subprocesses process_level2. Main is supposed to run for a predefined amount of time (e.g. 24 hrs) then terminate everything. The problem is that the code above terminates the 1st layer of subprocesses but not the 2nd one.
How should I do this to terminate both layers at the same time?
(Maybe an important) caveat: I guess one approach would be to set up an internal queue to communicate from main to process_level1 and then send a signal to each process_level1 subprocess to terminate their respective subprocesses. The problem is that process_level1 runs an infinite loop reading messages form an external queue. So I am not sure where and how I would check for the terminate signal from main.
Im trying to create a Python 3 program that has one or more child processes.
The Parent process spawns the child processes and then goes on with its own buisiness, now and then I want to send a message to a specific child process that catches it and takes action.
Also the child process need to be non locked while waiting for message, it will run a own loop maintaning a server connection and send any recived messages on to parent.
Im currently looking at multiprocessing, threading, subprocess modules in python but have not been able to find any solution.
What Im trying to achive is to have a Main part of the program that interacts with the user, taking care of user inputs and presenting information to the user.
This will be asychronous from the child parts that talks with different servers, reciving messages from server and sending correct messages from user to server.
The child processes will then send information back to main part where they will be pressented to user
My questions are:
Am I going at this in the wrong way
Which module would be the best to use
2.1 How would I set this up
See Doug Hellmann's (multiprocessing) "Communication Between Processes". Part of his Python Module of the Week series. It is fairly simple to use a dictionary or list to communicate with a process.
import time
from multiprocessing import Process, Manager
def test_f(test_d):
""" frist process to run
exit this process when dictionary's 'QUIT' == True
"""
test_d['2'] = 2 ## change to test this
while not test_d["QUIT"]:
print "test_f", test_d["QUIT"]
test_d["ctr"] += 1
time.sleep(1.0)
def test_f2(name):
""" second process to run. Runs until the for loop exits
"""
for j in range(0, 10):
print name, j
time.sleep(0.5)
print "second process finished"
if __name__ == '__main__':
##--- create a dictionary via Manager
manager = Manager()
test_d = manager.dict()
test_d["ctr"] = 0
test_d["QUIT"] = False
##--- start first process and send dictionary
p = Process(target=test_f, args=(test_d,))
p.start()
##--- start second process
p2 = Process(target=test_f2, args=('P2',))
p2.start()
##--- sleep 3 seconds and then change dictionary
## to exit first process
time.sleep(3.0)
print "\n terminate first process"
test_d["QUIT"] = True
print "test_d changed"
print "data from first process", test_d
time.sleep(5.0)
p.terminate()
p2.terminate()
Sounds like you might be familiar with multi-processing, just not with python.
os.pipe will supply you with pipes to connect parent and child. And semaphores can be used to coordinate/signal between parent&child processes. You might want to consider queues for passing messages.