Python Multiprocessing With (LIFO) Queues - python

I'm trying to use multiprocessing in Python to have a function keep getting called within a loop, and subsequently access the latest return value from the function (by storing the values in a LIFO Queue).
Here is a code snippet from the main program
q = Queue.LifoQueue()
while True:
p = multiprocessing.Process(target=myFunc, args = (q))
p.daemon = True
p.start()
if not q.empty():
#do something with q.get()
And here's a code snippet from myFunc
def myFunc(q):
x = calc()
q.put(x)
The problem is, the main loop thinks that q is empty. However, I've checked to see if myFunc() is placing values into q (by putting a q.empty() check right after the q.put(x)) and the queue shouldn't be empty.
What can I do so that the main loop can see the values placed in the queue? Or am I going about this in an inefficient way? (I do need myFunc and the main loop to be run separately though, since myFunc is a bit slow and the main loop needs to keep performing its task)

Queue.LifoQueue is not fit for multiprocessing, only multiprocessing.Queue is, at is it specially designed for this usecase. That means that values put into a Queue.LifoQueue will only be available to the local process, as the queue is not shared between subprocesses.
A possibility would be to use a shared list from a SyncManager (SyncManager.list()) instead. When used with only append and pop, a list behaves just like a lifo queue.

Related

Python - Running Function in Separate Thread, Then Accessing It

So I have a Python code running with one very expensive function that gets executed at times on demand, but it's result is not needed straight away (it can be delayed by a few cycles).
def heavy_function(arguments):
return calc_obtained_from_arguments
def main():
a = None
if some_condition:
a = heavy_function(x)
else:
do_something_with(a)
The thing is that whenever I calculate the heavy_function, the rest of the program hangs. However, I need it to run with empty a value, or better make it know that a is being processed separately and thus should not be accessed. How can I move the heavy_function to separate process and keep calling the main function all the time until heavy_function is done executing, then read the obtained a value and use it in main function?
You could use a simple queue.
Put your heavy_function inside a separate process that idles as long as there is no input in the input queue. Use Queue.get(block=True) to do so. Put the result of the computation in another queue.
Run your normal process with the empty a-value and check emptiness of the output queue from time to time. Maybe use while Queue.empty(): here.
If an item becomes available, because your heavy_functionhas finished, switch to a calculation with the value a from your output queue.

Function within Worker/Child instance does not return, freezes program

I am using the multiprocessing module in python. Here is a sample of the code I am using:
import multiprocessing as mp
def function(fun_var1, fun_var2):
b = fun_var1 + fun_var2
# and more computationally intensive stuff happens here
return b
# my program freezes after the return command
class Worker(mp.Process):
def __init__(self, queue_obj, func_var1, func_var2):
mp.Process.__init__(self)
self.queue_obj = queue_obj
self.func_var1 = func_var1
self.func_var2 = func_var2
def run(self):
self.var = function( self.func_var1, self.func_var2 )
self.queue_obj.put(self.var)
if __name__ == '__main__':
mp.freeze_support()
queue_list = []
processes = []
result = []
for i in range(2):
queue_list.append(mp.Queue())
processes.append( Worker(queue_list[i], i, var1, var2 )
processes[i].start()
for i in range(2):
processes[i].join()
result.append(queue_list[i].get())
During runtime of the program two instances of the worker class are generated which work simultaneously. One instance finishes after about 2 minutes and the other would take about 7 minutes. The first instance returns its results fine. However, the second instance freezes the program when the function() that is called within the run() method returns its value. No error is being thrown, the program just does not continue to execute. The console also indicates that it is busy but not displaying the >>> prompt. I am completely clueless why this behavior occurs. The same code works fine for slightly different inputs in the two Worker instances. The only difference I can make out is that the work loads are more equal when it executes correctly. Could the time difference cause trouble? Does anyone have experience with this kind of behavior? Also note that if I run a serial setup of the program in which function() is just called twice by the main program, the code executes flawlessly. Could there be some timeout involved in the worker instance that makes it impossible for function() to return its value to the Worker instance? The return value of function() is actually a list that is fairly small. It contains about 100 float values.
Any suggestions are welcomed!
This is a bit of an educated guess without actually seeing what's going on in worker, but is it possible that your child has put items into the Queue that haven't been consumed? The documentation has a warning about this:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
It might be worth trying to create your Queue object using mp.Manager.Queue and see if the issue goes away.

Output Queue of a Python multiprocessing is providing more results than expected

From the following code I would expect that the length of the resulting list were the same as the one of the range of items with which the multiprocess is feed:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() is True:
break #this is supposed to end the process.
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() is True:
break
else:
results_bank.append(output_q.get())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank
Has anyone any idea about how to make this code to run properly?
This code will never stop:
Each worker gets an item from the queue as long as it is not empty:
picked = working_queue.get()
and puts a new one for each that it got:
working_queue.put(picked+1)
As a result the queue will never be empty except when the timing between the process happens to be such that the queue is empty at the moment one of the processes calls empty(). Because the queue length is initially 100 and you have as many processes as cpu_count() I would be surprised if this ever stops on any realistic system.
Well executing the code with slight modification proves me wrong, it does stop at some point, which actually surprises me. Executing the code with one process there seems to be a bug, because after some time the process freezes but does not return. With multiple processes the result is varying.
Adding a short sleep period in the loop iteration makes the code behave as I expected and explained above. There seems to be some timing issue between Queue.put, Queue.get and Queue.empty, although they are supposed to be thread-safe. Removing the empty test also gives the expected result (without ever getting stuck at an empty queue).
Found the reason for the varying behaviour. The objects put on the queue are not flushed immediately. Therefore empty might return False although there are items in the queue waiting to be flushed.
From the documentation:
Note: When an object is put on a queue, the object is pickled and a
background thread later flushes the pickled data to an underlying
pipe. This has some consequences which are a little surprising, but
should not cause any practical difficulties – if they really bother
you then you can instead use a queue created with a manager.
After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising Queue.Empty.
If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other.

Multiprocessing at multiple steps: How to save returned objects?

I am new to multiporcessing and have some very basic queries.
I have three functions (fun1(<list>),fun2(<dict>,<int>),fun3(<dict>,<dict>)) that can be parallelized. The output of fun1 (a dictionary) is the input for fun2 and so on.
I have to consolidate the output of all the workers running fun1 before passing it on to fun2 (similarly for fun2 -> fun3 transition).
Consider this code:
if __name__=='__main__':
process1=[]
for i in range(args.numcores):
p1=Process(target=fun1, args=(m[i],))
process1.append(p1)
p1.start()
for p in process1:
p.join()
process2=[]
for i in range(args.numcores):
p2=Process(target=fun2, args=(g,j, ))
process1.append(p1)
p2.start()
for p in process2:
p.join()
I can merge the dictionaries returned by the different workers but how do I save those return values in the first place (in other words, where are the return objects saved)?
since p2.start() follows p1.join(), does this mean that process2 will start after process1 terminates ?
(2.) Yes, your program will not continue past the join() before the process is finished.
(1.) You can use a queue or an array (which you lock with a mutex), this way you can add the return data to a (multiprocessing.)Queue or array from your functions (in the case of array or non-multiprocessing type, use a lock to ensure that they are not accessed simultaneously). Then you can read the value from the queue/array afterwards.
Check out Python's multiprocessing.Queue class for consolidating the outputs. The general idea is that you wrap the functions in another function that appends the result of each function to a Queue. Then, you pull from the queue as functions terminate.
See Using Queue in python for a decent example (albiet with threads instead of processes)
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Queue

Return whichever expression returns first

I have two different functions f, and g that compute the same result with different algorithms. Sometimes one or the other takes a long time while the other terminates quickly. I want to create a new function that runs each simultaneously and then returns the result from the first that finishes.
I want to create that function with a higher order function
h = firstresult(f, g)
What is the best way to accomplish this in Python?
I suspect that the solution involves threading. I'd like to avoid discussion of the GIL.
I would simply use a Queue for this. Start the threads and the first one which has a result ready writes to the queue.
Code
from threading import Thread
from time import sleep
from Queue import Queue
def firstresult(*functions):
queue = Queue()
threads = []
for f in functions:
def thread_main():
queue.put(f())
thread = Thread(target=thread_main)
threads.append(thread)
thread.start()
result = queue.get()
return result
def slow():
sleep(1)
return 42
def fast():
return 0
if __name__ == '__main__':
print firstresult(slow, fast)
Live demo
http://ideone.com/jzzZX2
Notes
Stopping the threads is an entirely different topic. For this you need to add some state variable to the threads which needs to be checked in regular intervals. As I want to keep this example short I simply assumed that part and assumed that all workers get the time to finish their work even though the result is never read.
Skipping the discussion about the Gil as requested by the questioner. ;-)
Now - unlike my suggestion on the other answer, this piece of code does exactly what you are requesting:
from multiprocessing import Process, Queue
import random
import time
def firstresult(func1, func2):
queue = Queue()
proc1 = Process(target=func1,args=(queue,))
proc2 = Process(target=func2, args=(queue,))
proc1.start();proc2.start()
result = queue.get()
proc1.terminate(); proc2.terminate()
return result
def algo1(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 1")
def algo2(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 2")
print firstresult(algo1, algo2)
Run each function in a new worker thread, the 2 worker threads send the result back to the main thread in a 1 item queue or something similar. When the main thread receives the result from the winner, it kills (do python threads support kill yet? lol.) both worker threads to avoid wasting time (one function may take hours while the other only takes a second).
Replace the word thread with process if you want.
You will need to run each function in another process (with multiprocessing) or in a different thread.
If both are CPU bound, multithread won help much - exactly due to the GIL -
so multiprocessing is the way.
If the return value is a pickleable (serializable) object, I have this decorator I created that simply runs the function in background, in another process:
https://bitbucket.org/jsbueno/lelo/src
It is not exactly what you want - as both are non-blocking and start executing right away. The tirck with this decorator is that it blocks (and waits for the function to complete) as when you try to use the return value.
But on the other hand - it is just a decorator that does all the work.

Categories

Resources