So I have a Python code running with one very expensive function that gets executed at times on demand, but it's result is not needed straight away (it can be delayed by a few cycles).
def heavy_function(arguments):
return calc_obtained_from_arguments
def main():
a = None
if some_condition:
a = heavy_function(x)
else:
do_something_with(a)
The thing is that whenever I calculate the heavy_function, the rest of the program hangs. However, I need it to run with empty a value, or better make it know that a is being processed separately and thus should not be accessed. How can I move the heavy_function to separate process and keep calling the main function all the time until heavy_function is done executing, then read the obtained a value and use it in main function?
You could use a simple queue.
Put your heavy_function inside a separate process that idles as long as there is no input in the input queue. Use Queue.get(block=True) to do so. Put the result of the computation in another queue.
Run your normal process with the empty a-value and check emptiness of the output queue from time to time. Maybe use while Queue.empty(): here.
If an item becomes available, because your heavy_functionhas finished, switch to a calculation with the value a from your output queue.
Related
I have a loop (all this is being done in Python 3.10) that is running relatively fast compared to a function that needs to consume data from the loop. I don't want to slow down the data and am trying to find a way to run the function asynchronously but only execute the function again after completion... basically:
queue=[]
def flow():
thing=queue[0]
time.sleep(.5)
print(str(thing))
delete=queue.pop(0)
p1 = multiprocessing.Process(target=flow)
while True:
print('stream')
time.sleep(.25)
if len(queue)<1:
print('emptyQ')
queue.append('flow')
p1.start()
I've tried running the function in a thread and a process and both seem to try to start another while the function is still running. I tired using a queue to pass the data and as a semaphore by not removing the item until the end of the function and only adding an item and starting the thread or process if the queue was empty, but that didn't seem to work either.
EDIT : to add an explicit question...
Is there a way to execute a function asynchronously without executing it multiple time simultaneously?
EDIT2 : Updated with functional test code (accurately reproduces failure) since real code is a bit more substantial... I have noticed that it seems to work on the first execution of the function (the print doesn't work inside the function...) but the next execution it fails for whatever reason. I assume it tires to load the process twice?
The error I get is - RuntimeError : An attempt has been made to start a new process before the current process has finished its bootstrapping phase...
I have a loop with highly time-consuming process and instead of waiting for each process to complete to move to next iteration, is it possible to run the process and just move to next iteration without waiting for it to complete?
Example : Given a text, the script should try to find the matching links from the Web and files from the local disk. Both return simply a list of links or paths.
for proc in (web_search, file_search):
results = proc(text)
yield from results
What I have as a solution is, using a timer while doing the job. And if the time exceeds the waiting time, the process should be moved to a tray and asked to work from there. Now I will go to next iteration and repeat the same. After my loop is over, I will collect the results from the process moved to the tray.
For simple cases, where the objective is to let each process run simultaneously, we can use Thread of threading module.
So we can tackle the issue like this, we make each process as a Thread and ask it to put its results in a list or some other collection. The code is given below:
from threading import Thread
results = []
def add_to_collection(proc, args, collection):
'''proc is the function, args are the arguments to pass to it.
collection is our container (here it is the list results) for
collecting results.'''
result = proc(*args)
collection.append(result)
print("Completed":, proc)
# Now we do our time consuming tasks
for proc in (web_search, file_search):
t = Thread(target=add_to_collection, args=(proc, ()))
# We assume proc takes no arguments
t.start()
For complex tasks, as mentioned in comments, its better to go with multiprocessing.pool.Pool.
I'm trying to use multiprocessing in Python to have a function keep getting called within a loop, and subsequently access the latest return value from the function (by storing the values in a LIFO Queue).
Here is a code snippet from the main program
q = Queue.LifoQueue()
while True:
p = multiprocessing.Process(target=myFunc, args = (q))
p.daemon = True
p.start()
if not q.empty():
#do something with q.get()
And here's a code snippet from myFunc
def myFunc(q):
x = calc()
q.put(x)
The problem is, the main loop thinks that q is empty. However, I've checked to see if myFunc() is placing values into q (by putting a q.empty() check right after the q.put(x)) and the queue shouldn't be empty.
What can I do so that the main loop can see the values placed in the queue? Or am I going about this in an inefficient way? (I do need myFunc and the main loop to be run separately though, since myFunc is a bit slow and the main loop needs to keep performing its task)
Queue.LifoQueue is not fit for multiprocessing, only multiprocessing.Queue is, at is it specially designed for this usecase. That means that values put into a Queue.LifoQueue will only be available to the local process, as the queue is not shared between subprocesses.
A possibility would be to use a shared list from a SyncManager (SyncManager.list()) instead. When used with only append and pop, a list behaves just like a lifo queue.
I am using the multiprocessing module in python. Here is a sample of the code I am using:
import multiprocessing as mp
def function(fun_var1, fun_var2):
b = fun_var1 + fun_var2
# and more computationally intensive stuff happens here
return b
# my program freezes after the return command
class Worker(mp.Process):
def __init__(self, queue_obj, func_var1, func_var2):
mp.Process.__init__(self)
self.queue_obj = queue_obj
self.func_var1 = func_var1
self.func_var2 = func_var2
def run(self):
self.var = function( self.func_var1, self.func_var2 )
self.queue_obj.put(self.var)
if __name__ == '__main__':
mp.freeze_support()
queue_list = []
processes = []
result = []
for i in range(2):
queue_list.append(mp.Queue())
processes.append( Worker(queue_list[i], i, var1, var2 )
processes[i].start()
for i in range(2):
processes[i].join()
result.append(queue_list[i].get())
During runtime of the program two instances of the worker class are generated which work simultaneously. One instance finishes after about 2 minutes and the other would take about 7 minutes. The first instance returns its results fine. However, the second instance freezes the program when the function() that is called within the run() method returns its value. No error is being thrown, the program just does not continue to execute. The console also indicates that it is busy but not displaying the >>> prompt. I am completely clueless why this behavior occurs. The same code works fine for slightly different inputs in the two Worker instances. The only difference I can make out is that the work loads are more equal when it executes correctly. Could the time difference cause trouble? Does anyone have experience with this kind of behavior? Also note that if I run a serial setup of the program in which function() is just called twice by the main program, the code executes flawlessly. Could there be some timeout involved in the worker instance that makes it impossible for function() to return its value to the Worker instance? The return value of function() is actually a list that is fairly small. It contains about 100 float values.
Any suggestions are welcomed!
This is a bit of an educated guess without actually seeing what's going on in worker, but is it possible that your child has put items into the Queue that haven't been consumed? The documentation has a warning about this:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
It might be worth trying to create your Queue object using mp.Manager.Queue and see if the issue goes away.
I am using multiprocessing's Process and Queue.
I start several functions in parallel and most behave nicely: they finish, their output goes to their Queue, and they show up as .is_alive() == False. But for some reason a couple of functions are not behaving. They always show .is_alive() == True, even after the last line in the function (a print statement saying "Finished") is complete. This happens regardless of the set of functions I launch, even it there's only one. If not run in parallel, the functions behave fine and return normally. What kind of thing might be the problem?
Here's the generic function I'm using to manage the jobs. All I'm not showing is the functions I'm passing to it. They're long, often use matplotlib, sometimes launch some shell commands, but I cannot figure out what the failing ones have in common.
def runFunctionsInParallel(listOf_FuncAndArgLists):
"""
Take a list of lists like [function, arg1, arg2, ...]. Run those functions in parallel, wait for them all to finish, and return the list of their return values, in order.
"""
from multiprocessing import Process, Queue
def storeOutputFFF(fff,theArgs,que): #add a argument to function for assigning a queue
print 'MULTIPROCESSING: Launching %s in parallel '%fff.func_name
que.put(fff(*theArgs)) #we're putting return value into queue
print 'MULTIPROCESSING: Finished %s in parallel! '%fff.func_name
# We get this far even for "bad" functions
return
queues=[Queue() for fff in listOf_FuncAndArgLists] #create a queue object for each function
jobs = [Process(target=storeOutputFFF,args=[funcArgs[0],funcArgs[1:],queues[iii]]) for iii,funcArgs in enumerate(listOf_FuncAndArgLists)]
for job in jobs: job.start() # Launch them all
import time
from math import sqrt
n=1
while any([jj.is_alive() for jj in jobs]): # debugging section shows progress updates
n+=1
time.sleep(5+sqrt(n)) # Wait a while before next update. Slow down updates for really long runs.
print('\n---------------------------------------------------\n'+ '\t'.join(['alive?','Job','exitcode','Func',])+ '\n---------------------------------------------------')
print('\n'.join(['%s:\t%s:\t%s:\t%s'%(job.is_alive()*'Yes',job.name,job.exitcode,listOf_FuncAndArgLists[ii][0].func_name) for ii,job in enumerate(jobs)]))
print('---------------------------------------------------\n')
# I never get to the following line when one of the "bad" functions is running.
for job in jobs: job.join() # Wait for them all to finish... Hm, Is this needed to get at the Queues?
# And now, collect all the outputs:
return([queue.get() for queue in queues])
Alright, it seems that the pipe used to fill the Queue gets plugged when the output of a function is too big (my crude understanding? This is an unresolved/closed bug? http://bugs.python.org/issue8237). I have modified the code in my question so that there is some buffering (queues are regularly emptied while processes are running), which solves all my problems. So now this takes a collection of tasks (functions and their arguments), launches them, and collects the outputs. I wish it were simpler /cleaner looking.
Edit (2014 Sep; update 2017 Nov: rewritten for readability): I'm updating the code with the enhancements I've made since. The new code (same function, but better features) is here:
https://gitlab.com/cpbl/cpblUtilities/blob/master/parallel.py
The calling Description is also below.
def runFunctionsInParallel(*args, **kwargs):
""" This is the main/only interface to class cRunFunctionsInParallel. See its documentation for arguments.
"""
return cRunFunctionsInParallel(*args, **kwargs).launch_jobs()
###########################################################################################
###
class cRunFunctionsInParallel():
###
#######################################################################################
"""Run any list of functions, each with any arguments and keyword-arguments, in parallel.
The functions/jobs should return (if anything) pickleable results. In order to avoid processes getting stuck due to the output queues overflowing, the queues are regularly collected and emptied.
You can now pass os.system or etc to this as the function, in order to parallelize at the OS level, with no need for a wrapper: I made use of hasattr(builtinfunction,'func_name') to check for a name.
Parameters
----------
listOf_FuncAndArgLists : a list of lists
List of up-to-three-element-lists, like [function, args, kwargs],
specifying the set of functions to be launched in parallel. If an
element is just a function, rather than a list, then it is assumed
to have no arguments or keyword arguments. Thus, possible formats
for elements of the outer list are:
function
[function, list]
[function, list, dict]
kwargs: dict
One can also supply the kwargs once, for all jobs (or for those
without their own non-empty kwargs specified in the list)
names: an optional list of names to identify the processes.
If omitted, the function name is used, so if all the functions are
the same (ie merely with different arguments), then they would be
named indistinguishably
offsetsSeconds: int or list of ints
delay some functions' start times
expectNonzeroExit: True/False
Normal behaviour is to not proceed if any function exits with a
failed exit code. This can be used to override this behaviour.
parallel: True/False
Whenever the list of functions is longer than one, functions will
be run in parallel unless this parameter is passed as False
maxAtOnce: int
If nonzero, this limits how many jobs will be allowed to run at
once. By default, this is set according to how many processors
the hardware has available.
showFinished : int
Specifies the maximum number of successfully finished jobs to show
in the text interface (before the last report, which should always
show them all).
Returns
-------
Returns a tuple of (return codes, return values), each a list in order of the jobs provided.
Issues
-------
Only tested on POSIX OSes.
Examples
--------
See the testParallel() method in this module
"""