I have a class A with a function foo() that logs information for an infinite time.
I would like to execute this function for 30 sec, retrieving these logs. For the recovery of the logs, I base myself on this article, the logs being realized at the C level.
So I realized, in addition to the code of the previous article, this portion of code, allowing to stop the execution of the function after 30 seconds.
if __name__ == '__main__':
f = io.BytesIO()
with stdout_redirector(f):
p = multiprocessing.Process(target=A.foo, name="myfunc")
p.start()
# Cleanup
p.join(30)
if p.is_alive():
# Terminate foo
p.terminate()
p.join()
data = f.getvalue().decode('utf-8')
This works fine as is.
However, I can't get this portion of the code into a fastAPI endpoint. Indeed, no matter what I try, errors around the multiprocessing appear. Either the endpoint returns nothing, or a Pickle error appears... I don't know what to do!
Here I use multiprocessing only to stop foo() after a while; maybe there is another way to avoid problems with fastAPI. Does anyone have a way to fix my problem?
EDIT #1
Based on Brandt's suggestion, the following function was done (Using windows, I can't use signals.):
#timeout_decorator.timeout(30, use_signals=False)
def run_func(func):
f = io.BytesIO()
with stdout_redirector(f):
func()
return f.getvalue().decode('utf-8')
And the following endpoint :
#app.get('/foo')
def get_foo():
data = run_func(A.foo)
return {'data' : data}
but the EOFError: Ran out of input is triggered by thetimeout_decorator module.
You can use the 'timeout_decorator' package:
https://pypi.org/project/timeout-decorator/
Back in the days, it provided me the solution for a similar issue; I was/am not using FastAPI, but pretty much the same thing(AFAIU).
Basically, you just decorate the function you want to stop in case it surpasses some "T-seconds" timeout. Here is the code I used it:
https://github.com/chbrandt/eada/blob/master/eada/vo/conesearch.py#L57
Related
I'm developing a standard python script file (no servers, no async, no multiprocessing, ...) i.e. a classic data science program where I load data, process it as dataframes, and so on. Everything is synchronous.
At some point, I need to call a function of an external library which is totally external to me (I have no control on it, I don't know how it does what it does), like
def inside_my_function(...):
# My code
result = the_function(params)
# Other code
Now, this the_function sometimes never terminates (I don't know why, probably there are bugs or some conditions which makes it stuck, but it's completely random), and when it happens my program gets stuck as well.
Since I have to use it and it cannot be modified, I would like to know if there is a way for example to wrap it in another function which calls the_function, waits for some timeout, and if the_function returns before the timeout the result is returned, otherwise the_function is somehow killed, aborted, skipped, whatever, and retried up to n times.
I realise that in order to execute the_function and check for timeout at the same time for example multithreading will be needed, but I'm not sure if it makes sense and how to implement it correctly without doing bad practices.
How would you proceed?
EDIT: I would avoid multiprocessing because of the great overhead and because I don't want to overcomplicate things with serializability and so on.
Thank you
import time
import random
import threading
def func_that_waits():
t1 = time.time()
while (time.time() - t1) <= 3:
time.sleep(1)
if check_unreliable_func.worked:
break
if not check_unreliable_func.worked:
print("unreliable function has been working for too long, it's killed.")
def check_unreliable_func(func):
check_unreliable_func.worked = False
def inner(*args,**qwargs):
func(*args,**qwargs)
check_unreliable_func.worked = True
return inner
def unreliable_func():
working_time = random.randint(1,6)
time.sleep(working_time)
print(f"unreliable_func has been working for {working_time} seconds")
to_wait = threading.Thread(target=func_that_waits)
main_func = threading.Thread(target=check_unreliable_func(unreliable_func), daemon=True)
main_func.start()
to_wait.start()
Unreliable_func - the function we do not know if it works
check_unreliable_func(func) - decorator the only purpose of which is to make to_wait thread know that unreliable_func returned something and so there is no sense for to_wait to work further
the main thing to understand is that main_func thread is daemon one so it means that when to_wait thread is terminated all daemon threads are terminated automatically and no matter what they've been doing in the moment
Of course it's really far from being best practice, I just show how it can be done. And how it should be done - I myself would be glad to see it too.
Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()
It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.
There have been some questions discussing this but none have the set of constraints that I have so maybe someone will come with a good idea.
Basically I need to set-up a timeout for a Python function under the following constraints:
Cross-platform (i.e. no signal.ALARM)
Not Python 3 (I can assume Python >= 2.7.9)
Only the function needs to be timed-out, can't just exit the entire program.
I have absolutely no control over the called function, i.e. it's a callback using an abstract interface (using derived classes and overrides). Other people will be writing these callback functions and the assumption is that they're idiots.
Example code:
class AbstractInterface(Object):
def Callback(self):
# This will be overridden by derived classes.
# Assume the implementation cannot be controlled or modified.
pass
...
def RunCallbacks(listofcallbacks):
# This is function I can control and modify
for cb in listofcallbacks:
# The following call should not be allowed to execute
# for more than X seconds. If it does, the callback should
# be terminated but not the entire iteration
cb.Callback()
Any ideas will be greatly appreciated.
Other people will be writing these callback functions and the assumption is that they're idiots.
You really shouldn't execute code from people you consider 'idiots'.
However, I came up with one possibility shown below (only tested in python3 but should work in python2 with minor modifications).
Warning: This runs every callback in a new process, which is terminated after the specified timeout.
from multiprocessing import Process
import time
def callback(i):
while True:
print("I'm process {}.".format(i))
time.sleep(1)
if __name__ == '__main__':
for i in range(1, 11):
p = Process(target=callback, args=(i,))
p.start()
time.sleep(2) # Timeout
p.terminate()
I am trying to execute a time-consuming back-end job, executed by a front-end call. This back-end job should execute a callback method when it is completed, which will release a semaphore. The front end shouldn't have to wait for the long process to finish in order to get a response from the call to kick off the job.
I'm trying to use the Pool class from the multiprocessing library to solve this issue, but I'm running into some issues. Namely that it seems like the only way to actually execute the method passed into apply_async is to call the .get() method in the ApplyResult object that is returned by the apply_async call.
In order to solve this, I thought to create a Process object with the target being apply_result.get. But this doesn't seem to work.
Is there a basic understanding that I'm missing here? What would you folks suggest to solve this issue.
Here is a snippet example of what I have right now:
p = Pool(1)
result = p.apply_async(long_process, args=(config, requester), callback=complete_long_process)
Process(target=result.get).start()
response = {'status': 'success', 'message': 'Job started for {0}'.format(requester)}
return jsonify(response)
Thanks for the help in advance!
I don't quite understand why you would need a Process object here. Look at this snippet:
#!/usr/bin/python
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
from time import sleep
def complete_long_process(foo):
print "completed", foo
def long_process(a,b):
print a,b
sleep(10)
p = Pool(1)
result = p.apply_async(long_process, args=(1, 42),
callback=complete_long_process)
print "submitted"
sleep(20)
If I understand what you are trying to achieve, this does exactly that. As soon as you call apply_async, it launches long_process function and execution of the main program continues. As soon as it completes, complete_long_process is called. There is no need to use get method to execute long_process, and the code does not block and wait anything.
If your long_process does not appear to run, I assume your problem is somewhere within long_process.
Hannu
I am designing a Python app by calling a C++ DLL, I have posted my interaction between my DLL and Python 3.4 here. But now I need to do some process in streaming involving a threading based model and my callback function looks to put in a queue all the prints and only when my streaming has ended, all the Info is printed.
def callbackU(OutList, ConList, nB):
for i in range(nB):
out_list_item = cast(OutList[i], c_char_p).value
print("{}\t{}".format(ConList[i], out_list_item))
return 0
I have tried to use the next ways, but all of them looks to work in the same way:
from threading import Lock
print_lock = Lock()
def save_print(*args, **kwargs):
with print_lock:
print (*args, **kwargs)
def callbackU(OutList, ConList, nB):
for i in range(nB):
out_list_item = cast(OutList[i], c_char_p).value
save_print(out_list_item))
return 0
and:
import sys
def callbackU(OutList, ConList, nB):
for i in range(nB):
a = cast(OutList[i], c_char_p).value
sys.stdout.write(a)
sys.stdout.flush()
return 0
I would like that my callback prints its message when the it is called, not when the whole process ends.
I can find what was the problem, I am using a thread based process that needs to stay for an indefinite time before end it. In c++ I'm using getchar() to wait until the process has to be ended, then when I pushed the enter button the process jump to the releasing part. I also tried to use sleep()s of 0.5 secs in a while until a definite time has passed to test if that could help me, but it didn't. Both methods worked in the same way in my Python application, the values that I needed to have in streaming were put in a queue first and unless the process ended that values were printed.
The solution was to make two functions, the former one for initialize the thread based model. And the last one function for ends the process. By so doing I didn't need a getchar() neither a sleep(). This works pretty good to me!, thanks for you attention!