I am trying to execute a time-consuming back-end job, executed by a front-end call. This back-end job should execute a callback method when it is completed, which will release a semaphore. The front end shouldn't have to wait for the long process to finish in order to get a response from the call to kick off the job.
I'm trying to use the Pool class from the multiprocessing library to solve this issue, but I'm running into some issues. Namely that it seems like the only way to actually execute the method passed into apply_async is to call the .get() method in the ApplyResult object that is returned by the apply_async call.
In order to solve this, I thought to create a Process object with the target being apply_result.get. But this doesn't seem to work.
Is there a basic understanding that I'm missing here? What would you folks suggest to solve this issue.
Here is a snippet example of what I have right now:
p = Pool(1)
result = p.apply_async(long_process, args=(config, requester), callback=complete_long_process)
Process(target=result.get).start()
response = {'status': 'success', 'message': 'Job started for {0}'.format(requester)}
return jsonify(response)
Thanks for the help in advance!
I don't quite understand why you would need a Process object here. Look at this snippet:
#!/usr/bin/python
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
from time import sleep
def complete_long_process(foo):
print "completed", foo
def long_process(a,b):
print a,b
sleep(10)
p = Pool(1)
result = p.apply_async(long_process, args=(1, 42),
callback=complete_long_process)
print "submitted"
sleep(20)
If I understand what you are trying to achieve, this does exactly that. As soon as you call apply_async, it launches long_process function and execution of the main program continues. As soon as it completes, complete_long_process is called. There is no need to use get method to execute long_process, and the code does not block and wait anything.
If your long_process does not appear to run, I assume your problem is somewhere within long_process.
Hannu
Related
I'm developing a standard python script file (no servers, no async, no multiprocessing, ...) i.e. a classic data science program where I load data, process it as dataframes, and so on. Everything is synchronous.
At some point, I need to call a function of an external library which is totally external to me (I have no control on it, I don't know how it does what it does), like
def inside_my_function(...):
# My code
result = the_function(params)
# Other code
Now, this the_function sometimes never terminates (I don't know why, probably there are bugs or some conditions which makes it stuck, but it's completely random), and when it happens my program gets stuck as well.
Since I have to use it and it cannot be modified, I would like to know if there is a way for example to wrap it in another function which calls the_function, waits for some timeout, and if the_function returns before the timeout the result is returned, otherwise the_function is somehow killed, aborted, skipped, whatever, and retried up to n times.
I realise that in order to execute the_function and check for timeout at the same time for example multithreading will be needed, but I'm not sure if it makes sense and how to implement it correctly without doing bad practices.
How would you proceed?
EDIT: I would avoid multiprocessing because of the great overhead and because I don't want to overcomplicate things with serializability and so on.
Thank you
import time
import random
import threading
def func_that_waits():
t1 = time.time()
while (time.time() - t1) <= 3:
time.sleep(1)
if check_unreliable_func.worked:
break
if not check_unreliable_func.worked:
print("unreliable function has been working for too long, it's killed.")
def check_unreliable_func(func):
check_unreliable_func.worked = False
def inner(*args,**qwargs):
func(*args,**qwargs)
check_unreliable_func.worked = True
return inner
def unreliable_func():
working_time = random.randint(1,6)
time.sleep(working_time)
print(f"unreliable_func has been working for {working_time} seconds")
to_wait = threading.Thread(target=func_that_waits)
main_func = threading.Thread(target=check_unreliable_func(unreliable_func), daemon=True)
main_func.start()
to_wait.start()
Unreliable_func - the function we do not know if it works
check_unreliable_func(func) - decorator the only purpose of which is to make to_wait thread know that unreliable_func returned something and so there is no sense for to_wait to work further
the main thing to understand is that main_func thread is daemon one so it means that when to_wait thread is terminated all daemon threads are terminated automatically and no matter what they've been doing in the moment
Of course it's really far from being best practice, I just show how it can be done. And how it should be done - I myself would be glad to see it too.
I have a class A with a function foo() that logs information for an infinite time.
I would like to execute this function for 30 sec, retrieving these logs. For the recovery of the logs, I base myself on this article, the logs being realized at the C level.
So I realized, in addition to the code of the previous article, this portion of code, allowing to stop the execution of the function after 30 seconds.
if __name__ == '__main__':
f = io.BytesIO()
with stdout_redirector(f):
p = multiprocessing.Process(target=A.foo, name="myfunc")
p.start()
# Cleanup
p.join(30)
if p.is_alive():
# Terminate foo
p.terminate()
p.join()
data = f.getvalue().decode('utf-8')
This works fine as is.
However, I can't get this portion of the code into a fastAPI endpoint. Indeed, no matter what I try, errors around the multiprocessing appear. Either the endpoint returns nothing, or a Pickle error appears... I don't know what to do!
Here I use multiprocessing only to stop foo() after a while; maybe there is another way to avoid problems with fastAPI. Does anyone have a way to fix my problem?
EDIT #1
Based on Brandt's suggestion, the following function was done (Using windows, I can't use signals.):
#timeout_decorator.timeout(30, use_signals=False)
def run_func(func):
f = io.BytesIO()
with stdout_redirector(f):
func()
return f.getvalue().decode('utf-8')
And the following endpoint :
#app.get('/foo')
def get_foo():
data = run_func(A.foo)
return {'data' : data}
but the EOFError: Ran out of input is triggered by thetimeout_decorator module.
You can use the 'timeout_decorator' package:
https://pypi.org/project/timeout-decorator/
Back in the days, it provided me the solution for a similar issue; I was/am not using FastAPI, but pretty much the same thing(AFAIU).
Basically, you just decorate the function you want to stop in case it surpasses some "T-seconds" timeout. Here is the code I used it:
https://github.com/chbrandt/eada/blob/master/eada/vo/conesearch.py#L57
I know there is something called thread, but I am confused by those complex information all over Google. myFunc() takes a little time (not computationally expensive, say play a short mp3 file).
What I want to do is call myFunc() and don't need to wait for it to return to run the following lines of code. Furthermore, I don't need to keep anything related to myFunc(arg), I only need it to be executed only.
while(True):
......
myFunc(arg)
###some
###lines
###of
###code
Sorry for my bad English. Cheers!
from threading import Thread
def myFunc(arg):
# run code here
while(True):
thread = Thread(target = myFunc, args = (arg, ))
thread.start() # starts the thread, executes the function
###some
###lines
###of
###code
thread.join() # wait for myFunc to finish
You can do similarly with processes instead of threads.
Might want take a look into pools if you want to perform a list of arguments with the same function. You can call imap and iterate the results and call the rest of the code.
I have a function get_data(request) that requests some data to a server. Every time this function is called, it request data to a different server. All of them should return the same response.
I would like to get the response as soon as possible. I need to create a function that calls get_data several times, and returns the first response it gets.
EDIT:
I came up with an idea of using multithreading.Pipe(), but I have the feeling this is a very bad way to solve it, what do you think?:
def get_data(request, pipe):
data = # makes the request to a server, this can take a random amount of time
pipe.send(data)
def multiple_requests(request, num_servers):
my_pipe, his_pipe = multithreading.Pipe()
for i in range(num_servers):
Thread(target = get_data, args = (request,his_pipe)).start()
return my_pipe.recv()
multiple_requests("the_request_string", 6)
I think this is a bad way of doing it because you are passing the same pipe to all threads, and I don't really know but I guess that has to be very unsafe.
I think redis rq will be good for it. get_data is a job what you put in the queue six times. Jobs executes async, in the docs your also can read how to operate with results.
I'm trying to use a new thread or multiprocessing to run a function.
The function is called like this:
Actualize_files_and_folders(self)
i've read alot about multiprocessing and threads, and looking the questions at StackOverflow but i can't make it work... It would be great to have some help >.<
im calling the function with a button.
def on_button_act_clicked(self, menuitem, data=None):
self.window_waiting.show()
Actualize_files_and_folders(self)
self.window_waiting.hide()
In the waiting_window i have a button called 'cancel', it would be great if i can have a command/function that kills the thread.
i've tryed a lot of stuff, for exemple:
self.window_waiting.show()
from multiprocessing import Process
a=Process(Target=Actualize_files_and_folders(self))
a.start()
a.join()
self.window_waiting.hide()
But the window still freezing, and window_waiting is displayed at the end of Actualize_files_and_folders(self), like if i had called a normal function.
Thanks so much for any help!!
It looks like the worker function is being called rather than used as a callback for the process target:
process = Process(target=actualize_files_and_folders(self))
This is essentially equivalent to:
tmp = actualize_files_and_folders(self)
process = Process(target=tmp)
So the worker function is called in the main thread blocking it. The result of this function is passed into the Process as the target which will do nothing if it is None. You need to pass the function itself as a callback, not its result:
process = Process(target=actualize_files_and_folders, args=[self])
process.start()
See: https://docs.python.org/2/library/multiprocessing.html