Python 3.7+: Wait until result is produced - api/task system - python

In the following code, an API gives a task to a task broker, who puts it in a queue, where it is picked up by a worker. The worker will then execute the task and notify the task broker (using a redis message channel) that he is done, after which the task broker will remove it from its queue. This works.
What I'd like is that the task broker is then able to return the result of the task to the API. But I'm unsure on how to do so since it is asynchronous code and I'm having difficulty figuring it out. Can you help?
Simplified the code is roughly as follows, but incomplete.
The API code:
#router.post('', response_model=BaseDocument)
async def post_document(document: BaseDocument):
"""Create the document with a specific type and an optional name given in the payload"""
task = DocumentTask({ <SNIP>
})
task_broker.give_task(task)
result = await task_broker.get_task_result(task)
return result
The task broker code, first part is giving the task, the second part is removing the task and the final part is what I assume should be a blocking call on the status of the removed task
def give_task(self, task_obj):
self.add_task_to_queue(task_obj)
<SNIP>
self.message_channel.publish(task_obj)
# ...
def remove_task_from_queue(self, task):
id_task_to_remove = task.id
for i in range(len(task_queue)):
if task_queue[i]["id"] == id_task_to_remove:
removed_task = task_queue.pop(i)
logger.debug(
f"[TaskBroker] Task with id '{id_task_to_remove}' succesfully removed !"
)
removed_task["status"] = "DONE"
return
# ...
async def get_task_result(self, task):
return task.result
My intuition would like to implement a way in get_task_result that blocks on task.result until it is modified, where I would modify it in remove_task_from_queue when it is removed from the queue (and thus done).
Any idea in how to do this, asynchronously?

Related

ThreadPoolExecutor communication with separate asyncio loop

I have a task that is IO bound running in a loop. This task does a lot of work and is often times hogging the loop (Is that the right word for it?). My plan is to run it in a separate process or thread using run_in_executor with ProcessPoolExecutor or ThreadPoolExecutor to run it separately and allow the main loop to do its work. Currently for communication between tasks I use asyncio.PriorityQueue() and asyncio.Event() for communication and would like to reuse these, or something with the same interface, if possible.
Current code:
# Getter for events and queues so communication can happen
send, receive, send_event, receive_event = await process_obj.get_queues()
# Creates task based off the process object
future = asyncio.create_task(process_obj.main())
Current process code:
async def main():
while True:
#does things that hogs loop
What I want to do:
# Getter for events and queues so communication can happen
send, receive, send_event, receive_event = await process_obj.get_queues()
# I assume I could use Thread or Process executors
pool = concurrent.futures.ThreadPoolExecutor()
result = await loop.run_in_executor(pool, process_obj.run())
New process code:
def run():
asyncio.create_task(main())
async def main():
while True:
#does things that hogs loop
How do I communicate between this new thread and the original loop like I could originally?
There is not much I could reproduce your code. So please consider this code from YouTube Downloader as example and I hope that will help you to understand how to get result from thread function:
example code:
def on_download(self, is_mp3: bool, is_mp4: bool, url: str) -> None:
if is_mp3 == False and is_mp4 == False:
self.ids.info_lbl.text = 'Please select a type of file to download.'
else:
self.ids.info_lbl.text = 'Downloading...'
self.is_mp3 = is_mp3
self.is_mp4 = is_mp4
self.url = url
Clock.schedule_once(self.schedule_download, 2)
Clock.schedule_interval(self.start_progress_bar, 0.1)
def schedule_download(self, dt: float) -> None:
'''
Callback method for the download.
'''
pool = ThreadPool(processes=1)
_downloader = Downloader(self.d_path)
self.async_result = pool.apply_async(_downloader.download,
(self.is_mp3, self.is_mp4, self.url))
Clock.schedule_interval(self.check_process, 0.1)
def check_process(self, dt: float) -> None:
'''
Check if download is complete.
'''
if self.async_result.ready():
resp = self.async_result.get()
if resp[0] == 'Error. Download failed.':
self.ids.info_lbl.text = resp[0]
# progress bar gray if error
self.stop_progress_bar(value=0)
else:
# progress bar blue if success
self.stop_progress_bar(value=100)
self.ids.file_name.text = resp[0]
self.ids.info_lbl.text = 'Finished downloading.'
self.ids.url_input.text = ''
Clock.unschedule(self.check_process)
Personally I prefer from multiprocessing.pool import ThreadPool and now it looks like your code 'hogs up' because you are awaiting for result. So obviously until there is result program will wait (and that may be long). If you look in my example code:
on_download will schedule and event schedule download and this one will schedule another event check process. I can't tell if you app is GUI app or terminal as there is pretty much no code in your question but what you have to do, in your loop you have to schedule an event of check process.
If you look on my check process: if self.async_result.ready(): that will only return when my result is ready.
Now you are waiting for the result, here everything is happening in the background and every now and then the main loop will check for the result (it won't hog up as if there is no result the main loop will carry on doing what it have to rather than wait for it).
So basically you have to schedule some events (especially the one for the result) in your loop rather than going line by line and waiting for one. Does that make sense and does my example code is helpful? Sorry I am really bad at explaining what is in my head ;)
-> mainloop
-> new Thread if there is any
-> check for result if there is any Threads
-> if there is a result
-> do something
-> mainloop keeps running
-> back to top
When you execute the while True in your main coroutine, it doesn't hog the loop but blocks the loop not accepting the rest task to do their jobs. Running a process in your event-based application is not the best solution as the processes are not much friendly in data sharing.
It is possible to do all concurrently without using parallelism. All you need is to execute a await asyncio.sleep(0) at the end of while True. It yields back to the loop and allows the rest tasks to be executed. So we do not exit from the coroutine.
In the following example, I have a listener that uses while True and handles the data added by emitter to the queue.
import asyncio
from queue import Empty
from queue import Queue
from random import choice
queue = Queue()
async def listener():
while True:
try:
# data polling from the queue
data = queue.get_nowait()
print(data) # {"type": "event", "data": {...}}
except (Empty, Exception):
pass
finally:
# the magic action
await asyncio.sleep(0)
async def emitter():
# add a data to the queue
queue.put({"type": "event", "data": {...}})
async def main():
# first create a task for listener
running_loop = asyncio.get_running_loop()
running_loop.create_task(listener())
for _ in range(5):
# create tasks for emitter with random intervals to
# demonstrate that the listener is still running in
# the loop and handling the data put into the queue
running_loop.create_task(emitter())
await asyncio.sleep(choice(range(2)))
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Celery wait for a task from another request

Let's say there is a long task that takes 1 minute. When a user makes a request /get-info and waiting for the response it should return a result. I'm using delay(), wait() and everything works. Now I want if another 5 users make same request /get-info I want them 'connect' to the same task and get result once the task is finished. I'm trying to save task id in redis. But so far I'm having 2 problems.
If I use AsyncResult() and wait() the second request hangs.
If I use AsyncResult() and state, the first request hangs. How can I implement that?
#main.route('/get-info', methods=['POST'])
def get_info():
if redis.exists('getInfoTaskId'):
taks_id = redis.get('getInfoTaskId')
task = add_together.AsyncResult(taks_id)
result = task.wait()
# result = task.state - if uncomment and comment the line above the first req hangs
else:
task = add_together.delay(23, 42)
redis.set('getInfoTaskId', task.id, ex=600)
result = task.wait()
redis.delete('getInfoTaskId')
return f"task result is {result}"

Celery AsyncResult - Not working

I am new to celery but failing at what should be simple:
Backend and broker are both configured for RabbitMQ
Task as follows:
#app.task
def add(x, y):
return x + y
Test Code:
File 1:
from tasks import add
from celery import uuid
task_id = uuid()
result = add.delay(7, 2)
task_id = result.task_id
print task_id
# output =
05f3f783-a538-45ed-89e3-c836a2623e8a
print result.get()
# output =
9
File 2:
from tasks import add
from celery.result import AsyncResult
res = AsyncResult('05f3f783-a538-45ed-89e3-c836a2623e8a')
print res.state
# output =
pending
print ('Result = %s' %res.get())
My understanding is file 2 should retrieve the value success and 9.
I have installed flower:
This reports success and 9 for the result.
Help. This is driving me nuts.
Thank you
Maybe you should read the FineManual and think twice ?
RPC Result Backend (RabbitMQ/QPid)
The RPC result backend (rpc://) is special as it doesn’t actually store the states, but rather sends
them as messages. This is an important difference as it means that a
result can only be retrieved once, and only by the client that
initiated the task. Two different processes can’t wait for the same
result.
(...)
The messages are transient (non-persistent) by default, so the results
will disappear if the broker restarts. You can configure the result
backend to send persistent messages using the result_persistent
setting.

How can I notify RxPY observers on separate threads using asyncio?

(Note: The background for this problem is pretty verbose, but there's an SSCCE at the bottom that can be skipped to)
Background
I'm trying to develop a Python-based CLI to interact with a web service. In my codebase I have a CommunicationService class that handles all direct communication with the web service. It exposes a received_response property that returns an Observable (from RxPY) that other objects can subscribe to in order to be notified when responses are received back from the web service.
I've based my CLI logic on the click library, where one of my subcommands is implemented as below:
async def enabled(self, request: str, response_handler: Callable[[str], Tuple[bool, str]]) -> None:
self._generate_request(request)
if response_handler is None:
return None
while True:
response = await self.on_response
success, value = response_handler(response)
print(success, value)
if success:
return value
What's happening here (in the case that response_handler is not None) is that the subcommand is behaving as a coroutine that awaits responses from the web service (self.on_response == CommunicationService.received_response) and returns some processed value from the first response it can handle.
I'm trying to test the behaviour of my CLI by creating test cases in which CommunicationService is completely mocked; a fake Subject is created (which can act as an Observable) and CommunicationService.received_response is mocked to return it. As part of the test, the subject's on_next method is invoked to pass mock web service responses back to the production code:
#when('the communications service receives a response from TestCube Web Service')
def step_impl(context):
context.mock_received_response_subject.on_next(context.text)
I use a click 'result callback' function that gets invoked at the end of the CLI invocation and blocks until the coroutine (the subcommand) is done:
#cli.resultcallback()
def _handle_command_task(task: Coroutine, **_) -> None:
if task:
loop = asyncio.get_event_loop()
result = loop.run_until_complete(task)
loop.close()
print('RESULT:', result)
Problem
At the start of the test, I run CliRunner.invoke to fire off the whole shebang. The problem is that this is a blocking call and will block the thread until the CLI has finished and returned a result, which isn't helpful if I need my test thread to carry on so it can produce mock web service responses concurrently with it.
What I guess I need to do is run CliRunner.invoke on a new thread using ThreadPoolExecutor. This allows the test logic to continue on the original thread and execute the #when step posted above. However, notifications published with mock_received_response_subject.on_next do not seem to trigger execution to continue within the subcommand.
I believe the solution would involve making use of RxPY's AsyncIOScheduler, but I'm finding the documentation on this a little sparse and unhelpful.
SSCCE
The snippet below captures what I hope is the essence of the problem. If it can be modified to work, I should be able to apply the same solution to my actual code to get it to behave as I want.
import asyncio
import logging
import sys
import time
import click
from click.testing import CliRunner
from rx.subjects import Subject
web_response_subject = Subject()
web_response_observable = web_response_subject.as_observable()
thread_loop = asyncio.new_event_loop()
#click.group()
def cli():
asyncio.set_event_loop(thread_loop)
#cli.resultcallback()
def result_handler(task, **_):
loop = asyncio.get_event_loop()
result = loop.run_until_complete(task) # Should block until subject publishes value
loop.close()
print(result)
#cli.command()
async def get_web_response():
return await web_response_observable
def test():
runner = CliRunner()
future = thread_loop.run_in_executor(None, runner.invoke, cli, ['get_web_response'])
time.sleep(1)
web_response_subject.on_next('foo') # Simulate reception of web response.
time.sleep(1)
result = future.result()
print(result.output)
logging.basicConfig(
level=logging.DEBUG,
format='%(threadName)10s %(name)18s: %(message)s',
stream=sys.stderr,
)
test()
Current Behaviour
The program hangs when run, blocking at result = loop.run_until_complete(task).
Acceptance Criteria
The program terminates and prints foo on stdout.
Update 1
Based on Vincent's help I've made some changes to my code.
Relay.enabled (the subcommand that awaits responses from the web service in order to process them) is now implemented like this:
async def enabled(self, request: str, response_handler: Callable[[str], Tuple[bool, str]]) -> None:
self._generate_request(request)
if response_handler is None:
return None
return await self.on_response \
.select(response_handler) \
.where(lambda result, i: result[0]) \
.select(lambda result, index: result[1]) \
.first()
I wasn't quite sure how await would behave with RxPY observables - would they return execution to the caller on each element generated, or only when the observable has completed (or errored?). I now know it's the latter, which honestly feels like the more natural choice and has allowed me to make the implementation of this function feel a lot more elegant and reactive.
I've also modified the test step that generates mock web service responses:
#when('the communications service receives a response from TestCube Web Service')
def step_impl(context):
loop = asyncio.get_event_loop()
loop.call_soon_threadsafe(context.mock_received_response_subject.on_next, context.text)
Unfortunately, this will not work as it stands, since the CLI is being invoked in its own thread...
#when('the CLI is run with "{arguments}"')
def step_impl(context, arguments):
loop = asyncio.get_event_loop()
if 'async.cli' in context.tags:
context.async_result = loop.run_in_executor(None, context.cli_runner.invoke, testcube.cli, arguments.split())
else:
...
And the CLI creates its own thread-private event loop when invoked...
def cli(context, hostname, port):
_initialize_logging(context.meta['click_log.core.logger']['level'])
# Create a new event loop for processing commands asynchronously on.
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
...
What I think I need is a way to allow my test steps to invoke the CLI on a new thread and then fetch the event loop it's using:
#when('the communications service receives a response from TestCube Web Service')
def step_impl(context):
loop = _get_cli_event_loop() # Needs to be implemented.
loop.call_soon_threadsafe(context.mock_received_response_subject.on_next, context.text)
Update 2
There doesn't seem to be an easy way to get the event loop that a particular thread creates and uses for itself, so instead I took Victor's advice and mocked asyncio.new_event_loop to return an event loop that my test code creates and stores:
def _apply_mock_event_loop_patch(context):
# Close any already-existing exit stacks.
if hasattr(context, 'mock_event_loop_exit_stack'):
context.mock_event_loop_exit_stack.close()
context.test_loop = asyncio.new_event_loop()
print(context.test_loop)
context.mock_event_loop_exit_stack = ExitStack()
context.mock_event_loop_exit_stack.enter_context(
patch.object(asyncio, 'new_event_loop', spec=True, return_value=context.test_loop))
I change my 'mock web response received' test step to do the following:
#when('the communications service receives a response from TestCube Web Service')
def step_impl(context):
loop = context.test_loop
loop.call_soon_threadsafe(context.mock_received_response_subject.on_next, context.text)
The great news is that I'm actually getting the Relay.enabled coroutine to trigger when this step gets executed!
The only problem now is the final test step in which I await the future I got from executing the CLI in its own thread and validate that the CLI is sending this on stdout:
#then('the CLI should print "{output}"')
def step_impl(context, output):
if 'async.cli' in context.tags:
loop = asyncio.get_event_loop() # main loop, not test loop
result = loop.run_until_complete(context.async_result)
else:
result = context.result
assert_that(result.output, equal_to(output))
I've tried playing around with this but I can't seem to get context.async_result (which stores the future from loop.run_in_executor) to transition nicely to done and return the result. With the current implementation, I get an error for the first test (1.1) and indefinite hanging for the second (1.2):
#mock.comms #async.cli #wip
Scenario Outline: Querying relay enable state -- #1.1 # testcube/tests/features/relay.feature:45
When the user queries the enable state of relay 0 # testcube/tests/features/steps/relay.py:17 0.003s
Then the CLI should query the web service about the enable state of relay 0 # testcube/tests/features/steps/relay.py:48 0.000s
When the communications service receives a response from TestCube Web Service # testcube/tests/features/steps/core.py:58 0.000s
"""
{'module':'relays','path':'relays[0].enabled','data':[True]}'
"""
Then the CLI should print "True" # testcube/tests/features/steps/core.py:94 0.003s
Traceback (most recent call last):
File "/Users/davidfallah/testcube_env/lib/python3.5/site-packages/behave/model.py", line 1456, in run
match.run(runner.context)
File "/Users/davidfallah/testcube_env/lib/python3.5/site-packages/behave/model.py", line 1903, in run
self.func(context, *args, **kwargs)
File "testcube/tests/features/steps/core.py", line 99, in step_impl
result = loop.run_until_complete(context.async_result)
File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
return future.result()
File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/futures.py", line 274, in result
raise self._exception
File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/concurrent/futures/thread.py", line 55, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/davidfallah/testcube_env/lib/python3.5/site-packages/click/testing.py", line 299, in invoke
output = out.getvalue()
ValueError: I/O operation on closed file.
Captured stdout:
RECEIVED WEB RESPONSE: {'module':'relays','path':'relays[0].enabled','data':[True]}'
<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncio/futures.py:431]>
#mock.comms #async.cli #wip
Scenario Outline: Querying relay enable state -- #1.2 # testcube/tests/features/relay.feature:46
When the user queries the enable state of relay 1 # testcube/tests/features/steps/relay.py:17 0.005s
Then the CLI should query the web service about the enable state of relay 1 # testcube/tests/features/steps/relay.py:48 0.001s
When the communications service receives a response from TestCube Web Service # testcube/tests/features/steps/core.py:58 0.000s
"""
{'module':'relays','path':'relays[1].enabled','data':[False]}'
"""
RECEIVED WEB RESPONSE: {'module':'relays','path':'relays[1].enabled','data':[False]}'
Then the CLI should print "False" # testcube/tests/features/steps/core.py:94
Chapter 3: Finale
Screw all this asynchronous multi-threaded stuff, I'm too dumb for it.
First off, instead of describing the scenario like this...
When the user queries the enable state of relay <relay_id>
Then the CLI should query the web service about the enable state of relay <relay_id>
When the communications service receives a response from TestCube Web Service:
"""
{"module":"relays","path":"relays[<relay_id>].enabled","data":[<relay_enabled>]}
"""
Then the CLI should print "<relay_enabled>"
We describe it like this:
Given the communications service will respond to requests:
"""
{"module":"relays","path":"relays[<relay_id>].enabled","data":[<relay_enabled>]}
"""
When the user queries the enable state of relay <relay_id>
Then the CLI should query the web service about the enable state of relay <relay_id>
And the CLI should print "<relay_enabled>"
Implement the new given step:
#given('the communications service will respond to requests')
def step_impl(context):
response = context.text
def publish_mock_response(_):
loop = context.test_loop
loop.call_soon_threadsafe(context.mock_received_response_subject.on_next, response)
# Configure the mock comms service to publish a mock response when a request is made.
instance = context.mock_comms.return_value
instance.send_request.on_next.side_effect = publish_mock_response
BOOM
2 features passed, 0 failed, 0 skipped
22 scenarios passed, 0 failed, 0 skipped
58 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.111s
I can see two problems with your code:
asyncio is not thread-safe, unless you use call_soon_threadsafe or run_coroutine_threadsafe. RxPy doesn't use any of those in Observable.to_future, so you have to access RxPy objects in the same thread that runs the asyncio event loop.
RxPy sets the result of the future when on_completed is called, so that awaiting for an observable returns the last object emitted. This means you have to call both on_next and on_completed to get await to return.
Here is a working example:
import click
import asyncio
from rx.subjects import Subject
from click.testing import CliRunner
web_response_subject = Subject()
web_response_observable = web_response_subject.as_observable()
main_loop = asyncio.get_event_loop()
#click.group()
def cli():
pass
#cli.resultcallback()
def result_handler(task, **_):
future = asyncio.run_coroutine_threadsafe(task, main_loop)
print(future.result())
#cli.command()
async def get_web_response():
return await web_response_observable
def test():
runner = CliRunner()
future = main_loop.run_in_executor(
None, runner.invoke, cli, ['get_web_response'])
main_loop.call_later(1, web_response_subject.on_next, 'foo')
main_loop.call_later(2, web_response_subject.on_completed)
result = main_loop.run_until_complete(future)
print(result.output, end='')
if __name__ == '__main__':
test()

Store HttpResponse in dictionary

I have a trouble with long calculations in django. I am not able to install Celery because of idiocy of my company, so I have to "reinvent the wheel".I am trying to make all calculations in TaskQueue class, which stores all calculations in dictionary "results". Also, I am trying to make "Please Wait" page, which will asks this TaskQueue if task with provided key is ready.
And the problem is that the results somehow disappear.
I have some view with long calculations.
def some_view(request):
...
uuid = task_queue.add_task(method_name, params) #method_name(params) returns HttpResponse
return redirect('/please_wait/?uuid={0}'.format(uuid))
And please_wait view:
def please_wait(request):
uuid = request.GET.get('uuid','0')
ready = task_queue.task_ready(uuid)
if ready:
return task_queue.task_result(uuid)
elif ready == None:
return render_to_response('admin/please_wait.html',{'not_found':True})
else:
return render_to_response('admin/please_wait.html',{'not_found':False})
And last code, my TaskQueue:
class TaskQueue:
def __init__(self):
self.pool = ThreadPool()
self.results = {}
self.lock = Lock()
def add_task(self, method, params):
self.lock.acquire()
new_uuid = self.generate_new_uuid()
while self.results.has_key(new_uuid):
new_uuid = self.generate_new_uuid()
self.results[new_uuid] = self.pool.apply_async(func=method, args=params)
self.lock.release()
return new_uuid
def generate_new_uuid(self):
return uuid.uuid1().hex[0:8]
def task_ready(self, task_id):
if self.results.has_key(task_id):
return self.results[task_id].ready()
else:
return None
def task_result(self, task_id):
if self.task_ready(task_id):
return self.results[task_id].get()
else:
return None
global task_queue = TaskQueue()
After task addition I could log result providing it's uuid for some seconds, and then it says that task doesn't ready. Here is my log: (I am outputting task_queue.results)
[INFO] 2013-10-01 16:04:52,782 logger: {'ade5d154': <multiprocessing.pool.ApplyResult object at 0x1989906c>}
[INFO] 2013-10-01 16:05:05,740 logger: {}
Help me, please! Why the hell result disappears?
UPD: #freakish helped me to find out some new information. This result doesn't disappear forever, it disappears sometimes if I will repeat my tries to log it.
[INFO] 2013-10-01 16:52:41,743 logger: {}
[INFO] 2013-10-01 16:52:45,775 logger: {}
[INFO] 2013-10-01 16:52:48,855 logger: {'ade5d154': <multiprocessing.pool.ApplyResult object at 0x1989906c>}
OK, so we've established that you are running 4 processes of Django. In that case your queue won't be shared between them. Actually there are two possible solutions AFAIK:
Use a shared queueing server. You can write your own (see for example this entry) but using a proper one (like Celery) will be a lot easier (if you can't convince your employer to install it, then quit the job ;)).
Use database to store results inside it and let each server do the calculations (via processes or threads). It does not have to be a proper database server. You can use sqlite3 for example. This is more secure and reliable way but less efficient. I think this is easier then queueing mechanism. You simply create table with columns: id, state, result. When you create job you update entry with state=processing, when you finish the job you update entry with state=done and result=result (for example as JSON string). This is easy and reliable (you actually don't need a queue here at all, the order of jobs doesn't matter unless I'm missing something).
Of course you won't be able to use this .ready() functions with it (you should store results inside these storages) unless you pickle results but that is an unnecessary overhead.

Categories

Resources