So I was following a guide at http://tavendo.com/blog/post/going-asynchronous-from-flask-to-twisted-klein/ to create an asynchronous web service.
in my code, I had a function that will send out the request like
def query(text):
resp = yield treq.get("http://api.Iwanttoquery")
content = yield treq.content(resp)
returnValue(content)
#inlineCallbacks
def caller():
output1 = yield query("one")
output2 = yield query("two")
Since each query to the api usually take about 3 seconds, with my current code the result comes back after 6 seconds. I wonder is there a way to send out two queries at the same time so after 3 seconds I can get the content of both output1 and output2? Thanks.
What you need to do is use a DeferredList instead of inlineCallbacks. Basically you provide a list of deferreds and after each one completes, a final callback with the results of all the deferreds is executed.
import treq
from twisted.internet import defer, reactor
def query(text):
get = treq.get('http://google.com')
get.addCallback(treq.content)
return get
output1 = query('one')
output2 = query('two')
final = defer.DeferredList([output1, output2]) # wait for both queries to finish
final.addCallback(print) # print the results from all the queries in the list
reactor.run()
Each query() function will execute requests concurrently then return a Deferred. This happens almost immediately, so basically output1 and output2 are executing at the same time. Then you append the deferreds (ie. output1 and output2) inside a list and pass it in DeferredList, which itself returns a Deferred. Finally, you add a callback to the DeferredList to do something with the results (in this case I just print them). This is all done without the use of threads, which is the best part in my opinion! Hope this makes sense and please comment if it doesn't.
PS
If you need further help with Klein, I'm working on revamping the documentation here https://github.com/notoriousno/klein-basics (hopefully I'll make a blog post one of these days). Please take a look at some of the docs (the files with .rst). My shameless plug is now concluded :D
Related
I am trying to set up a fastAPI app doing the following:
Accept messages as post requests and put them in a queue;
A background job is, from time to time, pulling messages (up to a certain batch size) from the queue, processing them in a batch, and storing results in a dictionary;
The app is retrieving results from the dictionary and sending them back "as soon as" they are done.
To do so, I've set up a background job with apscheduler communicating via a queue trying to make a simplified version of this post: https://levelup.gitconnected.com/fastapi-how-to-process-incoming-requests-in-batches-b384a1406ec. Here is the code of my app:
import queue
import uuid
from asyncio import sleep
import uvicorn
from pydantic import BaseModel
from fastapi import FastAPI
from apscheduler.schedulers.asyncio import AsyncIOScheduler
app = FastAPI()
app.input_queue = queue.Queue()
app.output_dict = {}
app.queue_limit = 2
def upper_messages():
for i in range(app.queue_limit):
try:
obj = app.input_queue.get_nowait()
app.output_dict[obj['request_id']] = obj['text'].upper()
except queue.Empty:
pass
app.scheduler = AsyncIOScheduler()
app.scheduler.add_job(upper_messages, 'interval', seconds=5)
app.scheduler.start()
async def get_result(request_id):
while True:
if request_id in app.output_dict:
result = app.output_dict[request_id]
del app.output_dict[request_id]
return result
await sleep(0.001)
class Payload(BaseModel):
text: str
#app.post('/upper')
async def upper(payload: Payload):
request_id = str(uuid.uuid4())
app.input_queue.put({'text': payload.text, 'request_id': request_id})
return await get_result(request_id)
if __name__ == "__main__":
uvicorn.run(app)
however it's not really running asynchronously; if I invoke the following test script:
from time import time
import requests
texts = [
'text1',
'text2',
'text3',
'text4'
]
time_start = time()
for text in texts:
result = requests.post('http://127.0.0.1:8000/upper', json={'text': text})
print(result.text, time() - time_start)
the messages do get processed, but the whole processing takes 15-20 seconds, the output being something like:
"TEXT1" 2.961090087890625
"TEXT2" 7.96642279624939
"TEXT3" 12.962305784225464
"TEXT4" 17.96261429786682
I was instead expecting the whole processing to take 5-10 seconds (after less than 5 seconds the first two messages should be processed, and the other two more or less exactly 5 seconds later). It seems instead that the second message is not being put to the queue until the first one is processed - i.e. the same as if I were just using a single thread.
Questions:
Does anyone know how to modify the code above so that all the incoming messages are put to the queue immediately upon receiving them?
[bonus question 1]: The above holds true if I run the script (say, debug_app.py) from the command line via uvicorn debug_app:app. But if I run it with python3 debug_app.py no message is returned at all. Messages are received (doing CTRL+C results in Waiting for connections to close. (CTRL+C to force quit)) but never processed.
[bonus question 2]: Another thing I don't understand is why, if I remove the line await sleep(0.001) inside the definition of get_result, the behaviour gets even worse: no matter what I do, the app freezes, I cannot terminate it (i.e. neither CTRL+C nor kill work), I have to send a sigkill (kill -9) to stop it.
Background
If you are wondering why I am doing this, like in the blog post linked above, the purpose is to do efficient deep learning inference. The model I have takes (roughly) the same time processing one or a dozen requests at the same time, so batching can dramatically increase throughput. I first tried setting up a fastAPI frontend + RabbitMQ + Flask backend pipeline, and it worked, but the overhead of the complicated setup (and/or my inability of working with it) made the overhead heavier than the time it just took to compute the model, nullifying the gain... so I'm first trying to get a minimalistic version to work. The upper_messages method in this toy example will become either directly invocation of the model (if this computational-heavier step is not blocking incoming connections too much) or an async call to another process actually doing the computations - I'll see about that later...
... after looking better into it, it looks like the application was indeed working as I wanted it to, my error was in the way I tested it...
Indeed, when sending a POST request to the uvicorn server, the client is left waiting for an answer to come - which is intended behaviour. Of course, this also means, however, is that the next request is not sent until the first answer is collected. So the server is not batching them because there's nothing to batch!
To test this correctly, I slightly altered the test.py script to:
from time import time
import requests
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('prefix')
args = parser.parse_args()
texts = [
'text1',
'text2',
'text3',
'text4'
]
texts = [args.prefix + '_' + t for t in texts]
time_start = time()
for text in texts:
result = requests.post('http://127.0.0.1:8000/upper', json={'text': text})
print(result.text, time() - time_start)
And run it in multiple processes via:
python3 test.py user1 & python3 test.py user2 & python3 test.py user3
The output is now as expected, with pairs of messages (from different users!) being processed in a batch (and the exact order is a bit randomized, although the same user gets, of course, answers in the order of the requests it made):
"USER1_TEXT1" 4.340522766113281
"USER3_TEXT1" 4.340718030929565
"USER2_TEXT1" 9.334393978118896
"USER1_TEXT2" 9.340892553329468
"USER3_TEXT2" 14.33926010131836
"USER2_TEXT2" 14.334421396255493
"USER1_TEXT3" 19.339791774749756
"USER3_TEXT3" 19.33999013900757
"USER1_TEXT4" 24.33989715576172
"USER2_TEXT3" 24.334784030914307
"USER3_TEXT4" 29.338693857192993
"USER2_TEXT4" 29.333901166915894
I'm leaving the question open (and not accepting my own answer) because for the "bonus questions" above (about the application becoming frozen) I still don't have an answer.
I am using Celery to asynchronously perform a group of operations. There are a lot of these operations and each may take a long time, so rather than send the results back in the return value of the Celery worker function, I'd like to send them back one at a time as custom state updates. That way the caller can implement a progress bar with a change state callback, and the return value of the worker function can be of constant size rather than linear in the number of operations.
Here is a simple example in which I use the Celery worker function add_pairs_of_numbers to add a list of pairs of numbers, sending back a custom status update for every added pair.
#!/usr/bin/env python
"""
Run worker with:
celery -A tasks worker --loglevel=info
"""
from celery import Celery
app = Celery("tasks", broker="pyamqp://guest#localhost//", backend="rpc://")
#app.task(bind=True)
def add_pairs_of_numbers(self, pairs):
for x, y in pairs:
self.update_state(state="SUM", meta={"x":x, "y":y, "x+y":x+y})
return len(pairs)
def handle_message(message):
if message["status"] == "SUM":
x = message["result"]["x"]
y = message["result"]["y"]
print(f"Message: {x} + {y} = {x+y}")
def non_looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
result = task.get(on_message=handle_message)
print(result)
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
print(task)
while True:
pass
if __name__ == "__main__":
import sys
if sys.argv[1:] and sys.argv[1] == "looping":
looping((3,4), (2,7), (5,5))
else:
non_looping((3,4), (2,7), (5,5))
If you run just ./tasks it executes the non_looping function. This does the standard Celery thing: makes a delayed call to the worker function and then uses get to wait for the result. A handle_message callback function prints each message, and the number of pairs added is returned as the result. This is what I want.
$ ./task.py
Message: 3 + 4 = 7
Message: 2 + 7 = 9
Message: 5 + 5 = 10
3
Though the non-looping scenario is sufficient for this simple example, the real world task I'm trying to accomplish is processing a batch of files instead of adding pairs of numbers. Furthermore the client is a Flask REST API and therefore cannot contain any blocking get calls. In the script above I simulate this constraint with the looping function. This function starts the asynchronous Celery task, but does not wait for a response. (The infinite while loop that follows simulates the web server continuing to run and handle other requests.)
If you run the script with the argument "looping" it runs this code path. Here it immediately prints the Celery task ID then drops into the infinite loop.
$ ./tasks.py looping
a39c54d3-2946-4f4e-a465-4cc3adc6cbe5
The Celery worker logs show that the add operations are performed, but the caller doesn't define a callback function, so it never gets the results.
(I realize that this particular example is embarrassingly parallel, so I could use chunks to divide this up into multiple tasks. However, in my non-simplified real-world case I have tasks that cannot be parallelized.)
What I want is to be able to specify a callback in the looping scenario. Something like this.
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs, callback=handle_message) # There is no such callback.
print(task)
while True:
pass
In the Celery documentation and all the examples I can find online (for example this), there is no way to define a callback function as part of the delay call or its apply_async equivalent. You can only specify one as part of a get callback. That's making me think this is an intentional design decision.
In my REST API scenario I can work around this by having the Celery worker process send a "status update" back to the Flask server in the form of an HTTP post, but this seems weird because I'm starting to replicate messaging logic in HTTP that already exists in Celery.
Is there any way to write my looping scenario so that the caller receives callbacks without making a blocking call, or is that explicitly forbidden in Celery?
It's a pattern that is not supported by celery although you can (somewhat) trick it out by posting custom state updates to your task as described here.
Use update_state() to update a task’s state:.
def upload_files(self, filenames):
for i, file in enumerate(filenames):
if not self.request.called_directly:
self.update_state(state='PROGRESS',
meta={'current': i, 'total': len(filenames)})```
The reason that celery does not support such a pattern is that task producers (callers) are strongly decoupled from the task consumers (workers) with the only communications between the two being the broker to support communication from producers to consumers and the result backend supporting communications from consumers to producers. The closest you can get currently is with polling a task state or writing a custom result backend that will allow you to post events either via AMP RPC or redis subscriptions.
I am trying to execute a time-consuming back-end job, executed by a front-end call. This back-end job should execute a callback method when it is completed, which will release a semaphore. The front end shouldn't have to wait for the long process to finish in order to get a response from the call to kick off the job.
I'm trying to use the Pool class from the multiprocessing library to solve this issue, but I'm running into some issues. Namely that it seems like the only way to actually execute the method passed into apply_async is to call the .get() method in the ApplyResult object that is returned by the apply_async call.
In order to solve this, I thought to create a Process object with the target being apply_result.get. But this doesn't seem to work.
Is there a basic understanding that I'm missing here? What would you folks suggest to solve this issue.
Here is a snippet example of what I have right now:
p = Pool(1)
result = p.apply_async(long_process, args=(config, requester), callback=complete_long_process)
Process(target=result.get).start()
response = {'status': 'success', 'message': 'Job started for {0}'.format(requester)}
return jsonify(response)
Thanks for the help in advance!
I don't quite understand why you would need a Process object here. Look at this snippet:
#!/usr/bin/python
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
from time import sleep
def complete_long_process(foo):
print "completed", foo
def long_process(a,b):
print a,b
sleep(10)
p = Pool(1)
result = p.apply_async(long_process, args=(1, 42),
callback=complete_long_process)
print "submitted"
sleep(20)
If I understand what you are trying to achieve, this does exactly that. As soon as you call apply_async, it launches long_process function and execution of the main program continues. As soon as it completes, complete_long_process is called. There is no need to use get method to execute long_process, and the code does not block and wait anything.
If your long_process does not appear to run, I assume your problem is somewhere within long_process.
Hannu
I have a function get_data(request) that requests some data to a server. Every time this function is called, it request data to a different server. All of them should return the same response.
I would like to get the response as soon as possible. I need to create a function that calls get_data several times, and returns the first response it gets.
EDIT:
I came up with an idea of using multithreading.Pipe(), but I have the feeling this is a very bad way to solve it, what do you think?:
def get_data(request, pipe):
data = # makes the request to a server, this can take a random amount of time
pipe.send(data)
def multiple_requests(request, num_servers):
my_pipe, his_pipe = multithreading.Pipe()
for i in range(num_servers):
Thread(target = get_data, args = (request,his_pipe)).start()
return my_pipe.recv()
multiple_requests("the_request_string", 6)
I think this is a bad way of doing it because you are passing the same pipe to all threads, and I don't really know but I guess that has to be very unsafe.
I think redis rq will be good for it. get_data is a job what you put in the queue six times. Jobs executes async, in the docs your also can read how to operate with results.
I want to move to ndb, and have been wondering whether to use async urlfetch tasklets. I'm not sure I fully understand how it works, as the documentation is somewhat poor, but it seems quite promising for this particular use case.
Currently I use async urlfetch like this. It is far from actual threading or parallel code, but it has still improved performance quite significantly, compared to just sequential requests.
def http_get(url):
rpc = urlfetch.create_rpc(deadline=3)
urlfetch.make_fetch_call(rpc,url)
return rpc
rpcs = []
urls = [...] # hundreds of urls
while rpcs < 10:
rpcs.append(http_get(urls.pop()))
while rpcs:
rpc = rpcs.pop(0)
result = rpc.get_result()
if result.status_code == 200:
# append another item to rpcs
# process result
else:
# re-append same item to rpcs
Please note that this code is simplified. The actual code catches exceptions, has some additional checks, and only tries to re-append the same item a few times. It makes no difference for this case.
I should add that processing the result does not involve any db operations.
Actually yes, it's a good idea to use async urlfetch here. How it's working (rough explanation):
- your code reach the point of async call. It triggers long background task and doesn't wait for it's result, but continue to execute.
- task works in background, and when result is ready — it stores result somwhere, until you ask for it.
Simple example:
def get_fetch_all():
urls = ["http://www.example.com/", "http://mirror.example.com/"]
ctx = ndb.get_context()
futures = [ctx.urlfetch(url) for url in urls]
results = ndb.Future.wait_all(futures)
# do something with results here
If you want to store result in ndb and make it more optimal — it's good idea to write custom tasklet for this.
#ndb.tasklet
def get_data_and_store(url):
ctx = ndb.get_context()
# until we don't receive result here, this function is "paused", allowing other
# parallel tasks to work. when data will be fetched, control will be returned
result = yield ctx.urlfetch("http://www.google.com/")
if result.status_code == 200:
store = Storage(data=result.content)
# async job to put data
yield store.put_async()
raise ndb.Return(True)
else:
raise ndb.Return(False)
And you can use this tasklet combined with loop in first sample. You should get list of ther/false values, indicating success of fetch.
I'm not sure, how much this will boost overall productivity (it depends on google side), but it should.