I am experimenting with several of GAE's features.
I 've built a Dynamic Backend but I am having several issues getting this thing to work without task queues
Backend code:
class StartHandler(webapp2.RequestHandler):
def get(self):
#... do stuff...
if __name__ == '__main__':
_handlers = [(r'/_ah/start', StartHandler)]
run_wsgi_app(webapp2.WSGIApplication(_handlers))
The Backend is dynamic. So whenever it receives a call it does it's stuff and then stops.
Everything is worikng fine when I use inside my handlers:
url = backends.get_url('worker') + '/_ah/start'
urlfetch.fetch(url)
But I want this call to be async due to the reason that the Backend might take up to 10 minutes to finish it's work.
So I changed the above code to:
url = backends.get_url('worker') + '/_ah/start'
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
But then the Backend does not start. I am not interested into completion of the request or getting any data out of it.
What am I missing - implementing wrong?
Thank you all
Using RPC for async call without calling get_result() on the rpc object will not grantee that the urlfetch will be called. Once your code exits the pending async calls that were not completed will be aborted.
The only way to make the handler async is to queue the code in a push queue.
Related
I want to create a third party chatbot API which is asynchronous and replies "ok" after 10 seconds pause.
import time
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
return wait()
I have tried celery for the same as follows where I am waiting for celery response in view itself:
import time
from celery import shared_task
#shared_task
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
a = wait.delay()
work = AsyncResult(a.id)
while True:
if work.ready():
return work.get(timeout=1)
But this solution works synchronously and makes no difference. How can we make it asynchronous without asking our user to keep on requesting until the result is received?
As mentioned in #Blusky's answer:
The asynchronous API will exist in django 3.X. not before.
If this is not an option, then the answer is just no.
Please note as well, that even with django 3.X any django code, that accesses the database will not be asynchronous it had to be executed in a thread (thread pool)
Celery is intended for background tasks or deferred tasks, but celery will never return an HTTP response as it didn't receive the HTTP request to which it should respond to. Celery is also not asyncio friendly.
You might have to think of changing your architecture / implementation.
Look at your overall problem and ask yourself whether you really need an asynchronous API with Django.
Is this API intended for browser applications or for machine to machine applications?
Could your client's use web sockets and wait for the answer?
Could you separate blocking and non blocking parts on your server side?
Use django for everything non blocking, for everything periodic / deferred (django + celelry) and implement the asynchronous part with web server plugins or python ASGI code or web sockets.
Some ideas
Use Django + nginx nchan (if your web server is nginx)
Link to nchan: https://nchan.io/
your API call would create a task id, start a celery task, return immediately the task id or a polling url.
The polling URL would be handled for example via an nchan long polling channel.
your client connects to the url corresponding to an nchan long polling channel and celery deblocks it whenever you're task is finished (the 10s are over)
Use Django + an ASGI server + one handcoded view and use strategy similiar to nginx nchan
Same logic as above, but you don't use nginx nchan, but implement it yourself
Use an ASGI server + a non blocking framework (or just some hand coded ASGI views) for all blocking urls and Django for the rest.
They might exchange data via the data base, local files or via local http requests.
Just stay blocking and throw enough worker processes / threads at your server
This is probably the worst suggestion, but if it is just for personal use,
and you know how many requests you will have in parallel then just make sure you have enough Django workers, so that you can afford to be blocking.
I this case you would block an entire Django worker for each slow request.
Use websockets. e.g. with the channels module for Django
Websockets can be implemented with earlier versions of django (>= 2.2) with the django channels module (pip install channels) ( https://github.com/django/channels )
You need an ASGI server to server the asynchronous part. You could use for example Daphne ot uvicorn (The channel doc explains this rather well)
Addendum 2020-06-01: simple async example calling synchronous django code
Following code uses the starlette module as it seems quite simple and small
miniasyncio.py
import asyncio
import concurrent.futures
import os
import django
from starlette.applications import Starlette
from starlette.responses import Response
from starlette.routing import Route
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'pjt.settings')
django.setup()
from django_app.xxx import synchronous_func1
from django_app.xxx import synchronous_func2
executor = concurrent.futures.ThreadPoolExecutor(max_workers=2)
async def simple_slow(request):
""" simple function, that sleeps in an async matter """
await asyncio.sleep(5)
return Response('hello world')
async def call_slow_dj_funcs(request):
""" slow django code will be called in a thread pool """
loop = asyncio.get_running_loop()
future_func1 = executor.submit(synchronous_func1)
func1_result = future_func1.result()
future_func2 = executor.submit(synchronous_func2)
func2_result = future_func2.result()
response_txt = "OK"
return Response(response_txt, media_type="text/plain")
routes = [
Route("/simple", endpoint=simple_slow),
Route("/slow_dj_funcs", endpoint=call_slow_dj_funcs),
]
app = Starlette(debug=True, routes=routes)
you could for example run this code with
pip install uvicorn
uvicorn --port 8002 miniasyncio:app
then on your web server route these specific urls to uvicorn and not to your django application server.
The best option is to use the futur async API, which will be proposed on Django in 3.1 release (which is already available in alpha)
https://docs.djangoproject.com/en/dev/releases/3.1/#asynchronous-views-and-middleware-support
(however, you will need to use an ASGI Web Worker to make it work properly)
Checkout Django 3 ASGI (Asynchronous Server Gateway Interface) support:
https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/
I have a web endpoint for users to upload file.
When the endpoint receives the request, I want to run a background job to process the file.
Since the job would take time to complete, I wish to return the job_id to the user to track the status of the request while the job is running in background.
I am wondering if asyncio would help in this case.
import asyncio
#asyncio.coroutine
def process_file(job_id, file_obj):
<process the file and dump results in db>
#app.route('/file-upload', methods=['POST'])
def upload_file():
job_id = uuid()
process_file(job_id, requests.files['file']) . # I want this call to be asyc without any await
return jsonify('message' : 'Request received. Track the status using: " + `job_id`)
With the above code, process_file method is never called. Not able to understand why.
I am not sure if this is the right way to do it though, please help if I am missing something.
Flask doesn't support async calls yet.
To create and execute heavy tasks in background you can use https://flask.palletsprojects.com/en/1.1.x/patterns/celery/ Celery library.
You can use this for reference:
Making an asynchronous task in Flask
Official documentation:
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#installing-celery
Even though you wrote #asyncio.coroutine() around a function it is never awaited which tells a function to return result.
Asyncio is not good for such kind of tasks, because they are blocking I/O. It is usually used to make function calls and return results fast.
As #py_dude mentioned, Flask does not support async calls. If you are looking for a library that functions and feels similar to Flask but is asynchronous, I recommend checking out Sanic. Here is some sample code:
from sanic import Sanic
from sanic.response import json
app = Sanic()
#app.route("/")
async def test(request):
return json({"hello": "world"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
Updating your database asynchronously shouldn't be an issue; refer to here to find asyncio-supported database drivers. For processing your file, check out aiohttp. You can run your server extremely fast on a single thread without any hickup if you do so asynchronously.
I am working on an application which is based on GAE with python 2.7.13. What I want to do is that to make a bunch of async API calls inside a handler. Something like that:
class MakeRequests(webapp2.RequestHandler):
def post(self, *v, **kv):
*do an async api call#1*
*do an async api call#2*
*do an async api call#3*
*wait for response from all of above api requests*
*make response in a way like if call#1 failes, make it's expected*
*attributes in response as None, if call#2 succeeds add it's*
*attributes in response etc. This is just an example.*
For that purpose, I have tried libraries like asyncio, grequests, requests and simple-requests, they don't seems to be working because either they are not compatible with with GAE or with python 2.7.13.
Can anyone help me here?
Urlfetch, which is bundled by default with GAE has a way of making asynchronous calls:
from google.appengine.api import urlfetch
def post(self, *v, **kv):
rpcs = []
for url in urls:
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)
results = [rpc.get_result() for rpc in rpcs]
# do stuff with results
If, for some reason you don't want to use urlfetch you can parallelize the requests manually by using threading and a synchronized Queue to read the results.
I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]
Since nobody provided a solution to this post plus the fact that I desperately need a workaround, here is my situation and some abstract solutions/ideas for debate.
My stack:
Tornado
Celery
MongoDB
Redis
RabbitMQ
My problem: Find a way for Tornado to dispatch a celery task ( solved ) and then asynchronously gather the result ( any ideas? ).
Scenario 1: (request/response hack plus webhook)
Tornado receives a (user)request, then saves in local memory (or in Redis) a { jobID : (user)request} to remember where to propagate the response, and fires a celery task with jobID
When celery completes the task, it performs a webhook at some url and tells tornado that this jobID has finished ( plus the results )
Tornado retrieves the (user)request and forwards a response to the (user)
Can this happen? Does it have any logic?
Scenario 2: (tornado plus long-polling)
Tornado dispatches the celery task and returns some primary json data to the client (jQuery)
jQuery does some long-polling upon receipt of the primary json, say, every x microseconds, and tornado replies according to some database flag. When the celery task completes, this database flag is set to True, then jQuery "loop" is finished.
Is this efficient?
Any other ideas/schemas?
My solution involves polling from tornado to celery:
class CeleryHandler(tornado.web.RequestHandlerr):
#tornado.web.asynchronous
def get(self):
task = yourCeleryTask.delay(**kwargs)
def check_celery_task():
if task.ready():
self.write({'success':True} )
self.set_header("Content-Type", "application/json")
self.finish()
else:
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
Here is post about it.
Here is our solution to the problem. Since we look for result in several handlers in our application we made the celery lookup a mixin class.
This also makes code more readable with the tornado.gen pattern.
from functools import partial
class CeleryResultMixin(object):
"""
Adds a callback function which could wait for the result asynchronously
"""
def wait_for_result(self, task, callback):
if task.ready():
callback(task.result)
else:
# TODO: Is this going to be too demanding on the result backend ?
# Probably there should be a timeout before each add_callback
tornado.ioloop.IOLoop.instance().add_callback(
partial(self.wait_for_result, task, callback)
)
class ARemoteTaskHandler(CeleryResultMixin, tornado.web.RequestHandler):
"""Execute a task asynchronously over a celery worker.
Wait for the result without blocking
When the result is available send it back
"""
#tornado.web.asynchronous
#tornado.web.authenticated
#tornado.gen.engine
def post(self):
"""Test the provided Magento connection
"""
task = expensive_task.delay(
self.get_argument('somearg'),
)
result = yield tornado.gen.Task(self.wait_for_result, task)
self.write({
'success': True,
'result': result.some_value
})
self.finish()
I stumbled upon this question and hitting the results backend repeatedly did not look optimal to me. So I implemented a Mixin similar to your Scenario 1 using Unix Sockets.
It notifies Tornado as soon as the task finishes (to be accurate, as soon as next task in chain runs) and only hits results backend once. Here is the link.
Now, https://github.com/mher/tornado-celery comes to rescue...
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()