In my python-flask web app runs at port 5001, I have the scenario to create an endpoint where all other endpoint view functions need to be executed parallel and followed by aggregate all the individual responses to return it on same request life cycle.
For example,
routes in flask app contains, the following view functions
#app.route(/amazon)
def amazon():
return "amazon"
#app.route(/flipkart)
def flipkart():
return "flipkart"
#app.route(/snapdeal)
def sd():
return "snapdeal"
Note: In the above three endpoints, significant amount network io involved
I am creating another endpoint, where all other endpoint implementations has to be called even here collectively.
### This is my endpoint
#app.route(/all)
def do_all():
# execute all amazon, flipkart, snapdeal implementations
I am suggesting two approaches for this above scenario.
Approach-1 (Multiprocessing approach):
Writing the worker task as separate function, calling each callers via python-multiprocessing module and collects the response
def do_all():
def worker(name):
# doing network io task for the given source name
pass
for name in ['amazon', 'flipkart', 'snapdeal']:
p = multiprocessing.Process(target=worker, args=(name,))
jobs.append(p)
p.start()
# Terminating all the process explicitly, Since freezing after execution complete
for j in jobs:
j.terminate()
return 200
Here I am invoking each child process, to call the worker, finally all child process gets terminated explicitly, since it is also wsgi threads i presume.
Approach-2(grequests):
Calling each endpoint explicitly using python-grequests. So, each endpoint which resides in the same app will be called parallel, and collects the response
def do_all():
grequests.post("http://localhost:5001/amazon", data={})
grequests.post("http://localhost:5001/flipkart", data={})
grequests.post("http://localhost:5001/snapdeal", data={})
This will get executed via each wsgi threads will be spawned for each request, Here I have no idea about multiple process spawned and not terminating after execution ?
Both could be similar one, but Which one would be seamless to implement, Please assist me if there is any alternative way to solve this scenario? Why?
Maybe you could simplify it, using another approach:
Return immediate response to the User
def do_all():
return 'amazon, ... being processed'
Invoke all relevant methods in the background.
Let the invoked background methods send signals
from blinker import Namespace
background = Namespace()
amazon_finished = background.signal('amazon-finished')
def amazon_bg():
amazon_finished.send(**interesting)
Subscribe to the signals from the background workers.
def do_all():
amazon_finished.connect(amazon_logic)
Display the return values of the background workers to the user. This would be a new request (maybe in the same route).
def amazon_logic(sender, **extra):
sender
Advantages would be that the User has on each request immediate response to the status of the background workers, error handling would be much easier.
I have not tested it, so you should look up the blinker API for yourself.
Related
I am writing a Flask application and I am trying to insert a multi-threaded implementation for certain server related features. I noticed this weird behavior so I wanted to understand why is it happening and how to solve it. I have the following code:
from flask_login import current_user, login_required
import threading
posts = Blueprint('posts', __name__)
#posts.route("/foo")
#login_required
def foo():
print(current_user)
thread = threading.Thread(target=goo)
thread.start()
thread.join()
return
def goo():
print(current_user)
# ...
The main process correctly prints the current_user, while the child thread prints None.
User('Username1', 'email1#email.com', 'Username1-ProfilePic.jpg')
None
Why is it happening? How can I manage to obtain the current_user also in the child process? I tried passing it as argument of goo but I still get the same behavior.
I found this post but I can't understand how to ensure the context is not changing in this situation, so I tried providing a simpler example.
A partially working workaround
I tried passing as parameter also a newly created object User populated with the data from current_user
def foo():
# ...
user = User.query.filter_by(username=current_user.username).first_or_404()
thread = threading.Thread(target=goo, args=[user])
# ...
def goo(user):
print(user)
# ...
And it correctly prints the information of the current user. But since inside goo I am also performing database operations I get the following error:
RuntimeError: No application found. Either work inside a view function
or push an application context. See
http://flask-sqlalchemy.pocoo.org/contexts/.
So as I suspected I assume it's a problem of context.
I tried also inserting this inside goo as suggested by the error:
def goo():
from myapp import create_app
app = create_app()
app.app_context().push()
# ... database access
But I still get the same errors and if I try to print current_user I get None.
How can I pass the old context to the new thread? Or should I create a new one?
This is because Flask uses thread local variables to store this for each request's thread. That simplifies in many cases, but makes it hard to use multiple threads. See https://flask.palletsprojects.com/en/1.1.x/design/#thread-local.
If you want to use multiple threads to handle a single request, Flask might not be the best choice. You can always interact with Flask exclusively on the initial thread if you want and then forward anything you need on other threads back and forth yourself through a shared object of some kind. For database access on secondary threads, you can use a thread-safe database library with multiple threads as long as Flask isn't involved in its usage.
In summary, treat Flask as single threaded. Any extra threads shouldn't interact directly with Flask to avoid problems. You can also consider either not using threads at all and run everything sequentially or trying e.g. Tornado and asyncio for easier concurrency with coroutines depending on the needs.
your server serves multiple users, wich are threads by themself.
flask_login was not designed for extra threading in it, thats why child thread prints None.
i suggest u to use db for transmit variables from users and run addition docker container if you need separate process.
That is because current_user is implement as a local safe resource:
https://github.com/maxcountryman/flask-login/blob/main/flask_login/utils.py#L26
Read:
https://werkzeug.palletsprojects.com/en/1.0.x/local/#module-werkzeug.local
I am on python 3.7 with django 2.2.3 running. I want a solution with asyncio so that the api can just call the async function & return the response without waiting just the way we do things with the jquery promises. the definition my_coro is just for example. I will be running moviepy functions that usually require 40-50 seconds to complete. I don't want the api to wait that long for sending out the response. I am also confused on how to handle the thread pool. how to use thread pool here? because I intend to make the moviepy iterations fast as well. So how to create a pool for handling my_coro calls?
async def my_coro(n):
print(f"The answer is {n}.")
async def main():
await asyncio.gather(my_coro(1),my_coro(2),my_coro(3),my_coro(4))
class SaveSnaps(APIView):
def post(self, request, format = None):
if request.user.is_anonymous:
return Response({"response":"FORBIDDEN"}, status = 403)
else:
try:
asyncio.run(main())
return Response({"response": "success"}, status = 200)
except Exception as e:
return Response({'response':str(e)}, status = 400)
Update :
I tried using celery. but as I won't be using periodic tasks & actually the method I need to make asynchronous receives a blob array as parameter. celery's task.delay is giving me an error because tasks expect serializable params. So I am back to square one on this. I am not sure whether I should stick to the threading solution or something else.
Update : I forgot to share what I did at last. I shifted to celery. but since celery's task.delay expect serialized params, I moved the blob saving part to a synchronous method which after completion, hands over the moviepy tasks to the celery run task.
As far as I know, asyncio is going to process your code concurrently, but your application is still going to wait for the execution of asyncio to be done, then it continues execution inside your app.
If you want your app to run the code in the background and continue execution, similar to Promises in JS, you need to consider using Job Scheduling using Celery for example, or something similar.
This is a simple Guide for using Django with Celery.
I've been struggling with this issue, and this is what I've found.
The compatibility issues between DRF and asyncio are mostly at the Router layer of DRF.
If you override the as_view in a ViewSet you will run into errors where the Router is expecting either a callable or a list of callables.
This means there are 2 categories of solutions:
Skip / Customize DRF routers.
Wrap the routers URLPattern.
This is what I've had the most success with so far, perhaps others can provide improvements on it.
router = OptionalSlashRouter()
router.register( ... )
def _async_wrapper(func):
async def wrapped(*args, **kwargs):
loop = asyncio.get_event_loop()
kwargs['loop'] = loop
return await sync_to_async(func)(*args, **kwargs)
return wrapped
urlpatterns = [
url(urlpattern.pattern,
_async_wrapper(urlpattern.callback),
name=urlpattern.name)
for urlpattern in router.urls
]
I previously tried just wrapping the callables in the sync_to_async function, but this didn't seem to allow the views themselves to get the event loop.
However, with this pattern, I was able to call loop.create_task(some_coroutine) from inside a drf viewset.
There are a lot of downsides to this pattern, of course. For one thing, you have to pass the event loop around if you want to use it.
I may end up just removing the drf routers entirely. But this is obviously a drastic step in most projects as there are often a lot of urls managed by DRF routers.
Imagine you have a background processing daemon which could be controlled by a web interface.
So, the app is an object with some methods responsible for handling requests and one special method that needs to be called repeatedly from time to time, regardless of requests state.
When using aiohttp, the web part is fairly straightforward: you just instantiate an application instance, and set things up as per aiohttp.web.run_app source. Everything is clear. Now let's say your app instance has that special method, call it app.process, which is structured like so:
async def process(self):
while self._is_running:
await self._process_single_job()
With this approach you could call loop.run_until_complete(app.process()), but it blocks, obviously, and thus leaves no opportunity to set up the web part. Surely, I could have split these two responsibilities into separate processes and establish their communication by means of a database, but that would complicate things, so I would prefer to avoid this way if at all possible.
So, how do I make the event loop call some method repeatedly while still running a web app?
You have to schedule the execution of app.process() as a task using loop.create_task:
import asyncio
from aiohttp import web
class MyApp(web.Application):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.process_task = self.loop.create_task(self.process())
self.on_shutdown.append(lambda app: app.process_task.cancel())
async def process(self):
while True:
print(await asyncio.sleep(1, result='ping'))
if __name__ == '__main__':
web.run_app(MyApp())
The Bottle app (behind CherryPy) that I'm working on receives a request for a resource from an HTTP client which results in an execution of a task that can take a few hours to finish. I'd like to send an early HTTP response (e.g., 202 Accepted) and continue processing the task. Is there a way to achieve this without using MQ libraries and using Python/Bottle alone?
For example:
from bottle import HTTPResponse
#route('/task')
def f():
longRunningTask() # <-- Anyway to make this asynchronous?
return bottle.HTTPResponse(status=202)
I know this question is several years old but I found #ahmed's answer so unbelievably unhelpful that I thought I would at least share how I solved this problem in my application.
All I did was make use of Python's existing threading libraries, as below:
from bottle import HTTPResponse
from threading import Thread
#route('/task')
def f():
task_thread = Thread(target=longRunningTask) # create a thread that will execute your longRunningTask() function
task_thread.setDaemon(True) # setDaemon to True so it terminates when the function returns
task_thread.start() # launch the thread
return bottle.HTTPResponse(status=202)
Using threads allows you to maintain a consistent response time while still having relatively complex or time-consuming functions.
I used uWSGI, so do make sure you enable threading in your uWSGI application config if that's the way you went.
Is there any way in celery by which if a task execution fails I can automatically put it into another queue.
For example it the task is running in a queue x, on exception enqueue it to another queue named error_x
Edit:
Currently I am using celery==3.0.13 along with django 1.4, Rabbitmq as broker.
Some times the task fails. Is there a way in celery to add messages to an error queue and process it later.
The problem when celery task fails is that I don't have access to the message queue name. So I can't use self.retry retry to put it to a different error queue.
Well, you cannot use the retry mechanism if you want to route the task to another queue. From the docs:
retry() can be used to re-execute the task, for example in the event
of recoverable errors.
When you call retry it will send a new message, using the same
task-id, and it will take care to make sure the message is delivered
to the same queue as the originating task.
You'll have to relaunch yourself and route it manually to your wanted queue in the event of any exception raised. It seems a good job for error callbacks.
The main issue is that we need to get the task name in the error callback to be able to launch it. Also we may not want to add the callback each time we launch a task. Thus a decorator would be a good way to automatically add the right callback.
from functools import partial, wraps
import celery
#celery.shared_task
def error_callback(task_id, task_name, retry_queue, retry_routing_key):
# We must retrieve the task object itself.
# `tasks` is a dict of 'task_name': celery_task_object
task = celery.current_app.tasks[task_name]
# Re launch the task in specified queue.
task.apply_async(queue=retry_queue, routing_key=retry_routing_key)
def retrying_task(retry_queue, retry_routing_key):
"""Decorates function to automatically add error callbacks."""
def retrying_decorator(func):
#celery.shared_task
#wraps(func) # just to keep the original task name
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
# Monkey patch the apply_async method to add the callback.
wrapper.apply_async = partial(
wrapper.apply_async,
link_error=error_callback.s(wrapper.name, retry_queue, retry_routing_key)
)
return wrapper
return retrying_decorator
# Usage:
#retrying_task(retry_queue='another_queue', retry_routing_key='another_routing_key')
def failing_task():
print 'Hi, I will fail!'
raise Exception("I'm failing!")
failing_task.apply_async()
You can adjust the decorator to pass whatever parameters you need.
I had a similar problem and i solved it may be not in a most efficient way but however my solution is as follows:
I have created a django model to keep all my celery task-ids and that is capable of checking the task state.
Then i have created another celery task that is running in an infinite cycle and checks all tasks that are 'RUNNING' on their actual state and if the state is 'FAILED' it just reruns it. Im not actually changing the queue for the task which i rerun but i think you can implement some custom logic to decide where to put every task you rerun this way.