How to maintain multiple run_forever handlers simultaneously? - python

Imagine you have a background processing daemon which could be controlled by a web interface.
So, the app is an object with some methods responsible for handling requests and one special method that needs to be called repeatedly from time to time, regardless of requests state.
When using aiohttp, the web part is fairly straightforward: you just instantiate an application instance, and set things up as per aiohttp.web.run_app source. Everything is clear. Now let's say your app instance has that special method, call it app.process, which is structured like so:
async def process(self):
while self._is_running:
await self._process_single_job()
With this approach you could call loop.run_until_complete(app.process()), but it blocks, obviously, and thus leaves no opportunity to set up the web part. Surely, I could have split these two responsibilities into separate processes and establish their communication by means of a database, but that would complicate things, so I would prefer to avoid this way if at all possible.
So, how do I make the event loop call some method repeatedly while still running a web app?

You have to schedule the execution of app.process() as a task using loop.create_task:
import asyncio
from aiohttp import web
class MyApp(web.Application):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.process_task = self.loop.create_task(self.process())
self.on_shutdown.append(lambda app: app.process_task.cancel())
async def process(self):
while True:
print(await asyncio.sleep(1, result='ping'))
if __name__ == '__main__':
web.run_app(MyApp())

Related

Why child threads cannot access the current_user variable in flask_login?

I am writing a Flask application and I am trying to insert a multi-threaded implementation for certain server related features. I noticed this weird behavior so I wanted to understand why is it happening and how to solve it. I have the following code:
from flask_login import current_user, login_required
import threading
posts = Blueprint('posts', __name__)
#posts.route("/foo")
#login_required
def foo():
print(current_user)
thread = threading.Thread(target=goo)
thread.start()
thread.join()
return
def goo():
print(current_user)
# ...
The main process correctly prints the current_user, while the child thread prints None.
User('Username1', 'email1#email.com', 'Username1-ProfilePic.jpg')
None
Why is it happening? How can I manage to obtain the current_user also in the child process? I tried passing it as argument of goo but I still get the same behavior.
I found this post but I can't understand how to ensure the context is not changing in this situation, so I tried providing a simpler example.
A partially working workaround
I tried passing as parameter also a newly created object User populated with the data from current_user
def foo():
# ...
user = User.query.filter_by(username=current_user.username).first_or_404()
thread = threading.Thread(target=goo, args=[user])
# ...
def goo(user):
print(user)
# ...
And it correctly prints the information of the current user. But since inside goo I am also performing database operations I get the following error:
RuntimeError: No application found. Either work inside a view function
or push an application context. See
http://flask-sqlalchemy.pocoo.org/contexts/.
So as I suspected I assume it's a problem of context.
I tried also inserting this inside goo as suggested by the error:
def goo():
from myapp import create_app
app = create_app()
app.app_context().push()
# ... database access
But I still get the same errors and if I try to print current_user I get None.
How can I pass the old context to the new thread? Or should I create a new one?
This is because Flask uses thread local variables to store this for each request's thread. That simplifies in many cases, but makes it hard to use multiple threads. See https://flask.palletsprojects.com/en/1.1.x/design/#thread-local.
If you want to use multiple threads to handle a single request, Flask might not be the best choice. You can always interact with Flask exclusively on the initial thread if you want and then forward anything you need on other threads back and forth yourself through a shared object of some kind. For database access on secondary threads, you can use a thread-safe database library with multiple threads as long as Flask isn't involved in its usage.
In summary, treat Flask as single threaded. Any extra threads shouldn't interact directly with Flask to avoid problems. You can also consider either not using threads at all and run everything sequentially or trying e.g. Tornado and asyncio for easier concurrency with coroutines depending on the needs.
your server serves multiple users, wich are threads by themself.
flask_login was not designed for extra threading in it, thats why child thread prints None.
i suggest u to use db for transmit variables from users and run addition docker container if you need separate process.
That is because current_user is implement as a local safe resource:
https://github.com/maxcountryman/flask-login/blob/main/flask_login/utils.py#L26
Read:
https://werkzeug.palletsprojects.com/en/1.0.x/local/#module-werkzeug.local

How to use asyncio with django rest api framework just the way jquery promises work so that responses don't need to wait

I am on python 3.7 with django 2.2.3 running. I want a solution with asyncio so that the api can just call the async function & return the response without waiting just the way we do things with the jquery promises. the definition my_coro is just for example. I will be running moviepy functions that usually require 40-50 seconds to complete. I don't want the api to wait that long for sending out the response. I am also confused on how to handle the thread pool. how to use thread pool here? because I intend to make the moviepy iterations fast as well. So how to create a pool for handling my_coro calls?
async def my_coro(n):
print(f"The answer is {n}.")
async def main():
await asyncio.gather(my_coro(1),my_coro(2),my_coro(3),my_coro(4))
class SaveSnaps(APIView):
def post(self, request, format = None):
if request.user.is_anonymous:
return Response({"response":"FORBIDDEN"}, status = 403)
else:
try:
asyncio.run(main())
return Response({"response": "success"}, status = 200)
except Exception as e:
return Response({'response':str(e)}, status = 400)
Update :
I tried using celery. but as I won't be using periodic tasks & actually the method I need to make asynchronous receives a blob array as parameter. celery's task.delay is giving me an error because tasks expect serializable params. So I am back to square one on this. I am not sure whether I should stick to the threading solution or something else.
Update : I forgot to share what I did at last. I shifted to celery. but since celery's task.delay expect serialized params, I moved the blob saving part to a synchronous method which after completion, hands over the moviepy tasks to the celery run task.
As far as I know, asyncio is going to process your code concurrently, but your application is still going to wait for the execution of asyncio to be done, then it continues execution inside your app.
If you want your app to run the code in the background and continue execution, similar to Promises in JS, you need to consider using Job Scheduling using Celery for example, or something similar.
This is a simple Guide for using Django with Celery.
I've been struggling with this issue, and this is what I've found.
The compatibility issues between DRF and asyncio are mostly at the Router layer of DRF.
If you override the as_view in a ViewSet you will run into errors where the Router is expecting either a callable or a list of callables.
This means there are 2 categories of solutions:
Skip / Customize DRF routers.
Wrap the routers URLPattern.
This is what I've had the most success with so far, perhaps others can provide improvements on it.
router = OptionalSlashRouter()
router.register( ... )
def _async_wrapper(func):
async def wrapped(*args, **kwargs):
loop = asyncio.get_event_loop()
kwargs['loop'] = loop
return await sync_to_async(func)(*args, **kwargs)
return wrapped
urlpatterns = [
url(urlpattern.pattern,
_async_wrapper(urlpattern.callback),
name=urlpattern.name)
for urlpattern in router.urls
]
I previously tried just wrapping the callables in the sync_to_async function, but this didn't seem to allow the views themselves to get the event loop.
However, with this pattern, I was able to call loop.create_task(some_coroutine) from inside a drf viewset.
There are a lot of downsides to this pattern, of course. For one thing, you have to pass the event loop around if you want to use it.
I may end up just removing the drf routers entirely. But this is obviously a drastic step in most projects as there are often a lot of urls managed by DRF routers.

How to do parallel execution inside flask view function

In my python-flask web app runs at port 5001, I have the scenario to create an endpoint where all other endpoint view functions need to be executed parallel and followed by aggregate all the individual responses to return it on same request life cycle.
For example,
routes in flask app contains, the following view functions
#app.route(/amazon)
def amazon():
return "amazon"
#app.route(/flipkart)
def flipkart():
return "flipkart"
#app.route(/snapdeal)
def sd():
return "snapdeal"
Note: In the above three endpoints, significant amount network io involved
I am creating another endpoint, where all other endpoint implementations has to be called even here collectively.
### This is my endpoint
#app.route(/all)
def do_all():
# execute all amazon, flipkart, snapdeal implementations
I am suggesting two approaches for this above scenario.
Approach-1 (Multiprocessing approach):
Writing the worker task as separate function, calling each callers via python-multiprocessing module and collects the response
def do_all():
def worker(name):
# doing network io task for the given source name
pass
for name in ['amazon', 'flipkart', 'snapdeal']:
p = multiprocessing.Process(target=worker, args=(name,))
jobs.append(p)
p.start()
# Terminating all the process explicitly, Since freezing after execution complete
for j in jobs:
j.terminate()
return 200
Here I am invoking each child process, to call the worker, finally all child process gets terminated explicitly, since it is also wsgi threads i presume.
Approach-2(grequests):
Calling each endpoint explicitly using python-grequests. So, each endpoint which resides in the same app will be called parallel, and collects the response
def do_all():
grequests.post("http://localhost:5001/amazon", data={})
grequests.post("http://localhost:5001/flipkart", data={})
grequests.post("http://localhost:5001/snapdeal", data={})
This will get executed via each wsgi threads will be spawned for each request, Here I have no idea about multiple process spawned and not terminating after execution ?
Both could be similar one, but Which one would be seamless to implement, Please assist me if there is any alternative way to solve this scenario? Why?
Maybe you could simplify it, using another approach:
Return immediate response to the User
def do_all():
return 'amazon, ... being processed'
Invoke all relevant methods in the background.
Let the invoked background methods send signals
from blinker import Namespace
background = Namespace()
amazon_finished = background.signal('amazon-finished')
def amazon_bg():
amazon_finished.send(**interesting)
Subscribe to the signals from the background workers.
def do_all():
amazon_finished.connect(amazon_logic)
Display the return values of the background workers to the user. This would be a new request (maybe in the same route).
def amazon_logic(sender, **extra):
sender
Advantages would be that the User has on each request immediate response to the status of the background workers, error handling would be much easier.
I have not tested it, so you should look up the blinker API for yourself.

where to cache stuff for RequestHandler?

I intend to use tornado to write a handler that implement an autocomplete service, demonstrated as below:
class AutoCompleteHandler(tornado.web.RequestHandler):
def initialize(self, indexbuilder):
self.indexbuilder = indexbuilder
self.index = indexbuilder.build_merged_index()
def get(self):
query = self.get_argument('q')
result = self.index[query]
self.set_header("Content-Type", 'application/json;')
self.set_header('charset', "utf-8")
self.write(json.dumps(result))
The facts and requirements
the wordtreebuilder.build_merged_index is an extremely slow method and is planned to run every 24h to refresh the index list.
According to tornado documentation, A new RequestHandler object is created on each request. So having the treebuilder as an instance attribute won't work for me.
The problem
In short, how can I do this right?
Where should I cache the index, but still having the non-blocking feature of tornado? (I guess) I could put indexbuilder under the same module as the AutoCompleteHandler and build the index to a global variable, and spawn a separate thread to do the refreshing task, but that doesn't look right to me and I think the work could be done using tornado and make the structure much more elegant.
A global variable sounds like the best/simplest solution to me. You can also attach it to the Application object (which is accessible as self.application within the RequestHandler if you'd prefer to avoid the global. Or you could cache it on the indexbuilder itself, or pass some cache object in the initialize dict.
In any case you probably want to do the refresh in a separate thread to avoid blocking the IOLoop while it is running.

Bottle: execute a long running function asynchronously and send an early response to the client?

The Bottle app (behind CherryPy) that I'm working on receives a request for a resource from an HTTP client which results in an execution of a task that can take a few hours to finish. I'd like to send an early HTTP response (e.g., 202 Accepted) and continue processing the task. Is there a way to achieve this without using MQ libraries and using Python/Bottle alone?
For example:
from bottle import HTTPResponse
#route('/task')
def f():
longRunningTask() # <-- Anyway to make this asynchronous?
return bottle.HTTPResponse(status=202)
I know this question is several years old but I found #ahmed's answer so unbelievably unhelpful that I thought I would at least share how I solved this problem in my application.
All I did was make use of Python's existing threading libraries, as below:
from bottle import HTTPResponse
from threading import Thread
#route('/task')
def f():
task_thread = Thread(target=longRunningTask) # create a thread that will execute your longRunningTask() function
task_thread.setDaemon(True) # setDaemon to True so it terminates when the function returns
task_thread.start() # launch the thread
return bottle.HTTPResponse(status=202)
Using threads allows you to maintain a consistent response time while still having relatively complex or time-consuming functions.
I used uWSGI, so do make sure you enable threading in your uWSGI application config if that's the way you went.

Categories

Resources