I want to create a third party chatbot API which is asynchronous and replies "ok" after 10 seconds pause.
import time
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
return wait()
I have tried celery for the same as follows where I am waiting for celery response in view itself:
import time
from celery import shared_task
#shared_task
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
a = wait.delay()
work = AsyncResult(a.id)
while True:
if work.ready():
return work.get(timeout=1)
But this solution works synchronously and makes no difference. How can we make it asynchronous without asking our user to keep on requesting until the result is received?
As mentioned in #Blusky's answer:
The asynchronous API will exist in django 3.X. not before.
If this is not an option, then the answer is just no.
Please note as well, that even with django 3.X any django code, that accesses the database will not be asynchronous it had to be executed in a thread (thread pool)
Celery is intended for background tasks or deferred tasks, but celery will never return an HTTP response as it didn't receive the HTTP request to which it should respond to. Celery is also not asyncio friendly.
You might have to think of changing your architecture / implementation.
Look at your overall problem and ask yourself whether you really need an asynchronous API with Django.
Is this API intended for browser applications or for machine to machine applications?
Could your client's use web sockets and wait for the answer?
Could you separate blocking and non blocking parts on your server side?
Use django for everything non blocking, for everything periodic / deferred (django + celelry) and implement the asynchronous part with web server plugins or python ASGI code or web sockets.
Some ideas
Use Django + nginx nchan (if your web server is nginx)
Link to nchan: https://nchan.io/
your API call would create a task id, start a celery task, return immediately the task id or a polling url.
The polling URL would be handled for example via an nchan long polling channel.
your client connects to the url corresponding to an nchan long polling channel and celery deblocks it whenever you're task is finished (the 10s are over)
Use Django + an ASGI server + one handcoded view and use strategy similiar to nginx nchan
Same logic as above, but you don't use nginx nchan, but implement it yourself
Use an ASGI server + a non blocking framework (or just some hand coded ASGI views) for all blocking urls and Django for the rest.
They might exchange data via the data base, local files or via local http requests.
Just stay blocking and throw enough worker processes / threads at your server
This is probably the worst suggestion, but if it is just for personal use,
and you know how many requests you will have in parallel then just make sure you have enough Django workers, so that you can afford to be blocking.
I this case you would block an entire Django worker for each slow request.
Use websockets. e.g. with the channels module for Django
Websockets can be implemented with earlier versions of django (>= 2.2) with the django channels module (pip install channels) ( https://github.com/django/channels )
You need an ASGI server to server the asynchronous part. You could use for example Daphne ot uvicorn (The channel doc explains this rather well)
Addendum 2020-06-01: simple async example calling synchronous django code
Following code uses the starlette module as it seems quite simple and small
miniasyncio.py
import asyncio
import concurrent.futures
import os
import django
from starlette.applications import Starlette
from starlette.responses import Response
from starlette.routing import Route
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'pjt.settings')
django.setup()
from django_app.xxx import synchronous_func1
from django_app.xxx import synchronous_func2
executor = concurrent.futures.ThreadPoolExecutor(max_workers=2)
async def simple_slow(request):
""" simple function, that sleeps in an async matter """
await asyncio.sleep(5)
return Response('hello world')
async def call_slow_dj_funcs(request):
""" slow django code will be called in a thread pool """
loop = asyncio.get_running_loop()
future_func1 = executor.submit(synchronous_func1)
func1_result = future_func1.result()
future_func2 = executor.submit(synchronous_func2)
func2_result = future_func2.result()
response_txt = "OK"
return Response(response_txt, media_type="text/plain")
routes = [
Route("/simple", endpoint=simple_slow),
Route("/slow_dj_funcs", endpoint=call_slow_dj_funcs),
]
app = Starlette(debug=True, routes=routes)
you could for example run this code with
pip install uvicorn
uvicorn --port 8002 miniasyncio:app
then on your web server route these specific urls to uvicorn and not to your django application server.
The best option is to use the futur async API, which will be proposed on Django in 3.1 release (which is already available in alpha)
https://docs.djangoproject.com/en/dev/releases/3.1/#asynchronous-views-and-middleware-support
(however, you will need to use an ASGI Web Worker to make it work properly)
Checkout Django 3 ASGI (Asynchronous Server Gateway Interface) support:
https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/
Related
I am using the fastapi to build a RESTful webservice. To run the service, I am using uvicorn. On startup of uvicorn, I want to execute the my python script, which will make a database call and cache some data, so that it can be reused until uvicorn is running. I tried looking up documents of uvicorn, but did not find any reference.
Is there anyway to achieve my requirement?
Use the FastAPI startup event. From FastAPI docs:
from fastapi import FastAPI
app = FastAPI()
items = {}
#app.on_event("startup")
async def startup_event():
items["foo"] = {"name": "Fighters"}
items["bar"] = {"name": "Tenders"}
In this case, the startup event handler function will initialize the
items "database" (just a dict) with some values.
You can add more than one event handler function.
And your application won't start receiving requests until all the
startup event handlers have completed.
(this feature is actually implemented in starlette - the ASGI framework that FastAPI is built on)
Setup :
Language i am using is Python
I am running a flask app with threaded =True and inside the flask app when an endpoint is hit, it starts a thread and returns a thread started successfully with status code 200.
Inside the thread, there is re.Search happening, which takes 30-40 seconds(worst case) and once the thread is completed it hits a callback URL completing one request.
Ideally, the Flask app is able to handle concurrent requests
Issue:
When the re.search is happening inside the thread, the flask app is not accepting concurrent requests. I am assuming some kind of thread locking is happening and unable to figure out.
Question :
is it ok to do threading(using multi processing) inside flask app which has "threaded = True"?
When regex is happening does it do any thread locking?
Code snippet: hit_callback does a post request to another api which is not needed for this issue.
import threading
from flask import *
import re
app = Flask(__name__)
#app.route(/text)
def temp():
text = request.json["text"]
def extract_company(text)
attrib_list = ["llc","ltd"] # realone is 700 in length
entity_attrib = r"((\s|^)" + r"(.)?(\s|$)|(\s|^)".join(attrib_list) + r"(\s|$))"
raw_client = re.search("(.*(?:" + entity_attrib + "))", text, re.I)
hit_callback(raw_client)
extract_thread = threading.Thread(target=extract_company, )
extract_thread.start()
return jsonify({"Response": True}), 200
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=4557, threaded=True)
Please read up on the GIL - basically, Python can only execute ONE piece of code at the same time - even if in threads. So your re-search runs and blocks all other threads until run to completion.
Solutions: Use Gunicorn etc. pp to run multiple flask processes - do not attempt to do everything in one flask process.
To add something: Also your design is problematic - 40 seconds or so for a http answer is way to long. A better design would have at least two services: a web server and a search service. The first would register a search and give back an id. Your search service would communicate asynchronously with the web service and return resullt +id when the result is ready. Your clients could pull the web service until they get a result.
I have a web endpoint for users to upload file.
When the endpoint receives the request, I want to run a background job to process the file.
Since the job would take time to complete, I wish to return the job_id to the user to track the status of the request while the job is running in background.
I am wondering if asyncio would help in this case.
import asyncio
#asyncio.coroutine
def process_file(job_id, file_obj):
<process the file and dump results in db>
#app.route('/file-upload', methods=['POST'])
def upload_file():
job_id = uuid()
process_file(job_id, requests.files['file']) . # I want this call to be asyc without any await
return jsonify('message' : 'Request received. Track the status using: " + `job_id`)
With the above code, process_file method is never called. Not able to understand why.
I am not sure if this is the right way to do it though, please help if I am missing something.
Flask doesn't support async calls yet.
To create and execute heavy tasks in background you can use https://flask.palletsprojects.com/en/1.1.x/patterns/celery/ Celery library.
You can use this for reference:
Making an asynchronous task in Flask
Official documentation:
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#installing-celery
Even though you wrote #asyncio.coroutine() around a function it is never awaited which tells a function to return result.
Asyncio is not good for such kind of tasks, because they are blocking I/O. It is usually used to make function calls and return results fast.
As #py_dude mentioned, Flask does not support async calls. If you are looking for a library that functions and feels similar to Flask but is asynchronous, I recommend checking out Sanic. Here is some sample code:
from sanic import Sanic
from sanic.response import json
app = Sanic()
#app.route("/")
async def test(request):
return json({"hello": "world"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
Updating your database asynchronously shouldn't be an issue; refer to here to find asyncio-supported database drivers. For processing your file, check out aiohttp. You can run your server extremely fast on a single thread without any hickup if you do so asynchronously.
The API should allow arbitrary HTTP get requests containing URLs the user wants scraped, and then Flask should return the results of the scrape.
The following code works for the first http request, but after twisted reactor stops, it won't restart. I may not even be going about this the right way, but I just want to put a RESTful scrapy API up on Heroku, and what I have so far is all I can think of.
Is there a better way to architect this solution? Or how can I allow scrape_it to return without stopping twisted reactor (which can't be started again)?
from flask import Flask
import os
import sys
import json
from n_grams.spiders.n_gram_spider import NGramsSpider
# scrapy api
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
app = Flask(__name__)
def scrape_it(url):
items = []
def add_item(item):
items.append(item)
runner = CrawlerRunner()
d = runner.crawl(NGramsSpider, [url])
d.addBoth(lambda _: reactor.stop()) # <<< TROUBLES HERE ???
dispatcher.connect(add_item, signal=signals.item_passed)
reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished
return items
#app.route('/scrape/<path:url>')
def scrape(url):
ret = scrape_it(url)
return json.dumps(ret, ensure_ascii=False, encoding='utf8')
if __name__ == '__main__':
PORT = os.environ['PORT'] if 'PORT' in os.environ else 8080
app.run(debug=True, host='0.0.0.0', port=int(PORT))
I think there is no a good way to create Flask-based API for Scrapy. Flask is not a right tool for that because it is not based on event loop. To make things worse, Twisted reactor (which Scrapy uses) can't be started/stopped more than once in a single thread.
Let's assume there is no problem with Twisted reactor and you can start and stop it. It won't make things much better because your scrape_it function may block for an extended period of time, and so you will need many threads/processes.
I think the way to go is to create an API using async framework like Twisted or Tornado; it will be more efficient than a Flask-based (or Django-based) solution because the API will be able to serve requests while Scrapy is running a spider.
Scrapy is based on Twisted, so using twisted.web or https://github.com/twisted/klein can be more straightforward. But Tornado is also not hard because you can make it use Twisted event loop.
There is a project called ScrapyRT which does something very similar to what you want to implement - it is an HTTP API for Scrapy. ScrapyRT is based on Twisted.
As an examle of Scrapy-Tornado integration check Arachnado - here is an example on how to integrate Scrapy's CrawlerProcess with Tornado's Application.
If you really want Flask-based API then it could make sense to start crawls in separate processes and/or use queue solution like Celery. This way you're loosing most of the Scrapy efficiency; if you go this way you can use requests + BeautifulSoup as well.
I have been working on similar project last week, it's SEO service API, my workflow was like this:
The client send a request to Flask-based server with a URRL to scrape, and a callback url to notify the client when scrapping is done (client here is an other web app)
Run Scrapy in the background using Celery. The spider will save the data to the database.
The backgound service will notify the client by calling the callback url when the spider is done.
I am experimenting with several of GAE's features.
I 've built a Dynamic Backend but I am having several issues getting this thing to work without task queues
Backend code:
class StartHandler(webapp2.RequestHandler):
def get(self):
#... do stuff...
if __name__ == '__main__':
_handlers = [(r'/_ah/start', StartHandler)]
run_wsgi_app(webapp2.WSGIApplication(_handlers))
The Backend is dynamic. So whenever it receives a call it does it's stuff and then stops.
Everything is worikng fine when I use inside my handlers:
url = backends.get_url('worker') + '/_ah/start'
urlfetch.fetch(url)
But I want this call to be async due to the reason that the Backend might take up to 10 minutes to finish it's work.
So I changed the above code to:
url = backends.get_url('worker') + '/_ah/start'
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
But then the Backend does not start. I am not interested into completion of the request or getting any data out of it.
What am I missing - implementing wrong?
Thank you all
Using RPC for async call without calling get_result() on the rpc object will not grantee that the urlfetch will be called. Once your code exits the pending async calls that were not completed will be aborted.
The only way to make the handler async is to queue the code in a push queue.