I have an end-point in Django which initiates a function which takes really long to complete. I don't want the request to wait until this function has been completed.
def MyRequest(APIView):
def get(self, request, *args **kwargs):
a_function_which_takes_really_long_time()
return Response({"message" : "We're Working on it."})
I tried using asyncio with Django Asynchronous Support. Also tried python threading here. But all these are making the request to wait until the function is completed.
I know that we can easily achieve this using Celery. But that approach would require me to use a message-broker such as Redis, RabbitMQ or any other similar servers which I'm not supposed to use.
In my opinion using celery with django is really easy and recommended. I am not sure why are you against using a message broker, both of the options you suggested are really stable and wildly used projects.
But, if you still have the broker constraint I can recommend apscheduler which doesn't use an intermediate broker
Related
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I'm building a website which provides some information to the visitors. This information is aggregated in the background by polling a couple external APIs every 5 seconds. The way I have it working now is that I use APScheduler jobs. I initially preferred APScheduler because it makes the whole system more easy to port (since I don't need to set cron jobs on the new machine). I start the polling functions as follows:
from apscheduler.scheduler import Scheduler
#app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()
apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)
This kinda works, but there's some trouble with it:
For starters, this means that the interval-jobs are running outside of the Flask context. So far this hasn't been much of a problem, but when calling an endpoint fails I want the system to send me an email (saying "hey calling API X failed"). Because it doesn't run within the Flask context however, it complaints that flask-mail cannot be executed (RuntimeError('working outside of application context')).
Secondly, I wonder how this is going to behave when I don't use the Flask built-in debug server anymore, but a production server with lets say 4 workers. Will it start every job four times then?
All in all I feel that there should be a better way of running these recurring tasks, but I'm unsure how. Does anybody out there have an interesting solution to this problem? All tips are welcome!
[EDIT]
I've just been reading about Celery with its schedules. Although I don't really see how Celery is different from APScheduler and whether it could thus solve my two points, I wonder if anyone reading this thinks that I should investigate more in Celery?
[CONCLUSION]
About two years later I'm reading this, and I thought I could let you guys know what I ended up with. I figured that #BluePeppers was right in saying that I shouldn't be tied so closely to the Flask ecosystem. So I opted for regular cron-jobs running every minute which are set using Ansible. Although this makes it a bit more complex (I needed to learn Ansible and convert some code so that running it every minute would be enough) I think this is more robust.
I'm currently using the awesome pythonr-rq for queueing a-sync jobs (checking APIs and sending emails). I just found out about rq-scheduler. I haven't tested it yet, but it seems to do precisely what I needed in the first place. So maybe this is a tip for future readers of this question.
For the rest, I just wish all of you a beautiful day!
(1)
You can use the app.app_context() context manager to set the application context. I imagine usage would go something like this:
from apscheduler.scheduler import Scheduler
def checkSecondApi():
with app.app_context():
# Do whatever you were doing to check the second API
#app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()
apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)
Alternatively, you could use a decorator
def with_application_context(app):
def inner(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
with app.app_context():
return func(*args, **kwargs)
return wrapper
return inner
#with_application_context(app)
def checkFirstAPI():
# Check the first API as before
(2)
Yes it will still work. The sole (significant) difference is that your application will not be communicating directly with the world; it will be going through a reverse proxy or something via fastcgi/uwsgi/whatever. The only concern is that if you have multiple instances of the app starting, then multiple schedulers will be created. To manage this, I would suggest you move your backend tasks out of the Flask application, and use a tool designed for running tasks regularly (i.e. Celery). The downside to this is that you won't be able to use things like Flask-Mail, but imo, it's not too good to be so closely tied to the Flask ecosystem; what are you gaining by using Flask-Mail over a standard, non Flask, mail library?
Also, breaking up your application makes it much easier to scale up individual components as the capacity is required, compared to having one monolithic web application.
I have a REST API and now I want to create a web site that will use this API as only and primary datasource. The system is distributed: REST API is on one group of machines and the site will be on the other(s).
I'm expecting to have quite a lot of load, so I'd like to make requests as efficient, as possible.
Do I need some async HTTP requests library or any HTTP client library will work?
API is done using Flask, web site will be also built using Flask and Jinja as template engine.
You could use gevent with Flask to get asynchronous I/O from normally synchronous libraries. See this question for an example of someone getting help with doing that.
You could also run Flask behind gunicorn, which has support for spawning multiple workers (threads, processes, or greenlets) for handling concurrent requests. If you were to take that approach, Flask would remain completely synchronous, and gunicorn would handle creating multiple Flask instances to handle concurrent requests.
Start simple and use the way, which seems to be easy to use for you. Consider optimization to be done later on only if needed.
Use of async libraries would come into play as helpful if you would have thousands of request a second. Much sooner you are likely to have performance problems related to database (if you use it), which is not to be resolved by async magic.
Actually your API is on a separate machine. Even if you make your client calls asynchronous , it will not have any impact on the server. By making your calls asynchronous, your thread in the client will not wait for the response. Your server will react the same when call is sync /async.
And if you want to make your calls async , please check http://stackandqueue.com/?p=57 . It uses unirest to make both get and post async calls
I have a Django website, and one page has a button (or link) that when clicked will launch a somewhat long running task. Obviously I want to launch this task as a background task and immediately return a result to the user. I want to implement this using a simple approach that will not require me to install and learn a whole new messaging architecture like Celery for example. I do not want to use Celery! I just want to use a simple approach that I can set up and get running over the next half hour or so. Isn't there a simple way to do this in Django without having to add (yet another) 3rd party package?
Just use a thread.
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
See this question for more details:
Can Django do multi-thread works?
Have a look at django-background-tasks - it does exactly what you need and doesn't need any additional services to be running like RabbitMQ or Redis. It manages a task queue in the database and has a Django management command which you can run once or as a cron job.
If you're willing to install a 3rd party library, but you want something a whole lot simpler than Celery, check out Redis Queue. It does require Redis, which is pretty easy in itself, but that can provide a lot of other benefits as well.
RQ itself has almost zero configuration. It's startlingly simple.
References:
http://python-rq.org/
http://nvie.com/posts/introducing-rq/
https://devcenter.heroku.com/articles/python-rq (RQ on Heroku)
I'm trying to build a Twisted/Django mashup that will let me control various client connections managed by a Twisted server via Django's admin interface. Meaning, I want to be able to login to Django's admin and see what protocols are currently in use, any details specific to each connection (e.g. if the server is connected to freenode via IRC, it should list all the channels currently connected to), and allow me to disconnect or connect new clients by modifying or creating database records.
What would be the best way to do this? There are lots of posts out there about combining Django with Twisted, but I haven't found any prior art for doing quite what I've outlined. All the Twisted examples I've seen use hardcoded connection parameters, which makes it difficult for me to imagine how I would dynamically running reactor.connectTCP(...) or loseConnection(...) when signalled by a record in the database.
My strategy is to create a custom ClientFactory that solely polls the Django/managed database every N seconds for any commands, and to modify/create/delete connections as appropriate, reflecting the new status in the database when complete.
Does this seem feasible? Is there a better approach? Does anyone know of any existing projects that implement similar functionality?
Polling the database is lame, but unfortunately, databases rarely have good tools (and certainly there are no database-portable tools) for monitoring changes. So your approach might be okay.
However, if your app is in Django and you're not supporting random changes to the database from other (non-Django) clients, and your WSGI container is Twisted, then you can do this very simply by doing callFromThread(connectTCP, ...).
I've been working on yet another way of combing django and twisted. Fell free to give it a try: https://github.com/kowalski/featdjango.
The way it works, is slightly different that the others. It starts a twisted application and http site. The requests done to django are processed inside a special thread pool. What makes it special, is that that these threads can wait on Deferred, which makes it easy to combine synchronous django application code with asynchronous twisted code.
The reason I came up with structure like this, is that my application needs to perform a lot of http requests from inside the django views. Instead of performing them one by one I can delegate all of them at once to "the main application thread" which runs twisted and wait for them. The similarity to your problem is, that I also have an asynchronous component, which is a singleton and I access it from django views.
So this is, for example, this is how you would initiate the twisted component and later to get the reference from the view.
import threading
from django.conf import settings
_initiate_lock = threading.Lock()
def get_component():
global _initiate_lock
if not hasattr(settings, 'YOUR_CLIENT')
_initiate_lock.acquire()
try:
# other thread might have did our job while we
# were waiting for the lock
if not hasattr(settings, 'YOUR_CLIENT'):
client = YourComponent(**whatever)
threading.current_thread().wait_for_deferred(
client.initiate)
settings.YOUR_CLIENT = client
finally:
_initiate_lock.release()
return settings.YOUR_CLIENT
The code above, initiates my client and calls the initiate method on it. This method is asynchronous and returns a Deferred. I do all the necessary setup in there. The django thread will wait for it to finish before processing to next line.
This is how I do it, because I only access it from the request handler. You probably would want to initiate your component at startup, to call ListenTCP|SSL. Than your django request handlers could get the data about the connections just accessing some public methods on the your client. These methods could even return Deferred, in which case you should use .wait_for_defer() to call them.