I've been into Python/Django and things are going good but I'm very curious about how django handle any asynchronous tasks for example if there is a database query that took 10 seconds to execute and another request arrives on the server during these 10 seconds?
def my_api(id):
do_somthing()
user = User.objects.get(id=id) # took 10 sec to execute
do_somthing_with_user()
I've used the queryset method that fetch something from database and let's say it took 10 seconds to complete. what happens when another request arrives on the django server, how django respond to it? will the request be pending until the first query responds or will it be handled in a parallel way? what is the deep concept for it?
How Python Handles these type things under the hood?
Handling requests is the main job of the web server that u are using. Django users by default use WSGI/ASGI.
Any requests coming to those web servers have a unique session and it's a separate thread. So no conflicts but there are race conditions. (OS resources, Queue, Stack...)
So consider we have a function that has a 2 seconds sleep inside.
import time
def function1():
time.sleep(2)
User 1 and user 2 request to this function by API.(e.x GET test/API)
two threads start for each one and if both almost start at time 0:0:0 then both end like 0:0:2.
So How about Async? Async request work concurrently for each request. (Not Parallel)
assume another API should call that sleepy function twice. (e.x Get test/API2)
The first call would take 2 seconds(Async function and await) and the second call 2 seconds too.
if we called this at 0:0:0 this would be like 0:0:2 again.
In sync way that would be 0:0:4.
So finally how about the Database?
there are many ways to handle that and the popular one is using Database pooling. However, generally, each query request makes a new connection (multi-thread) to the database and runs those stuff.
database pool is a thing that reduces a runtime because it has some ready-to-use connections to the database and whenever it has a job to do it goes and runs the query and back to the place where in without that pool the connection lifetime is over usage and end whenever the job has done.
Related
I am making a simple website with as Django as backend.
Ideally, you should be able to use it without creating an account and then all of your saved items ('notes') in there would be visible to anyone.
For now I have created a dummy user on Django, and every time an anonymous user makes an API call to add/delete/modify a notes, on Django's side it selects dummy user as a user.
It would work okey (I think) but one of Django's API can take a really long time to run (~1-2 minutes). Meaning that if multiple people are trying to make API calls while being anonymous, at some point the server will freeze until the long API finishes run.
Is there a way such case should be handed on the server side to prevent freezing of server ?
As Sorin answered in comments, I would go into Celery way. Basically you can create model that collect the data when the last Celery task was ran - and, for example if it was not running since last 24 hours, and user visits the website, you run task again by async way.
You are not even forced to use AJAX calls for that, because sending task to Celery is fast enough to call it on get_context_data() or dispatch() methods. You could overload others, but these are the fastest and safest to override.
You mentioned that one of your API endpoints perform task that take really long time which means you already have a task or process that can be block http request-response cycle. What you can do is to run the time-consuming (or long running) task asynchronously using celery or other task queues.
Am running an app with Flask , UWSGI and Nginx. My UWSGI is set to spawn out 4 parallel processes to handle multiple requests at the same time. Now I have one request that takes lot of time and that changes important data concerning the application. So, when one UWSGI process is processing that request and say all others are also busy, the fifth request would have to wait. The problem here is I cannot change this request to run in an offline mode as it changes important data and the user cannot simply remain unknown about it. What is the best way to handle this situation ?
As an option you can do the following:
Separate the heavy logic from the function which is being called
upon #route and move it into a separate place (a file, another
function, etc)
Introduce Celery to run that pieces of heavy logic
(it will be processed in a separate thread from the #route-decorated functions).
A quick way of doing this is using Redis as a message broker.
Schedule the time-consuming functions from your #route-decorated
functions in Celery (it is possible to pass parameters as well)
This way the HTTP requests won't be blocked for the complete function execution time.
I am writing an application that would asynchronously trigger some events. The test looks like this: set everything up, sleep for sometime, check that event has triggered.
However because of that waiting the test takes quite a time to run - I'm waiting for about 10 seconds on every test. I feel that my tests are slow - there are other places where I can speed them up, but this sleeping seems the most obvious place to speed it up.
What would be the correct way to eliminate that sleep? Is there some way to cheat datetime or something like that?
The application is a tornado-based web app and async events are triggered with IOLoop, so I don't have a way to directly trigger it myself.
Edit: more details.
The test is a kind of integration test, where I am willing to mock the 3rd party code, but don't want to directly trigger my own code.
The test is to verify that a certain message is sent using websocket and is processed correctly in the browser. Message is sent after a certain timeout which is started at the moment the client connects to the websocket handler. The timeout value is taken as a difference between datetime.now() at the moment of connection and a value in database. The value is artificially set to be datetime.now() - 5 seconds before using selenium to request the page. Since loading the page requires some time and could be a bit random on different machines I don't think reducing the 5 seconds time gap would be wise. Loading the page after timeout will produce a different result (no websocket message should be sent).
So the problem is to somehow force tornado's IOLoop to send the message at any moment after the websocket is connected - if that happened in 0.5 seconds after setting the database value, 4.5 seconds left to wait and I want to try and eliminate that delay.
Two obvious places to mock are IOLoop itself and datetime.now(). the question is now which one I should monkey-patch and how.
I you want to mock sleep then you must not use it directly in your application's code. I would create a class method like System.sleep() and use this in your application. System.sleep() can be mocked then.
Use the built in tornado testing tools. Each test gets it's own IOLoop, and you use self.stop and self.wait to get results from it, e.g. (from the docs):
client = AsyncHTTPClient(self.io_loop)
# call self.stop on fetch completion
client.fetch("http://www.tornadoweb.org/", self.stop)
response = self.wait()
I am working on a django web app that has functions (say for e.g. sync_files()) that take a long time to return. When I use gevent, my app does not block when sync_file() runs and other clients can connect and interact with the webapp just fine.
My goal is to have the webapp responsive to other clients and not block. I do not expect a zillion users to connect to my webapp (perhaps max 20 connections), and I do not want to set this up to become the next twitter. My app is running on a vps, so I need something light weight.
So in my case listed above, is it redundant to use celery when I am using gevent? Is there a specific advantage to using celery? I prefer not to use celery since it is yet another service that will be running on my machine.
edit: found out that celery can run the worker pool on gevent. I think I am a litle more unsure about the relationship between gevent & celery.
In short you do need a celery.
Even if you use gevent and have concurrency, the problem becomes request timeout. Lets say your task takes 10 minutes to run however the typical request timeout is about up to a minute. So what will happen if you trigger the task directly within a view is that the server will start processing it however after a minute a client (browser) will probably disconnect the connection since it will think the server is offline. As a result, your data can become corrupt since you cannot be guaranteed what will happen when connection will close. Celery solves this because it will trigger a background process which will process the task independent of the view. So the user will get the view response right away and at the same time the server will start processing the task. That is a correct pattern to handle any scenarios which require lots of processing.
Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.
As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!
Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.
It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.