What does Django do when a request to the root comes? - python

I am programming a django web app. I don't understand how it works concurrently. Basically, what happens is that I have a page that takes 10 seconds to load (due to a lot of python computation being executed), and another page that takes about 1 second to load due to less python code to execute and immediately returning the index.html page.
This is the link that I provided in the routing.
localhost:3000/10secondpage
localhost:3000/1secondpage
I perform this action on my browser:
Open first browser to localhost:3000/10secondpage, then immediately open a second browser to localhost:3000/1secondpage
As I am only running it on localhost with 1 terminal, this was the behavior I expected.
Expected Behavior:
The python code executes the first browser's request and takes 10 second to complete, after it is done, it immediately starts the second browser's request and takes about 1 second to complete. As a result, the second browser needs to wait about 11 seconds in total as it needs to wait for the first browser's request to be completed first.
Actual Behavior:
However, the actual behaviour was that the second browser completed its request first despite being execute after the first browser. This suggest django comes with some built in process/thread spawning already.
Can someone please explain why does the actual behavior occur?

Put simple, its threading.
Web requests do not depend on other requests to be able to be finished before you are able to execute your request, if it did, then posting an update to facebook would take hours/months/years before your post actually makes it live.
Django is no different. In order to process any number of requests that a page may receive at once, it must process them individually and asyncronously. Of course, this can become much more complex very quickly with the introduction of load sharing and similar but it comes down to the same answer.
You can take a look at the Handlers source code to see more in detail about what django does with this
Note: I haven't tried this but to observe your expected output you can run runserver with the --nothreading flag
manage.py runserver --nothreading

Related

Django responding to different request from different browser with one python session

I'm actually a php(CodeIgniter) web developer though I love python I just installed Bitnami's Django Stack which has Apache, MySQL, PostgreSQL and Python 2.7.9 with Django installed. During installation itself it generated a simple Django project.
Though it looked familiar to me I started adding some lines of codes to it but when I save it and refresh the page or even restart the browser I found that python instance is still running the old script. The script updates only when I restart start the Apache Server(I believe that's where the Python instance got terminated).
So, to clarify this problem with Python I created a simple view and URLed it to r'^test/'
from django.http import HttpResponse
i = 0
def test_view(request):
global i
i += 1
return HttpResponse(str(i))
Then I found that even switching between different browser the i value keep on increasing i.e increasing value continues with the other browse.
So, can anyone tell me is this a default behavior of Django or is there something wrong with my Apache installation.
This is the default behavior, it may reset if you were running with gunicorn and killing workers after X requests or so, I don't remember. It's like this because the app continues to run after a request has been served.
Its been a while I've worked with PHP but I believe, a request comes in, php starts running a script which returns output and then that script terminates. Special global variables like $_SESSION aside, nothing can really cross requests.
Your Django app starts up and continues to run unless something tells it to reload (when running with ./manage.py runserver it will reload whenever it detects changes to the code, this is what you want during development).
If you are interested in per visitor data see session data. It would look something like:
request.session['i'] = request.session.get('i', 0) + 1
You can store data in there for the visitor and it will stick around until they lose their session.

how to write endless loop crawler in python?

EDITED:
I have a crawler.py that crawls certain sites every 10 minutes and sends me some emails regarding these site. The crawler is ready and working locally.
How can I adjust it so that the following two things will happen :
It will run in endless loop on the hosting that I'll upload it to?
Sometimes I will be able to stop it ( e.g. for debugging).
At first, I thought of doing endless loop e.g.
crawler.py:
while True:
doCarwling()
sleep(10 minutes)
However, according to answers I got below, this would be impossible since hosting providers kill processes after a while (just for the question sake, let's assume proccesses are killed every 30 min). Therefore, my endless loop process would be killed at some point.
Therefore, I have thought pf a different solution:
Lets assume that my crawler is located at "www.example.com\crawler.py" and each time it is accessed, it executes the function run():
run()
doCarwling()
sleep(10 minutes)
call URL "www.example.com\crawler.py"
Thus, there will be no endless loop. In fact, every time my crawler runs, it would also access the URL which will execute the same crawler again. Therefore, there would be no endless loop, no process with a long-running time, and my crawler will continue operating forever.
Will my idea work?
Are there any hidden drawbacks I haven't thought of?
Thanks!
Thanks
As you stated in the comments, you are running on a public shared server like GoDaddy and so on. Therefore cron is not available there and long running scripts are usually forbidden - your process would be killed even if you were using sleep.
Therefore, the only solution I see is to use an external server on which you have to control to connect to your public server and run the script, every 10 minutes. One solution could be using cron on your local machine to connect with wget or curl to a specific page on your host. **
Maybe you can find on-line services that allow running a script periodically, and use those, but I know none.
** Bonus: you can get the results directly as response without having to send yourself an email.
Update
So, in your updated question you propose yo use your script to call itself with an HTTP request. I thought of it before, but I didn't consider it in my previous answer because I believe it won't work (in general).
My concern is: will the server kill a script if the HTTP connection requesting it is closed before the script terminates?
In other words: if you open yoursite.com/script.py and it takes 60 seconds to run, and you close the connection with the server after 10 seconds, will the script run till its regular end?
I thought that the answer was obviously "no, the script will be killed", therefore that method would be useless, because you should guarantee that a script calling itself via a HTTP request stays alive longer than the called script. I did a little experiment using flask, and it proved me wrong:
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
import time
print('Script started...')
time.sleep(5)
print('5 seconds passed...')
time.sleep(5)
print('Script finished')
return 'Script finished'
if __name__ == '__main__':
app.run()
If I run this script and make an HTTP request to localhost:5000, and close the connection after 2 seconds, the scripts continues to run until the end and the messages are still printed.
Therefore, with flask, if you can do an asynchronous request to yourself, you should be able to have an "infinite loop" script.
I don't know the behavior on other servers, though. You should make a test.
Control
Assuming your server allows you to do a GET request and have the script running even if the connection is closed, you have few things to take care of, for example that your script still has to run fast enough to complete during the maximum server time allowance, and that to make your script run every 10 minutes, with a maximum allowance of 1 minute, you have to count every time 10 calls.
In addition, this mechanism has to be controlled, because you cannot interrupt it for debug as you requested. At least, not directly.
Therefore, I suggest you to use files: use a file to split your crawling in smaller steps, each capable to finish in less than one minute, and then continue again when the script is called again.
Use a file to count how many times the script is called, before actually doing the crawling. This is necessary if, for example, the script is allowed to live 90 seconds, but you want to crawl every 10 hours.
Use a file to control the script: store a boolean flag that you use to stop the recursion mechanism if you need to.
If you're using Linux you should just do a cron job for your script. Info: http://code.tutsplus.com/tutorials/scheduling-tasks-with-cron-jobs--net-8800
If you are running linux I would setup and upstart script http://upstart.ubuntu.com/getting-started.html to turn it into a service.
It offers a lot of advantages like:
-Starting at system boot
-Auto restart on crashes
-Manageable: service mycrawler restart
...
Or if you would prefer to have it run every 10 minutes forget about the endless loop and do a cronjob http://en.wikipedia.org/wiki/Cron

shell command from python script

I need you guys :D
I have a web page, on this page I have check some items and pass their value as variable to python script.
problem is:
I Need to write a python script and in that script I need to put this variables into my predefined shell commands and run them.
It is one gnuplot and one other shell commands.
I never do anything in python can you guys send me some advices ?
THx
I can't fully address your questions due to lack of information on the web framework that you are using but here are some advice and guidance that you will find useful. I did had a similar problem that will require me to run a shell program that pass arguments derived from user requests( i was using the django framework ( python ) )
Now there are several factors that you have to consider
How long will each job takes
What is the load that you are expecting (are there going to be loads of jobs)
Will there be any side effects from your shell command
Here are some explanation that why this will be important
How long will each job takes.
Depending on your framework and browser, there is a limitation on the duration that a connection to the server is kept alive. In other words, you will have to take into consideration that the time for the server to response to a user request do not exceed the connection time out set by the server or the browser. If it takes too long, then you will get a server connection time out. Ie you will get an error response as there is no response from the server side.
What is the load that you are expecting.
You will have probably figure that if a work that you are requesting is huge,it will take out more resources than you will need. Also, if you have multiple requests at the same time, it will take a huge toll on your server. For instance, if you do proceed with using subprocess for your jobs, it will be important to note if you job is blocking or non blocking.
Side effects.
It is important to understand what are the side effects of your shell process. For instance, if your shell process involves writing and generating lots of temp files, you will then have to consider the permissions that your script have. It is a complex task.
So how can this be resolve!
subprocesswhich ship with base python will allow you to run shell commands using python. If you want more sophisticated tools check out the fabric library. For passing of arguments do check out optparse and sys.argv
If you expect a huge work load or a long processing time, do consider setting up a queue system for your jobs. Popular framework like celery is a good example. You may look at gevent and asyncio( python 3) as well. Generally, instead of returning a response on the fly, you can retur a job id or a url in which the user can come back later on and have a look
Point to note!
Permission and security is vital! The last thing you want is for people to execute shell command that will be detrimental to your system
You can also increase connection timeout depending on the framework that you are using.
I hope you will find this useful
Cheers,
Biobirdman

Time out issues with chrome and flask

I have a web application which acts as an interface to an offsite server which runs a very long task. The user enters information and hits submit and then chrome waits for the response, and loads a new webpage when it receives it. However depending on the network, input of the user, the task can take a pretty long time and occasionally chrome loads a "no data received page" before the data is returned (though the task is still running).
Is there a way to put either a temporary page while my task is thinking or simply force chrome to continue waiting? Thanks in advance
While you could change your timeout on the server or other tricks to try to keep the page "alive", keep in mind that there might be other parts of the connection that you have no control over that could timeout the request (such as the timeout value of the browser, or any proxy between the browser and server, etc). Also, you might need to constantly up your timeout value if the task takes longer to complete (becomes more advanced, or just slower because more people use it).
In the end, this sort of problem is typically solved by a change in your architecture.
Use a Separate Process for Long-Running Tasks
Rather than submitting the request and running the task in the handling view, the view starts the running of the task in a separate process, then immediately returns a response. This response can bring the user to a "Please wait, we're processing" page. That page can use one of the many push technologies out there to determine when the task was completed (long-polling, web-sockets, server-sent events, an AJAX request every N seconds, or the dead-simplest: have the page reload every 5 seconds).
Have your Web Request "Kick Off" the Separate Process
Anyway, as I said, the view handling the request doesn't do the long action: it just kicks off a background process to do the task for it. You can create this background process dispatch yourself (check out this Flask snippet for possible ideas), or use a library like Celery or (RQ).
Once the task is complete, you need some way of notifying the user. This will be dependent on what sort of notification method you picked above. For a simple "ajax request every N seconds", you need to create a view that handles the AJAX request that checks if the task is complete. A typical way to do this is to have the long-running task, as a last step, make some update to a database. The requests for checking the status can then check this part of the database for updates.
Advantages and Disadvantages
Using this method (rather than trying to fit the long-running task into a request) has a few benefits:
1.) Handling long-running web requests is a tricky business due to the fact that there are multiple points that could time out (besides the browser and server). With this method, all your web requests are very short and much less likely to timeout.
2.) Flask (and other frameworks like it) is designed to only support a certain number of threads that can respond to web queries. Assume it has 8 threads: if four of them are handling the long requests, that only leaves four requests to actually handle more typical requests (like a user getting their profile page). Half of your web server could be tied up doing something that is not serving web content! At worse, you could have all eight threads running a long process, meaning your site is completely unable to respond to web requests until one of them finishes.
The main drawback: there is a little more set up work in getting a task queue up and running, and it does make your entire system slightly more complex. However, I would highly recommend this strategy for long-running tasks that run on the web.
I believe this is due to your web server (apache in most cases) which has a timeout to small. Try to increase this number
For apache, have a look at the timeout option
EDIT: I don't think you can do set this time out in Chrome (see this topic on google forums even though it's really old)
In firefox, on the about:config page, type timeout and you'll have some options you can set. I have no idea about Internet Explorer.
Let's assume:
This is not a server issue, so we don't have to go fiddle with Apache, nginx, etc. timeout settings.
The delay is minutes, not hours or days, just to make the scenario manageable.
You control the web page on which the user hits submit, and from which user interaction is managed.
If those obtain, I'd suggest not using a standard HTML form submission, but rather have the submit button kick off a JavaScript function to oversee processing. It would put up a "please be patient...this could take a little while" style message, then use jQuery.ajax, say, to call the long-time-taking server with a long timeout value. jQuery timeouts are measured in milliseconds, so 60000 = 60 seconds. If it's longer than that, increase your specified timeout accordingly. I have seen reports that not all clients will allow super-extra-long timeouts (e.g. Safari on iOS apparently has a 60-second limitation). But in general, this will give you a platform from which to manage the interactions (with your user, with the slow server) rather than being at the mercy of simple web form submission.
There are a few edge cases here to consider. The web server timeouts may indeed need to be adjusted upward (Apache defaults to 300 seconds aka 5 minutes, and nginx less, IIRC). Your client timeouts (on iOS, say) may have maximums too low for the delays you're seeing. Etc. Those cases would require either adjusting at the server, or adopting a different interaction strategy. But an AJAX-managed interaction is where I would start.

Django, sleep() pauses all processes, but only if no GET parameter?

Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.
As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!
Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.
It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.

Categories

Resources