I've being trying to setup my first django app. I've used nginx + uwsgi + django. I tested a simple Hello World view, but when I check the url I'm getting a different result per refresh:
AttributeError at /control/
Hello world
The standard Django message:
It worked!
Congratulations on your first Django-powered page.
In the urls.py I have this simple pattern:
urlpatterns = patterns('',
url(r'^control/?$',hello),
)
On views.py I have:
def hello(request):
return HttpResponse("Hello world")
Why do the results per refresh vary?
How do you start and stop your uWSGI server might be an issue here.
uWSGI is not reloading itself when code changes by default, so unless stopped or reloaded, it will contain cached version of old django code and operate with it. But caching won't occur immediately, just on first request.
But... uWSGI can spawn multiple workers at once, so it can process multiple requests at once (each worker thread can process one request at time) and that workers will have their own cached version of code.
So one scenario can be: you've started you uWSGI server, then you did some request (before making any views, so standard django it worked shows up), that time one of workers cached code responsible for that action. You've decided to change something, but you've done some error, on next request code providing to that error was cached on another worker. Then you've fixed that error and next worker cached fixed code, providing to you your hello world response.
And now there is situation when all workers have some cached version of your code and depending of which one will process your code, you are getting different results.
Solution for that is: restart your uWSGI instance, and maybe add py-auto-reload to your uWSGI config, so uWSGI will automatically reload when code is changed (use that option only in development, never in production environment).
Other scenario is: you don't have multiple workers, but every time you've changed something in code, you're starting new uWSGI instance without stopping old one. That will cause multiple instances to run in same time, and when you are using unix socket, they will co-exist, taking turns when processing requests. If it's the case, stop all uWSGI instances and next time stop old instance before starting new. Or just simply reload old one.
Related
I'm an newer in Python and just wrote some spiders using scrapy. Now i want to active my spider using http request like this:
http://xxxxx.com/myspidername/args
I used nginx + uwsgi + django to call my scrapy spider.
Steps:
create&config django project
create scrapy project in the django project root and write my spider
start uwsgi: uwsgi -x django_socket.xml
call my spider in the django app's views.py
from django.http import HttpResponse
from scrapy import cmdline
def index(request, mid):
cmd = "scrapy crawl myitem -a mid=" + mid
cmdline.execute(cmd.split())
return HttpResponse("Hello, it work")
when i visit the http://myhost/myapp/index pointed to the index view, the nginx return error page and the error log shows "upstream prematurely closed connection while reading response header from upstream", and i can see the process uwsgi dispeared, but in the uwsgi's log i can see my spider run correctly.
How can i fix this error?
Is this way right? any other way to do what i want?
I don't think it's such good idea to launch a spider tool inside django views. Django web app is meant to provide quick request/response to end users so that they could retrieve information quickly. Even if I'm not entirely sure what caused the error to happen, I would imagine that your view function would stuck in there as long as the spider don't finish.
There are two options here you could try to improve the user experience and minimize the error that could happen:
crontab
It runs your script regularly. It's reliable and easier for you to log and debug. But it's not flexible for scheduling and lack of control.
celery
This is pretty python/django specific tool that could schedule dynamically your task. You could define either crontab like tasks to run regularly, or apply a task at the run time. It won't block your view function and execute everything in a separate process, so it's most likely what you want. It needs some setup so it might be not straightforward at first. However, many people have used it and it works great once everything is in place.
Nginx does asynchronous, non-blocking IO.
The call too scrapy.cmdline is synchronous. Most likely this messes up in the context of nginx.
Try opening a new process upon receiving the request.
There are many (well maybe not THAT many) ways to do this.
Try this question and its answers first:
Starting a separate process
I am programming a django web app. I don't understand how it works concurrently. Basically, what happens is that I have a page that takes 10 seconds to load (due to a lot of python computation being executed), and another page that takes about 1 second to load due to less python code to execute and immediately returning the index.html page.
This is the link that I provided in the routing.
localhost:3000/10secondpage
localhost:3000/1secondpage
I perform this action on my browser:
Open first browser to localhost:3000/10secondpage, then immediately open a second browser to localhost:3000/1secondpage
As I am only running it on localhost with 1 terminal, this was the behavior I expected.
Expected Behavior:
The python code executes the first browser's request and takes 10 second to complete, after it is done, it immediately starts the second browser's request and takes about 1 second to complete. As a result, the second browser needs to wait about 11 seconds in total as it needs to wait for the first browser's request to be completed first.
Actual Behavior:
However, the actual behaviour was that the second browser completed its request first despite being execute after the first browser. This suggest django comes with some built in process/thread spawning already.
Can someone please explain why does the actual behavior occur?
Put simple, its threading.
Web requests do not depend on other requests to be able to be finished before you are able to execute your request, if it did, then posting an update to facebook would take hours/months/years before your post actually makes it live.
Django is no different. In order to process any number of requests that a page may receive at once, it must process them individually and asyncronously. Of course, this can become much more complex very quickly with the introduction of load sharing and similar but it comes down to the same answer.
You can take a look at the Handlers source code to see more in detail about what django does with this
Note: I haven't tried this but to observe your expected output you can run runserver with the --nothreading flag
manage.py runserver --nothreading
I'm actually a php(CodeIgniter) web developer though I love python I just installed Bitnami's Django Stack which has Apache, MySQL, PostgreSQL and Python 2.7.9 with Django installed. During installation itself it generated a simple Django project.
Though it looked familiar to me I started adding some lines of codes to it but when I save it and refresh the page or even restart the browser I found that python instance is still running the old script. The script updates only when I restart start the Apache Server(I believe that's where the Python instance got terminated).
So, to clarify this problem with Python I created a simple view and URLed it to r'^test/'
from django.http import HttpResponse
i = 0
def test_view(request):
global i
i += 1
return HttpResponse(str(i))
Then I found that even switching between different browser the i value keep on increasing i.e increasing value continues with the other browse.
So, can anyone tell me is this a default behavior of Django or is there something wrong with my Apache installation.
This is the default behavior, it may reset if you were running with gunicorn and killing workers after X requests or so, I don't remember. It's like this because the app continues to run after a request has been served.
Its been a while I've worked with PHP but I believe, a request comes in, php starts running a script which returns output and then that script terminates. Special global variables like $_SESSION aside, nothing can really cross requests.
Your Django app starts up and continues to run unless something tells it to reload (when running with ./manage.py runserver it will reload whenever it detects changes to the code, this is what you want during development).
If you are interested in per visitor data see session data. It would look something like:
request.session['i'] = request.session.get('i', 0) + 1
You can store data in there for the visitor and it will stick around until they lose their session.
Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.
As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!
Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.
It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.
I have two sites running essentially the same codebase, with only slight differences in settings. Each site is built in Django, with a WordPress blog integrated.
Each site needs to import blog posts from WordPress and store them in the Django database. When a user publishes a post, WordPress hits a webhook URL on the Django side, which kicks off a Celery task that grabs the JSON version of the post and imports it.
My initial thought was that each site could run its own instance of manage.py celeryd, each is in its own virtualenv, and the two sites would stay out of each other's way. Each is daemonized with a separate upstart script.
But it looks like they're colliding somehow. I can run one at a time successfully, but if both are running, one instance won't receive tasks, or tasks will run with the wrong settings (in this case, each has a WORDPRESS_BLOG_URL setting).
I'm using a Redis queue, if that makes a difference. What am I doing wrong here?
Have you specified the name of the default queue that celery should use? If you haven't set CELERY_DEFAULT_QUEUE the both sites will be using the same queue and getting each other's messages. You need to set this setting to a different value for each site to keep the message separate.
Edit
You're right, CELERY_DEFAULT_QUEUE is only for backends like RabbitMQ. I think you need to set a different database number for each site, using a different number at the end of your broker url.
If you are using django-celery then make sure you don't have an instance of celery running outside of your virtualenvs. Then start the celery instance within your virtualenvs using manage.py celeryd like you have done. I recommend setting up supervisord to keep track of your instances.