forking a process within a django view - python

I have a webservice that initiates a process that can take up to a minute. I want to return a 204 that effectively says, "I have successfully gotten your request," but run the slow process in the background.
I am trying to do this by forking another process like this:
p = Process(target = modelObj.slowProcess)
p.start()
logger.debug('sending 204')
return HttpResponse( status=204)
This part of the code seems to execute fine, but is tripping up django components. The debug statement is printed, and the process executes, but when I look at the network traffic in chrome's debugger, it says that the upload status is "cancelled". Since I haven't cancelled the event on the browser side, I assume that means the connection died. I never get any response back from the server, so it seems that I'm somehow breaking the request process.
How can I fork that separate process and still have the 204 get delivered?

The comments showed me what I was doing wrong.
The request in question was from a hidden iframe that was uploading a file. In many instances you can get away with thinking of that as an ajax request, but if the iframe gets back a 204, problems ensue. The iframe has to get back some content event though nothing substantive is being done with that content.

Related

How to show a 'processing' or 'in progress' view while pyramid is running a process?

I've got a simple pyramid app up and running, most of the views are a fairly thin wrapper around an sqlite database, with forms thrown in to edit/add some information.
A couple of times a month a new chunk of data will need to be added to this system (by csv import). The data is saved in an SQL table (the whole process right till commit takes about 4 seconds).
Every time a new chunk of data is uploaded, this triggers a recalculation of other tables in the database. The recalculation process takes a fairly long time (about 21-50 seconds for a month's worth of data).
Currently I just let the browser/client sit there waiting for the process to finish, but I do foresee the calculation process taking more and more time as the system gets more usage. From a UI perspective, this obviously looks like a hung process.
What can I do to indicate to the user that:-
That the long wait is normal/expected?
How MUCH longer they should have to wait (progress bar etc.)?
Note: I'm not asking about long-polling or websockets here, as this isn't really an interactive application and based on my basic knowledge websockets/async are overkill for my purposes.
I guess a follow-on question at this point, am I doing the wrong thing running processes in my view functions? Hardly seem to see that being done in examples/tutorials around the web. Am I supposed to be using celery or similar in this situation?
You're right, doing long calculations in a view function is generally frowned upon - I mean, if it's a typical website with random visitors who are able to hung a webserver thread for a minute then it's a recipe for a DoS vulnerability. But in some situations (internal website, few users, only admin has access to the "upload csv" form) you may get away with it. In fact, I used to have maintenance scripts which ran for hours :)
The trick here is to avoid browser timeouts - at the moment your client sends the data to the server and just sits there waiting for any reply, without any idea whether their request is being processed or not. Generally, at about 60 seconds the browser (or proxy, or frontend webserver) may become impatient and close the connection. Your server process will then get an error trying writing anything to the already closed connection and crash/raise an error.
To prevent this from happening the server needs to write something to the connection periodically, so the client sees that the server is alive and won't close the connection.
"Normal" Pyramid templates are buffered - i.e. the output is not sent to the client until the whole template to generated. Because of that you need to directly use response.app_iter / response.body_file and output some data there periodically.
As an example, you can duplicate the Todo List Application in One File example from Pyramid Cookbook and replace the new_view function with the following code (which itself has been borrowed from this question):
#view_config(route_name='new', request_method='GET', renderer='new.mako')
def new_view(request):
return {}
#view_config(route_name='new', request_method='POST')
def iter_test(request):
import time
if request.POST.get('name'):
request.db.execute(
'insert into tasks (name, closed) values (?, ?)',
[request.POST['name'], 0])
request.db.commit()
def test_iter():
i = 0
while True:
i += 1
if i == 5:
yield str('<p>Done! Click here to see the results</p>')
raise StopIteration
yield str('<p>working %s...</p>' % i)
print time.time()
time.sleep(1)
return Response(app_iter=test_iter())
(of cource, this solution is not too fancy UI-wise, but you said you didn't want to mess with websockets and celery)
So is the long running process triggered by browser action? I.e., the user is uploading the CSV that gets processed and then the view is doing the processing right there? For short-ish running browser processes I've used a loading indicator via jQuery or javascript, basically popping a modal animated spinner or something while a process runs, then when it completes hiding the spinner.
But if you're getting into longer and longer processes I think you should really look at some sort of background processing that will offload it from the UI. It doesn't have to be a message based worker, but even something like the end user uploads the file and a "to be processed" entry gets set in a database. Then you could have a pyramid script scheduled periodically in the background polling the status table and running anything it finds. You can move your file processing that is in the view to a separate method, and that can be called from the command line script. Then when the processing is finished it can update the status table indicating it is finished and that feedback could be presented back to the user somewhere, and not blocking their UI the whole time.

Kill Django process from browser

Within a Django view I call a function for uploading and importing an excel file.
def import_log(request):
report = ""
if request.method == "POST":
file_object = request.FILES
sheet_name = request.POST["sheet_name"]
if len(file_object):
file_object = file_object["file_object"]
if len(file_object):
process_import()
context = {
"report": report
}
return render(request, "import_log.html", context)
else:
return import_upload_view(request, error="No file uploaded")
When I try to stop the page by clicking "Stop loading this page" or by closing the browser the import process does not stop.
These import files are pretty big so I would like to be able to kill the process from the browser when needed.
How would I be able to do this?
Put simply, you can't.
The internet works by sending requests to a server and then waiting for a response, it doesn't pertain an open connection to a process, thats the server's job to handle its own processes.
The browser is essentially nothing more than your computers monitor, displaying the information sent to it - so you could turn your monitor off or pull the plug as much as you'd like, its not going to stop your computer from running
The only time Django/server would know about the 'aborted connection' is when it tries to send the response back. To understand this you can write a dummy view and sleep for may be 10 seconds in that view; call it from browser and stop it as fast you can; wait to see what Django does. If you are running Django Dev server it should be clear for you that Django does behave normally and sends the response but a 500 error happens because of aborted connection and then a try happens to send this 500 error to client which of course obviously fails too.
So, it is not possible to stop the view process in the middle.
But, you can change the way you approach this problem probably by first sending the request to your view then spin off a new process to do the "pretty big" import process; register the process by using some unique ID and current timestamp in some persistent data store (probably in a database); return HTTP status code 202(Accepted) with the registered ID to end the view.
In the spun process, have multihreading. One thread continuously polls the database to check delta between current time and the one in database. If the difference exceeds the threshold you decide (say 10 seconds) this whole process should kill itself.
From browser keep hitting an API (another Django view) using AJAX to update the timestamp in database for the particular record whose ID is the ID that you got in 202 response.
The idea is to keep let know the server that client is still available and if for some reason you don't see any ping from client, you would treat it as the case of browser close or navigate away from page and so stop working on the process spun off.
This approach may get tricky if you are single page application.

What does Django do when a request to the root comes?

I am programming a django web app. I don't understand how it works concurrently. Basically, what happens is that I have a page that takes 10 seconds to load (due to a lot of python computation being executed), and another page that takes about 1 second to load due to less python code to execute and immediately returning the index.html page.
This is the link that I provided in the routing.
localhost:3000/10secondpage
localhost:3000/1secondpage
I perform this action on my browser:
Open first browser to localhost:3000/10secondpage, then immediately open a second browser to localhost:3000/1secondpage
As I am only running it on localhost with 1 terminal, this was the behavior I expected.
Expected Behavior:
The python code executes the first browser's request and takes 10 second to complete, after it is done, it immediately starts the second browser's request and takes about 1 second to complete. As a result, the second browser needs to wait about 11 seconds in total as it needs to wait for the first browser's request to be completed first.
Actual Behavior:
However, the actual behaviour was that the second browser completed its request first despite being execute after the first browser. This suggest django comes with some built in process/thread spawning already.
Can someone please explain why does the actual behavior occur?
Put simple, its threading.
Web requests do not depend on other requests to be able to be finished before you are able to execute your request, if it did, then posting an update to facebook would take hours/months/years before your post actually makes it live.
Django is no different. In order to process any number of requests that a page may receive at once, it must process them individually and asyncronously. Of course, this can become much more complex very quickly with the introduction of load sharing and similar but it comes down to the same answer.
You can take a look at the Handlers source code to see more in detail about what django does with this
Note: I haven't tried this but to observe your expected output you can run runserver with the --nothreading flag
manage.py runserver --nothreading

Time out issues with chrome and flask

I have a web application which acts as an interface to an offsite server which runs a very long task. The user enters information and hits submit and then chrome waits for the response, and loads a new webpage when it receives it. However depending on the network, input of the user, the task can take a pretty long time and occasionally chrome loads a "no data received page" before the data is returned (though the task is still running).
Is there a way to put either a temporary page while my task is thinking or simply force chrome to continue waiting? Thanks in advance
While you could change your timeout on the server or other tricks to try to keep the page "alive", keep in mind that there might be other parts of the connection that you have no control over that could timeout the request (such as the timeout value of the browser, or any proxy between the browser and server, etc). Also, you might need to constantly up your timeout value if the task takes longer to complete (becomes more advanced, or just slower because more people use it).
In the end, this sort of problem is typically solved by a change in your architecture.
Use a Separate Process for Long-Running Tasks
Rather than submitting the request and running the task in the handling view, the view starts the running of the task in a separate process, then immediately returns a response. This response can bring the user to a "Please wait, we're processing" page. That page can use one of the many push technologies out there to determine when the task was completed (long-polling, web-sockets, server-sent events, an AJAX request every N seconds, or the dead-simplest: have the page reload every 5 seconds).
Have your Web Request "Kick Off" the Separate Process
Anyway, as I said, the view handling the request doesn't do the long action: it just kicks off a background process to do the task for it. You can create this background process dispatch yourself (check out this Flask snippet for possible ideas), or use a library like Celery or (RQ).
Once the task is complete, you need some way of notifying the user. This will be dependent on what sort of notification method you picked above. For a simple "ajax request every N seconds", you need to create a view that handles the AJAX request that checks if the task is complete. A typical way to do this is to have the long-running task, as a last step, make some update to a database. The requests for checking the status can then check this part of the database for updates.
Advantages and Disadvantages
Using this method (rather than trying to fit the long-running task into a request) has a few benefits:
1.) Handling long-running web requests is a tricky business due to the fact that there are multiple points that could time out (besides the browser and server). With this method, all your web requests are very short and much less likely to timeout.
2.) Flask (and other frameworks like it) is designed to only support a certain number of threads that can respond to web queries. Assume it has 8 threads: if four of them are handling the long requests, that only leaves four requests to actually handle more typical requests (like a user getting their profile page). Half of your web server could be tied up doing something that is not serving web content! At worse, you could have all eight threads running a long process, meaning your site is completely unable to respond to web requests until one of them finishes.
The main drawback: there is a little more set up work in getting a task queue up and running, and it does make your entire system slightly more complex. However, I would highly recommend this strategy for long-running tasks that run on the web.
I believe this is due to your web server (apache in most cases) which has a timeout to small. Try to increase this number
For apache, have a look at the timeout option
EDIT: I don't think you can do set this time out in Chrome (see this topic on google forums even though it's really old)
In firefox, on the about:config page, type timeout and you'll have some options you can set. I have no idea about Internet Explorer.
Let's assume:
This is not a server issue, so we don't have to go fiddle with Apache, nginx, etc. timeout settings.
The delay is minutes, not hours or days, just to make the scenario manageable.
You control the web page on which the user hits submit, and from which user interaction is managed.
If those obtain, I'd suggest not using a standard HTML form submission, but rather have the submit button kick off a JavaScript function to oversee processing. It would put up a "please be patient...this could take a little while" style message, then use jQuery.ajax, say, to call the long-time-taking server with a long timeout value. jQuery timeouts are measured in milliseconds, so 60000 = 60 seconds. If it's longer than that, increase your specified timeout accordingly. I have seen reports that not all clients will allow super-extra-long timeouts (e.g. Safari on iOS apparently has a 60-second limitation). But in general, this will give you a platform from which to manage the interactions (with your user, with the slow server) rather than being at the mercy of simple web form submission.
There are a few edge cases here to consider. The web server timeouts may indeed need to be adjusted upward (Apache defaults to 300 seconds aka 5 minutes, and nginx less, IIRC). Your client timeouts (on iOS, say) may have maximums too low for the delays you're seeing. Etc. Those cases would require either adjusting at the server, or adopting a different interaction strategy. But an AJAX-managed interaction is where I would start.

django,fastcgi: how to manage a long running process?

I have inherited a django+fastcgi application which needs to be modified to perform a lengthy computation (up to half an hour or more). What I want to do is run the computation in the background and return a "your job has been started" -type response. While the process is running, further hits to the url should return "your job is still running" until the job finishes at which point the results of the job should be returned. Any subsequent hit on the url should return the cached result.
I'm an utter novice at django and haven't done any significant web work in a decade so I don't know if there's a built-in way to do what I want. I've tried starting the process via subprocess.Popen(), and that works fine except for the fact it leaves a defunct entry in the process table. I need a clean solution that can remove temporary files and any traces of the process once it has finished.
I've also experimented with fork() and threads and have yet to come up with a viable solution. Is there a canonical solution to what seems to me to be a pretty common use case? FWIW this will only be used on an internal server with very low traffic.
I have to solve a similar problem now. It is not going to be a public site, but similarly, an internal server with low traffic.
Technical constraints:
all input data to the long running process can be supplied on its start
long running process does not require user interaction (except for the initial input to start a process)
the time of the computation is long enough so that the results cannot be served to the client in an immediate HTTP response
some sort of feedback (sort of progress bar) from the long running process is required.
Hence, we need at least two web “views”: one to initiate the long running process, and the other, to monitor its status/collect the results.
We also need some sort of interprocess communication: send user data from the initiator (the web server on http request) to the long running process, and then send its results to the reciever (again web server, driven by http requests). The former is easy, the latter is less obvious. Unlike in normal unix programming, the receiver is not known initially. The receiver may be a different process from the initiator, and it may start when the long running job is still in progress or is already finished. So the pipes do not work and we need some permamence of the results of the long running process.
I see two possible solutions:
dispatch launches of the long running processes to the long running job manager (this is probably what the above-mentioned django-queue-service is);
save the results permanently, either in a file or in DB.
I preferred to use temporary files and to remember their locaiton in the session data. I don't think it can be made more simple.
A job script (this is the long running process), myjob.py:
import sys
from time import sleep
i = 0
while i < 1000:
print 'myjob:', i
i=i+1
sleep(0.1)
sys.stdout.flush()
django urls.py mapping:
urlpatterns = patterns('',
(r'^startjob/$', 'mysite.myapp.views.startjob'),
(r'^showjob/$', 'mysite.myapp.views.showjob'),
(r'^rmjob/$', 'mysite.myapp.views.rmjob'),
)
django views:
from tempfile import mkstemp
from os import fdopen,unlink,kill
from subprocess import Popen
import signal
def startjob(request):
"""Start a new long running process unless already started."""
if not request.session.has_key('job'):
# create a temporary file to save the resuls
outfd,outname=mkstemp()
request.session['jobfile']=outname
outfile=fdopen(outfd,'a+')
proc=Popen("python myjob.py",shell=True,stdout=outfile)
# remember pid to terminate the job later
request.session['job']=proc.pid
return HttpResponse('A new job has started.')
def showjob(request):
"""Show the last result of the running job."""
if not request.session.has_key('job'):
return HttpResponse('Not running a job.'+\
'Start a new one?')
else:
filename=request.session['jobfile']
results=open(filename)
lines=results.readlines()
try:
return HttpResponse(lines[-1]+\
'<p>Terminate?')
except:
return HttpResponse('No results yet.'+\
'<p>Terminate?')
return response
def rmjob(request):
"""Terminate the runining job."""
if request.session.has_key('job'):
job=request.session['job']
filename=request.session['jobfile']
try:
kill(job,signal.SIGKILL) # unix only
unlink(filename)
except OSError, e:
pass # probably the job has finished already
del request.session['job']
del request.session['jobfile']
return HttpResponseRedirect('/startjob/') # start a new one
Maybe you could look at the problem the other way around.
Maybe you could try DjangoQueueService, and have a "daemon" listening to the queue, seeing if there's something new and process it.

Categories

Resources