So I have a simple python cgi script. The web front end is used to add stuff to a database, and I have update() function that does some cleanup.
I want to run the update() function every time something is added to site, but it needs to be in the background. That is, the webpage should finish loading without waiting for the update() function to finish.
Now I use:
-add stuff to db
Thread(target=update).start()
-redirect to index page
The problem seems to be that python does not want to finish the request (redirect) until the update() thread is done.
Any ideas?
That is, the webpage should finish loading without waiting for the update() function to finish
CGI has to wait for the process -- as a whole -- to finish. Threads aren't helpful.
You have three choices.
subprocess. Spawn a separate "no wait" subprocess to do the update. Provide all the information as command-line parameters.
multiprocessing. Have your CGI connect place a work request in a Queue. You'd start a separate listener which handles the update requests from a Queue.
celery. Download Celery and use it to manage the separate worker process that does the background processing.
you could add a database trigger to update db in response to an event e.g., if a specific column has changed
start a subprocess e.g., subprocess.Popen([sys.executable, '-c', "from m import update; update()"]). It might not work depending on your cgi environment
or just touch update file to be picked up by an inotify script to run necessary updates in a separate process
switch to a different execution environment, e.g., some multithreaded wsgi-server
as a heave-weight option you could use celery if it is easy to deploy in your environment
Related
I've written a script that uses PARAMIKO library to log on to a server and executes a command. This command actually invokes the server to execute another python script (resulting in a child process I believe). I believe the server returns back signal indicating that the command was executed successfully, however it doesn't seem to wait for the new child process to complete - only that the original parent process has been completed. Is there anyway of waiting to reference any/all child processes that were generated as a result of this command and waiting that they are all completed before returning control to the initiating client?
Many thanks.
Without the code this will be difficult. I think you should create a rest service . So you would POST to http://0.0.0.0/runCode and this would kick off a process in a different thread. That would end that call. The thread is still running ...when done do a post to http:// 0.0.0.0/afterProcessIsDone this will be the response from the thread that was kicked off. Then in that route you can do whatever you want with thay response there. If you need help with REST check out Flask. It's pretty easy and straight to the point for small projects.
I currently have a working python application, gui with wxpython. I send this application a folder which then gets processed by a command line application via Popen. Each time I run this application it take about 40 mins+ to process before it finishes. While a single job processes I would like to queue up another job, I don't want to submit multiple jobs at the same time, I want to submit one job, while it's processing I want to submit another job, so when the first one finishes it would then just process the next, and so on, but I am unsure of how to go about this and would appreciate some suggestions.
Presumably you have either a notification that the task has finished being passed back to the GUI or the GUI is checking the state of the task periodically. In either case you can allow the user to just add to a list of directories to be processed and when your popen task has finished take the first one off of the list and start a new popen task, (remembering to remove the started one off of the list.
Use subprocess.call() instead of Popen, or use Popen.wait().
I have a tool that I am working on and I need it to run a parser and also output another analysis log. Currently I have it so that it's through a web interface.
User goes to the form and submits a filename for parsing (file already on system).
Form submits information to Python CGI script
Python CGI script runs and spawns a subprocess to run the parsing.
Parser finds appropriate information for analysis and spawns subprocess also.
I am using
import subprocess
...
subprocess.Popen(["./program.py", input])
In my code and I assumed from documentation that we don't wait on the child process to terminate, we just keep running the script. My CGI script that starts all this does:
subprocess.Popen(["./program.py", input])
// HTML generation code
// Javascript to refresh after 1 second to a different page
The HTML generation code is to output just a status that we've processed the request and then the javascript refreshes the page to the main homepage.
The Problem
The CGI page hangs until the subprocesses finish, which is not what I want. I thought Popen doesn't wait for the subprocesses to finish but whenever I run this tool, it stalls until it's complete. I want the script to finish and let the subprocesses run in the background and let the webpages still function properly without the user thinking everything is just stalled with the loading signals.
I can't seem to find any reason why Popen would do this because everywhere I read it says it does not wait, but it seems to.
Something odd also is that the apache logs show: "Request body read timeout" as well before the script completes. Is Apache actually stalling the script then?
Sorry I can't show complete code as it's "confidential" but hopefully the logic is there to be understood.
Apache probably waits for the child process to complete. You could try to demonize the child (double fork, setsid) or better just submit the job to a local service e.g., by writing to a predefined file or using some message broker or via higher level interface such as celery
Not sure exactly why this works but I followed the answer in this thread:
How do I run another script in Python without waiting for it to finish?
To do:
p = subprocess.Popen([sys.executable, '/path/to/script.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
Instead of:
p = subprocess.Popen([sys.executable, '/path/to/script.py'])
And for some reason now the CGI script will terminate and the subprocesses keep running.
Any insight as to why there is a difference would be helpful? I don't see why having to define the other two parameters would cause such a stall.
I need to to handle a large (time and memory-consuming) process asynchronously in a web2py application called inside a controller method.
My specific use case is to call a process via stdlib.subprocess and wait for it to exit without blocking the web server, but I am open to alternative methods.
Hands-on examples would be a plus.
3rd party library recommendations
are welcome.
CRON scheduling is not required/wanted.
Assuming you'll need to start multiple, possibly simultaneous, instances of the background task, the solution is a task queue. I've heard good things about Celery and RabbitMQ, if you're looking for 3rd-party options, and web2py includes it's own task queue system that might be sufficient for your needs.
With either tool, you'll define a function that encapsulates the operation you want the background process to perform. Then bring the task queue workers online. The web2py manual and forums indicate this can be done with an #reboot statement in the web2py cron system, which is triggered whenever the web server starts. There are probably other ways to start the workers if this is unsatisfactory.
In your controller you'll insert a task into the task queue, passing any necessary parameters as inputs to the function (the background function will not run in the same environment as the controller, so it won't have access to the session, DB, etc. unless you explicitly pass the appropriate values into the task function).
Now, to get the output of the background operation to the user. When you insert a task into the task queue, you should get back a unique ID for the task. You would then implement controller logic (either something that expects an AJAX call, or a page that keeps refreshing until the task completes) that calls the task queue's API to check the status of the specified task. If the task's status is "finished", return the data to the user. If not, keep waiting.
Maybe review the book section on running tasks in the background. You can use the new scheduler or create a homemade queue (email example). There's also a web2py-celery plugin, though I'm not sure what state that is in.
This is more difficult than one might expect. Note the deadlock warnings in the stdlib.subprocess documentation. It's easy if you don't mind blocking---use Popen.communicate. To work around the blocking, you can manage the process using stdlib.subprocess from a thread.
My favorite way to deal with subprocesses is to use Twisted's spawnProcess. But, it is not easy to get Twisted to play nicely with other frameworks.
I've been searching for an answer to this for awhile, it's possible that I haven't been searching for the right information though.
I'm trying to send data to a server, and once received the server executes a python script based on that data. I have been trying to spawn a thread and return, but I can't figure out how to "detach" the thread. I simply have to wait until the thread returns to be able to return an HttpResponse(). This is unacceptable, as the website interface has many other things that need to be able to be used while the thread runs on the server.
I'm not certain that was a clear explanation but I'll be more than happy to clarify if any part is confusing.
Have a look at Celery. It's quite nice in that you can accept the request, and it offload it quickly to workers, and return. It's simple to use.
http://celeryproject.org/
Most simply, you can do this with subprocess.Popen. See here for some information regarding the subprocess module:
http://docs.python.org/library/subprocess.html
There are other (possibly better) methods to doing this, but this one seems to fit your requirements.
Use message queue system, like celery (django-celery may help you.)
Use RDBMS and background process(es) which is periodically invoked by cron or always running.
First, the web server inserts data required by the background job into a database table. And then, background process (always running or run periodically by cron) gets the latest inserted row(s) and process it.
Spawn a thread.
worker_thread = threading.Thread(target=do_background_job, args=args)
worker_thread.setDaemon(False)
worker_thread.start()
return HttpResponse()
Even after HttpResponse is sent, do_background_job is processed. However, because Web server (apache) may kill any threads, execution of background_job is not guaranteed.