how to get a real callback from os in python3

how to get a real callback from os in python3 - python

i wrote actionscript and javascript. add callback to invoke a piece of code is pretty normal in everyday life.
but cames to python it seems not quit an easy job. i can hardly see things writing in callback style.i mean real callback,not a fake one,here's a fake callback example:
a list of file for download,you can write:
urls = []
def downloadfile(url,callback):
//download the file
callback()
def downloadNext():
if urls:
downloadfile(urls.pop(),downloadNext)
downloadNext()
this works but would finally meet the maximum recursion limit.while a really callback won't.
A real callback,as far as i understand,can't not come from program, it's must come from physics,like CPU clock,or some hardware IO state change,this would invoke some interception to CPU ,CPU interrupt current operating flow and check if the runtime registered any code about this int,if has,run it,the OS wrapped it as signal or event or something else ,and finally pass it to application.(if i'm wrong ,please point it out)thus would avoid the function calling stack pile up to overflow,otherwise you'll drop into infinite recursion .
there was something like coroutine in python to handle multi tasks,but must be very carefully.if in any of the routine you are blocked,all tasks would be blocked
there's some third party libs like twisted or gevent,but seems very troublesome to get and install,platform limited,not well supported in python 3,it's not good for writing a simple app and distribute.
multiprocessing, heavy and only works on linux
threading,because of GIL, never be the first choice,and it seems a psuedo one.
why not python give an implementation in standard libraries?and is there other easy way to get the real callback i want?

Your example code is just a complicated way of sequentially downloading all files.
If you really want to do asyncronous downloading, using a multiprocessing.Pool, especially the Pool.map_async member function. is the best way to go. Note that this uses callbacks.
According to the documentation for multiprocessing:
"It runs on both Unix and Windows."
But it is true that multiprocessing on ms windows has some extra restrictions.

Related

subprocess.call does not wait for the process to complete

Per Python documentation, subprocess.call should be blocking and wait for the subprocess to complete. In this code I am trying to convert few xls files to a new format by calling Libreoffice on command line. I assumed that the call to subprocess call is blocking but seems like I need to add an artificial delay after each call otherwise I miss few files in the out directory.
what am I doing wrong? and why do I need the delay?
from subprocess import call
for i in range(0,len(sorted_files)):
args = ['libreoffice', '-headless', '-convert-to',
'xls', "%s/%s.xls" %(sorted_files[i]['filename'],sorted_files[i]['filename']), '-outdir', 'out']
call(args)
var = raw_input("Enter something: ") # if comment this line I dont get all the files in out directory
EDIT It might be hard to find the answer through the comments below. I used unoconv for document conversion which is blocking and easy to work with from an script.

It's possible likely that libreoffice is implemented as some sort of daemon/intermediary process. The "daemon" will (effectively1) parse the commandline and then farm the work off to some other process, possibly detaching them so that it can exit immediately. (based on the -invisible option in the documentation I suspect strongly that this is indeed the case you have).
If this is the case, then your subprocess.call does do what it is advertised to do -- It waits for the daemon to complete before moving on. However, it doesn't do what you want which is to wait for all of the work to be completed. The only option you have in that scenario is to look to see if the daemon has a -wait option or similar.
1It is likely that we don't have an actual daemon here, only something which behaves similarly. See comments by abernert

The problem is that the soffice command-line tool (which libreoffice is either just a link to, or a further wrapper around) is just a "controller" for the real program soffice.bin. It finds a running copy of soffice.bin and/or creates on, tells it to do some work, and then quits.
So, call is doing exactly the right thing: it waits for libreoffice to quit.
But you don't want to wait for libreoffice to quit, you want to wait for soffice.bin to finish doing the work that libreoffice asked it to do.
It looks like what you're trying to do isn't possible to do directly. But it's possible to do indirectly.
The docs say that headless mode:
… allows using the application without user interface.
This special mode can be used when the application is controlled by external clients via the API.
In other words, the app doesn't quit after running some UNO strings/doing some conversions/whatever else you specify on the command line, it sits around waiting for more UNO commands from outside, while the launcher just runs as soon as it sends the appropriate commands to the app.
You probably have to use that above-mentioned external control API (UNO) directly.
See Scripting LibreOffice for the basics (although there's more info there about internal scripting than external), and the API documentation for details and examples.
But there may be an even simpler answer: unoconv is a simple command-line tool written using the UNO API that does exactly what you want. It starts up LibreOffice if necessary, sends it some commands, waits for the results, and then quits. So if you just use unoconv instead of libreoffice, call is all you need.
Also notice that unoconv is written in Python, and is designed to be used as a module. If you just import it, you can write your own (simpler, and use-case-specific) code to replace the "Main entrance" code, and not use subprocess at all. (Or, of course, you can tear apart the module and use the relevant code yourself, or just use it as a very nice piece of sample code for using UNO from Python.)
Also, the unoconv page linked above lists a variety of other similar tools, some that work via UNO and some that don't, so if it doesn't work for you, try the others.
If nothing else works, you could consider, e.g., creating a sentinel file and using a filesystem watch, so at least you'll be able to detect exactly when it's finished its work, instead of having to guess at a timeout. But that's a real last-ditch workaround that you shouldn't even consider until eliminating all of the other options.

If libreoffice is being using an intermediary (daemon) as mentioned by #mgilson, then one solution is to find out what program it's invoking, and then directly invoke it yourself.

Is there a simple way to launch a background task from a Python CGI script without waiting around for it to terminate?

In Windows, that is.
I think the answer to this question is that I need to create a Windows service. This seems ludicrously heavyweight for what I am trying to do.
I'm just trying to slap together a little prototype here for my manager, I'm not going to be responsible for productizing it... in fact, it may never even BE productized; it might just be something that a few researchers play around with.
I have a CGI script that receives a file for upload, stores it to a temporary location, then launches a background process to do some serious number-crunching on the file. Then some Javascript stuff sits around calling other CGI scripts to check on the status and update the page as needed.
All of this works, except the damn web server won't close the connection as long as the subrocess is running. I've done some searching, and it appears the answer on Unix is to make it a daemon, but I'm stuck on Windows right now and I guess the answer there is to make it a Windows service?!? This seems incredibly heavyweight to just, you know, launch a damn process and then close the server connection.
That's really the only way?
Edit: Okay, found a nifty little hack over here (the choice (3) that the guy gives):
How to completely background a process in Perl CGI under IIS
I was able to modify this to make it even simpler, and although this is a klugey solution, it is perfect for the quick-and-dirty little prototype I am trying to make.
So I initially had my main script doing this:
subprocess.Popen("python.exe","myscript.py","arg1","arg2")
Which doesn't work, as I've described. Instead, I now have my main script emit this little bit of Javascript which runs after the document is fully loaded:
$("#somecrap").load("launchBackgroundProcess.py", {arg1:"foo",arg2:"bar"});
And then launchBackgroundProcess.py does the subprocess.Popen.
This solution would never scale, since it still leaves the browser connection open during the entire time the background task is running. But since this little thinger I am whipping up might someday have two simultaneous users at most (even then I doubt it) resources are not a concern. This allows the user to see the main page and get the Javascript updates even though there is still an http connection hanging open for no good reason.
Thanks for the answers! If I'm ever asked to productize this, I'll take at the resources Profane recommends.

If you haven't much experience with windows programming and don't wish to peruse the MSDN docs-- I don't blame you-- you may want to try to pick up a copy of Mark Hammond's cannonical guide to all things python and windows. It somehow never goes out-of-date on many of these sorts of recurring questions. Instead of launching the process with the every-platform solution, you'd probably be better off using the win32process module. Chapter 17 of the Hammond book covers this extensively, but you could probably get all you need by downloading the pywin ide (I think it comes bundled in the windows extensions which you can download from pypi), and looking through the help docs it has on python's windows' api. Here's an example of using the api, from a project I was working on recently. It may in fact do some of what you want with a little adaptation. You'd probably want to focus on CreationFlags. In particular, win32process.DETACHED_PROCESS is "often used to execute console programs in the background." Many other flags are available and conveniently wrapped however.
if subprocess.mswindows:
su=subprocess.STARTUPINFO()
su.dwFlags |= subprocess._subprocess.STARTF_USESHOWWINDOW
process = subprocess.Popen(['program', 'flag', 'flag2'], bufsize=-1,
stdout=subprocess.PIPE, startupinfo=su)

Simplest, but not most efficient way would be to just run another python executable
from subprocess import Popen
Popen("python somescript.py")

You can just use a system call using the "start" windows command. This way your python script will not wait for the completion of the started program.

CGI scripts are run with standard output redirected, either directly to the TCP socket or to a pipe. Typically, the connection won't close until the handle, and all copies of it, are closed. By default, the subprocess will inherit a copy of the handle.
There are two ways to prevent the connection from waiting on the subprocess. One is to prevent the subprocess from inheriting the handle, the other is for the subprocess to close its copy of the handle when it starts.
If the subprocess is in Perl, I think you could close the handle very simply:
close(STDOUT);
If you want to prevent the subprocess from inheriting the handle, you could use the SetHandleInformation function (if you have access to the Win32 API) or set bInheritHandles to FALSE in the call to CreateProcess. Alternatively, close the handle before launching the subprocess.

Numeric GUI bottleneck

I've made a GUI to set up and start a numerical integrator using PyQT4, Wing, QT, and Python 2.6.6, on my Mac. The thing is, when I run the integrator form the GUI, it takes very many times longer than when I crudely run the integrator from the command line.
As an example, a 1000 year integration took 98 seconds on the command line and ~570 seconds from the GUI.
In the GUI, the integration runs from a thread and then returns. It uses a a queue to communicate back to the GUI.
Does anyone have any ideas as to where the bottleneck is? I suspect that others may be experiencing something like this just on a smaller scale.
t = threading.Thread(target=self.threadsafe_start_thread, args=(self.queue, self.selected))
t.start()

In general it is not a good idea to use python threads within a pyqt application. Instead use a QThread.
Both python and QThreads call the same underlying mechanisms, but they don't play well together. I don't know if this will solve your problem or not, but it might be part of the issue.

Is your thread code mostly Python code? If yes, then you might be a victim of the Global Interpreter Lock.

python queue concurrency process management

The use case is as follows :
I have a script that runs a series of
non-python executables to reduce (pulsar) data. I right now use
subprocess.Popen(..., shell=True) and then the communicate function of subprocess to
capture the standard out and standard error from the non-python executables and the captured output I log using the python logging module.
The problem is: just one core of the possible 8 get used now most of the time.
I want to spawn out multiple processes each doing a part of the data set in parallel and I want to keep track of progres. It is a script / program to analyze data from a low frequencey radio telescope (LOFAR). The easier to install / manage and test the better.
I was about to build code to manage all this but im sure it must already exist in some easy library form.

The subprocess module can start multiple processes for you just fine, and keep track of them. The problem, though, is reading the output from each process without blocking any other processes. Depending on the platform there's several ways of doing this: using the select module to see which process has data to be read, setting the output pipes non-blocking using the fnctl module, using threads to read each process's data (which subprocess.Popen.communicate itself uses on Windows, because it doesn't have the other two options.) In each case the devil is in the details, though.
Something that handles all this for you is Twisted, which can spawn as many processes as you want, and can call your callbacks with the data they produce (as well as other situations.)

Maybe Celery will serve your needs.

If I understand correctly what you are doing, I might suggest a slightly different approach. Try establishing a single unit of work as a function and then layer on the parallel processing after that. For example:
Wrap the current functionality (calling subprocess and capturing output) into a single function. Have the function create a result object that can be returned; alternatively, the function could write out to files as you see fit.
Create an iterable (list, etc.) that contains an input for each chunk of data for step 1.
Create a multiprocessing Pool and then capitalize on its map() functionality to execute your function from step 1 for each of the items in step 2. See the python multiprocessing docs for details.
You could also use a worker/Queue model. The key, I think, is to encapsulate the current subprocess/output capture stuff into a function that does the work for a single chunk of data (whatever that is). Layering on the parallel processing piece is then quite straightforward using any of several techniques, only a couple of which were described here.

Render in infinity loop

Question for Python 2.6
I would like to create an simple web application which in specified time interval will run a script that modifies the data (in database). My problem is code for infinity loop or some other method to achieve this goal. The script should be run only once by the user. Next iterations should run automatically, even when the user leaves the application. If someone have idea for method detecting apps breaks it would be great to show it too. I think that threads can be the best way to achive that. Unfortunately, I just started my adventure with Python and don't know yet how to use them.
The application will have also views showing database and for control of loop script.
Any ideas?

You mentioned that you're using Google App Engine. You can schedule recurring tasks by placing a cron.yaml file in your application folder. The details are here.
Update: It sounds like you're not looking for GAE-specific solutions, so the more general advice I'd give is to use the native scheduling abilities of whatever platform you're using. Cron jobs on a *nix host, scheduled tasks on Windows, cron.yaml on GAE, etc.
In your other comments you've suggested wanting something in Python that doesn't leave your script executing, and I don't think there's any way to do this. Some process has to be responsible for kicking off whatever it is you need done, so either you do it in Python and keep a process executing (even if it's just sleeping), or you use the platform's scheduling tools. The OS is almost guaranteed to do a better job of this than your code.

i think you'd want to use cron. write your script, and have cron run it every X minutes / hours.
if you really want to do this in Python, you can do something like this:
while(True):
<your app logic here>
sleep(TIME_INTERVAL)

Can you use cron to schedule the job to run at certain intervals? It's usually considered better than infinite loops, and was designed to help solve this sort of problem.

There's a very primitive cron in the Python standard library: import sched. There's also threading.Timer.
But as others say, you probably should just use the real cron.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.