When should I use Popen() and when should I use call()? - python

Caveat: new to Python.
Wanting to hear from professionals who actually use it:
What are the main differences between subprocess.Popen() and subprocess.call() and when is it best to use each one?
Unless you want to read why I was thinking about this question or what to center your answer around, you may stop reading now.
I was inspired to ask this question because I am working through an issue in a script where I started using subprocess.Popen(), eventually called a system pause, and then wanted to delete the .exe that created the system pause, but I noticed with Popen(), the commands all seemed to run together (the delete on the .exe gets executed before the exe is closed..), though I tried adding communicate().
Here is fake code for what I'm describing above:
subprocess.Popen(r'type pause.exe > c:\worker.exe', shell=True).communicate()
subprocess.Popen(r'c:\worker.exe', shell=True).communicate()
subprocess.Popen(r'del c:\worker.exe', shell=True).communicate()

subprocess.call(*popenargs, **kwargs)
Run command with arguments. Wait for
command to complete, then return the
returncode attribute.
If you create a Popen object, you must call the sp.wait() yourself.
If you use call, that's done for you.

Related

Starting process in Google Colab with Prefix "!" vs. "subprocess.Popen(..)"

I've been using Google Colab for a few weeks now and I've been wondering what the difference is between the two following commands (for example):
!ffmpeg ...
subprocess.Popen(['ffmpeg', ...
I was wondering because I ran into some issues when I started either of the commands above and then tried to stop execution midway. Both of them cancel on KeyboardInterrupt but I noticed that after that the runtime needs a factory reset because it somehow got stuck. Checking ps aux in the Linux console listed a process [ffmpeg] <defunct> which somehow still was running or at least blocking some ressources as it seemed.
I then did some research and came across some similar posts asking questions on how to terminate a subprocess correctly (1, 2, 3). Based on those posts I generally came to the conclusion that using the subprocess.Popen(..) variant obviously provides more flexibility when it comes to handling the subprocess: Defining different stdout procedures or reacting to different returncode etc. But I'm still unsure on what the first command above using the ! as prefix exactly does under the hood.
Using the first command is much easier and requires way less code to start this process. And assuming I don't need a lot of logic handling the process flow it would be a nice way to execute something like ffmpeg - if I were able to terminate it as expected. Even following the answers from the other posts using the 2nd command never got me to a point where I could terminate the process fully once started (even when using shell=False, process.kill() or process.wait() etc.). This got me frustrated, because restarting and re-initializing the Colab instance itself can take several minutes every time.
So, finally, I'd like to understand in more general terms what the difference is and was hoping that someone could enlighten me. Thanks!
! commands are executed by the notebook (or more specifically by the ipython interpreter), and are not valid Python commands. If the code you are writing needs to work outside of the notebook environment, you cannot use ! commands.
As you correctly note, you are unable to interact with the subprocess you launch via !; so it's also less flexible than an explicit subprocess call, though similar in this regard to subprocess.call
Like the documentation mentions, you should generally avoid the bare subprocess.Popen unless you specifically need the detailed flexibility it offers, at the price of having to duplicate the higher-level functionality which subprocess.run et al. already implement. The code to run a command and wait for it to finish is simply
subprocess.check_call(['ffmpeg', ... ])
with variations for capturing its output (check_output) and the more modern run which can easily replace all three of the legacy high-level calls, albeit with some added verbosity.

Python: Spawning another program

I have a Python program from which I spawn a sub-program to process some files without holding up the main program. I'm currently using bash for the sub-program, started with a command and two parameters like this:
result = os.system('sub-program.sh file.txt file.txt &')
That works fine, but I (eventually!) realised that I could use Python for the sub-program, which would be far preferable, so I have converted it. The simplest way of spawning it might be:
result = os.system('python3 sub-program.py file.txt file.txt &')
Some research has shown several more sophisticated alternatives, but I have the impression that the latest and most approved method is this one:
subprocess.Popen(["python3", "-u", "sub-program.py"])
Am I correct in thinking that that is the most appropriate way of doing it? Would anyone recommend a different method and why? Simple would be good as I'm a bit of a Python novice.
If this is the recommended method, I can probably work out what the "-u" does and how to add the parameters for myself.
Optional extras:
Send a message back from the sub-program to the main program.
Make the sub-program quit when the main program does.
Yes, using subprocess is the recommended way to go according to the documentation:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
However, subprocess.Popen may not be what you're looking for. As opposed to os.system you will create a Popen object that corresponds to the subprocess and you'll have to wait for it in order to wait for it's completion, fx:
proc = subprocess.Popen(["python3", "-u", "sub-program.py"])
do_something()
res = proc.wait()
If you want to just run a program and wait for completion you should probably use subprocess.run (or maybe subprocess.call, subprocess.check_call or subprocess.check_output) instead.
Thanks skyking!
With
import subprocess
at the beginning of the main program, this does what I want:
with open('output.txt', 'w') as f:
subprocess.Popen([spawned.py, parameter1, parameter2], stdout = f)
The first line opens a file for the output from the sub-program started in the second line. In the second line, the square brackets contain the stuff for the sub-program - name followed by two parameters. The parameters are available in the sub-program in sys.argv[1] and sys.argv[2]. After that come the subprocess parameters - the f says to output to the text file mentioned above.
Is there any particular reason it has to be another program entirely? Why not just spawn another process which runs one of the functions defined within your script?
I suggest that you read up on multiprocessing. Python has module just for that: https://docs.python.org/dev/library/multiprocessing.html
Here you can find info on spawning new processes, communicating between them and syncronizing them.
Be warned though that if you want to really speed up your file processing you'll want to use processes instead of threads (due to some limitations in python, threads will only slow you down which is confusing).
Also check out this page: https://pymotw.com/2/multiprocessing/basics.html
It has some code samples that will help you out a lot.
Don't forget this guard in your script:
if __name__ == '__main__':
It is very important ;)

Python Communicate/Wait with a shell subprocess

Tried searching for the solution to this problem but due to there being a command Shell=True (don't think that is related to what I'm doing but I could well be wrong) it get's lots of hits that aren't seemingly useful.
Ok so the problem I is basically:
I'm running a Python script on a cluster. On the cluster the normal thing to do is to launch all codes/etc. via a shell script which is used to request the appropriate resources (maximum run time, nodes, processors per node, etc.) needed to run the job. This shell then calls the script and away it goes.
This isn't an issue, but the problem I have is my 'parent' code needs to wait for it's 'children' to run fully (and generate their data to be used by the parent) before continuing. This isn't a problem when I don't have the shell between it and the script but as it stands .communicate() and .wait() are 'satisfied' when the shell script is done. I need to to wait until the script(s) called by the shell are done.
I could botch it by putting a while loop in that needs certain files to exist before breaking, but this seems messy to me.
So my question is, is there a way I can get .communicate (idealy) or .wait or via some other (clean/nice) method to pause the parent code until the shell, and everything called by the shell, finishes running? Ideally (nearly essential tbh) is that this be done in the parent code alone.
I might not be explaining this very well so happy to provide more details if needed, and if somewhere else answers this I'm sorry, just point me thata way!

subprocess.call does not wait for the process to complete

Per Python documentation, subprocess.call should be blocking and wait for the subprocess to complete. In this code I am trying to convert few xls files to a new format by calling Libreoffice on command line. I assumed that the call to subprocess call is blocking but seems like I need to add an artificial delay after each call otherwise I miss few files in the out directory.
what am I doing wrong? and why do I need the delay?
from subprocess import call
for i in range(0,len(sorted_files)):
args = ['libreoffice', '-headless', '-convert-to',
'xls', "%s/%s.xls" %(sorted_files[i]['filename'],sorted_files[i]['filename']), '-outdir', 'out']
call(args)
var = raw_input("Enter something: ") # if comment this line I dont get all the files in out directory
EDIT It might be hard to find the answer through the comments below. I used unoconv for document conversion which is blocking and easy to work with from an script.
It's possible likely that libreoffice is implemented as some sort of daemon/intermediary process. The "daemon" will (effectively1) parse the commandline and then farm the work off to some other process, possibly detaching them so that it can exit immediately. (based on the -invisible option in the documentation I suspect strongly that this is indeed the case you have).
If this is the case, then your subprocess.call does do what it is advertised to do -- It waits for the daemon to complete before moving on. However, it doesn't do what you want which is to wait for all of the work to be completed. The only option you have in that scenario is to look to see if the daemon has a -wait option or similar.
1It is likely that we don't have an actual daemon here, only something which behaves similarly. See comments by abernert
The problem is that the soffice command-line tool (which libreoffice is either just a link to, or a further wrapper around) is just a "controller" for the real program soffice.bin. It finds a running copy of soffice.bin and/or creates on, tells it to do some work, and then quits.
So, call is doing exactly the right thing: it waits for libreoffice to quit.
But you don't want to wait for libreoffice to quit, you want to wait for soffice.bin to finish doing the work that libreoffice asked it to do.
It looks like what you're trying to do isn't possible to do directly. But it's possible to do indirectly.
The docs say that headless mode:
… allows using the application without user interface.
This special mode can be used when the application is controlled by external clients via the API.
In other words, the app doesn't quit after running some UNO strings/doing some conversions/whatever else you specify on the command line, it sits around waiting for more UNO commands from outside, while the launcher just runs as soon as it sends the appropriate commands to the app.
You probably have to use that above-mentioned external control API (UNO) directly.
See Scripting LibreOffice for the basics (although there's more info there about internal scripting than external), and the API documentation for details and examples.
But there may be an even simpler answer: unoconv is a simple command-line tool written using the UNO API that does exactly what you want. It starts up LibreOffice if necessary, sends it some commands, waits for the results, and then quits. So if you just use unoconv instead of libreoffice, call is all you need.
Also notice that unoconv is written in Python, and is designed to be used as a module. If you just import it, you can write your own (simpler, and use-case-specific) code to replace the "Main entrance" code, and not use subprocess at all. (Or, of course, you can tear apart the module and use the relevant code yourself, or just use it as a very nice piece of sample code for using UNO from Python.)
Also, the unoconv page linked above lists a variety of other similar tools, some that work via UNO and some that don't, so if it doesn't work for you, try the others.
If nothing else works, you could consider, e.g., creating a sentinel file and using a filesystem watch, so at least you'll be able to detect exactly when it's finished its work, instead of having to guess at a timeout. But that's a real last-ditch workaround that you shouldn't even consider until eliminating all of the other options.
If libreoffice is being using an intermediary (daemon) as mentioned by #mgilson, then one solution is to find out what program it's invoking, and then directly invoke it yourself.

Python script to script enter and exit

I am trying to create a python script that on a click of a button opens another python script and closes itself and some return function in the second script to return to the original script hope you can help.
Thanks.
Since your question is very vague, here's a somewhat vague answer:
First, think about whether you really need to do this at all. Why can't the first script just import the second script as a module and call some function on it?
But let's assume you've got a good answer for that, and you really do need to "close" and run the other script, where by "close" you mean "make your GUI invisible".
def handle_button_click(button):
button.parent_window().hide()
subprocess.call([sys.executable, '/path/to/other/script.py'])
button.parent_window().show()
This will hide the window, run the other script, then show the window again when the other script is finished. It's generally a very bad idea to do something slow and blocking in the middle of an event handler, but in this case, because we're hiding our whole UI anyway, you can get away with it.
A smarter solution would involve some kind of signal that either the second script sends, or that a watcher thread sends. For example:
def run_other_script_with_gui_hidden(window):
gui_library.do_on_main_thread(window.hide)
subprocess.call([sys.executable, '/path/to/other/script.py'])
gui_library.do_on_main_thread(window.show)
def handle_button_click(button):
t = threading.Thread(target=run_other_script_with_gui_hidden)
t.daemon = True
t.start()
Obviously you have to replace things like button.window(), window.hide(), gui_library.do_on_main_thread, etc. with the appropriate code for your chosen window library.
If you'd prefer to have the first script actually exit, and the second script re-launch it, you can do that, but it's tricky. You don't want to launch the second script as a child process, but as a sibling. Ideally, you want it to just take over your own process. Except that you need to shut down your GUI before doing that, unless your OS will do that automatically (basically, Windows will, Unix will not). Look at the os.exec family, but you'll really need to understand how these things work in Unix to do it right. Unless you want the two scripts to be tightly coupled together, you probably want to pass the second script, on the command line, the exact right arguments to re-launch the first one (basically, pass it your whole sys.argv after any other parameters).
As an alternative, you can use execfile to run the second script within your existing interpreter instance, and then have the second script execfile you back. This has similar, but not identical, issues to the exec solution.

Categories

Resources