Python: Spawning another program - python

I have a Python program from which I spawn a sub-program to process some files without holding up the main program. I'm currently using bash for the sub-program, started with a command and two parameters like this:
result = os.system('sub-program.sh file.txt file.txt &')
That works fine, but I (eventually!) realised that I could use Python for the sub-program, which would be far preferable, so I have converted it. The simplest way of spawning it might be:
result = os.system('python3 sub-program.py file.txt file.txt &')
Some research has shown several more sophisticated alternatives, but I have the impression that the latest and most approved method is this one:
subprocess.Popen(["python3", "-u", "sub-program.py"])
Am I correct in thinking that that is the most appropriate way of doing it? Would anyone recommend a different method and why? Simple would be good as I'm a bit of a Python novice.
If this is the recommended method, I can probably work out what the "-u" does and how to add the parameters for myself.
Optional extras:
Send a message back from the sub-program to the main program.
Make the sub-program quit when the main program does.

Yes, using subprocess is the recommended way to go according to the documentation:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
However, subprocess.Popen may not be what you're looking for. As opposed to os.system you will create a Popen object that corresponds to the subprocess and you'll have to wait for it in order to wait for it's completion, fx:
proc = subprocess.Popen(["python3", "-u", "sub-program.py"])
do_something()
res = proc.wait()
If you want to just run a program and wait for completion you should probably use subprocess.run (or maybe subprocess.call, subprocess.check_call or subprocess.check_output) instead.

Thanks skyking!
With
import subprocess
at the beginning of the main program, this does what I want:
with open('output.txt', 'w') as f:
subprocess.Popen([spawned.py, parameter1, parameter2], stdout = f)
The first line opens a file for the output from the sub-program started in the second line. In the second line, the square brackets contain the stuff for the sub-program - name followed by two parameters. The parameters are available in the sub-program in sys.argv[1] and sys.argv[2]. After that come the subprocess parameters - the f says to output to the text file mentioned above.

Is there any particular reason it has to be another program entirely? Why not just spawn another process which runs one of the functions defined within your script?
I suggest that you read up on multiprocessing. Python has module just for that: https://docs.python.org/dev/library/multiprocessing.html
Here you can find info on spawning new processes, communicating between them and syncronizing them.
Be warned though that if you want to really speed up your file processing you'll want to use processes instead of threads (due to some limitations in python, threads will only slow you down which is confusing).
Also check out this page: https://pymotw.com/2/multiprocessing/basics.html
It has some code samples that will help you out a lot.
Don't forget this guard in your script:
if __name__ == '__main__':
It is very important ;)

Related

Force a 3rd-party program to flush its output when called through subprocess

I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.
The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.

Subprocess or import to invoke a script in Python

I have a script task.py that I am trying to invoke. It seems there are two ways to do that. One is to use the subprocess API while the other is to use Python's import mechanism.
task.py
def call_task():
print("task in progress...")
return "something"
print("calling task..")
out = call_task()
print("output of the executed task::", out)
Now, we have two approaches to invoke the above task.py python script.
Approach 1
import task as task
print("invoke call-task")
out = task.call_task()
print("output::", out)
Approach 2
import subprocess, shlex, PIPE
proc = subprocess.Popen(shlex.split("python task.py"), stdout = PIPE)
out = proc.communicate()
print("output::", out)
Although both approaches work, which approach is more pythonic?
Running a separate Python process from Python is frequently an antipattern. There are situations where you specifically want two Python instances (for example, if the module you want to use requires its own signal handling etc) but in the absence of factors which force the other choice, import is generally vastly preferrable in terms of usability (you get to call the functions inside the package in an order different from its main flow, and have more fine-grained control over the internals) and performance (starting a separate process is almost always a bad idea if you can avoid it).
While "The subprocess module allows you to spawn new processes" which executes the code within your task.py,
importing will result in the original process executing your code.
Other than that, it should be identical.
You can read more about it in the Python Subprocess Docs
As i've seen, its rather unusual to execute python code using an extra subprocess.
It may benefit performance wise but the more pythonic way would be importing i guess.

Advantages of subprocess over os.system

I have recently came across a few posts on stack overflow saying that subprocess is much better than os.system, however I am having difficulty finding the exact advantages.
Some examples of things I have run into:
https://docs.python.org/3/library/os.html#os.system
"The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function."
No idea in what ways it is more powerful though, I know it is easier in many ways to use subprocess but is it actually more powerful in some way?
Another example is:
https://stackoverflow.com/a/89243/3339122
The advantage of subprocess vs system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).
This post which has 2600+ votes. Again could not find any elaboration on what was meant by better error handling or real status code.
Top comment on that post is:
Can't see why you'd use os.system even for quick/dirty/one-time. subprocess seems so much better.
Again, I understand it makes some things slightly easier, but I hardly can understand why for example:
subprocess.call("netsh interface set interface \"Wi-Fi\" enable", shell=True)
is any better than
os.system("netsh interface set interface \"Wi-Fi\" enabled")
Can anyone explain some reasons it is so much better?
First of all, you are cutting out the middleman; subprocess.call by default avoids spawning a shell that examines your command, and directly spawns the requested process. This is important because, besides the efficiency side of the matter, you don't have much control over the default shell behavior, and it actually typically works against you regarding escaping.
In particular, do not do this:
subprocess.call('netsh interface set interface "Wi-Fi" enable')
since
If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.
Instead, you'll do:
subprocess.call(["netsh", "interface", "set", "interface", "Wi-Fi", "enable"])
Notice that here all the escaping nightmares are gone. subprocess handles escaping (if the OS wants arguments as a single string - such as Windows) or passes the separated arguments straight to the relevant syscall (execvp on UNIX).
Compare this with having to handle the escaping yourself, especially in a cross-platform way (cmd doesn't escape in the same way as POSIX sh), especially with the shell in the middle messing with your stuff (trust me, you don't want to know what unholy mess is to provide a 100% safe escaping for your command when calling cmd /k).
Also, when using subprocess without the shell in the middle you are sure you are getting correct return codes. If there's a failure launching the process you get a Python exception, if you get a return code it's actually the return code of the launched program. With os.system you have no way to know if the return code you get comes from the launched command (which is generally the default behavior if the shell manages to launch it) or it is some error from the shell (if it didn't manage to launch it).
Besides arguments splitting/escaping and return code, you have way better control over the launched process. Even with subprocess.call (which is the most basic utility function over subprocess functionalities) you can redirect stdin, stdout and stderr, possibly communicating with the launched process. check_call is similar and it avoids the risk of ignoring a failure exit code. check_output covers the common use case of check_call + capturing all the program output into a string variable.
Once you get past call & friends (which is blocking just as os.system), there are way more powerful functionalities - in particular, the Popen object allows you to work with the launched process asynchronously. You can start it, possibly talk with it through the redirected streams, check if it is running from time to time while doing other stuff, waiting for it to complete, sending signals to it and killing it - all stuff that is way besides the mere synchronous "start process with default stdin/stdout/stderr through the shell and wait it to finish" that os.system provides.
So, to sum it up, with subprocess:
even at the most basic level (call & friends), you:
avoid escaping problems by passing a Python list of arguments;
avoid the shell messing with your command line;
either you have an exception or the true exit code of the process you launched; no confusion about program/shell exit code;
have the possibility to capture stdout and in general redirect the standard streams;
when you use Popen:
you aren't restricted to a synchronous interface, but you can actually do other stuff while the subprocess run;
you can control the subprocess (check if it is running, communicate with it, kill it).
Given that subprocess does way more than os.system can do - and in a safer, more flexible (if you need it) way - there's just no reason to use system instead.
There are many reasons, but the main reason is mentioned directly in the docstring:
>>> os.system.__doc__
'Execute the command in a subshell.'
For almost all cases where you need a subprocess, it is undesirable to spawn a subshell. This is unnecessary and wasteful, it adds an extra layer of complexity, and introduces several new vulnerabilities and failure modes. Using subprocess module cuts out the middleman.

How to return a value from Python script as a Bash variable?

This is a summary of my code:
# import whatever
def createFolder():
#someCode
var1=Gdrive.createFolder(name)
return var1
def main():
#someCode
var2=createFolder()
return var2
if __name__ == "__main__":
print main()
One way in which I managed to return a value to a bash variable was printing what was returned from main(). Another way is just printing the variable in any place of the script.
Is there any way to return it in a more pythonic way?
The script is called this way:
folder=$(python create_folder.py "string_as_arg")
A more pythonic way would be to avoid bash and write the whole lot in python.
You can't expect bash to have a pythonic way of getting values from another process - it's way is the bash way.
bash and python are running in different processes, and inter-process communication (IPC) must go via kernel. There are many IPC mechanisms, but bash does not support them all (shared memory, for example). The lowest common denominator here is bash, so you must use what bash supports, not what python has (python has everything).
Without shared memory, it is not a simple thing to write to variables of another process - let alone another language. Debuggers do it, but they are written specifically for the host language.
The mechanism you use from bash is to capture the stdout of the child process, so python must print. Under the covers this uses an anonymous pipe. You could use a named pipe (also known as a fifo) instead, which python would open as a normal file and write to it. But it wouldn't buy you much.
If you were working in bash then you could simply do:
export var="value"
However, there is no such equivalent in Python. If you try to use os.environ those values will persist for the rest of the process and will not modify anything after the program finishes. Your best bet is to do exactly what you are already doing.
You can try to set an environment variable from within the python code and read it outside, at the bash script. This way looks very elegant to me, but it is definitely not the "perfect solution" or the only solution. If you like this approach, this thread might be useful: How to set environment variables in Python
There are other ways, very similar to what you have done. Check also this thread: store return value of a Python script in a bash script
Just use sys.exit(), i.e.:
import sys
[...]
if __name__ == "__main__":
sys.exit(main())

subprocess.call does not wait for the process to complete

Per Python documentation, subprocess.call should be blocking and wait for the subprocess to complete. In this code I am trying to convert few xls files to a new format by calling Libreoffice on command line. I assumed that the call to subprocess call is blocking but seems like I need to add an artificial delay after each call otherwise I miss few files in the out directory.
what am I doing wrong? and why do I need the delay?
from subprocess import call
for i in range(0,len(sorted_files)):
args = ['libreoffice', '-headless', '-convert-to',
'xls', "%s/%s.xls" %(sorted_files[i]['filename'],sorted_files[i]['filename']), '-outdir', 'out']
call(args)
var = raw_input("Enter something: ") # if comment this line I dont get all the files in out directory
EDIT It might be hard to find the answer through the comments below. I used unoconv for document conversion which is blocking and easy to work with from an script.
It's possible likely that libreoffice is implemented as some sort of daemon/intermediary process. The "daemon" will (effectively1) parse the commandline and then farm the work off to some other process, possibly detaching them so that it can exit immediately. (based on the -invisible option in the documentation I suspect strongly that this is indeed the case you have).
If this is the case, then your subprocess.call does do what it is advertised to do -- It waits for the daemon to complete before moving on. However, it doesn't do what you want which is to wait for all of the work to be completed. The only option you have in that scenario is to look to see if the daemon has a -wait option or similar.
1It is likely that we don't have an actual daemon here, only something which behaves similarly. See comments by abernert
The problem is that the soffice command-line tool (which libreoffice is either just a link to, or a further wrapper around) is just a "controller" for the real program soffice.bin. It finds a running copy of soffice.bin and/or creates on, tells it to do some work, and then quits.
So, call is doing exactly the right thing: it waits for libreoffice to quit.
But you don't want to wait for libreoffice to quit, you want to wait for soffice.bin to finish doing the work that libreoffice asked it to do.
It looks like what you're trying to do isn't possible to do directly. But it's possible to do indirectly.
The docs say that headless mode:
… allows using the application without user interface.
This special mode can be used when the application is controlled by external clients via the API.
In other words, the app doesn't quit after running some UNO strings/doing some conversions/whatever else you specify on the command line, it sits around waiting for more UNO commands from outside, while the launcher just runs as soon as it sends the appropriate commands to the app.
You probably have to use that above-mentioned external control API (UNO) directly.
See Scripting LibreOffice for the basics (although there's more info there about internal scripting than external), and the API documentation for details and examples.
But there may be an even simpler answer: unoconv is a simple command-line tool written using the UNO API that does exactly what you want. It starts up LibreOffice if necessary, sends it some commands, waits for the results, and then quits. So if you just use unoconv instead of libreoffice, call is all you need.
Also notice that unoconv is written in Python, and is designed to be used as a module. If you just import it, you can write your own (simpler, and use-case-specific) code to replace the "Main entrance" code, and not use subprocess at all. (Or, of course, you can tear apart the module and use the relevant code yourself, or just use it as a very nice piece of sample code for using UNO from Python.)
Also, the unoconv page linked above lists a variety of other similar tools, some that work via UNO and some that don't, so if it doesn't work for you, try the others.
If nothing else works, you could consider, e.g., creating a sentinel file and using a filesystem watch, so at least you'll be able to detect exactly when it's finished its work, instead of having to guess at a timeout. But that's a real last-ditch workaround that you shouldn't even consider until eliminating all of the other options.
If libreoffice is being using an intermediary (daemon) as mentioned by #mgilson, then one solution is to find out what program it's invoking, and then directly invoke it yourself.

Categories

Resources