How to get PID via subprocess.Popen with custom environment variable?

How to get PID via subprocess.Popen with custom environment variable? - python

Using Python, how can I run a subprocess with a modified environment variable and get its PID? I assume subprocess.Popen() is along the right track...
In shell (bash), I would do this:
MY_ENV_VAR=value ./program_name arg1 arg2 etc &
This runs program_name in the background, passing in "arg1" and "arg2" and "etc", with a modified environment variable, "MY_ENV_VAR" with a value of "value". The program program_name requires the environment variable MY_ENV_VAR to be set to the proper value.
How can do the equivalent thing in Python? I absolutely need the PID of the process. (My intent is to keep the python script running and performing checks on some of the things program_name is doing in the meantime, and I need the process ID to make sure it's still running.)
I've tried:
proc = subprocess.Popen(['MY_ENV_VAR=value', './program_name', 'arg1', 'arg2', 'etc'])
But of course, it expects the first item to be the program, not an environment variable.
Also tried:
environ = dict(os.environ)
environ['MY_ENV_VAR'] = 'value'
proc = subprocess.Popen(['./program_name', 'arg1', 'arg2', 'etc', env=environ])
Close, I suppose, but no cigar. Similarly, this:
environ = dict(os.environ)
environ['MY_ENV_VAR'] = 'value'
proc = subprocess.Popen(['echo', '$MY_ENV_VAR'], env=environ)
This echoes "$MY_ENV_VAR" literally, I suppose because there's no shell to interpret it. Okay, so I try the above but with this line instead:
proc = subprocess.Popen(['echo', '$MY_ENV_VAR'], env=environ, shell=True)
And that's fine and dandy, except that the value that's echoed is blank (doesn't apparently exist). And even if it did work, I'd get the PID of the shell, not the actual process I'm trying to launch.
I need to launch a process with a custom environment variable and get its PID (not the PID of the shell). Ideas?

Your last version is very close, but not quite there.
You don't want $MY_ENV_VAR to be an argument to echo. The echo program will have MY_ENV_VAR in its environment, but echo doesn't do any env variable expansion. You need it to be expanded by the shell, before it even gets to echo.
This may actually have nothing to do with your real-life test case. You already are getting the environment variable to the child process in all of your tests, it's just that echo doesn't do anything with that environment variable. If your real program just needs the environment variable to be set, you're done:
proc = subprocess.Popen(['./program_name', 'arg1', 'arg2', 'etc'], env=environ)
But if your program needs it to be substituted, like echo, then you have to substitute it into the arguments before they get passed to your program.
The easiest way to do that is to just give the shell a command line instead of a list of arguments:
proc = subprocess.Popen('echo "$MY_ENV_VAR"', env=environ, shell=True)
People will tell you that you should never use a command string in subprocess—but the reason for that is that you always want to prevent the shell from expanding variables, etc., in a way that could be insecure/etc. On the rare occasions when you want the shell to do its shelly things, you want a command string.
Of course if you use a shell, on most platforms, you're going to end up getting the PID of the shell rather than the PID of the actual program. Short of doing some platform-specific digging to enumerate the shell's children (or wrapping the whole thing in some simple sh code that gives you the child's PID indirectly), there's no way around that. The shell is what you're running.
Another alternative is to expand the variables in Python instead of making the shell do it. Then you don't even need a shell:
proc = subprocess.Popen(['echo', os.path.expandvars('$MY_ENV_VAR')])
… or, even more simply:
proc = subprocess.Popen(['echo', os.environ['MY_ENV_VAR']])

here's a program that spits out the current environment.
#!/usr/bin/env python
##program_name
import os
for k,v in os.environ.iteritems():
print k, '=', v
Here's a program that calls the other program, but first changes the environment
#!/usr/bin/env python
import subprocess, os
newenv = os.environ.copy()
newenv['MY_ENV_VAR'] = 'value'
args = ['./program_name', 'arg1', 'arg2', 'etc']
proc = subprocess.Popen(args, env=newenv)
pid = proc.pid
proc.wait()
print 'PID =', pid

Related

Python: subprocess.Popen() returns None

I need to execute a CLI binary with args, keep the process alive and run multiple commands throughout the python script. So I am using Python and subprocess.Popen() in the following way:
from subprocess import Popen, PIPE
cmd = ["/full/path/to/binary","--arg1"]
process = Popen(cmd,stdin=PIPE, stdout=None)
process.stdin.write(f"command-for-the-CLI-tool".encode())
process.stdin.flush()
However, no matter how I call Popen(), the returned process object is None.
If I run process = Popen(cmd), without specifying stdin and stdout, I can see the process running correctly in the output console, meaning that the binary path and args are correct, but the process object is still None, meaning that I cannot issue other commands afterwards.
EDIT: The point of this is that I want to execute the following:
command = (
f"cat << EOF | {cmd}\n"
f"use {dbname};\n"
"set optimizer_switch='hypergraph_optimizer=on';\n"
f"SET forced_plan='{forced_plan}';\n"
f"{query_text}\n"
"EOF"
)
runtimes = []
for _ in trange(runs):
start = time.time()
subprocess.run(command, shell=True, stdout=sys.stdout)
runtimes.append(time.time() - start)
But this clearly measures the time of all the commands, whereas I am only interested in measuring the "query_text" command.
This is why I am looking for a solution where I can send the commands separately and time only the one I am interested in.
If I use multiple subprocess.run(), then the process instances will be different. I want the instance to be the same because the query depends on the previous commands.

With subprocess.run you can pass the entire input as ... input.
command = f"""\
use {dbname};
set optimizer_switch='hypergraph_optimizer=on';
SET forced_plan='{forced_plan}';
{query_text}
"""
runtimes = []
for _ in trange(runs):
start = time.time()
subprocess.run([cmd], text=true, input=command, stdout=sys.stdout)
runtimes.append(time.time() - start)
I took out shell=True; perhaps see also Actual meaning of shell=True in subprocess as well as perhaps Running Bash commands in Python which elaborates on several of the changes here.

Try using subprocess.run() instead of subprocess.Popen()
If you still use subprocess.Popen(), then you can use the .poll() method
But subprocess.Popen() will always return None if the execution of the command has not yet completed, or an exit code if the command has finished its execution.

How does subprocess.call() work with shell=False?

I am using Python's subprocess module to call some Linux command line functions. The documentation explains the shell=True argument as
If shell is True, the specified command will be executed through the shell
There are two examples, which seem the same to me from a descriptive viewpoint (i.e. both of them call some command-line command), but one of them uses shell=True and the other does not
>>> subprocess.call(["ls", "-l"])
0
>>> subprocess.call("exit 1", shell=True)
1
My question is:
What does running the command with shell=False do, in contrast to shell=True?
I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell. In other words, how can it possibly not execute the argument through the shell?
It would also be helpful to get some examples of:
Things that can be done with shell=True that can't be done with
shell=False and why they can't be done.
Vice versa (although it seems that there are no such examples)
Things for which it does not matter whether shell=True or False and why it doesn't matter

UNIX programs start each other with the following three calls, or derivatives/equivalents thereto:
fork() - Create a new copy of yourself.
exec() - Replace yourself with a different program (do this if you're the copy!).
wait() - Wait for another process to finish (optional, if not running in background).
Thus, with shell=False, you do just that (as Python-syntax pseudocode below -- exclude the wait() if not a blocking invocation such as subprocess.call()):
pid = fork()
if pid == 0: # we're the child process, not the parent
execlp("ls", "ls", "-l", NUL);
else:
retval = wait(pid) # we're the parent; wait for the child to exit & get its exit status
whereas with shell=True, you do this:
pid = fork()
if pid == 0:
execlp("sh", "sh", "-c", "ls -l", NUL);
else:
retval = wait(pid)
Note that with shell=False, the command we executed was ls, whereas with shell=True, the command we executed was sh.
That is to say:
subprocess.Popen(foo, shell=True)
is exactly the same as:
subprocess.Popen(
["sh", "-c"] + ([foo] if isinstance(foo, basestring) else foo),
shell=False)
That is to say, you execute a copy of /bin/sh, and direct that copy of /bin/sh to parse the string into an argument list and execute ls -l itself.
So, why would you use shell=True?
You're invoking a shell builtin.
For instance, the exit command is actually part of the shell itself, rather than an external command. That said, this is a fairly small set of commands, and it's rare for them to be useful in the context of a shell instance that only exists for the duration of a single subprocess.call() invocation.
You have some code with shell constructs (ie. redirections) that would be difficult to emulate without it.
If, for instance, your command is cat one two >three, the syntax >three is a redirection: It's not an argument to cat, but an instruction to the shell to set stdout=open('three', 'w') when running the command ['cat', 'one', 'two']. If you don't want to deal with redirections and pipelines yourself, you need a shell to do it.
A slightly trickier case is cat foo bar | baz. To do that without a shell, you need to start both sides of the pipeline yourself: p1 = Popen(['cat', 'foo', 'bar'], stdout=PIPE), p2=Popen(['baz'], stdin=p1.stdout).
You don't give a damn about security bugs.
...okay, that's a little bit too strong, but not by much. Using shell=True is dangerous. You can't do this: Popen('cat -- %s' % (filename,), shell=True) without a shell injection vulnerability: If your code were ever invoked with a filename containing $(rm -rf ~), you'd have a very bad day. On the other hand, ['cat', '--', filename] is safe with all possible filenames: The filename is purely data, not parsed as source code by a shell or anything else.
It is possible to write safe scripts in shell, but you need to be careful about it. Consider the following:
filenames = ['file1', 'file2'] # these can be user-provided
subprocess.Popen(['cat -- "$#" | baz', '_'] + filenames, shell=True)
That code is safe (well -- as safe as letting a user read any file they want ever is), because it's passing your filenames out-of-band from your script code -- but it's safe only because the string being passed to the shell is fixed and hardcoded, and the parameterized content is external variables (the filenames list). And even then, it's "safe" only to a point -- a bug like Shellshock that triggers on shell initialization would impact it as much as anything else.

I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell.
No, subprocess is perfectly capable of starting a program directly (via an operating system call). It does not need a shell
Things that can be done with shell=True that can't be done with shell=False
You can use shell=False for any command that simply runs some executable optionally with some specified arguments.
You must use shell=True if your command uses shell features. This includes pipelines, |, or redirections or that contains compound statements combined with ; or && or || etc.
Thus, one can use shell=False for a command like grep string file. But, a command like grep string file | xargs something will, because of the | require shell=True.
Because the shell has power features that python programmers do not always find intuitive, it is considered better practice to use shell=False unless you really truly need the shell feature. As an example, pipelines are not really truly needed because they can also be done using subprocess' PIPE feature.

Python work with parent shell environment

Is it possible to create a shell object and manipulate it without losing it's data after command execution?
from subprocess import *
sh.Popen('/bin/bash', stdout=PIPE)
sh.communicate('source /path/to/file/env.sh')
print os.getenv('ENV_VAR1')
ENV_VAR1 should be available after sourcing /path/to/file/env.sh but it's not.
This part of code is not working as expected, how can I make it work?
Here is another try which is not working as well
os.system('source env.sh; echo $ENV_VAR1') #Prints out correct value
os.system('echo $ENV_VAR1') #Prints nothing

You could echo $ENV_VAR1, and use communicate to return the result from stdout:
import subprocess
proc = subprocess.Popen('source /path/to/file/env.sh; echo $ENV_VAR1',
stdout=PIPE, shell=True)
env_var1, err = proc.communicate()
print(env_var1)
Another option might be to use Miki Tebeka's source function:
import subprocess
import os
def source(script, update=True):
"""
source a file and return the environment as a dict.
http://pythonwise.blogspot.fr/2010/04/sourcing-shell-script.html (Miki Tebeka)
"""
proc = subprocess.Popen("source %s; env -0" % script, stdout=subprocess.PIPE,
shell=True)
output, err = proc.communicate()
env = dict((line.split("=", 1) for line in output.split('\x00') if line))
if update:
os.environ.update(env)
return env
source('/path/to/env.sh')
print(os.environ['ENV_VAR1'])
If /path/to/file/env.sh contains
ENV_VAR1=FOO
export ENV_VAR1
the script above prints
FOO
Above, I made a small change to
the function so that env uses a null byte (\x00) to separate output
lines. This makes it possible to parse name/value pairs that span multiple
lines.

This isn't a Python issue. Environment variables are only visible to the process in which they are set and any child processes. In these examples you are trying to set environment variables in a child and access them from the parent, which simply doesn't work.
If you want to communicate these values back to the parent you will need to arrange for some sort of explicit communication (e.g, having the child write the values to stdout and read them from the parent).

How to call a series of bash commands in python and store output

I am trying to run the following bash script in Python and store the readlist output. The readlist that I want to be stored as a python list, is a list of all files in the current directory ending in *concat_001.fastq.
I know it may be easier to do this in python (i.e.
import os
readlist = [f for f in os.listdir(os.getcwd()) if f.endswith("concat_001.fastq")]
readlist = sorted(readlist)
However, this is problematic, as I need Python to sort the list in EXACTLY the same was as bash, and I was finding that bash and Python sort certain things in different orders (eg Python and bash deal with capitalised and uncapitalised things differently - but when I tried
readlist = np.asarray(sorted(flist, key=str.lower))
I still found that two files starting with ML_ and M_ were sorted in different order with bash and Python. Hence trying to run my exact bash script through Python, then to use the sorted list generated with bash in my subsequent Python code.
input_suffix="concat_001.fastq"
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g" `
readlist="$(echo $ender)"
I have tried
proc = subprocess.call(command1, shell=True, stdout=subprocess.PIPE)
proc = subprocess.call(command2, shell=True, stdout=subprocess.PIPE)
proc = subprocess.Popen(command3, shell=True, stdout=subprocess.PIPE)
But I just get: subprocess.Popen object at 0x7f31cfcd9190
Also - I don't understand the difference between subprocess.call and subprocess.Popen. I have tried both.
Thanks,
Ruth

So your question is a little confusing and does not exactly explain what you want. However, I'll try to give some suggestions to help you update it, or in my effort, answer it.
I will assume the following: your python script is passing to the command line 'input_suffix' and that you want your python program to receive the contents of 'readlist' when the external script finishes.
To make our lives simpler, and allow things to be more complicated, I would make the following bash script to contain your commands:
script.sh
#!/bin/bash
input_suffix=$1
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g"`
readlist="$(echo $ender)"
echo $readlist
You would execute this as script.sh "concat_001.fastq", where $1 takes in the first argument passed on the command line.
To use python to execute external scripts, as you quite rightly found, you can use subprocess (or as noted by another response, os.system - although subprocess is recommended).
The docs tell you that subprocess.call:
"Wait for command to complete, then return the returncode attribute."
and that
"For more advanced use cases when these do not meet your needs, use the underlying Popen interface."
Given you want to pipe the output from the bash script to your python script, let's use Popen as suggested by the docs. As I posted the other stackoverflow answer, it could look like the following:
import subprocess
from subprocess import Popen, PIPE
# Execute out script and pipe the output to stdout
process = subprocess.Popen(['script.sh', 'concat_001.fastq'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# Obtain the standard out, and standard error
stdout, stderr = process.communicate()
and then:
>>> print stdout
*concat_001.fastq

python: subprocess.Popen() behaviour

I am trying to use rsync with python. I have read that the preferred way to passing arguments to Popen is using an array.
The code I tried:
p = Popen(["rsync",
"\"{source}\"".format(source=latestPath),
"\"{user}#{host}:{dir}\"".format(user=user, host=host, dir=dir)],
stdout=PIPE, stderr=PIPE)
The result is rsync asking for password, even though I have set up SSH keys to do the authentication.
I think this is a problem with the environment the new process gets executed in. What I tried next is:
p = Popen(["rsync",
"\"{source}\"".format(source=latestPath),
"\"{user}#{host}:{dir}\"".format(user=user, host=host, dir=dir)],
stdout=PIPE, stderr=PIPE, shell=True)
This results in rsync printing the "correct usage", so the arguments are passed incorrectly to rsync. I am not sure if this is even supposed to work(passing an array with shell=True)
If I remove the array altogether like this:
p = Popen("rsync \"{source}\" \"{user}#{host}:{dir}\"".format(
source=latestPath, user=user, host=host, dir=dir),
stdout=PIPE, stderr=PIPE, shell=True)
The program works fine. It really doesn't matter for the sake of this script, but I'd like to know what's the difference? Why don't the other two(mainly the first one) work?
Is it just that the shell environment is required, and the second one is incorrect?
EDIT: Contents of the variables
latestPath='/home/tomcat/.jenkins/jobs/MC 4thworld/workspace/target/FourthWorld-0.1-SNAPSHOT.jar'
user='mc'
host='192.168.0.32'
dir='/mc/test/plugins/'

I'd like to know what's the difference?
When shell=True, the entire command is passed to the shell. The quotes are there so the shell can correctly pick the command apart again. In particular, passing
foo "bar baz"
to the shell causes it to parse the command as (Python syntax) ['foo', 'bar baz'] so that it can execute the foo command with the argument bar baz.
By contrast, when shell=False, Python will pass the arguments in the list to the program immediately. For example, try the following subprocess commands:
>>> import subprocess
>>> subprocess.call(["echo", '"Hello!"'])
"Hello!"
0
>>> subprocess.call('echo "Hello!"', shell=True)
Hello!
0
and note that in the first, the quotes are echoed back at you by the echo program, while in the second case, the shell has stripped them off prior to executing echo.
In your specific case, rsync gets the quotes but doesn't know how it's supposed to handle them; it's not itself a shell, after all.

Could it be to do with the cwd or env parameters? Maybe in the first syntax, it can't find the SSH keys...

Just a suggestion, it might be easier for you to use sh instead of subprocess:
import sh
sh.rsync(latestPath, user+"#"+host+":"+dir)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.