From Python execute shell command and incorporate environment changes (without subprocess)? - python

I'm exploring using iPython as shell replacement for a workflow that requires good logging and reproducibility of actions.
I have a few non-python binary programs and bash shell commands to run during my common workflow that manipulate the environment variables affecting subsequent work. i.e. when run from bash, the environment changes.
How can I incorporate these cases into the Python / iPython interactive shell and modify the environment going forward in the session?
Let's focus on the most critical case.
From bash, I woud do:
> sysmanager initialize foo
where sysmanager is a function:
> type sysmanager
sysmanager is a function
sysmanager ()
{
eval `/usr/bin/sysmanagercmd bash $*`
}
I don't control the binary sysmanagercmd and it generally makes non-trivial manipulations of the environment variables. Use of the eval built-in means these manipulations affect the shell process going forward -- that's critical to the design.
How can I call this command from Python / iPython with the same affect? Does python have something equivalent to bash's eval built-in for non-python commands?

Having not come across any built-in capability to do this, I wrote the following function which accomplishes the broad intent. Environment variable modifications and change of working directory are reflected in the python shell after the function returns. Any modification of shell aliases or functions are not retained but that could be done too with enhancement of this function.
#!/usr/bin/env python3
"""
Some functionality useful when working with IPython as a shell replacement.
"""
import subprocess
import tempfile
import os
def ShellEval(command_str):
"""
Evaluate the supplied command string in the system shell.
Operates like the shell eval command:
- Environment variable changes are pulled into the Python environment
- Changes in working directory remain in effect
"""
temp_stdout = tempfile.SpooledTemporaryFile()
temp_stderr = tempfile.SpooledTemporaryFile()
# in broader use this string insertion into the shell command should be given more security consideration
subprocess.call("""trap 'printf "\\0`pwd`\\0" 1>&2; env -0 1>&2' exit; %s"""%(command_str,), stdout=temp_stdout, stderr=temp_stderr, shell=True)
temp_stdout.seek(0)
temp_stderr.seek(0)
all_err_output = temp_stderr.read()
allByteStrings = all_err_output.split(b'\x00')
command_error_output = allByteStrings[0]
new_working_dir_str = allByteStrings[1].decode('utf-8') # some risk in assuming index 1. What if commands sent a null char to the output?
variables_to_ignore = ['SHLVL','COLUMNS', 'LINES','OPENSSL_NO_DEFAULT_ZLIB', '_']
newdict = dict([ tuple(bs.decode('utf-8').split('=',1)) for bs in allByteStrings[2:-1]])
for (varname,varvalue) in newdict.items():
if varname not in variables_to_ignore:
if varname not in os.environ:
#print("New Variable: %s=%s"%(varname,varvalue))
os.environ[varname] = varvalue
elif os.environ[varname] != varvalue:
#print("Updated Variable: %s=%s"%(varname,varvalue))
os.environ[varname] = varvalue
deletedVars = []
for oldvarname in os.environ.keys():
if oldvarname not in newdict.keys():
deletedVars.append(oldvarname)
for oldvarname in deletedVars:
#print("Deleted environment Variable: %s"%(oldvarname,))
del os.environ[oldvarname]
if os.getcwd() != os.path.normpath(new_working_dir_str):
#print("Working directory changed to %s"%(os.path.normpath(new_working_dir_str),))
os.chdir(new_working_dir_str)
# Display output of user's command_str. Standard output and error streams are not interleaved.
print(temp_stdout.read().decode('utf-8'))
print(command_error_output.decode('utf-8'))

Related

How can you create an os.environ object with a modified environment, e.g. after loading many different modules with "module load"?

I have a python script that calls an application using subprocess. I am calling this application many times, currently I am doing something along the lines of
out, err = subprocess.Popen(f"module load {' '.join(my_module_list)} && ./my_binary", stdout=sp.PIPE, stderr=sp.STDOUT, shell = True).communicate()
to run my program. Ideally I would like to first generate a modified os.environ object that already contains all the paths to the modules I am loading, and then pass it to subprocess.Popen under the env argument. However, since the printenv command doesn't output a python dictionary format, I'm not sure how to access all the modifications that modules load makes to the environment variables. Is there a good, clean way to create the required modified os.environ object?
I'd be tempted to call python in the subprocess and dump from os.environ in it
python -c 'import os; print(os.environ)'
Once you know what you're after, you can pass a dict directly to subprocess's env arg to set custom environmental vars, which could be something like
custom_env = os.environ.copy()
custom_env["foo"] = "bar"
subprocess.Popen(
...
env=custom_env,
)

Airflow SSHExecuteOperator() with env=... not setting remote environment

I am modifying the environment of the calling process and appending to it's PATH along with setting some new environment variables. However, when I print os.environ in the child process, these changes are not reflected. Any idea what may be happening?
My call to the script on the instance:
ssh_hook = SSHHook(conn_id=ssh_conn_id)
temp_env = os.environ.copy()
temp_env["PATH"] = "/somepath:"+temp_env["PATH"]
run = SSHExecuteOperator(
bash_command="python main.py",
env=temp_env,
ssh_hook=ssh_hook,
task_id="run",
dag=dag)
Explanation: Implementation Analysis
If you look at the source to Airflow's SSHHook class, you'll see that it doesn't incorporate the env argument into the command being remotely run at all. The SSHExecuteOperator implementation passes env= through to the Popen() call on the hook, but that only passes it through to the local subprocess.Popen() implementation, not to the remote operation.
Thus, in short: Airflow does not support passing environment variables over SSH. If it were to have such support, it would need to either incorporate them into the command being remotely executed, or to add the SendEnv option to the ssh command being locally executed for each command to be sent (which even then would only work if the remote sshd were configured with AcceptEnv whitelisting the specific environment variable names to be received).
Workaround: Passing Environment Variables On The Command Line
from pipes import quote # in Python 3, make this "from shlex import quote"
def with_prefix_from_env(env_dict, command=None):
result = 'set -a; '
for (k,v) in env_dict.items():
result += '%s=%s; ' % (quote(k), quote(v))
if command:
result += command
return result
SSHExecuteOperator(bash_command=prefix_from_env(temp_env, "python main.py"),
ssh_hook=ssh_hook, task_id="run", dag=dag)
Workaround: Remote Sourcing
If your environment variables are sensitive and you don't want them to be logged with the command, you can transfer them out-of-band and source the remote file containing them.
from pipes import quote
def with_env_from_remote_file(filename, command):
return "set -a; . %s; %s" % (quote(filename), command)
SSHExecuteOperator(bash_command=with_env_from_remote_file(envfile, "python main.py"),
ssh_hook=ssh_hook, task_id="run", dag=dag)
Note that set -a directs the shell to export all defined variables, so the file being executed need only define variables with key=val declarations; they'll be automatically exported. If generating this file from your Python script, be sure to quote both keys and values with pipes.quote() to ensure that it only performs assignments and does not run other commands. The . keyword is a POSIX-compliant equivalent to the bash source command.

Import bash variables from a python script

I have seen plenty examples of running a python script from inside a bash script and either passing in variables as arguments or using export to give the child shell access, I am trying to do the opposite here though.
I am running a python script and have a separate file, lets call it myGlobalVariables.bash
myGlobalVariables.bash:
foo_1="var1"
foo_2="var2"
foo_3="var3"
My python script needs to use these variables.
For a very simple example:
myPythonScript.py:
print "foo_1: {}".format(foo_1)
Is there a way I can import them directly? Also, I do not want to alter the bash script if possible since it is a common file referenced many times elsewhere.
If your .bash file is formatted as you indicated - you might be able to just import it direct as a Python module via the imp module.
import imp
bash_module = imp.load_source("bash_module, "/path/to/myGlobalVariables.bash")
print bash_module.foo_1
You can also use os.environ:
Bash:
#!/bin/bash
# works without export as well
export testtest=one
Python:
#!/usr/bin/python
import os
os.environ['testtest'] # 'one'
I am very new to python, so I would welcome suggestions for more idiomatic ways to do this, but the following code uses bash itself to tell us which values get set by first calling bash with an empty environment (env -i bash) to tell us what variables are set as a baseline, then I call it again and tell bash to source your "variables" file, and then tell us what variables are now set. After removing some false-positives and an apparently-blank line, I loop through the "additional" output, looking for variables that were not in the baseline. Newly-seen variables get split (carefully) and put into the bash dictionary. I've left here (but commented-out) my previous idea for using exec to set the variables natively in python, but I ran into quoting/escaping issues, so I switched gears to using a dict.
If the exact call (path, etc) to your "variables" file is different than mine, then you'll need to change all of the instances of that value -- in the subprocess.check_output() call, in the list.remove() calls.
Here's the sample variable file I was using, just to demonstrate some of the things that could happen:
foo_1="var1"
foo_2="var2"
foo_3="var3"
if [[ -z $foo_3 ]]; then
foo_4="test"
else
foo_4="testing"
fi
foo_5="O'Neil"
foo_6='I love" quotes'
foo_7="embedded
newline"
... and here's the python script:
#!/usr/bin/env python
import subprocess
output = subprocess.check_output(['env', '-i', 'bash', '-c', 'set'])
baseline = output.split("\n")
output = subprocess.check_output(['env', '-i', 'bash', '-c', '. myGlobalVariables.bash; set'])
additional = output.split("\n")
# these get set when ". myGlobal..." runs and so are false positives
additional.remove("BASH_EXECUTION_STRING='. myGlobalVariables.bash; set'")
additional.remove('PIPESTATUS=([0]="0")')
additional.remove('_=myGlobalVariables.bash')
# I get an empty item at the end (blank line from subprocess?)
additional.remove('')
bash = {}
for assign in additional:
if not assign in baseline:
name, value = assign.split("=", 1)
bash[name]=value
#exec(name + '="' + value + '"')
print "New values:"
for key in bash:
print "Key: ", key, " = ", bash[key]
Another way to do it:
Inspired by Marat's answer, I came up with this two-stage hack. Start with a python program, let's call it "stage 1", which uses subprocess to call bash to source the variable file, as my above answer does, but it then tells bash to export all of the variables, and then exec the rest of your python program, which is in "stage 2".
Stage 1 python program:
#!/usr/bin/env python
import subprocess
status = subprocess.call(
['bash', '-c',
'. myGlobalVariables.bash; export $(compgen -v); exec ./stage2.py'
]);
Stage 2 python program:
#!/usr/bin/env python
# anything you want! for example,
import os
for key in os.environ:
print key, " = ", os.environ[key]
As stated in #theorifice answer, the trick here may be that such formatted file may be interpreted by both as bash and as python code. But his answer is outdated. imp module is deprecated in favour of importlib.
As your file has extension other than ".py", you can use the following approach:
from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader
spec = spec_from_loader("foobar", SourceFileLoader("foobar", "myGlobalVariables.bash"))
foobar = module_from_spec(spec)
spec.loader.exec_module(foobar)
I do not completely understand how this code works (where there are these foobar parameters), however, it worked for me. Found it here.

Load environment variables from a shell script

I have a file with some environment variables that I want to use in a python script
The following works form the command line
$ source myFile.sh
$ python ./myScript.py
and from inside the python script I can access the variables like
import os
os.getenv('myvariable')
How can I source the shell script, then access the variables, from with the python script?
If you are saying backward environment propagation, sorry, you can't. It's a security issue. However, directly source environment from python is definitely valid. But it's more or less a manual process.
import subprocess as sp
SOURCE = 'your_file_path'
proc = sp.Popen(['bash', '-c', 'source {} && env'.format(SOURCE)], stdout=sp.PIPE)
source_env = {tup[0].strip(): tup[1].strip() for tup in map(lambda s: s.strip().split('=', 1), proc.stdout)}
Then you have everything you need in source_env.
If you need to write it back to your local environment (which is not recommended, since source_env keeps you clean):
import os
for k, v in source_env.items():
os.environ[k] = v
Another tiny attention needs to be paid here, is since I called bash here, you should expect the rules are applied here too. So if you want your variable to be seen, you will need to export them.
export VAR1='see me'
VAR2='but not me'
You can not load environmental variables in general from a bash or shell script, it is a different language. You will have to use bash to evaluate the file and then somehow print out the variables and then read them. see Forcing bash to expand variables in a string loaded from a file

Are there any modules to read shell scripts?

I'm making a python script right now, and I need to use some environment variables which are set in a bash shell script.
The bash script is something like:
#! /bin/sh
#sets some names:
export DISTRO="unified"
#export DISTRO="other"
#number of parallel builds
export BB_NUM_THREADS=2
#set build dir
export BUILDDIR=$PWD
Normally, I would just source this script in bash, then go do my builds. I'm trying to wrap python around the whole process to do some management of the output so I want to remove the manual source ./this_script.sh step.
What I want to do is read this script from python and then use os.environ to set up the variables within it. (I know this will not affect the parent, but only the current running Python instance and that's fine)
So to make my work easier, I'm trying to find out are there any modules which can "parse" the bash script and make use of the environment variables found within? Currently I'm doing this by hand and it's a bit of a pain.
If no such module exists to do exactly what I want, is there a more pythonic (read: easier/shorter) way of manually parsing a file in general, right now I'm doing this:
def parse_bash_script(fn):
with open(fn) as f:
for line in f:
if not line[:1] == '#': #ignore comments
if "export" in line:
line = line.replace(" ","").strip()
var = line[6:line.find("=")]
val = line[line.find("=")+1:len(line)]
if "\"" in val:
val = val[1:-1]
os.environ[var]=val
There is no module to do exactly what you want, but shlex will do a lot of what you want. In particular, it will get the quoting, etc. right without you having to worry about it (which is the hardest part of this), as well as skipping comments, etc. The only thing it won't do is handle the export keywords.
The easy way around that is to preprocess:
with open(fn) as f:
processed = f.read().replace('export ', '')
for line in shlex.split(processed):
var, _, value = line.partition('=')
os.environ[var] = val
It's a bit hackier, but you can also do it a bit less verbosely by post-processing. In particular, shlex will treat export foo="bar spam eggs" as two values: export and foo="bar spam eggs", and you can just skip the ones that == 'export', or where the partition finds nothing, or… For example:
with open(fn) as f:
for line in shlex.split(f.read()):
var, eq, value = line.partition('=')
if eq:
os.environ[var] = val
If you want to get fancier, you can construct a shlex object and (a) drive the parser directly from the file, and (b) control the parsing at a finer-grained level. However, I don't think that's necessary here.
Meanwhile, if you want to handle environment substitution (as the BUILDDIR=$PWD implies), this won't magically take care of that for you. You can make configparser do that for you with its ExtendedInterpolation feature, but then you'll need to trick configparser into handling shlex syntax, at which point… why bother.
You can of course do it manually by writing your own interpolator, but that's hard to get right. You need to know the shell's rules for why $PWD-foo is the same as ${PWD}-foo, but $PWD_foo is the same as ${PWD_foo}, etc.
A better solution at this point—assuming the script is actually safe to run—would be to actually use a shell to do it for you. For example:
with open('script.sh') as f:
script = f.read()
script += b'\nenv'
with subprocess.Popen(['sh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) as p:
result = p.communicate(script)
for line in result.splitlines():
var, _, value = line.partition('=')
os.environ[var] = value
Of course this will also override things like _=/usr/bin/env, but probably not anything you care about.
def parse_bash_script(fn):
with open(fn) as f:
for line in f:
if not line.startswith('#'): #ignore comments
if "export" in line:
var, _, val = line.partition('=')
var = var.lstrip()
val = val.rstrip()
if val.startswith('"'):
vals = val.rpartition('"')
val = vals[0][1]+vals[2]
os.environ[var]=val
I had the same problem, and based on the advice from abarnert, I decided to implement the solution as a subprocess call to a restricted bash shell, combined with shlex.
import shlex
import subprocess
filename = '/path/to/file.conf'
o, e = subprocess.Popen(
['/bin/bash', '--restricted', '--noprofile', '--init-file',
filename, '-i', '-c', 'declare'],
env={'PATH': ''},
stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()
if e:
raise StandardError('conf error in {}: {}'.format(filename, e))
for token in shlex.split(o):
parts = token.split('=', 1)
if len(parts) == 2:
os.environ[parts[0]] = parts[1]
The advantage to the restricted shell is that it blocks many of the undesirable or malicious side effects that may otherwise happen when executing a shell script. From the bash documentation:
A restricted shell is used to set up an environment more controlled than the standard shell. It behaves identically to bash with the exception that the following are disallowed or not performed:
changing directories with cd
setting or unsetting the values of SHELL, PATH, ENV, or BASH_ENV
specifying command names containing /
specifying a file name containing a / as an argument to the . builtin command
Specifying a filename containing a slash as an argument to the -p option to the hash builtin command
importing function definitions from the shell environment at startup
parsing the value of SHELLOPTS from the shell environment at startup
redirecting output using the >, >|, <>, >&, &>, and >> redirection operators
using the exec builtin command to replace the shell with another command
adding or deleting builtin commands with the -f and -d options to the enable builtin command
Using the enable builtin command to enable disabled shell builtins
specifying the -p option to the command builtin command
turning off restricted mode with set +r or set +o restricted.

Categories

Resources