Airflow SSHExecuteOperator() with env=... not setting remote environment

Airflow SSHExecuteOperator() with env=... not setting remote environment - python

I am modifying the environment of the calling process and appending to it's PATH along with setting some new environment variables. However, when I print os.environ in the child process, these changes are not reflected. Any idea what may be happening?
My call to the script on the instance:
ssh_hook = SSHHook(conn_id=ssh_conn_id)
temp_env = os.environ.copy()
temp_env["PATH"] = "/somepath:"+temp_env["PATH"]
run = SSHExecuteOperator(
bash_command="python main.py",
env=temp_env,
ssh_hook=ssh_hook,
task_id="run",
dag=dag)

Explanation: Implementation Analysis
If you look at the source to Airflow's SSHHook class, you'll see that it doesn't incorporate the env argument into the command being remotely run at all. The SSHExecuteOperator implementation passes env= through to the Popen() call on the hook, but that only passes it through to the local subprocess.Popen() implementation, not to the remote operation.
Thus, in short: Airflow does not support passing environment variables over SSH. If it were to have such support, it would need to either incorporate them into the command being remotely executed, or to add the SendEnv option to the ssh command being locally executed for each command to be sent (which even then would only work if the remote sshd were configured with AcceptEnv whitelisting the specific environment variable names to be received).
Workaround: Passing Environment Variables On The Command Line
from pipes import quote # in Python 3, make this "from shlex import quote"
def with_prefix_from_env(env_dict, command=None):
result = 'set -a; '
for (k,v) in env_dict.items():
result += '%s=%s; ' % (quote(k), quote(v))
if command:
result += command
return result
SSHExecuteOperator(bash_command=prefix_from_env(temp_env, "python main.py"),
ssh_hook=ssh_hook, task_id="run", dag=dag)
Workaround: Remote Sourcing
If your environment variables are sensitive and you don't want them to be logged with the command, you can transfer them out-of-band and source the remote file containing them.
from pipes import quote
def with_env_from_remote_file(filename, command):
return "set -a; . %s; %s" % (quote(filename), command)
SSHExecuteOperator(bash_command=with_env_from_remote_file(envfile, "python main.py"),
ssh_hook=ssh_hook, task_id="run", dag=dag)
Note that set -a directs the shell to export all defined variables, so the file being executed need only define variables with key=val declarations; they'll be automatically exported. If generating this file from your Python script, be sure to quote both keys and values with pipes.quote() to ensure that it only performs assignments and does not run other commands. The . keyword is a POSIX-compliant equivalent to the bash source command.

Related

How can you create an os.environ object with a modified environment, e.g. after loading many different modules with "module load"?

I have a python script that calls an application using subprocess. I am calling this application many times, currently I am doing something along the lines of
out, err = subprocess.Popen(f"module load {' '.join(my_module_list)} && ./my_binary", stdout=sp.PIPE, stderr=sp.STDOUT, shell = True).communicate()
to run my program. Ideally I would like to first generate a modified os.environ object that already contains all the paths to the modules I am loading, and then pass it to subprocess.Popen under the env argument. However, since the printenv command doesn't output a python dictionary format, I'm not sure how to access all the modifications that modules load makes to the environment variables. Is there a good, clean way to create the required modified os.environ object?

I'd be tempted to call python in the subprocess and dump from os.environ in it
python -c 'import os; print(os.environ)'
Once you know what you're after, you can pass a dict directly to subprocess's env arg to set custom environmental vars, which could be something like
custom_env = os.environ.copy()
custom_env["foo"] = "bar"
subprocess.Popen(
...
env=custom_env,
)

How to call . /home/test.sh file in python script

I have file called . /home/test.sh (the space between the first . and / is intentional) which contains some environmental variables. I need to load this file and run the .py. If I run the command manually first on the Linux server and then run python script it generates the required output. However, I want to call . /home/test.sh from within python to load the profile and run rest of the code. If this profile is not loaded python scripts runs and gives 0 as an output.
The call
subprocess.call('. /home/test.sh',shell=True)
runs fine but the profile is not loaded on the Linux terminal to execute python code and give the desired output.
Can someone help?

Environment variables are not inherited directly by the parent process, which is why your simple approach does not work.
If you are trying to pick up environment variables that have been set in your test.sh, then one thing you could do instead is to use env in a sub-shell to write them to stdout after sourcing the script, and then in Python you can parse these and set them locally.
The code below will work provided that test.sh does not write any output itself. (If it does, then what you could do to work around it would be to echo some separator string afterward sourcing it, and before running the env, and then in the Python code, strip off the separator string and everything before it.)
import subprocess
import os
p = subprocess.Popen(". /home/test.sh; env -0", shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, _ = p.communicate()
for varspec in out.decode().split("\x00")[:-1]:
pos = varspec.index("=")
name = varspec[:pos]
value = varspec[pos + 1:]
os.environ[name] = value
# just to test whether it works - output of the following should include
# the variables that were set
os.system("env")
It is also worth considering that if all that you want to do is set some environment variables every time before you run any python code, then one option is just to source your test.sh from a shell-script wrapper, and not try to set them inside python at all:
#!/bin/sh
. /home/test.sh
exec "/path/to/your/python/script $#"
Then when you want to run the Python code, you run the wrapper instead.

Load environment variables from a shell script

I have a file with some environment variables that I want to use in a python script
The following works form the command line
$ source myFile.sh
$ python ./myScript.py
and from inside the python script I can access the variables like
import os
os.getenv('myvariable')
How can I source the shell script, then access the variables, from with the python script?

If you are saying backward environment propagation, sorry, you can't. It's a security issue. However, directly source environment from python is definitely valid. But it's more or less a manual process.
import subprocess as sp
SOURCE = 'your_file_path'
proc = sp.Popen(['bash', '-c', 'source {} && env'.format(SOURCE)], stdout=sp.PIPE)
source_env = {tup[0].strip(): tup[1].strip() for tup in map(lambda s: s.strip().split('=', 1), proc.stdout)}
Then you have everything you need in source_env.
If you need to write it back to your local environment (which is not recommended, since source_env keeps you clean):
import os
for k, v in source_env.items():
os.environ[k] = v
Another tiny attention needs to be paid here, is since I called bash here, you should expect the rules are applied here too. So if you want your variable to be seen, you will need to export them.
export VAR1='see me'
VAR2='but not me'

You can not load environmental variables in general from a bash or shell script, it is a different language. You will have to use bash to evaluate the file and then somehow print out the variables and then read them. see Forcing bash to expand variables in a string loaded from a file

From Python execute shell command and incorporate environment changes (without subprocess)?

I'm exploring using iPython as shell replacement for a workflow that requires good logging and reproducibility of actions.
I have a few non-python binary programs and bash shell commands to run during my common workflow that manipulate the environment variables affecting subsequent work. i.e. when run from bash, the environment changes.
How can I incorporate these cases into the Python / iPython interactive shell and modify the environment going forward in the session?
Let's focus on the most critical case.
From bash, I woud do:
> sysmanager initialize foo
where sysmanager is a function:
> type sysmanager
sysmanager is a function
sysmanager ()
{
eval `/usr/bin/sysmanagercmd bash $*`
}
I don't control the binary sysmanagercmd and it generally makes non-trivial manipulations of the environment variables. Use of the eval built-in means these manipulations affect the shell process going forward -- that's critical to the design.
How can I call this command from Python / iPython with the same affect? Does python have something equivalent to bash's eval built-in for non-python commands?

Having not come across any built-in capability to do this, I wrote the following function which accomplishes the broad intent. Environment variable modifications and change of working directory are reflected in the python shell after the function returns. Any modification of shell aliases or functions are not retained but that could be done too with enhancement of this function.
#!/usr/bin/env python3
"""
Some functionality useful when working with IPython as a shell replacement.
"""
import subprocess
import tempfile
import os
def ShellEval(command_str):
"""
Evaluate the supplied command string in the system shell.
Operates like the shell eval command:
- Environment variable changes are pulled into the Python environment
- Changes in working directory remain in effect
"""
temp_stdout = tempfile.SpooledTemporaryFile()
temp_stderr = tempfile.SpooledTemporaryFile()
# in broader use this string insertion into the shell command should be given more security consideration
subprocess.call("""trap 'printf "\\0`pwd`\\0" 1>&2; env -0 1>&2' exit; %s"""%(command_str,), stdout=temp_stdout, stderr=temp_stderr, shell=True)
temp_stdout.seek(0)
temp_stderr.seek(0)
all_err_output = temp_stderr.read()
allByteStrings = all_err_output.split(b'\x00')
command_error_output = allByteStrings[0]
new_working_dir_str = allByteStrings[1].decode('utf-8') # some risk in assuming index 1. What if commands sent a null char to the output?
variables_to_ignore = ['SHLVL','COLUMNS', 'LINES','OPENSSL_NO_DEFAULT_ZLIB', '_']
newdict = dict([ tuple(bs.decode('utf-8').split('=',1)) for bs in allByteStrings[2:-1]])
for (varname,varvalue) in newdict.items():
if varname not in variables_to_ignore:
if varname not in os.environ:
#print("New Variable: %s=%s"%(varname,varvalue))
os.environ[varname] = varvalue
elif os.environ[varname] != varvalue:
#print("Updated Variable: %s=%s"%(varname,varvalue))
os.environ[varname] = varvalue
deletedVars = []
for oldvarname in os.environ.keys():
if oldvarname not in newdict.keys():
deletedVars.append(oldvarname)
for oldvarname in deletedVars:
#print("Deleted environment Variable: %s"%(oldvarname,))
del os.environ[oldvarname]
if os.getcwd() != os.path.normpath(new_working_dir_str):
#print("Working directory changed to %s"%(os.path.normpath(new_working_dir_str),))
os.chdir(new_working_dir_str)
# Display output of user's command_str. Standard output and error streams are not interleaved.
print(temp_stdout.read().decode('utf-8'))
print(command_error_output.decode('utf-8'))

how to set environmental variables permanently in posix(unix/linux) machine using python script

I am trying to set a environmental variable permanently. but temporarily it is working.
if i run below program i got the variable path. after close it and open new terminal to find the variable path using the command printenv LD_LIBRARY_PATH nothing will print.
#!/usr/bin/python
import os
import subprocess
def setenv_var():
env_var = "LD_LIBRARY_PATH"
env_path = "/usr/local/lib"`enter code here`
os.environ[env_var] = env_path
process = subprocess.Popen('printenv ' + env_var, stdout=subprocess.PIPE, shell=True)
result = process.communicate()[0]
return result
if __name__ == '__main__':
print setenv_var()
please help me.

Here is what I use to set environment variables:
def setenv_var(env_file, set_this_env=True):
env_var = "LD_LIBRARY_PATH"
env_path = "/usr/local/lib"`enter code here`
# set environments opened later by appending to `source`-d file
with open(env_file, 'a') as f:
f.write(os.linesep + ("%s=%s" % (env_var, env_path)))
if set_this_end:
# set this environment
os.environ[env_var] = env_path
Now you only have to choose where to set it, that is the first argument in the function. I recommend the profile-specific file ~/.profile or if you're using bash which is pretty common ~/.bashrc
You can also set it globally by using a file like /etc/environment but you'll need to have permissions when you run this script (sudo python script.py).
Remember that environments are inherited from the parent process, and you can't have a child set up a parent process' environment.

When you set an environment variable, it only affects the currently running process (and, by extension, any children that are forked after the variable is set). If you are attempting to set an environment variable in your shell and you want that environment variable to always be set for your interactive shells, you need to set it in the startup scripts (eg .login, .bashrc, .profile) for your shell. Commands that you run are (initially) children of the shell from which you run them, so although they inherit the environment of the shell and can change their own environment, they cannot change the environment of your shell.

Whether you do an export from bash or you set your os.environ from Python, these only stay for the session or process's lifetime. If you want to set them permanent you will have to touch and add it to the respective shell's profile file.
For ex. If you are on bash, you could do:
with open("~/.bashrc", "a") as outfile: # 'a' stands for "append"
outfile.write("export LD_LIBRARY_PATH=/usr/local/lib")
Check this out for some insight as to which file to add this depending on the target shell. https://unix.stackexchange.com/questions/117467/how-to-permanently-set-environmental-variables

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow SSHExecuteOperator() with env=... not setting remote environment - python

Related

How can you create an os.environ object with a modified environment, e.g. after loading many different modules with "module load"?

How to call . /home/test.sh file in python script

Load environment variables from a shell script

From Python execute shell command and incorporate environment changes (without subprocess)?

how to set environmental variables permanently in posix(unix/linux) machine using python script

Categories

Resources