Can't execute shell script in snakemake

Can't execute shell script in snakemake - python

I recently started using snakemake and would like to run a shell script in my snakefile. However, I'm having trouble accessing input, output and params. I would appreciate any advice!
Here the relevant code snippets:
from my snakefile
rule ..:
input:
munged = 'results/munged.sumstats.gz'
output:
ldsc = 'results/ldsc.txt'
params:
mkdir = 'results/ldsc_results/',
ldsc_sumstats = '/resources/ldsc_sumstats/',
shell:
'scripts/run_gc.sh'
and the script:
chmod 770 {input.munged}
mkdir -p {params.mkdir}
ldsc=$(ls {params.ldsc_sumstats})
for i in $ldsc; do
...
I get the following error message:
...
chmod: cannot access '{input.munged}': No such file or directory
ls: cannot access '{params.ldsc_sumstats}': No such file or directory
...

The syntax of using {} statements applies only to shell scripts defined within Snakefile, while in the example you provide, the script is defined externally.
If you want to use the script as an external script you will need to pass the relevant arguments (and parse them inside the shell script). Otherwise, it should be possible to copy-paste the script content inside the shell directive and let snakemake substitute the {} variables.

From v7.14.0 Snakemake now supports executing external Bash scripts, with access to snakemake objects inside. See the docs for example usage.

Related

Snakemake: creating directories and files anywhere while using singularity

Whenever I use snakemake with singularity and would like create a directory or file, I run into read-only errors if I try to write outside of the snakemake directory.
For example, if my rule contains the following:
container:
docker://some_image
shell:
"touch ~/hello.txt"
then everything runs fine. hello.txt is created inside the snakemake directory. However if my touch command tries to create a file outside of the snakemake directory:
container:
docker://some_image
shell:
"touch /home/user/hello.txt"
Then I get the following error:
touch: cannot touch '/home/user/hello.txt': Read-only file system
Is there anyway to give snakemake the ability to create files anywhere it wants when using singularity?

Singularity by default mounts certain directories including user's home directory. In your first command (touch ~/hello.txt), file gets written to home directory, where singularity has read/write permissions. However in your second command (touch /home/user/hello.txt), singularity doesn't have read/write access to /home/user/; you will need to bind that path manually using singularity's --bind and supply it to snakemake via --singularity-args.
So the snakemake command would look something like
snakemake --use-singularity --singularity-args "--bind /home/user/"

python3.7 subprocess failed to delete files for me

I have a python script using 'subprocess' running linux command to confirm my task is doing the right thing, and it worked well. But i found that at the same time it will generate some log files when running my task. So i added a clean up function to rm log files for me at the beginning. My script is:
def test_clean_up_logs(path_to_my_log):
regex = path_to_my_log + ".*" # i need this because log will append current date time when it's generated
print(regex) # i can see it's correct
result = subprocess.run(['rm', '-rf', regex])
def test_my_real_test():
# This will run my real test and generate log files
but it turns out it did not remove log files for me after i added first test, it still have more and more logs file in my build dir. I run it using:
Python3.7 -m pytest /path/to/mydir
My question is:
1. Why did not it work? In my second test case, i am using 'subprocess' to run a linux command and it worked fine.
2. Is this correct way to clean up log files? i cannot think of a better way to do it automatically. Thanks!

Why did not it work?
Because the arguments that you gave to your command is passed in quotes and wildcards like * does not work in quotes. Currently the executed command looks like this:
$ rm "-rf" "filename.*"
Try this in your terminal and you will see that it will not remove the files that starts with filename..
You need to pass shell = True to execute the command in a shell interpreter and give your command as a single string.
subprocess.run(f'rm -rf {regex}', shell=True)

Running a Python function from Ansible script

I have a Django project hosted on a remote server. This contains a file called tmp_file.py. There's a function called fetch_data() inside that file. Usually I follow the below approach to run that function.
# Inside Django Project
$ python manage.py shell
[Shell] from tmp_file import feth_data
[Shell] fetch_data()
Also the file doesn't contain __name__ section. So can't run as a stand alone script. What's the best way to perform this task using Ansible. I couldn't find anything useful from Ansible docs.

There's --command switch for shell django-admin command.
So you can try in Ansible:
- name: Fetch data
command: "django-admin shell --command='from tmp_file import feth_data; fetch_data()'"
args:
chdir: /path/to/tmp_file

running python script with an ECS task

I have an ECS task setup which, when with a Command override ls, produces expected results with my CloudWatch log stream: test.py. my script test.py takes one parameter. I am wondering how I can execute this script with python3 (which exists in my container) using the command override. Essentially, I want to execute the command:
python3 test.py hello
how can I do this?

Here's how I did something similar:
In your docker build file, make the command you want to run as the last instruction. In your case:
CMD python3 test.py hello
To make it more extensible, use environment variables. For instance, do something like:
CMD ["python3", "test.py"]
But make the parameter come from an environment variable you pass into the container definition in your task.

Why can't environmental variables set in python persist?

I was hoping to write a python script to create some appropriate environmental variables by running the script in whatever directory I'll be executing some simulation code, and I've read that I can't write a script to make these env vars persist in the mac os terminal. So two things:
Is this true?
and
It seems like it would be a useful things to do; why isn't it possible in general?

You can't do it from python, but some clever bash tricks can do something similar. The basic reasoning is this: environment variables exist in a per-process memory space. When a new process is created with fork() it inherits its parent's environment variables. When you set an environment variable in your shell (e.g. bash) like this:
export VAR="foo"
What you're doing is telling bash to set the variable VAR in its process space to "foo". When you run a program, bash uses fork() and then exec() to run the program, so anything you run from bash inherits the bash environment variables.
Now, suppose you want to create a bash command that sets some environment variable DATA with content from a file in your current directory called ".data". First, you need to have a command to get the data out of the file:
cat .data
That prints the data. Now, we want to create a bash command to set that data in an environment variable:
export DATA=`cat .data`
That command takes the contents of .data and puts it in the environment variable DATA. Now, if you put that inside an alias command, you have a bash command that sets your environment variable:
alias set-data="export DATA=`cat .data`"
You can put that alias command inside the .bashrc or .bash_profile files in your home directory to have that command available in any new bash shell you start.

One workaround is to output export commands, and have the parent shell evaluate this..
thescript.py:
import pipes
import random
r = random.randint(1,100)
print("export BLAHBLAH=%s" % (pipes.quote(str(r))))
..and the bash alias (the same can be done in most shells.. even tcsh!):
alias setblahblahenv="eval $(python thescript.py)"
Usage:
$ echo $BLAHBLAH
$ setblahblahenv
$ echo $BLAHBLAH
72
You can output any arbitrary shell code, including multiple commands like:
export BLAHBLAH=23 SECONDENVVAR='something else' && echo 'everything worked'
Just remember to be careful about escaping any dynamically created output (the pipes.quote module is good for this)

If you set environment variables within a python script (or any other script or program), it won't affect the parent shell.
Edit clarification:
So the answer to your question is yes, it is true.
You can however export from within a shell script and source it by using the dot invocation
in fooexport.sh
export FOO="bar"
at the command prompt
$ . ./fooexport.sh
$ echo $FOO
bar

It's not generally possible. The new process created for python cannot affect its parent process' environment. Neither can the parent affect the child, but the parent gets to setup the child's environment as part of new process creation.
Perhaps you can set them in .bashrc, .profile or the equivalent "runs on login" or "runs on every new terminal session" script in MacOS.
You can also have python start the simulation program with the desired environment. (use the env parameter to subprocess.Popen (http://docs.python.org/library/subprocess.html) )
import subprocess, os
os.chdir('/home/you/desired/directory')
subprocess.Popen(['desired_program_cmd', 'args', ...], env=dict(SOMEVAR='a_value') )
Or you could have python write out a shell script like this to a file with a .sh extension:
export SOMEVAR=a_value
cd /home/you/desired/directory
./desired_program_cmd
and then chmod +x it and run it from anywhere.

What I like to do is use /usr/bin/env in a shell script to "wrap" my command line when I find myself in similar situations:
#!/bin/bash
/usr/bin/env NAME1="VALUE1" NAME2="VALUE2" ${*}
So let's call this script "myappenv". I put it in my $HOME/bin directory which I have in my $PATH.
Now I can invoke any command using that environment by simply prepending "myappenv" as such:
myappenv dosometask -xyz
Other posted solutions work too, but this is my personal preference. One advantage is that the environment is transient, so if I'm working in the shell only the command I invoke is affected by the altered environment.
Modified version based on new comments
#!/bin/bash
/usr/bin/env G4WORKDIR=$PWD ${*}
You could wrap this all up in an alias too. I prefer the wrapper script approach since I tend to have other environment prep in there too, which makes it easier for me to maintain.

As answered by Benson, but the best hack-around is to create a simple bash function to preserve arguments:
upsert-env-var (){ eval $(python upsert_env_var.py $*); }
Your can do whatever you want in your python script with the arguments. To simply add a variable use something like:
var = sys.argv[1]
val = sys.argv[2]
if os.environ.get(var, None):
print "export %s=%s:%s" % (var, val, os.environ[var])
else:
print "export %s=%s" % (var, val)
Usage:
upsert-env-var VAR VAL

As others have pointed out, the reason this doesn't work is that environment variables live in a per-process memory spaces and thus die when the Python process exits.
They point out that a solution to this is to define an alias in .bashrc to do what you want such as this:
alias export_my_program="export MY_VAR=`my_program`"
However, there's another (a tad hacky) method which does not require you to modify .bachrc, nor requires you to have my_program in $PATH (or specify the full path to it in the alias). The idea is to run the program in Python if it is invoked normally (./my_program), but in Bash if it is sourced (source my_program). (Using source on a script does not spawn a new process and thus does not kill environment variables created within.) You can do that as follows:
my_program.py:
#!/usr/bin/env python3
_UNUSED_VAR=0
_UNUSED_VAR=0 \
<< _UNUSED_VAR
#=======================
# Bash code starts here
#=======================
'''
_UNUSED_VAR
export MY_VAR=`$(dirname $0)/my_program.py`
echo $MY_VAR
return
'''
#=========================
# Python code starts here
#=========================
print('Hello environment!')
Running this in Python (./my_program.py), the first 3 lines will not do anything useful and the triple-quotes will comment out the Bash code, allowing Python to run normally without any syntax errors from Bash.
Sourcing this in bash (source my_program.py), the heredoc (<< _UNUSED_VAR) is a hack used to "comment out" the first-triple quote, which would otherwise be a syntax error. The script returns before reaching the second triple-quote, avoiding another syntax error. The export assigns the result of running my_program.py in Python from the correct directory (given by $(dirname $0)) to the environment variable MY_VAR. echo $MY_VAR prints the result on the command-line.
Example usage:
$ source my_program.py
Hello environment!
$ echo $MY_VAR
Hello environment!
However, the script will still do everything it did before except exporting, the environment variable if run normally:
$ ./my_program.py
Hello environment!
$ echo $MY_VAR
<-- Empty line

As noted by other authors, the memory is thrown away when the Python process exits. But during the python process, you can edit the running environment. For example:
>>> os.environ["foo"] = "bar"
>>> import subprocess
>>> subprocess.call(["printenv", "foo"])
bar
0
>>> os.environ["foo"] = "foo"
>>> subprocess.call(["printenv", "foo"])
foo
0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't execute shell script in snakemake - python

From v7.14.0 Snakemake now supports executing external Bash scripts, with access to snakemake objects inside. See the docs for example usage.

Related

Snakemake: creating directories and files anywhere while using singularity

python3.7 subprocess failed to delete files for me

Running a Python function from Ansible script

running python script with an ECS task

Why can't environmental variables set in python persist?

Categories

Resources