Difference between using os.system and snakemake.shell - python

I'm fairly new to snakemake, so I still struggle with combining shell commands and python code.
My solution is to make script files and then perform the shell command within this script.
Is there any mechanical difference between envoking snakemake.shell and os.system for executing command lines?
Example:
sample = ["SRR12845350"]
rule prefetch:
input:
"results/Metadata/{sample}.json"
output:
"results/SRA/{sample}.sra
params:
"prefetch %s -o %s"
script:
"scripts/prefetch.py"
And prefetch.pyis:
from json import load
from snakemake import shell
from os import system
json_file = snakemake.input[0]
prefetch = snakemake.params[0]
sra_file = snakemake.output[0]
json = load(open(json_file))
sra_run = json["RUN_accession"]
shell(prefetch %(sra_run, sra_file)) # option 1
system(prefetch %(sra_run, sra_file)) # option 2

shell is just a helper function to make it easier to call command-line arguments from snakemake. Learning snakemake can be overwhelming, and learning the fine intricacies of Python's os.system and subprocess is unnecessarily complicating. The snakemake shell command does a couple sanity checks, sets some environment variables e.g. the number of threads the command can use and some other "small" stuff, but under the hood just calls subprocess.Popen on your command. Both options should work, but since you are writing a snakemake wrapper, it's probably slightly better to use shell as it is designed to be used in snakemake.

Related

How do I embed my shell scanning-script into a Python script?

Iv'e been using the following shell command to read the image off a scanner named scanner_name and save it in a file named file_name
scanimage -d <scanner_name> --resolution=300 --format=tiff --mode=Color 2>&1 > <file_name>
This has worked fine for my purposes.
I'm now trying to embed this in a python script. What I need is to save the scanned image, as before, into a file and also capture any std output (say error messages) to a string
I've tried
scan_result = os.system('scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name))
But when I run this in a loop (with different scanners), there is an unreasonably long lag between scans and the images aren't saved until the next scan starts (the file is created as an empty file and is not filled until the next scanning command). All this with scan_result=0, i.e. indicating no error
The subprocess method run() has been suggested to me, and I have tried
with open(file_name, 'w') as scanfile:
input_params = '-d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name)
scan_result = subprocess.run(["scanimage", input_params], stdout=scanfile, shell=True)
but this saved the image in some kind of an unreadable file format
Any ideas as to what may be going wrong? Or what else I can try that will allow me to both save the file and check the success status?
subprocess.run() is definitely preferred over os.system() but neither of them as such provides support for running multiple jobs in parallel. You will need to use something like Python's multiprocessing library to run several tasks in parallel (or painfully reimplement it yourself on top of the basic subprocess.Popen() API).
You also have a basic misunderstanding about how to run subprocess.run(). You can pass in either a string and shell=True or a list of tokens and shell=False (or no shell keyword at all; False is the default).
with_shell = subprocess.run(
"scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} ".format(
scanner, file_name), shell=True)
with open(file_name) as write_handle:
no_shell = subprocess.run([
"scanimage", "-d", scanner, "--resolution=300", "--format=tiff",
"--mode=Color"], stdout=write_handle)
You'll notice that the latter does not support redirection (because that's a shell feature) but this is reasonably easy to implement in Python. (I took out the redirection of standard error -- you really want error messages to remain on stderr!)
If you have a larger working Python program this should not be awfully hard to integrate with a multiprocessing.Pool(). If this is a small isolated program, I would suggest you peel off the Python layer entirely and go with something like xargs or GNU parallel to run a capped number of parallel subprocesses.
I suspect the issue is you're opening the output file, and then running the subprocess.run() within it. This isn't necessary. The end result is, you're opening the file via Python, then having the command open the file again via the OS, and then closing the file via Python.
JUST run the subprocess, and let the scanimage 2>&1> filename command create the file (just as it would if you ran the scanimage at the command line directly.)
I think subprocess.check_output() is now the preferred method of capturing the output.
I.e.
from subprocess import check_output
# Command must be a list, with all parameters as separate list items
command = ['scanimage',
'-d{}'.format(scanner),
'--resolution=300',
'--format=tiff',
'--mode=Color',
'2>&1>{}'.format(file_name)]
scan_result = check_output(command)
print(scan_result)
However, (with both run and check_output) that shell=True is a big security risk ... especially if the input_params come into the Python script externally. People can pass in unwanted commands, and have them run in the shell with the permissions of the script.
Sometimes, the shell=True is necessary for the OS command to run properly, in which case the best recommendation is to use an actual Python module to interface with the scanner - versus having Python pass an OS command to the OS.

set bash variable from python script

i'm calling a python script inside my bash script and I was wondering if there is a simple way to set my bash variables within my python script.
Example:
My bash script:
#!/bin/bash
someVar=""
python3 /some/folder/pythonScript.py
My python script:
anotherVar="HelloWorld"
Is there a way I can set my someVar to the value of anotherVar? I was thinking of printing properties in a file inside the python script and then read them from my bash script but maybe there is another way. Also I don't know and don't think it makes any difference but I can name both variable with the same name (someVar/someVar instead of someVar/anotherVar)
No, when you execute python, you start a new process, and every process has access only to their own memory. Imagine what would happen if a process could influence another processes memory! Even for parent/child processes like this, this would be a huge security problem.
You can make python print() something and use that, though:
#!/usr/bin/env python3
print('Hello!')
And in your shell script:
#!/usr/bin/env bash
someVar=$(python3 myscript.py)
echo "$someVar"
There are, of course, many others IPC techniques you could use, such as sockets, pipes, shared memory, etc... But without context, it's difficult to make a specific recommendation.
shlex.quote() in Python 3, or pipes.quote() in Python 2, can be used to generate code which can be evaled by the calling shell. Thus, if the following script:
#!/usr/bin/env python3
import sys, shlex
print('export foobar=%s' % (shlex.quote(sys.argv[1].upper())))
...is named setFoobar and invoked as:
eval "$(setFoobar argOne)"
...then the calling shell will have an environment variable set with the name foobar and the value argOne.

Python equivalent/emulator of Bash "parameter expansion"

I have a bash script that I use to update several computers in my house. It makes use of the deborphan program which identifies programs that are no longer required on my system (Linux obviously).
The bash script makes use of bash's parameter expansion which enables me to pass deborphan's results to my package manager (in this case aptitude):
aptitude purge $(deborphan --guess-all) -y
deborphan's results are:
python-pip
python3-all
I would like to convert my bash script into python (partly as a learning opportunity, as I am new to python), but I have run into a significant snag. My obvious start for the python script is
subprocess.call(["aptitude", "purge", <how do I put the deborphan results here?>, "-y"])
I have tried a separate subprocess.call for a parameter inside the above subprocess.call just for deborphan and that fails.
Interestingly enough I cannot seem to capture the deborphan results with:
deb = subprocess.call(["deborphan", "--guess-all"])
to pass deborphan's results as a variable for the parameter either.
Is there anyway to emulate Bash's parameter expansion in python?
You can use + to concatenate lists:
import subprocess as sp
deborphan_results = sp.check_output(…)
deborphan_results = deborphan_results.splitlines()
subprocess.call(["aptitude", "purge"] + deborphan_results + ["-y"])
(if you're using a Python version below 2.7, you can use proc = sp.Popen(…, stdout=sp.PIPE); deborphan_results, _ = proc.communicate())

String parameter using subprocess module

I am using Python to simplify some commands in Maven. I have this script which calls mvn test in debug mode.
from subprocess import call
commands = []
commands.append("mvn")
commands.append("test")
commands.append("-Dmaven.surefire.debug=\"-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Xnoagent -Djava.compiler=NONE\"")
call(commands)
The problem is with line -Dmaven.surefire.debug which accepts parameter which has to be in quotas and I don't know how to do that correctly. It looks fine when I print this list but when I run the script I get Error translating CommandLine and the debugging line is never executed.
The quotas are only required for the shell executing the command.
If you do the said call directly from the shell, you probably do
mvn test -Dmaven.surefire.debug="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Xnoagent -Djava.compiler=NONE"
With these " signs you (simply spoken) tell the shell to ignore the spaces within.
The program is called with the arguments
mvn
test
-Dmaven.surefire.debug=-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Xnoagent -Djava.compiler=NONE
so
from subprocess import call
commands = []
commands.append("mvn")
commands.append("test")
commands.append("-Dmaven.surefire.debug=-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Xnoagent -Djava.compiler=NONE")
call(commands)
should be the way to go.

Run commands sequential from Python

I'm trying to build a LaTeX document using Python but am having problems getting the commands to run in sequence. For those familiar with LaTeX, you'll know that you usually have to run four commands, each completing before running the next, e.g.
pdflatex file
bibtex file
pdflatex file
pdflatex file
In Python, I'm therefore doing this to define the commands
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
commands = [(element + ' ' + src_file) for element in commands]
but the problem is then running them.
I've tried to suss things out from this thread – e.g. using os.system() in a loop, subprocess stuff like map(call, commands) or Popen, and collapsing the list to a single string separated by & – but it seems like the commands all run as separate processes, without waiting for the previous one to complete.
For the record, I'm on Windows but would like a cross-platform solution.
EDIT
The problem was a bug in speciyfing the src_file variable; it's shouldn't have a ".tex". The following code now works:
test.py
import subprocess
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
for command in commands:
subprocess.call((command, 'test'))
test.tex
\documentclass{article}
\usepackage{natbib}
\begin{document}
This is a test \citep{Body2000}.
\bibliographystyle{plainnat}
\bibliography{refs}
\end{document}
refs.bib
#book{Body2000,
author={N.E. Body},
title={Introductory Widgets},
publisher={Widgets International},
year={2000}
}
os.system shouldn't cause this, but subprocess.Popen should.
But I think using subprocess.call is the best choice:
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
for command in commands:
subprocess.call((command, src_file))

Categories

Resources