How do I embed my shell scanning-script into a Python script? - python

Iv'e been using the following shell command to read the image off a scanner named scanner_name and save it in a file named file_name
scanimage -d <scanner_name> --resolution=300 --format=tiff --mode=Color 2>&1 > <file_name>
This has worked fine for my purposes.
I'm now trying to embed this in a python script. What I need is to save the scanned image, as before, into a file and also capture any std output (say error messages) to a string
I've tried
scan_result = os.system('scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name))
But when I run this in a loop (with different scanners), there is an unreasonably long lag between scans and the images aren't saved until the next scan starts (the file is created as an empty file and is not filled until the next scanning command). All this with scan_result=0, i.e. indicating no error
The subprocess method run() has been suggested to me, and I have tried
with open(file_name, 'w') as scanfile:
input_params = '-d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name)
scan_result = subprocess.run(["scanimage", input_params], stdout=scanfile, shell=True)
but this saved the image in some kind of an unreadable file format
Any ideas as to what may be going wrong? Or what else I can try that will allow me to both save the file and check the success status?

subprocess.run() is definitely preferred over os.system() but neither of them as such provides support for running multiple jobs in parallel. You will need to use something like Python's multiprocessing library to run several tasks in parallel (or painfully reimplement it yourself on top of the basic subprocess.Popen() API).
You also have a basic misunderstanding about how to run subprocess.run(). You can pass in either a string and shell=True or a list of tokens and shell=False (or no shell keyword at all; False is the default).
with_shell = subprocess.run(
"scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} ".format(
scanner, file_name), shell=True)
with open(file_name) as write_handle:
no_shell = subprocess.run([
"scanimage", "-d", scanner, "--resolution=300", "--format=tiff",
"--mode=Color"], stdout=write_handle)
You'll notice that the latter does not support redirection (because that's a shell feature) but this is reasonably easy to implement in Python. (I took out the redirection of standard error -- you really want error messages to remain on stderr!)
If you have a larger working Python program this should not be awfully hard to integrate with a multiprocessing.Pool(). If this is a small isolated program, I would suggest you peel off the Python layer entirely and go with something like xargs or GNU parallel to run a capped number of parallel subprocesses.

I suspect the issue is you're opening the output file, and then running the subprocess.run() within it. This isn't necessary. The end result is, you're opening the file via Python, then having the command open the file again via the OS, and then closing the file via Python.
JUST run the subprocess, and let the scanimage 2>&1> filename command create the file (just as it would if you ran the scanimage at the command line directly.)
I think subprocess.check_output() is now the preferred method of capturing the output.
I.e.
from subprocess import check_output
# Command must be a list, with all parameters as separate list items
command = ['scanimage',
'-d{}'.format(scanner),
'--resolution=300',
'--format=tiff',
'--mode=Color',
'2>&1>{}'.format(file_name)]
scan_result = check_output(command)
print(scan_result)
However, (with both run and check_output) that shell=True is a big security risk ... especially if the input_params come into the Python script externally. People can pass in unwanted commands, and have them run in the shell with the permissions of the script.
Sometimes, the shell=True is necessary for the OS command to run properly, in which case the best recommendation is to use an actual Python module to interface with the scanner - versus having Python pass an OS command to the OS.

Related

Difference between using os.system and snakemake.shell

I'm fairly new to snakemake, so I still struggle with combining shell commands and python code.
My solution is to make script files and then perform the shell command within this script.
Is there any mechanical difference between envoking snakemake.shell and os.system for executing command lines?
Example:
sample = ["SRR12845350"]
rule prefetch:
input:
"results/Metadata/{sample}.json"
output:
"results/SRA/{sample}.sra
params:
"prefetch %s -o %s"
script:
"scripts/prefetch.py"
And prefetch.pyis:
from json import load
from snakemake import shell
from os import system
json_file = snakemake.input[0]
prefetch = snakemake.params[0]
sra_file = snakemake.output[0]
json = load(open(json_file))
sra_run = json["RUN_accession"]
shell(prefetch %(sra_run, sra_file)) # option 1
system(prefetch %(sra_run, sra_file)) # option 2
shell is just a helper function to make it easier to call command-line arguments from snakemake. Learning snakemake can be overwhelming, and learning the fine intricacies of Python's os.system and subprocess is unnecessarily complicating. The snakemake shell command does a couple sanity checks, sets some environment variables e.g. the number of threads the command can use and some other "small" stuff, but under the hood just calls subprocess.Popen on your command. Both options should work, but since you are writing a snakemake wrapper, it's probably slightly better to use shell as it is designed to be used in snakemake.

Execute batch file in different directory

I have a a file structure like the following (Windows):
D:\
dir_1\
batch_1.bat
dir_1a\
batch_2.bat
dir_2\
main.py
For the sake of this question, batch_1.bat simply calls batch_2.bat, and looks like:
cd dir_1a
start batch_2.bat %*
Opening batch_1.bat from a command prompt indeed opens batch_2.bat as it's supposed to, and from there on, everything is golden.
Now I want my Python file, D:\dir_2\main.py, to spawn a new process which starts batch_1.bat, which in turn should start batch_2.bat. So I figured the following Python code should work:
import subprocess
subprocess.Popen(['cd "D:/dir_1"', "start batch_1.bat"], shell=True)
This results in "The system cannot find the path specified" being printed to my Python console. (No error is raised, of course.) This is due to the first command. I get the same result even if I cut it down to:
subprocess.Popen(['cd "D:/"'], shell=True)
I also tried starting the batch file directly, like so:
subprocess.Popen("start D:/dir_1/batch_1.bat", shell=True)
For reasons that I don't entirely get, this seems to just open a windows command prompt, in dir_2.
If I forego the start part of this command, then my Python process is going to end up waiting for batch_1 to finish, which I don't want. But it does get a little further:
subprocess.Popen("D:/dir_1/batch_1.bat", shell=True)
This results in batch_1.bat successfully executing... in dir_2, the directory of the Python script, rather than the directory of batch_1.bat, which results in it not being able to find dir_1a\ and hence, batch_2.bat is not executed at all.
I am left highly confused. What am I doing wrong, and what should I be doing instead?
Your question is answered here: Python specify popen working directory via argument
In a nutshell, just pass an optional cwd argument to Popen:
subprocess.Popen(["batch_1.bat"], shell=True, cwd=r'd:\<your path>\dir1')

Python subprocess command execution got stuck

I have an issue where a unix command executed with the python subprocess module is stuck:
(The full code is here:
https://github.com/discoproject/disco/blob/master/lib/disco/worker/classic/func.py)
The unix command is a simple in-place sort.
The way the process is created:
env = os.environ.copy()
env['LC_ALL'] = 'C'
cmd, shell = sort_cmd(filename, sort_buffer_size)
subprocess.check_call(cmd, env=env, shell=shell)
where the sort_cmd is:
def sort_cmd(filename, sort_buffer_size):
return (r"sort -z -t$'\xff' -k 1,1 -T . -S {0} -o {1} {1}"
.format(sort_buffer_size, filename), True)
The input file (which is also the output file) of the sort command is empty. The file was not empty before calling this command (it is printed).
The question is, if this is a python issue, how could the file be empty. (One hypothesis is this python 2.7 bug: http://bugs.python.org/issue19809).
Issuing strace on the sort process showed that it was stuck on a futex. Unfortunately, I haven't been able to reproduce this problem and I do not have the input file. When the sort process was killed manually, it returned (with an error of course).
I am using gnu coreutils 8.10
This cannot be a python issue, as it all happens from within the subshell and Python has no notion that it is a filename whatsoever.
In fact, the sort command outputs its output file for writing and empties it. If it is the input file at the same time, you are lost.
A solution could be to output everything into a temporary file and rename that afterwards.

Run commands sequential from Python

I'm trying to build a LaTeX document using Python but am having problems getting the commands to run in sequence. For those familiar with LaTeX, you'll know that you usually have to run four commands, each completing before running the next, e.g.
pdflatex file
bibtex file
pdflatex file
pdflatex file
In Python, I'm therefore doing this to define the commands
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
commands = [(element + ' ' + src_file) for element in commands]
but the problem is then running them.
I've tried to suss things out from this thread – e.g. using os.system() in a loop, subprocess stuff like map(call, commands) or Popen, and collapsing the list to a single string separated by & – but it seems like the commands all run as separate processes, without waiting for the previous one to complete.
For the record, I'm on Windows but would like a cross-platform solution.
EDIT
The problem was a bug in speciyfing the src_file variable; it's shouldn't have a ".tex". The following code now works:
test.py
import subprocess
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
for command in commands:
subprocess.call((command, 'test'))
test.tex
\documentclass{article}
\usepackage{natbib}
\begin{document}
This is a test \citep{Body2000}.
\bibliographystyle{plainnat}
\bibliography{refs}
\end{document}
refs.bib
#book{Body2000,
author={N.E. Body},
title={Introductory Widgets},
publisher={Widgets International},
year={2000}
}
os.system shouldn't cause this, but subprocess.Popen should.
But I think using subprocess.call is the best choice:
commands = ['pdflatex','bibtex','pdflatex','pdflatex']
for command in commands:
subprocess.call((command, src_file))

Write to a FIFO from a Python program

I a trying to control the volume of mplayer from a python program. The mplayer program gets started from a bash script:
#!/bin/bash
mkfifo /home/administrator/files/mplayer-control.pipe
/usr/bin/mplayer -slave -input file=/home/administrator/files/mplayer-control.pipe /home/administrator/music/file.mp3
Then I have a GUI written in Python that is supposed to be able to control the volume of the instance of mplayer that is being played. I have tried the following:
os.system('echo "set_property volume $musicvol" > /home/administrator/files/mplayer-control.pipe')
That works if i substitute $musicvol with the numeric value instead, but that is unfortunately of no use. I need to be able to pass the variable.
I would also be able to solve it by invoking a bash script from the Python application, but I can not get that to work either:
subprocess.call("/home/administrator/files/setvolume.sh", executable="bash", shell=True)
You don't need to call os.system and invoke a shell to write that line to the FIFO from your Python script- you can just do:
new_volume = 50
with open("/home/administrator/files/mplayer-control.pipe","w") as fp:
fp.write("set_property volume %d\n" % (new_volume,))
It's not clear to me what you expect to happen in your original python, though - is musicvol set in the environment? If instead it's a Python variable that you want to insert into the string that you're passing, the easiest way is to use the string interpolation operator (%) as I've done in the example above.
In your example of using subprocess.call you don't need the executable or shell keyword arguments if setvolume.sh is executable and has a #! line - you could just do:
subprocess.call("/home/administrator/files/setvolume.sh")
However, it's better to just use open and write in Python as above, I think.

Categories

Resources