Problems capturing Python subprocess output on Mac OS X

Problems capturing Python subprocess output on Mac OS X - python

I'm running Python 3.3 on Mac OS 10.6.8. I am writing a script that runs several subprocesses, and I want to capture the output of each one and record it in a file. I'm having trouble with this.
I first tried the following:
import subprocess
logFile = open("log.txt", 'w')
proc = subprocess.Popen(args, stdout=logFile, stderr=logFile)
proc.wait()
This produced an empty log.txt. After poking around on the internet for a bit, I tried this instead
import subprocess
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate()
logFile = open("log.txt", 'w')
logFile.write(output)
This, too, produced an empty log.txt. So instead of writing to the file, I tried to just print the output to the command line:
output, err = proc.communicate()
print(output)
print(err)
That produced this:
b''
b''
The process I'm trying to run is fastq_quality_trimmer. It takes an input file, filters it, and saves the result to a new file. It only writes a few lines to stdout, like so
Minimum Quality Threshold: 20
Minimum Length: 20
Input: 750000 reads.
Output: 750000 reads.
discarded 0 (0%) too-short reads.
If I run it from the command line and redirect the output like this
fastq_quality_trimmer -Q 33 -v -t 50 -l 20 -i in.fq -o in_trimmed.fq > log.txt
the output is successfully written to log.txt.
I thought perhaps that fastq_quality_trimmer was somehow failing to run when I called it with Popen, but my script produces a filtered file that is identical to the one produced when I run fastq_quality_trimmer from the command line. So it's working; I just can't capture the output. To make matters more confusing, I can successfully capture the output of other processes (echo, other Python scripts) using code that is essentially identical to what I've posted.
Any thoughts? Am I missing something blindingly obvious?

You forgot a comma:
["fastq_quality_trimmer", "-Q", "33" "-v", "-t", "50", "-l", "20", "-i", leftInitial, "-o", leftTrimmed]
add it between "33" and "-v".
You are essentially passing in the arguments -Q 33-v instead of -Q 33 -v.
Python will concatenate two adjacent strings if there is only whitespace between them:
>>> "33", "-v"
('33', '-v')
>>> "33" "-v"
'33-v'
Since -v is the verbose switch that is required to make fastq_quality_trimmer produce output at all, it'll remain silent with it missing.
Whenever you encounter problems with calling a subprocess, triple check the command line created. Pre-pending args with ['echo'] can help in that:
proc = subprocess.Popen(['echo'] + args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate()
print(output)

Related

Execute df | grep -w "/" not parsing output correctly

I am trying to run the shell command df -h | grep -w "/" using python to watch the root partition usage and wanted to avoid shell=True option for security.
The code I tried as follows:
import subprocess
p1 = subprocess.Popen(['df', '-h'], stdout=subprocess.PIPE)
p2 = subprocess.Popen(['grep', '-w', '"/"'], stdin=p1.stdout, stdout=subprocess.PIPE)
output=p2.communicate()[0]
print(output)
The output I get is:
$ ./subprocess_df_check.py
b''
Expected output is:
$ df -h | grep -w "/"
/dev/sdd 251G 4.9G 234G 3% /

The immediate problem is the unnecessary quotes being added.
p2 = subprocess.Popen(['grep', '-w', '"/"'], stdin=p1.stdout, stdout=subprocess.PIPE)
is not equivalent to the shell command grep -w "/". Instead, it's equivalent to the shell command grep -w '"/"', (or grep -w \"/\", or any other means of writing an argument vector that passes literal double-quote characters on the last non-NUL element of grep's argument vector) and wrong for the same reasons.
Use '/', not '"/"'.

Don't use subprocess with df and / or grep. If you already use python, you can use the statvfs function call like:
import os
import time
path = "/"
while True:
info = os.statvfs(path)
print("Block size [%d] Free blocks [%d] Free inodes [%d]"
% (info.f_bsize, info.f_bfree, info.f_ffree))
time.sleep(15)

Running grep in a separate subprocess is certainly unnecessary. If you are using Python, you already have an excellent tool for looking for substrings within strings.
df = subprocess.run(['df', '-h'],
capture_output=True, text=True, check=True)
for line in df.stdout.split('\n')[1:]:
if '/' in line:
print(line)
Notice also how you basically always want to prefer subprocess.run over Popen when you can, and how you want text=True to get text rather than bytes. Usually you also want check=True to ensure that the subprocess completed successfully.

Ok figured out the whole thing.
import subprocess
p1 = subprocess.Popen(['df', '-h'], stdout=subprocess.PIPE)
p2 = subprocess.Popen(['grep', '-w', '/'], stdin=p1.stdout, stdout=subprocess.PIPE)
output=p2.communicate()[0].split()[4]
print("Root partition is of", output.decode(), "usage now")
Removed unnecessary double quotes, changed from subprocess.Popen(['grep', '-w', '"/"'] to subprocess.Popen(['grep', '-w', '/']. The double quotes are for the shell, not for df. When you have no shell, you need no shell syntax.
On output=p2.communicate()[0].split()[4], the [0] picks only stdout, not the stderr, which is None if no error. Then split()[4] cuts the 4th column which is disk usage percent value from shell df command.
output.decode(), the decode() is to decode the encoded bytes string format and avoid outputting character b in front of the result. Refer here
So the output of the script is:
$ ./subprocess_df_check.py
Root partition is of 3% usage now

FFMpeg giving invalid argument error with python subprocess

I am trying to convert a file or microphone stream to 22050 sample rate and change tempo to double. I can do it using terminal with below code;
#ffmpeg -i test.mp3 -af asetrate=44100*0.5,aresample=44100,atempo=2 output.mp3
But i can not run this terminal code with python subprocess. I try many things but every time fail. Generaly i am taking Requested output format 'asetrate' or 'aresample' or 'atempo' is not suitable output format errors. Invalid argument. How can i run it and take a stream with pipe?
song = subprocess.Popen(["ffmpeg.exe", "-i", sys.argv[1], "-f", "asetrate", "22050", "wav", "pipe:1"],
stdout=subprocess.PIPE)

Your two commands are different. Try:
song = subprocess.Popen(["ffmpeg", "-i", sys.argv[1], "-af", "asetrate=22050,aresample=44100,atempo=2", "-f", "wav", "pipe:1"],
-af is for audio filter.
-f is to manually set muxer/output format

ffmpeg interprets whatever supplied by -af as a single argument that it would then parse internally into separate ones, so splitting them out before passing it via Popen would not achieve the same thing.
The initial example using the terminal should be created using Popen as
subprocess.Popen([
'ffmpeg', '-i', 'test.mp3', '-af', 'asetrate=44100*0.5,aresample=44100,atempo=2',
'output.mp3',
])
So for your actual example with pipe, try instead the following:
song = subprocess.Popen(
["ffmpeg.exe", "-i", sys.argv[1], "-f", "asetrate=22050,wav", "pipe:1"],
stdout=subprocess.PIPE
)
You will then need to call song.communicate() to get the output produced by ffmpeg.exe.

subprocess.call() throws error "FileNotFoundError: [Errno 2] No such file or directory" when redirecting stdout to file

I want to redirect the console output to a textfile for further inspection.
The task is to extract TIFF-TAGs from a raster file (TIFF) and filter the results.
In order to achieve this, I have several tools at hand. Some of them are not python libraries, but command-line tools, such as "identify" of ImageMagick.
My example command-string passed to subprocess.check_call() was:
cmd_str = 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
Here, in the output of the TIFF-TAGs produced by "identify" all lines which contain information about the TAG number "274" shall be either displayed in the console, or written to a file.
Error-type 1: Displaying in the console
subprocess.check_call(bash_str, shell=True)
subprocess.CalledProcessError: Command 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"' returned non-zero exit status 1.
Error-type 2: Redirecting the output to textfile
subprocess.call(bash_str, stdout=filehandle_dummy, stderr=filehandle_dummy
FileNotFoundError: [Errno 2] No such file or directory: 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"': 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
CODE
These subprocess.check_call() functions were executed by the following convenience function:
def subprocess_stdout_to_console_or_file(bash_str, filehandle=None):
"""Function documentation:\n
Convenience tool which either prints out directly in the provided shell, i.e. console,
or redirects the output to a given file.
NOTE on file redirection: it must not be the filepath, but the FILEHANDLE,
which can be achieved via the open(filepath, "w")-function, e.g. like so:
filehandle = open('out.txt', 'w')
print(filehandle): <_io.TextIOWrapper name='bla_dummy.txt' mode='w' encoding='UTF-8'>
"""
# Check whether a filehandle has been passed or not
if filehandle is None:
# i) If not, just direct the output to the BASH (shell), i.e. the console
subprocess.check_call(bash_str, shell=True)
else:
# ii) Otherwise, write to the provided file via its filehandle
subprocess.check_call(bash_str, stdout=filehandle)
The code piece where everything takes place is already redirecting the output of print() to a textfile. The aforementioned function is called within the function print_out_all_TIFF_Tags_n_filter_for_desired_TAGs().
As the subprocess-outputs are not redirected automatically along with the print()-outputs, it is necessary to pass the filehandle to the subprocess.check_call(bash_str, stdout=filehandle) via its keyword-argument stdout.
Nevertheless, the above-mentioned error would also happen outside this redirection zone of stdout created by contextlib.redirect_stdout().
dummy_filename = "/home/andylu/bla_dummy.txt" # will be saved temporarily in the user's home folder
# NOTE on scope: redirect sys.stdout for python 3.4x according to the following website_
# https://stackoverflow.com/questions/14197009/how-can-i-redirect-print-output-of-a-function-in-python
with open(dummy_filename, 'w') as f:
with contextlib.redirect_stdout(f):
print_out_all_TIFF_Tags_n_filter_for_desired_TAGs(
TIFF_filepath)
EDIT:
For more security, the piping-process should be split up as mentioned in the following, but this didn't really work out for me.
If you have an explanation for why a split-up piping process like
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
doesn't produce the output-textfile while still exiting successfully, I'd be delighted to learn about the reasons.
OS and Python versions
OS:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
Python:
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux

As for the initial error mentioned in the question:
The comments answered it with that I needed to put in all calls of subprocess.check_call() the kwarg shell=True if I wanted to pass on a prepared shell-command string like
gdalinfo TIFF_filepath | grep 'Pixel Size =' > path_to_textfile
As a sidenote, I noticed that it doesn't make a difference if I enquote the paths or not. I'm not sure whether it makes a difference using single (') or double (") quotes.
Furthermore, for the sake of security outlined in the comments to my questions, I followed the docs about piping savely avoiding shell and consequently changed from my previous standard approach
subprocess.check_call(shell_str, shell=True)
to the (somewhat cumbersome) piping steps delineated hereafter:
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
In order to get these sequences of command-strings from the initial entire shell-string, I had to write custom string-manipulation functions and play around with them in order to get the strings (like filepaths) enquoted while avoiding to enquote other functional parameters, flags etc. (like -i, >, ...).
This quite complex approach was necessary since shlex.split() function just splitted my shell-command-strings at every whitespace character, which lead to problems when recombining them in the pipes.
Yet in spite of all these apparent improvements, there is no output textfile generated, albeit the process seemingly doesn't produce any errors and finishes "correctly" after the last line of the piping process:
output = p2.communicate()[0]
As a consequence, I'm still forced to use the old and unsecure, but at least well-working approach via the shell:
subprocess.check_call(shell_str, shell=True)
At least it works now employing this former approach, even though I didn't manage to implement the more secure piping procedure where several commands can be glued/piped together.

I once ran into a similar issue like this and this fixed it.
cmd_str.split(' ')
My code :
# >>>>>>>>>>>>>>>>>>>>>>> UNZIP THE FILE AND RETURN THE FILE ARGUMENTS <<<<<<<<<<<<<<<<<<<<<<<<<<<<
def unzipFile(zipFile_):
# INITIALIZE THE UNZIP COMMAND HERE
cmd = "unzip -o " + zipFile_ + " -d " + outputDir
Tlog("UNZIPPING FILE " + zipFile_)
# GET THE PROCESS OUTPUT AND PIPE IT TO VARIABLE
log = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# GET BOTH THE ERROR LOG AND OUTPUT LOG FOR IT
stdout, stderr = log.communicate()
# FORMAT THE OUTPUT
stdout = stdout.decode('utf-8')
stderr = stderr.decode('utf-8')
if stderr != "" :
Tlog("ERROR WHILE UNZIPPING FILE \n\n\t"+stderr+'\n')
sys.exit(0)
# INITIALIZE THE TOTAL UNZIPPED ITEMS
unzipped_items = []
# DECODE THE STDOUT TO 'UTF-8' FORMAT AND PARSE LINE BY LINE
for line in stdout.split('\n'):
# CHECK IF THE LINE CONTAINS KEYWORD 'inflating'
if Regex.search(r"inflating",line) is not None:
# FIND ALL THE MATCHED STRING WITH REGEX
Matched = Regex.findall(r"inflating: "+outputDir+"(.*)",line)[0]
# SUBSTITUTE THE OUTPUT BY REMOVING BEGIN/END WHITESPACES
Matched = Regex.sub('^\s+|\s+$','',Matched)
# APPEND THE OUTPUTS TO LIST
unzipped_items.append(outputDir+Matched)
# RETURN THE OUTPUT
return unzipped_items

Using shlex to split commands with spaces and apostrophe

I have a problem with my python script. I want to download files from my server to my NAS. My script is downloading every file, except for files containing single quotes/apastrophes and/or spaces. I already know where the problem is, but I can't fix it. The problem is with the shlex.split() command. It just deletes the single quotes. I also looked into paramiko, but it's kinda buggy with big files, so that won't work for me. I am also open for a completly different approach.
I was also wondering if it is possible to get a return for that command, so that I know that scp succesfully downloaded my file or if it failed.
import subprocess
import shlex
def download(download_path, remote_path):
command = "sshpass -p %s scp -o StrictHostKeyChecking=no -r %s#%s:%s %s" %\
(ssh_password, ssh_user, host, remote_path, download_path)
args = shlex.split(command)
# call scp
p = subprocess.Popen(args, stdout=subprocess.PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
print line,
p.stdout.close()
p.wait()
P.S.: There already is a topic with a similar problem, which I can't get working. Passing a filename with an apostrophe into scp using python

You aren't getting an arbitrary string that needs to be split; you are basically starting out with the individual words and concatenating them yourself. Don't do that; just make command a list to begin with.
command = [
"sshpass", "-p", ssh_password,
"scp", "-o", "StrictHostKeyChecking=no", "-r",
"%s#%s:%s" % (ssh_user, host, remote_path),
download_path
]
p = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=1)

Why cannot pass arguments with subprocess.PIPE in python?

I'm trying to do something really easy in order to learn how to use subprocess in python
What I'm trying is this:
ll | egrep "*gz"
so after read the manual of python (which I didn't understand very well), I tried this:
lista = subprocess.Popen(['ls', '-alF'], stdout=subprocess.PIPE)
filtro = subprocess.Popen(['egrep', '"*gz"'], stdin=lista.stdout, stdout=subprocess.PIPE)
filtro.communicate()[0]
But all I get is '' and I don't really know how to do this, I've read this but it seems I didn't get it at all... could somebody explain to me how this works in order to use it after with other commands??
Thanks in advance!!

The problem might be the double set of quotes around the argument to egrep. Try this instead:
import subprocess
ls = subprocess.Popen(['ls', '-alF'], stdout=subprocess.PIPE)
egrep = subprocess.Popen(['egrep', '\.gz$'], stdin=ls.stdout, stdout=subprocess.PIPE)
print egrep.communicate()[0]
I am assuming you are looking for files ending in ".gz" here, as your initial regex does not make sense. If you are simply looking for files ending in "gz", you would use 'gz$' instead. And if you do not case where in the line "gz" appears, simply use 'gz'.
Edit: Here is a full example. In a directory containing the three files "pipe.py", "test1.py.gz" and "test2.py.gz", where "pipe.py" is the above script, I execute:
$ python pipe.py
With the result
-rw-r--r-- 1 amaurea amaurea 146 Jan 30 20:54 test1.py.gz
-rw-r--r-- 1 amaurea amaurea 150 Jan 30 20:54 test2.py.gz

On Unix, take advantage of shell argument:
lista = subprocess.Popen('ls -alF', stdout=subprocess.PIPE, shell=True)
filtro = subprocess.Popen('egrep "*gz"', stdin=lista.stdout, stdout=subprocess.PIPE, shell=True)
filtro.communicate()[0]
You can simply copy commands and do not wary about breaking them into argument lists.
Also you can specify shell explicitly:
subprocess.Popen('bash -c "ls -l"', stdout=subprocess.PIPE, shell=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems capturing Python subprocess output on Mac OS X - python

Related

Execute df | grep -w "/" not parsing output correctly

FFMpeg giving invalid argument error with python subprocess

subprocess.call() throws error "FileNotFoundError: [Errno 2] No such file or directory" when redirecting stdout to file

Using shlex to split commands with spaces and apostrophe

Why cannot pass arguments with subprocess.PIPE in python?

Categories

Resources