Dos2unix not working when trying to silence command - python

I was calling dos2unix from within Python this way:
call("dos2unix " + file1, shell=True, stdout=PIPE)
However to silence the Unix output, I did this:
f_null = open(os.devnull, 'w')
call("dos2unix " + file1, shell=True, stdout=f_null , stderr=subprocess.STDOUT)
This doesn't seem to work. The command isn't being called anymore as the diff that I perform on the file1 against file2 (did a diff -y file1 file2 | cat -t and could see the line endings hadn't changed).
file2 is the file I am comparing file1 against. It has Unix line endings as it is generated on the box. However, there is a chance that file1 doesn't.

Not sure, why but I would try to get rid of the "noise" around your command & check return code:
check_call(["dos2unix",file1], stdout=f_null , stderr=subprocess.STDOUT)
pass as list of args, not command line (support for files with spaces in it!)
remove shell=True as dos2unix isn't a built-in shell command
use check_call so it raises an exception instead of failing silently
At any rate, it is possible that dos2unix detects that the output isn't a tty anymore and decides to dump the output in it instead (dos2unix can work from standard input & to standard output). I'd go with that explanation. You could check it by redirecting to a real file instead of os.devnull and check if the result is there.
But I would do a pure python solution instead (with a backup for safety), which is portable and doesn't need dos2unix command (so it works on Windows as well):
with open(file1,"rb") as f:
contents = f.read().replace(b"\r\n",b"\n")
with open(file1+".bak","wb") as f:
f.write(contents)
os.remove(file1)
os.rename(file1+".bak",file1)
reading the file fully is fast, but could choke on really big files. A line-by-line solution is also possible (still using the binary mode):
with open(file1,"rb") as fr, open(file1+".bak","wb") as fw:
for l in fr:
fw.write(l.replace(b"\r\n",b"\n"))
os.remove(file1)
os.rename(file1+".bak",file1)

Related

Python subprocess module: what the code does?

What this piece of code is doing?
with open(temp_path) as f:
command = "xdg-open"
subprocess.Popen(
["im=$(cat);" + command + " $im; rm -f $im"], shell=True, stdin=f
)
I'm confused with the subprocess part...
What the shell script does?
im=$(cat)
uses cat to read the standard input, and assigns the result to the variable im. Since you use stdin=f, that reads the contents of temp_path.
command + " $im;`
executes the command xdg-open with $im as its argument. So this uses the contents of the file as the argument to xdg-open, which opens the file in its default application. Since the argument should be a filename, this implies that temp_path contains a filename.
rm -f $im
deletes the file that was opened.
This seems like a silly way to do this. A better way to write it would be:
with open(temp_path) as f:
filename = f.read().strip()
command = "xdg-open"
subprocess.Popen([command, filename])
os.remove(filename)
Although I haven't seen the rest of the script, I suspect the temp path is also unnecessary when doing it this way -- it seems like it was just using that as a way to get the filename into the shell variable.

subprocess.call() throws error "FileNotFoundError: [Errno 2] No such file or directory" when redirecting stdout to file

I want to redirect the console output to a textfile for further inspection.
The task is to extract TIFF-TAGs from a raster file (TIFF) and filter the results.
In order to achieve this, I have several tools at hand. Some of them are not python libraries, but command-line tools, such as "identify" of ImageMagick.
My example command-string passed to subprocess.check_call() was:
cmd_str = 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
Here, in the output of the TIFF-TAGs produced by "identify" all lines which contain information about the TAG number "274" shall be either displayed in the console, or written to a file.
Error-type 1: Displaying in the console
subprocess.check_call(bash_str, shell=True)
subprocess.CalledProcessError: Command 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"' returned non-zero exit status 1.
Error-type 2: Redirecting the output to textfile
subprocess.call(bash_str, stdout=filehandle_dummy, stderr=filehandle_dummy
FileNotFoundError: [Errno 2] No such file or directory: 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"': 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
CODE
These subprocess.check_call() functions were executed by the following convenience function:
def subprocess_stdout_to_console_or_file(bash_str, filehandle=None):
"""Function documentation:\n
Convenience tool which either prints out directly in the provided shell, i.e. console,
or redirects the output to a given file.
NOTE on file redirection: it must not be the filepath, but the FILEHANDLE,
which can be achieved via the open(filepath, "w")-function, e.g. like so:
filehandle = open('out.txt', 'w')
print(filehandle): <_io.TextIOWrapper name='bla_dummy.txt' mode='w' encoding='UTF-8'>
"""
# Check whether a filehandle has been passed or not
if filehandle is None:
# i) If not, just direct the output to the BASH (shell), i.e. the console
subprocess.check_call(bash_str, shell=True)
else:
# ii) Otherwise, write to the provided file via its filehandle
subprocess.check_call(bash_str, stdout=filehandle)
The code piece where everything takes place is already redirecting the output of print() to a textfile. The aforementioned function is called within the function print_out_all_TIFF_Tags_n_filter_for_desired_TAGs().
As the subprocess-outputs are not redirected automatically along with the print()-outputs, it is necessary to pass the filehandle to the subprocess.check_call(bash_str, stdout=filehandle) via its keyword-argument stdout.
Nevertheless, the above-mentioned error would also happen outside this redirection zone of stdout created by contextlib.redirect_stdout().
dummy_filename = "/home/andylu/bla_dummy.txt" # will be saved temporarily in the user's home folder
# NOTE on scope: redirect sys.stdout for python 3.4x according to the following website_
# https://stackoverflow.com/questions/14197009/how-can-i-redirect-print-output-of-a-function-in-python
with open(dummy_filename, 'w') as f:
with contextlib.redirect_stdout(f):
print_out_all_TIFF_Tags_n_filter_for_desired_TAGs(
TIFF_filepath)
EDIT:
For more security, the piping-process should be split up as mentioned in the following, but this didn't really work out for me.
If you have an explanation for why a split-up piping process like
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
doesn't produce the output-textfile while still exiting successfully, I'd be delighted to learn about the reasons.
OS and Python versions
OS:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
Python:
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
As for the initial error mentioned in the question:
The comments answered it with that I needed to put in all calls of subprocess.check_call() the kwarg shell=True if I wanted to pass on a prepared shell-command string like
gdalinfo TIFF_filepath | grep 'Pixel Size =' > path_to_textfile
As a sidenote, I noticed that it doesn't make a difference if I enquote the paths or not. I'm not sure whether it makes a difference using single (') or double (") quotes.
Furthermore, for the sake of security outlined in the comments to my questions, I followed the docs about piping savely avoiding shell and consequently changed from my previous standard approach
subprocess.check_call(shell_str, shell=True)
to the (somewhat cumbersome) piping steps delineated hereafter:
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
In order to get these sequences of command-strings from the initial entire shell-string, I had to write custom string-manipulation functions and play around with them in order to get the strings (like filepaths) enquoted while avoiding to enquote other functional parameters, flags etc. (like -i, >, ...).
This quite complex approach was necessary since shlex.split() function just splitted my shell-command-strings at every whitespace character, which lead to problems when recombining them in the pipes.
Yet in spite of all these apparent improvements, there is no output textfile generated, albeit the process seemingly doesn't produce any errors and finishes "correctly" after the last line of the piping process:
output = p2.communicate()[0]
As a consequence, I'm still forced to use the old and unsecure, but at least well-working approach via the shell:
subprocess.check_call(shell_str, shell=True)
At least it works now employing this former approach, even though I didn't manage to implement the more secure piping procedure where several commands can be glued/piped together.
I once ran into a similar issue like this and this fixed it.
cmd_str.split(' ')
My code :
# >>>>>>>>>>>>>>>>>>>>>>> UNZIP THE FILE AND RETURN THE FILE ARGUMENTS <<<<<<<<<<<<<<<<<<<<<<<<<<<<
def unzipFile(zipFile_):
# INITIALIZE THE UNZIP COMMAND HERE
cmd = "unzip -o " + zipFile_ + " -d " + outputDir
Tlog("UNZIPPING FILE " + zipFile_)
# GET THE PROCESS OUTPUT AND PIPE IT TO VARIABLE
log = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# GET BOTH THE ERROR LOG AND OUTPUT LOG FOR IT
stdout, stderr = log.communicate()
# FORMAT THE OUTPUT
stdout = stdout.decode('utf-8')
stderr = stderr.decode('utf-8')
if stderr != "" :
Tlog("ERROR WHILE UNZIPPING FILE \n\n\t"+stderr+'\n')
sys.exit(0)
# INITIALIZE THE TOTAL UNZIPPED ITEMS
unzipped_items = []
# DECODE THE STDOUT TO 'UTF-8' FORMAT AND PARSE LINE BY LINE
for line in stdout.split('\n'):
# CHECK IF THE LINE CONTAINS KEYWORD 'inflating'
if Regex.search(r"inflating",line) is not None:
# FIND ALL THE MATCHED STRING WITH REGEX
Matched = Regex.findall(r"inflating: "+outputDir+"(.*)",line)[0]
# SUBSTITUTE THE OUTPUT BY REMOVING BEGIN/END WHITESPACES
Matched = Regex.sub('^\s+|\s+$','',Matched)
# APPEND THE OUTPUTS TO LIST
unzipped_items.append(outputDir+Matched)
# RETURN THE OUTPUT
return unzipped_items

Python3 subprocess output

I want to run the Linux word count utility wc to determine the number of lines currently in the /var/log/syslog, so that I can detect that it's growing. I've tried various test, and while I get the results back from wc, it includes both the line count as well as the command (e.g., var/log/syslog).
So it's returning:
1338 /var/log/syslog
But I only want the line count, so I want to strip off the /var/log/syslog portion, and just keep 1338.
I have tried converting it to string from bytestring, and then stripping the result, but no joy. Same story for converting to string and stripping, decoding, etc - all fail to produce the output I'm looking for.
These are some examples of what I get, with 1338 lines in syslog:
b'1338 /var/log/syslog\n'
1338 /var/log/syslog
Here's some test code I've written to try and crack this nut, but no solution:
import subprocess
#check_output returns byte string
stdoutdata = subprocess.check_output("wc --lines /var/log/syslog", shell=True)
print("2A stdoutdata: " + str(stdoutdata))
stdoutdata = stdoutdata.decode("utf-8")
print("2B stdoutdata: " + str(stdoutdata))
stdoutdata=stdoutdata.strip()
print("2C stdoutdata: " + str(stdoutdata))
The output from this is:
2A stdoutdata: b'1338 /var/log/syslog\n'
2B stdoutdata: 1338 /var/log/syslog
2C stdoutdata: 1338 /var/log/syslog
2D stdoutdata: 1338 /var/log/syslog
I suggest that you use subprocess.getoutput() as it does exactly what you want—run a command in a shell and get its string output (as opposed to byte string output). Then you can split on whitespace and grab the first element from the returned list of strings.
Try this:
import subprocess
stdoutdata = subprocess.getoutput("wc --lines /var/log/syslog")
print("stdoutdata: " + stdoutdata.split()[0])
Since Python 3.6 you can make check_output() return a str instead of bytes by giving it an encoding parameter:
check_output('wc --lines /var/log/syslog', encoding='UTF-8')
But since you just want the count, and both split() and int() are usable with bytes, you don't need to bother with the encoding:
linecount = int(check_output('wc -l /var/log/syslog').split()[0])
While some things might be easier with an external program (e.g., counting log line entries printed by journalctl), in this particular case you don't need to use an external program. The simplest Python-only solution is:
with open('/var/log/syslog', 'rt') as f:
linecount = len(f.readlines())
This does have the disadvantage that it reads the entire file into memory; if it's a huge file instead initialize linecount = 0 before you open the file and use a for line in f: linecount += 1 loop instead of readlines() to have only a small part of the file in memory as you count.
To avoid invoking a shell and decoding filenames that might be an arbitrary byte sequence (except '\0') on *nix, you could pass the file as stdin:
import subprocess
with open(b'/var/log/syslog', 'rb') as file:
nlines = int(subprocess.check_output(['wc', '-l'], stdin=file))
print(nlines)
Or you could ignore any decoding errors:
import subprocess
stdoutdata = subprocess.check_output(['wc', '-l', '/var/log/syslog'])
nlines = int(stdoutdata.decode('ascii', 'ignore').partition(' ')[0])
print(nlines)
Equivalent to Curt J. Sampson's answer is also this one (it's returning a string):
subprocess.check_output('wc -l /path/to/your/file | cut -d " " -f1', universal_newlines=True, shell=True)
from docs:
If encoding or errors are specified, or text is true, file objects for
stdin, stdout and stderr are opened in text mode using the specified
encoding and errors or the io.TextIOWrapper default. The
universal_newlines argument is equivalent to text and is provided for
backwards compatibility. By default, file objects are opened in binary
mode.
Something similar, but a bit more complex using subprocess.run():
subprocess.run(command, shell=True, check=True, universal_newlines=True, stdout=subprocess.PIPE).stdout
as subprocess.check_output() could be equivalent to subprocess.run().
getoutput (and the closer replacement getstatusoutput) are not a direct replacement of check_output - there are security changes in 3.x that prevent some previous commands from working that way (my script was attempting to work with iptables and failing with the new commands). Better to adapt to the new python3 output and add the argument universal_newlines=True:
check_output(command, universal_newlines=True)
This command will behave as you expect check_output, but return string output instead of bytes. It's a direct replacement.

correct way to write to pipe line by line in Python

How can I write to stdout from Python and feed it simultaneously (via a Unix pipe) to another program? For example if you have
# write file line by line
with open("myfile") as f:
for line in f:
print line.strip()
But you'd like that to go line by line to another program, e.g. | wc -l so that it outputs the lines in myfile. How can that be done? thanks.
If you want to pipe python to wc externally, that's easy, and will just work:
python myscript.py | wc -l
If you want to tee it so its output both gets printed and gets piped to wc, try man tee or, better, your shell's built-in fancy redirection features.
If you're looking to run wc -l from within the script, and send your output to both stdout and it, you can do that too.
First, use subprocess.Popen to start wc -l:
wc = subprocess.Popen(['wc', '-l'], stdin=subprocess.PIPE)
Now, you can just do this:
# write file line by line
with open("myfile") as f:
for line in f:
stripped = line.strip()
wc.stdin.write(stripped + '\n')
That will have wc's output go to the same place as your script's. If that's not what you want, you can also make its stdout a PIPE. In that case, you want to use communicate instead of trying to get all the fiddly details right manually.

writing command line output to file

I am writing a script to clean up my desktop, moving files based on file type. The first step, it would seem, is to ls -1 /Users/user/Desktop (I'm on Mac OSX). So, using Python, how would I run a command, then write the output to a file in a specific directory? Since this will be undocumented, and I'll be the only user, I don't mind (prefer?) if it uses os.system().
You can redirect standard output to any file using > in command.
$ ls /Users/user/Desktop > out.txt
Using python,
os.system('ls /Users/user/Desktop > out.txt')
However, if you are using python then instead of using ls command you can use os.listdir to list all the files in the directory.
path = '/Users/user/Desktop'
files = os.listdir(path)
print files
After skimming the python documentation to run shell command and obtain the output you can use the subprocess module with the check_output method.
After that you can simple write that output to a file with the standard Python IO functions: File IO in python.
To open a file, you can use the f = open(/Path/To/File) command. The syntax is f = open('/Path/To/File', 'type') where 'type' is r for reading, w for writing, and a for appending. The commands to do this are f.read() and f.write('content_to_write'). To get the output from a command line command, you have to use popen and subprocess instead of os.system(). os.system() doesn't return a value. You can read more on popen here.

Categories

Resources