I want to run the Linux word count utility wc to determine the number of lines currently in the /var/log/syslog, so that I can detect that it's growing. I've tried various test, and while I get the results back from wc, it includes both the line count as well as the command (e.g., var/log/syslog).
So it's returning:
1338 /var/log/syslog
But I only want the line count, so I want to strip off the /var/log/syslog portion, and just keep 1338.
I have tried converting it to string from bytestring, and then stripping the result, but no joy. Same story for converting to string and stripping, decoding, etc - all fail to produce the output I'm looking for.
These are some examples of what I get, with 1338 lines in syslog:
b'1338 /var/log/syslog\n'
1338 /var/log/syslog
Here's some test code I've written to try and crack this nut, but no solution:
import subprocess
#check_output returns byte string
stdoutdata = subprocess.check_output("wc --lines /var/log/syslog", shell=True)
print("2A stdoutdata: " + str(stdoutdata))
stdoutdata = stdoutdata.decode("utf-8")
print("2B stdoutdata: " + str(stdoutdata))
stdoutdata=stdoutdata.strip()
print("2C stdoutdata: " + str(stdoutdata))
The output from this is:
2A stdoutdata: b'1338 /var/log/syslog\n'
2B stdoutdata: 1338 /var/log/syslog
2C stdoutdata: 1338 /var/log/syslog
2D stdoutdata: 1338 /var/log/syslog
I suggest that you use subprocess.getoutput() as it does exactly what you want—run a command in a shell and get its string output (as opposed to byte string output). Then you can split on whitespace and grab the first element from the returned list of strings.
Try this:
import subprocess
stdoutdata = subprocess.getoutput("wc --lines /var/log/syslog")
print("stdoutdata: " + stdoutdata.split()[0])
Since Python 3.6 you can make check_output() return a str instead of bytes by giving it an encoding parameter:
check_output('wc --lines /var/log/syslog', encoding='UTF-8')
But since you just want the count, and both split() and int() are usable with bytes, you don't need to bother with the encoding:
linecount = int(check_output('wc -l /var/log/syslog').split()[0])
While some things might be easier with an external program (e.g., counting log line entries printed by journalctl), in this particular case you don't need to use an external program. The simplest Python-only solution is:
with open('/var/log/syslog', 'rt') as f:
linecount = len(f.readlines())
This does have the disadvantage that it reads the entire file into memory; if it's a huge file instead initialize linecount = 0 before you open the file and use a for line in f: linecount += 1 loop instead of readlines() to have only a small part of the file in memory as you count.
To avoid invoking a shell and decoding filenames that might be an arbitrary byte sequence (except '\0') on *nix, you could pass the file as stdin:
import subprocess
with open(b'/var/log/syslog', 'rb') as file:
nlines = int(subprocess.check_output(['wc', '-l'], stdin=file))
print(nlines)
Or you could ignore any decoding errors:
import subprocess
stdoutdata = subprocess.check_output(['wc', '-l', '/var/log/syslog'])
nlines = int(stdoutdata.decode('ascii', 'ignore').partition(' ')[0])
print(nlines)
Equivalent to Curt J. Sampson's answer is also this one (it's returning a string):
subprocess.check_output('wc -l /path/to/your/file | cut -d " " -f1', universal_newlines=True, shell=True)
from docs:
If encoding or errors are specified, or text is true, file objects for
stdin, stdout and stderr are opened in text mode using the specified
encoding and errors or the io.TextIOWrapper default. The
universal_newlines argument is equivalent to text and is provided for
backwards compatibility. By default, file objects are opened in binary
mode.
Something similar, but a bit more complex using subprocess.run():
subprocess.run(command, shell=True, check=True, universal_newlines=True, stdout=subprocess.PIPE).stdout
as subprocess.check_output() could be equivalent to subprocess.run().
getoutput (and the closer replacement getstatusoutput) are not a direct replacement of check_output - there are security changes in 3.x that prevent some previous commands from working that way (my script was attempting to work with iptables and failing with the new commands). Better to adapt to the new python3 output and add the argument universal_newlines=True:
check_output(command, universal_newlines=True)
This command will behave as you expect check_output, but return string output instead of bytes. It's a direct replacement.
Related
I want to redirect the console output to a textfile for further inspection.
The task is to extract TIFF-TAGs from a raster file (TIFF) and filter the results.
In order to achieve this, I have several tools at hand. Some of them are not python libraries, but command-line tools, such as "identify" of ImageMagick.
My example command-string passed to subprocess.check_call() was:
cmd_str = 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
Here, in the output of the TIFF-TAGs produced by "identify" all lines which contain information about the TAG number "274" shall be either displayed in the console, or written to a file.
Error-type 1: Displaying in the console
subprocess.check_call(bash_str, shell=True)
subprocess.CalledProcessError: Command 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"' returned non-zero exit status 1.
Error-type 2: Redirecting the output to textfile
subprocess.call(bash_str, stdout=filehandle_dummy, stderr=filehandle_dummy
FileNotFoundError: [Errno 2] No such file or directory: 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"': 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
CODE
These subprocess.check_call() functions were executed by the following convenience function:
def subprocess_stdout_to_console_or_file(bash_str, filehandle=None):
"""Function documentation:\n
Convenience tool which either prints out directly in the provided shell, i.e. console,
or redirects the output to a given file.
NOTE on file redirection: it must not be the filepath, but the FILEHANDLE,
which can be achieved via the open(filepath, "w")-function, e.g. like so:
filehandle = open('out.txt', 'w')
print(filehandle): <_io.TextIOWrapper name='bla_dummy.txt' mode='w' encoding='UTF-8'>
"""
# Check whether a filehandle has been passed or not
if filehandle is None:
# i) If not, just direct the output to the BASH (shell), i.e. the console
subprocess.check_call(bash_str, shell=True)
else:
# ii) Otherwise, write to the provided file via its filehandle
subprocess.check_call(bash_str, stdout=filehandle)
The code piece where everything takes place is already redirecting the output of print() to a textfile. The aforementioned function is called within the function print_out_all_TIFF_Tags_n_filter_for_desired_TAGs().
As the subprocess-outputs are not redirected automatically along with the print()-outputs, it is necessary to pass the filehandle to the subprocess.check_call(bash_str, stdout=filehandle) via its keyword-argument stdout.
Nevertheless, the above-mentioned error would also happen outside this redirection zone of stdout created by contextlib.redirect_stdout().
dummy_filename = "/home/andylu/bla_dummy.txt" # will be saved temporarily in the user's home folder
# NOTE on scope: redirect sys.stdout for python 3.4x according to the following website_
# https://stackoverflow.com/questions/14197009/how-can-i-redirect-print-output-of-a-function-in-python
with open(dummy_filename, 'w') as f:
with contextlib.redirect_stdout(f):
print_out_all_TIFF_Tags_n_filter_for_desired_TAGs(
TIFF_filepath)
EDIT:
For more security, the piping-process should be split up as mentioned in the following, but this didn't really work out for me.
If you have an explanation for why a split-up piping process like
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
doesn't produce the output-textfile while still exiting successfully, I'd be delighted to learn about the reasons.
OS and Python versions
OS:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
Python:
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
As for the initial error mentioned in the question:
The comments answered it with that I needed to put in all calls of subprocess.check_call() the kwarg shell=True if I wanted to pass on a prepared shell-command string like
gdalinfo TIFF_filepath | grep 'Pixel Size =' > path_to_textfile
As a sidenote, I noticed that it doesn't make a difference if I enquote the paths or not. I'm not sure whether it makes a difference using single (') or double (") quotes.
Furthermore, for the sake of security outlined in the comments to my questions, I followed the docs about piping savely avoiding shell and consequently changed from my previous standard approach
subprocess.check_call(shell_str, shell=True)
to the (somewhat cumbersome) piping steps delineated hereafter:
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
In order to get these sequences of command-strings from the initial entire shell-string, I had to write custom string-manipulation functions and play around with them in order to get the strings (like filepaths) enquoted while avoiding to enquote other functional parameters, flags etc. (like -i, >, ...).
This quite complex approach was necessary since shlex.split() function just splitted my shell-command-strings at every whitespace character, which lead to problems when recombining them in the pipes.
Yet in spite of all these apparent improvements, there is no output textfile generated, albeit the process seemingly doesn't produce any errors and finishes "correctly" after the last line of the piping process:
output = p2.communicate()[0]
As a consequence, I'm still forced to use the old and unsecure, but at least well-working approach via the shell:
subprocess.check_call(shell_str, shell=True)
At least it works now employing this former approach, even though I didn't manage to implement the more secure piping procedure where several commands can be glued/piped together.
I once ran into a similar issue like this and this fixed it.
cmd_str.split(' ')
My code :
# >>>>>>>>>>>>>>>>>>>>>>> UNZIP THE FILE AND RETURN THE FILE ARGUMENTS <<<<<<<<<<<<<<<<<<<<<<<<<<<<
def unzipFile(zipFile_):
# INITIALIZE THE UNZIP COMMAND HERE
cmd = "unzip -o " + zipFile_ + " -d " + outputDir
Tlog("UNZIPPING FILE " + zipFile_)
# GET THE PROCESS OUTPUT AND PIPE IT TO VARIABLE
log = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# GET BOTH THE ERROR LOG AND OUTPUT LOG FOR IT
stdout, stderr = log.communicate()
# FORMAT THE OUTPUT
stdout = stdout.decode('utf-8')
stderr = stderr.decode('utf-8')
if stderr != "" :
Tlog("ERROR WHILE UNZIPPING FILE \n\n\t"+stderr+'\n')
sys.exit(0)
# INITIALIZE THE TOTAL UNZIPPED ITEMS
unzipped_items = []
# DECODE THE STDOUT TO 'UTF-8' FORMAT AND PARSE LINE BY LINE
for line in stdout.split('\n'):
# CHECK IF THE LINE CONTAINS KEYWORD 'inflating'
if Regex.search(r"inflating",line) is not None:
# FIND ALL THE MATCHED STRING WITH REGEX
Matched = Regex.findall(r"inflating: "+outputDir+"(.*)",line)[0]
# SUBSTITUTE THE OUTPUT BY REMOVING BEGIN/END WHITESPACES
Matched = Regex.sub('^\s+|\s+$','',Matched)
# APPEND THE OUTPUTS TO LIST
unzipped_items.append(outputDir+Matched)
# RETURN THE OUTPUT
return unzipped_items
I was calling dos2unix from within Python this way:
call("dos2unix " + file1, shell=True, stdout=PIPE)
However to silence the Unix output, I did this:
f_null = open(os.devnull, 'w')
call("dos2unix " + file1, shell=True, stdout=f_null , stderr=subprocess.STDOUT)
This doesn't seem to work. The command isn't being called anymore as the diff that I perform on the file1 against file2 (did a diff -y file1 file2 | cat -t and could see the line endings hadn't changed).
file2 is the file I am comparing file1 against. It has Unix line endings as it is generated on the box. However, there is a chance that file1 doesn't.
Not sure, why but I would try to get rid of the "noise" around your command & check return code:
check_call(["dos2unix",file1], stdout=f_null , stderr=subprocess.STDOUT)
pass as list of args, not command line (support for files with spaces in it!)
remove shell=True as dos2unix isn't a built-in shell command
use check_call so it raises an exception instead of failing silently
At any rate, it is possible that dos2unix detects that the output isn't a tty anymore and decides to dump the output in it instead (dos2unix can work from standard input & to standard output). I'd go with that explanation. You could check it by redirecting to a real file instead of os.devnull and check if the result is there.
But I would do a pure python solution instead (with a backup for safety), which is portable and doesn't need dos2unix command (so it works on Windows as well):
with open(file1,"rb") as f:
contents = f.read().replace(b"\r\n",b"\n")
with open(file1+".bak","wb") as f:
f.write(contents)
os.remove(file1)
os.rename(file1+".bak",file1)
reading the file fully is fast, but could choke on really big files. A line-by-line solution is also possible (still using the binary mode):
with open(file1,"rb") as fr, open(file1+".bak","wb") as fw:
for l in fr:
fw.write(l.replace(b"\r\n",b"\n"))
os.remove(file1)
os.rename(file1+".bak",file1)
I am logging into a remote node using SSH, getting the status of a service and want to print it.
Running the bash command on my remote node yields.
[root#redis-1 ~]# redis-cli -a '!t3bmjEJss' info replication | grep role | cut -d':' -f2
slave
The python code that Ive written is
def serviceDetails(ip,svc):
if svc == 'redis-server':
ssh = subprocess.Popen(["ssh", "%s" % ip, "redis-cli -a '!t3Z9LJt2_wmUDbmjEJss' info replication | grep role | cut -d':' -f2"], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
result = ssh.stdout.readlines()
print (result)
else:
print ("Redis service is not running on this node")
The output that I am getting from result variable is:
[b'slave\r\n']
Why do all these extra characters appear ? And how can I get rid of them ?
The entire process of calling subprocess.Popen and then manually reading from its stdout property can be condensed into one call which will also automatically performs the bytes to string conversion:
subprocess.check_output([arg0, arg1, ...], encoding='utf-8')
If you also want to read stderr, then include a stderr=subprocess.STDOUT.
You can find the docs for subprocess.check_output here.
When you use .readlines(), it will return a list of lines. You can use .read() if you want it all in one string. It has the b there because it is a byte string. To get it to a normal string, you can use .decode('utf-8') in most cases. It may be a different encoding, but utf-8 will probably work. Then to get rid of the new line, you can use .strip(). Putting it all together, either of these would work:
result = ssh.stdout.read().decode('utf-8').strip()
print(result)
# slave
or
result = [line.decode('utf-8').strip() for line in ssh.stdout.readlines()]
print(result)
# ['slave']
Either one will work when you have only one line. If you have more than one line, the first will not work properly; it will have \r\n in the middle of the string.
This question already has answers here:
Using subprocess.run with arguments containing quotes
(3 answers)
Closed 1 year ago.
I am trying to run http://mediaarea.net/en/MediaInfo command-line utility from python.
It accepts arguments like this.
*Simple Usage: *
# verbose all info
MediaInfo.exe test.mp4
Template Usage:
# verbose selected info from csv
MediaInfo.exe --inform="file://D:\path\to\csv\template.csv" test.mp4
I am trying to run it with Template argument.I can use above command successfully from CMD.It is working and i can see my selected output fine from Dos window.
But when I try to run it from python , it outputs all info ignoring CSV which I give as argument.
Can anyone explain why ? It is because of quotes ?
NOTE: If path to csv not correct/invalid csv, MediaInfo outputs all info which is happening here exactly.
#App variable is full path to MediaInfo.exe
#filename variable is full path to media file
proc = subprocess.Popen([App ,'--inform="file://D:\path\to\csv\template.csv"',filename],shell=True,stderr=subprocess.PIPE, stdout=subprocess.PIPE)
return_code = proc.wait()
for line in proc.stdout:
print line
On Windows, you could pass the command as string i.e., as is:
from subprocess import check_output
cmd = r'MediaInfo.exe --inform="file://D:\path\to\csv\template.csv" test.mp4'
out = check_output(cmd)
Notice: r'' -- the raw-string literal is used to avoid interpreting '\t' as a single tab character instead of r'\t' two characters (backslash and t).
Unrelated: if you have specified stdout=PIPE, stderr=PIPE then you should read both streams concurrently and before p.wait() is called otherwise a deadlock is possible if the command generates enough output.
If the passing of the command as a string works then your could try a list argument:
from subprocess import check_output
from urllib import pathname2url
cmd = [app, '--inform']
cmd += ['file:' + pathname2url(r'D:\path\to\csv\template.csv')]
cmd += [filename]
out = check_output(cmd)
Also can u write a example for p.wait() deadlock u mentioned.
It is easy. Just produce large output in the child process:
import sys
from subprocess import Popen, PIPE
#XXX DO NOT USE, IT DEADLOCKS
p = Popen([sys.executable, "-c", "print('.' * (1 << 23))"], stdout=PIPE)
p.wait() # <-- this never returns unless the pipe buffer is larger than (1<<23)
assert 0 # unreachable
If you print your arguments, you might see what is going wrong:
>>> print '--inform="file://D:\path\to\csv\template.csv"'
--inform="file://D:\path o\csv emplate.csv"
The problem is \ denotes special characters. If you use the "r" literal in front of your string, these special characters are not escaped:
>>> print r'--inform="file://D:\path\to\csv\template.csv"'
--inform="file://D:\path\to\csv\template.csv"
I am writing a python script that reads a line/string, calls Unix, uses grep to search a query file for lines that contain the string, and then prints the results.
from subprocess import call
for line in infilelines:
output = call(["grep", line, "path/to/query/file"])
print output
print line`
When I look at my results printed to the screen, I will get a list of matching strings from the query file, but I will also get "1" and "0" integers as output, and line is never printed to the screen. I expect to get the lines from the query file that match my string, followed by the string that I used in my search.
call returns the process return code.
If using Python 2.7, use check_output.
from subprocess import check_output
output = check_output(["grep", line, "path/to/query/file"])
If using anything before that, use communicate.
import subprocess
process = subprocess.Popen(["grep", line, "path/to/query/file"], stdout=subprocess.PIPE)
output = process.communicate()[0]
This will open a pipe for stdout that you can read with communicate. If you want stderr too, you need to add "stderr=subprocess.PIPE" too.
This will return the full output. If you want to parse it into separate lines, use split.
output.split('\n')
I believe Python takes care of line-ending conversions for you, but since you're using grep I'm going to assume you're on Unix where the line-ending is \n anyway.
http://docs.python.org/library/subprocess.html#subprocess.check_output
The following code works with Python >= 2.5:
from commands import getoutput
output = getoutput('grep %s path/to/query/file' % line)
output_list = output.splitlines()
Why would you want to execute a call to external grep when Python itself can do it? This is extra overhead and your code will then be dependent on grep being installed. This is how you do simple grep in Python with "in" operator.
query=open("/path/to/query/file").readlines()
query=[ i.rstrip() for i in query ]
f=open("file")
for line in f:
if "line" in query:
print line.rstrip()
f.close()