How can I get my Python script to work using bash? - python

I am new to this site so hopefully this is the correct location to place this question.
I am trying to write a script using python for Linux, that:
creates a file file.txt
appends the output of the 'lsof' command to file.txt
read each line of the output and append them to an array.
then print each line.
I'm basically just doing this to familiarize myself with using python for bash, I'm new to this area so any help would be great. I'm not sure where to go from here. Also if there is a better way to do this I'm open to that!
#!/usr/bin/env python
import subprocess
touch = "touch file.txt"
subprocess.call(touch, shell=True)
xfile = "file.txt"
connection_count = "lsof -i tcp | grep ESTABLISHED | wc -l"
count = subprocess.call(connection_count, shell=True)
if count > 0:
connection_lines = "lsof -i tcp | grep ESTABLISHED >> file.txt"
subprocess.call(connection_lines, shell=True)
with open(subprocess.call(xfile, shell=True), "r") as ins:
array = []
for line in ins:
array.append(line)
for i in array:
print i

subprocess.call returns the return code for the process that was started ($? in bash). This is almost certainly not what you want -- and explains why this line almost certainly fails:
with open(subprocess.call(xfile, shell=True), "r") as ins:
(you can't open a number).
Likely, you want to be using subprocess.Popen with stdout=subprocess.PIPE. Then you can read the output from the pipe. e.g. to get the count, you probably want something like:
connection_count = "lsof -i tcp | grep ESTABLISHED"
proc = subprocess.POPEN(connection_count, shell=True, stdout=subprocess.PIPE)
# line counting moved to python :-)
count = sum(1 for unused_line in proc.stdout)
(you could also use Popen.communicate here)
Note, excessive use of shell=True is always a bit scary for me... It's much better to chain your pipes together as demonstrated in the documentation.

Related

subprocess.call() throws error "FileNotFoundError: [Errno 2] No such file or directory" when redirecting stdout to file

I want to redirect the console output to a textfile for further inspection.
The task is to extract TIFF-TAGs from a raster file (TIFF) and filter the results.
In order to achieve this, I have several tools at hand. Some of them are not python libraries, but command-line tools, such as "identify" of ImageMagick.
My example command-string passed to subprocess.check_call() was:
cmd_str = 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
Here, in the output of the TIFF-TAGs produced by "identify" all lines which contain information about the TAG number "274" shall be either displayed in the console, or written to a file.
Error-type 1: Displaying in the console
subprocess.check_call(bash_str, shell=True)
subprocess.CalledProcessError: Command 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"' returned non-zero exit status 1.
Error-type 2: Redirecting the output to textfile
subprocess.call(bash_str, stdout=filehandle_dummy, stderr=filehandle_dummy
FileNotFoundError: [Errno 2] No such file or directory: 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"': 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'
CODE
These subprocess.check_call() functions were executed by the following convenience function:
def subprocess_stdout_to_console_or_file(bash_str, filehandle=None):
"""Function documentation:\n
Convenience tool which either prints out directly in the provided shell, i.e. console,
or redirects the output to a given file.
NOTE on file redirection: it must not be the filepath, but the FILEHANDLE,
which can be achieved via the open(filepath, "w")-function, e.g. like so:
filehandle = open('out.txt', 'w')
print(filehandle): <_io.TextIOWrapper name='bla_dummy.txt' mode='w' encoding='UTF-8'>
"""
# Check whether a filehandle has been passed or not
if filehandle is None:
# i) If not, just direct the output to the BASH (shell), i.e. the console
subprocess.check_call(bash_str, shell=True)
else:
# ii) Otherwise, write to the provided file via its filehandle
subprocess.check_call(bash_str, stdout=filehandle)
The code piece where everything takes place is already redirecting the output of print() to a textfile. The aforementioned function is called within the function print_out_all_TIFF_Tags_n_filter_for_desired_TAGs().
As the subprocess-outputs are not redirected automatically along with the print()-outputs, it is necessary to pass the filehandle to the subprocess.check_call(bash_str, stdout=filehandle) via its keyword-argument stdout.
Nevertheless, the above-mentioned error would also happen outside this redirection zone of stdout created by contextlib.redirect_stdout().
dummy_filename = "/home/andylu/bla_dummy.txt" # will be saved temporarily in the user's home folder
# NOTE on scope: redirect sys.stdout for python 3.4x according to the following website_
# https://stackoverflow.com/questions/14197009/how-can-i-redirect-print-output-of-a-function-in-python
with open(dummy_filename, 'w') as f:
with contextlib.redirect_stdout(f):
print_out_all_TIFF_Tags_n_filter_for_desired_TAGs(
TIFF_filepath)
EDIT:
For more security, the piping-process should be split up as mentioned in the following, but this didn't really work out for me.
If you have an explanation for why a split-up piping process like
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
doesn't produce the output-textfile while still exiting successfully, I'd be delighted to learn about the reasons.
OS and Python versions
OS:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
Python:
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
As for the initial error mentioned in the question:
The comments answered it with that I needed to put in all calls of subprocess.check_call() the kwarg shell=True if I wanted to pass on a prepared shell-command string like
gdalinfo TIFF_filepath | grep 'Pixel Size =' > path_to_textfile
As a sidenote, I noticed that it doesn't make a difference if I enquote the paths or not. I'm not sure whether it makes a difference using single (') or double (") quotes.
Furthermore, for the sake of security outlined in the comments to my questions, I followed the docs about piping savely avoiding shell and consequently changed from my previous standard approach
subprocess.check_call(shell_str, shell=True)
to the (somewhat cumbersome) piping steps delineated hereafter:
p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
In order to get these sequences of command-strings from the initial entire shell-string, I had to write custom string-manipulation functions and play around with them in order to get the strings (like filepaths) enquoted while avoiding to enquote other functional parameters, flags etc. (like -i, >, ...).
This quite complex approach was necessary since shlex.split() function just splitted my shell-command-strings at every whitespace character, which lead to problems when recombining them in the pipes.
Yet in spite of all these apparent improvements, there is no output textfile generated, albeit the process seemingly doesn't produce any errors and finishes "correctly" after the last line of the piping process:
output = p2.communicate()[0]
As a consequence, I'm still forced to use the old and unsecure, but at least well-working approach via the shell:
subprocess.check_call(shell_str, shell=True)
At least it works now employing this former approach, even though I didn't manage to implement the more secure piping procedure where several commands can be glued/piped together.
I once ran into a similar issue like this and this fixed it.
cmd_str.split(' ')
My code :
# >>>>>>>>>>>>>>>>>>>>>>> UNZIP THE FILE AND RETURN THE FILE ARGUMENTS <<<<<<<<<<<<<<<<<<<<<<<<<<<<
def unzipFile(zipFile_):
# INITIALIZE THE UNZIP COMMAND HERE
cmd = "unzip -o " + zipFile_ + " -d " + outputDir
Tlog("UNZIPPING FILE " + zipFile_)
# GET THE PROCESS OUTPUT AND PIPE IT TO VARIABLE
log = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# GET BOTH THE ERROR LOG AND OUTPUT LOG FOR IT
stdout, stderr = log.communicate()
# FORMAT THE OUTPUT
stdout = stdout.decode('utf-8')
stderr = stderr.decode('utf-8')
if stderr != "" :
Tlog("ERROR WHILE UNZIPPING FILE \n\n\t"+stderr+'\n')
sys.exit(0)
# INITIALIZE THE TOTAL UNZIPPED ITEMS
unzipped_items = []
# DECODE THE STDOUT TO 'UTF-8' FORMAT AND PARSE LINE BY LINE
for line in stdout.split('\n'):
# CHECK IF THE LINE CONTAINS KEYWORD 'inflating'
if Regex.search(r"inflating",line) is not None:
# FIND ALL THE MATCHED STRING WITH REGEX
Matched = Regex.findall(r"inflating: "+outputDir+"(.*)",line)[0]
# SUBSTITUTE THE OUTPUT BY REMOVING BEGIN/END WHITESPACES
Matched = Regex.sub('^\s+|\s+$','',Matched)
# APPEND THE OUTPUTS TO LIST
unzipped_items.append(outputDir+Matched)
# RETURN THE OUTPUT
return unzipped_items

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

Issues calling awk from within Python using subprocess.call

Having some issues calling awk from within Python. Normally, I'd do the following to call the command in awk from the command line.
Open up command line, in admin mode or not.
Change my directory to awk.exe, namely cd R\GnuWin32\bin
Call awk -F "," "{ print > (\"split-\" $10 \".csv\") }" large.csv
My command is used to split up the large.csv file based on the 10th column into a number of files named split-[COL VAL HERE].csv. I have no issues running this command. I tried to run the same code in Python using subprocess.call() but I'm having some issues. I run the following code:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', '\",\"',
'\"{ print > (\\"split-\\" $10 \\".csv\\") }\"', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')
and clearly, something is running when I execute the function (CPU usage, etc) but when I go to check C:/R/GnuWin32/bin/ there are no split files in the directory. Any idea on what's going wrong?
As I stated in my previous answer that was downvoted, you overprotect the arguments, making awk argument parsing fail.
Since there was no comment, I supposed there was a typo but it worked... So I suppose that's because I should have strongly suggested a full-fledged python solution, which is the best thing to do here (as stated in my previous answer)
Writing the equivalent in python is not trivial as we have to emulate the way awk opens files and appends to them afterwards. But it is more integrated, pythonic and handles quoting properly if quoting occurs in the input file.
I took the time to code & test it:
def split_ByInputColumn():
# get rid of the old data from previous runs
for f in glob.glob("split-*.csv"):
os.remove(f)
open_files = dict()
with open('large.csv') as f:
cr = csv.reader(f,delimiter=',')
for r in cr:
tenth_row = r[9]
filename = "split-{}.csv".format(tenth_row)
if not filename in open_files:
handle = open(filename,"wb")
open_files[filename] = (handle,csv.writer(handle,delimiter=','))
open_files[filename][1].writerow(r)
for f,_ in open_files.values():
f.close()
split_ByInputColumn()
in detail:
read the big file as csv (advantage: quoting is handled properly)
compute the destination filename
if filename not in dictionary, open it and create csv.writer object
write the row in the corresponding dictionary
in the end, close file handles
Aside: My old solution, using awk properly:
import subprocess
def split_ByInputColumn():
subprocess.call(['awk.exe', '-F', ',',
'{ print > ("split-" $10 ".csv") }', 'large.csv'],cwd = 'some_directory')
Someone else posted an answer (and then subsequently deleted it), but the issue was that I was over-protecting my arguments. The following code works:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', ',',
'{ print > (\"split-\" $10 \".csv\") }', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')

correct way to write to pipe line by line in Python

How can I write to stdout from Python and feed it simultaneously (via a Unix pipe) to another program? For example if you have
# write file line by line
with open("myfile") as f:
for line in f:
print line.strip()
But you'd like that to go line by line to another program, e.g. | wc -l so that it outputs the lines in myfile. How can that be done? thanks.
If you want to pipe python to wc externally, that's easy, and will just work:
python myscript.py | wc -l
If you want to tee it so its output both gets printed and gets piped to wc, try man tee or, better, your shell's built-in fancy redirection features.
If you're looking to run wc -l from within the script, and send your output to both stdout and it, you can do that too.
First, use subprocess.Popen to start wc -l:
wc = subprocess.Popen(['wc', '-l'], stdin=subprocess.PIPE)
Now, you can just do this:
# write file line by line
with open("myfile") as f:
for line in f:
stripped = line.strip()
wc.stdin.write(stripped + '\n')
That will have wc's output go to the same place as your script's. If that's not what you want, you can also make its stdout a PIPE. In that case, you want to use communicate instead of trying to get all the fiddly details right manually.

write python script to run the shell script?

I run the svn status got the modified files :
svn status
? .settings
? .buildpath
? .directory
A A.php
M B.php
D html/C.html
M html/D.fr
M api/E.api
M F.php
..
After I want to keep all of these files
zcvf MY.tar.gz all files that svn stat display
(not include ? just M,A,D)
My idea is to create the python script can run the shell,because right now to do this I just copy the file name one by one.
zcvf MY.tar.gz all the files that we run svn stat
Anybody could guide how to do this or some related tutorial? But if you think it difficult than copy && paste I will ignore my trying?
thanks
I don't see why you would use python for this if you can do it in a single line of code in the shell.
svn status | grep "^[AMD]" | sed 's/^.\{8\}//' | xargs zcvf My.tar.gz
I used grep to only select lines that are modified, if you want all files that svn status lists (also those that are added / deleted) you can leave that part out. I've used sed here to delete the first part of every line, I'm sure there is an easier way to do that but I can't think of one right now.
Once you figure out your command as a string you can just call it with subprocess
subprocess
This module spawns called processes and allows you to control them. From there its up to you.
You could use check_output() and check_call() functions:
#!/usr/bin/env python
from subprocess import check_call, check_output as qx
filenames = [line[8:] # extract filename
for line in qx(['svn', 'status']).splitlines()
if not line.startswith('?')] # exclude files that are not under VC
check_call(['tar', 'cvfz', 'MY.tar.gz'] + filenames)
On Python < 2.7 see check_output() recipe.
subprocess is the Pythonic way, but using a small bash one-liner could be a shorter idea.
Bash one-liner
svn status | egrep "^ M" | awk '{s=s $2 " "} END {print "tar cvfz MY.tar "s}'
Subprocess
import subprocess as sp
p=sp.Popen('svn status', shell=True, stdout=sp.PIPE,
stderr=sp.PIPE).communicate()[0]
files=[]
for line in p:
if line.strip().find('M')==0:
files.append(line.split(' ')[1].strip())
files=' '.join(files)
sp.Popen('tar cvfz MY.tar.gz '+files, shell=True).communicate()

Categories

Resources