This question already has answers here:
Passing variables to a subprocess call [duplicate]
(6 answers)
Closed 1 year ago.
I have a list of strings in python and want to run a recursive grep on each string in the list. Am using the following code,
import subprocess as sp
for python_file in python_files:
out = sp.getoutput("grep -r python_file . | wc -l")
print(out)
The output I am getting is the grep of the string "python_file". What mistake am I committing and what should I do to correct this??
Your code has several issues. The immediate answer to what you seem to be asking was given in a comment, but there are more things to fix here.
If you want to pass in a variable instead of a static string, you have to use some sort of string interpolation.
grep already knows how to report how many lines matched; use grep -c. Or just ask Python to count the number of output lines. Trimming off the pipe to wc -l allows you to also avoid invoking a shell, which is a good thing; see also Actual meaning of shell=True in subprocess.
grep already knows how to search for multiple expressions. Try passing in the whole list as an input file with grep -f -.
import subprocess as sp
out = sp.check_output(
["grep", "-r", "-f", "-", "."],
input="\n".join(python_files), text=True)
print(len(out.splitlines()))
If you want to speed up your processing and the patterns are all static strings, try also adding the -F option to grep.
Of course, all of this is relatively easy to do natively in Python, too. You should easily be able to find examples with os.walk().
Your intent isn't totally clear from the way you've written your question, but the first argument to grep is the pattern (python_file in your example), and the second is the file(s) . in your example
You could write this in native Python or just use grep directly, which is probably easier than using both!
grep args
--count will report just the number of matching lines
--file Read one or more newline separated patterns from file. (manpage)
grep --count --file patterns.txt -r .
import re
from pathlib import Path
for pattern in patterns:
count = 0
for path_file in Path(".").iterdir():
with open(path_file) as fh:
for line in fh:
if re.match(pattern, line):
count += 1
print(count)
NOTE that the behavior in your question would get a separate word count for each pattern, while you may really want a single count
Related
I would like to retrieve output from a shell command that contains spaces and quotes. It looks like this:
import subprocess
cmd = "docker logs nc1 2>&1 |grep mortality| awk '{print $1}'|sort|uniq"
subprocess.check_output(cmd)
This fails with "No such file or directory". What is the best/easiest way to pass commands such as these to subprocess?
The absolutely best solution here is to refactor the code to replace the entire tail of the pipeline with native Python code.
import subprocess
from collections import Counter
s = subprocess.run(
["docker", "logs", "nc1"],
text=True, capture_output=True, check=True)
count = Counter()
for line in s.stdout.splitlines():
if "mortality" in line:
count[line.split()[0]] += 1
for count, word in count.most_common():
print(count, word)
There are minor differences in how Counter objects resolve ties (if two words have the same count, the one which was seen first is returned first, rather than by sort order), but I'm guessing that's unimportant here.
I am also ignoring standard output from the subprocess; if you genuinely want to include output from error messages, too, just include s.stderr in the loop driver too.
However, my hunch is that you don't realize your code was doing that, which drives home the point nicely: Mixing shell script and Python raises the mainainability burden, because now you have to understand both shell script and Python to understand the code.
(And in terms of shell script style, I would definitely get rid of the useless grep by refactoring it into the Awk script, and probably also fold in the sort | uniq which has a trivial and more efficient replacement in Awk. But here, we are replacing all of that with Python code anyway.)
If you really wanted to stick to a pipeline, then you need to add shell=True to use shell features like redirection, pipes, and quoting. Without shell=True, Python looks for a command whose file name is the entire string you were passing in, which of course doesn't exist.
I have a list of keywords and I want to build a python script to iterate through each keyword, search (grep?) for against a given file, and write the output to a file.
I know my answer is somewhere in the world of:
for words in keywords
grep |word -o foundkeywords.txt
Maybe I should stay more in bash? Either way, pardon the noob question and any guidance is very appreciated.
python does not exactly have a lot to do with bash; that said your script in python should look like this (if there is a word on each line of your keyword file):
# open the file
with open('foundkeywords.txt') as f:
# read each line of the file
for i in f.read().split('\n'):
# if the word is find in the line
# print it
if i.find(word) != -1:
print i
Using bash only:
If you want to grep all the keywords at once, you can do
grep -f keywords inputfile
If you want to grep it sequentially, you can do
while read line; do
grep "$line" inputfile
done < keywords
Of course, this can be done in Python too. But I don't see how this would facilitate the process.
This question already has answers here:
How to use `subprocess` command with pipes
(7 answers)
Closed 7 years ago.
I am trying to filter out first 3 line of /proc/meminfo using pipe and head command.
so basically i need to run this in Python:
cat /proc/meminfo | head -3
I am using below line in my code :
subprocess.call(["cat", "/proc/meminfo", "|", "head", "-3"])
While just using subprocess.call(["cat", "/proc/meminfo"]) I am getting whole list but I am just interested in first 3 line.
Using above command is giving me below error:
cat: invalid option -- '3'
Try `cat --help' for more information.
Any suggestions?
/proc/meminfo is just a file. You don't need a subprocess to read it. Simply open and read it as a file. Here is all you need:
fh = open('/proc/meminfo', 'r')
lines = fh.readlines()
fh.close()
first_lines = lines[:3]
The first_lines list will contain the first three lines (including trailing newline characters).
To use pip you have to enable shell as shell=True, however it's not advisable specifically because of security reason . You can do this alternative,
import subprocess
ps = subprocess.Popen(('cat', '/proc/meminfo'),stdout=subprocess.PIPE)
output = subprocess.check_output(('head', '-3'), stdin=ps.stdout)
print output
The pipe is a shell syntax element. You need to run the code in a shell to use a pipe:
subprocess.call(["cat /proc/meminfo | head -3"], shell=True)
From the manual:
If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.
Well head actually accepts an argument, so the pipe is not actually necessary. The following should give the expected result.
subprocess.call(["head", "-3", "/proc/meminfo"])
following this document
In default, subprocess.call with shell=False will disables all shell based features including pipe. When using shell=True, pipes.quote() can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.
you can use this code
subprocess.call("cat /proc/meminfo | head -3", shell=True)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
its a fundamental on Linux.I need to redirect the the output of the following to a file or list.
cmd="ls-ltr | grep *.txt | cut -1 "
My Code is something like this:
#!/usr/bin/python
import os
path='/opt/dsoren/'
os.system('ls-ltr | grep *.txt | cut -1 > /dest_path/abc.txt')
Any help or insight is highly appreciated.
Why not use the builtin functions?
import glob
import os
sorted(glob.glob('./*.txt'), key=os.path.getmtime)
Short answer
Use Python's included batteries instead of the shell.
Long answer
1. Fix command
First, your shell command doesn't do anything meaningful. The grep and cut commands are not needed here, and won't do what you expect here. In the following, I assume you meant to do this:
cmd='ls -1tr *.txt'
Note that I'm using -1 instead of -l to get just the file names without having to cut anything.
2. Run from Python and parse result (unsafe!)
You can run the command from Python via subprocess, using the communicate() method of Popen. Then, strip the trailing newline on the result out, and split it by newlines. This leads to a list of strings files which contains your filenames:
#!/usr/bin/python
import subprocess
# This is unsafe!
cmd='ls -1tr *.txt'
out, err = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).communicate()
files = out.rstrip('\n').split('\n')
print repr(files)
3. How to do it correctly
While the script above answers precisely your question, it will fail horribly if your file names contain newlines or other strange stuff. This is not so much a practical issue, but can easily become a security issue!
The correct way to do this is without any shell. Just use Python's included batteries, which are safe and simpler:
#!/usr/bin/python
import glob
import os.path
unsorted_files = glob.glob('*.txt')
files = sorted(unsorted_files, key=os.path.getmtime)
print files
You should try to use
fp = os.popen('ls -lr')
which returns a file descripter in fp.
You can itterate throuch each line:
for l in fp.readlines():
print l
I thik problem is with your cut argument
In [4]: os.system('ls-ltr | grep *.txt | cut -1 > abc.txt')
sh: 1: ls-ltr: not found
cut: invalid option -- '1'
Try 'cut --help' for more information.
you can try this
In [9]: os.system('ls -ltr | grep *.txt | head -1 > pqr.txt')
Out[9]: 0
This will working fine.First try at terminal then use in your program
I am trying to format the following awk command
awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt
for use in python subprocess popen. However i am having a hard time formatting it. I have tried solutions suggested in similar answers but none of them worked. I have also tried using raw string literals. Also i would not like to use shell=True as this is not recommended
Edit according to comment:
The command i tried was
awk_command = """awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt"""
command_execute = Popen(shlex.split(awk_command))
However i get the following error upon executing this
KeyError: 'printf "chr%s\t%s\t%s\n", $1, $2-1, $2'
googling the error suggests this happens when a value is requested for an undefined key but i do not understand its context here
> is the shell redirection operator. To implement it in Python, use stdout parameter:
#!/usr/bin/env python
import shlex
import subprocess
cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)
To avoid starting a separate process, you could implement this particular awk command in pure Python.
The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess with shell=True - then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.
WARNING: do not use this with untrusted input without sanitizing it first!
Alternatively, you can replace the command line with an argv-type sequence and feed that to subprocess instead. Then, you need to provide stuff as the program would see it:
remove all the shell-level escaping
remove the output redirection stuff and do the redirection yourself instead
Regarding the specific problems:
you didn't escape Python special characters in the string so \t and \n became the literal tab and newline (try to print awk_command)
using shlex.split is nothing different from shell=True - with an added unreliability since it cannot guarantee if would parse the string the same way your shell would in every case (not to mention the lack of transmutations the shell makes).
Specifically, it doesn't know or care about the special meaning of the redirection part:
>>> awk_command = """awk -v OFS="\\t" '{printf "chr%s\\t%s\\t%s\\n", $1, $2- 1, $2}' file1.txt > file2.txt"""
>>> shlex.split(awk_command)
['awk','-v','OFS=\\t','{printf "chr%s\\t%s\\t%s\\n", $1, $2-1, $2}','file1.txt','>','file2.txt']
So, if you wish to use shell=False, do construct the argument list yourself.