How to search string in files recursively in python

How to search string in files recursively in python - python

I am trying to find all log files in my C:\ and then in these log file find a string. If the string is found the output should be the abs path of the log file where the string is found. below is what I have done till now.
import os
rootdir=('C:\\')
for folder,dirs,file in os.walk(rootdir):
for files in file:
if files.endswith('.log'):
fullpath=open(os.path.join(folder,files),'r')
for line in fullpath.read():
if "saurabh" in line:
print(os.path.join(folder,files))

Your code is broken at:
for line in fullpath.read():
The statement fullpath.read() will return the entire file as one string, and when you iterate over it, you will be iterating a character at a time. You will never find the string 'saurabh' in a single character.
A file is its own iterator for lines, so just replace this statement with:
for line in fullpath:
Also, for cleanliness, you might want to close the file when you're done, either explicitly or by using a with statement.
Finally, you may want to break when you find a file, rather than printing the same file out multiple times (if there are multiple occurrences of your string):
import os
rootdir=('C:\\')
for folder, dirs, files in os.walk(rootdir):
for file in files:
if file.endswith('.log'):
fullpath = os.path.join(folder, file)
with open(fullpath, 'r') as f:
for line in f:
if "saurabh" in line:
print(fullpath)
break

Related

Seach a directory for a file that fits a filemask, search it line by line for specific text and return that line

In Python, its the vagueness that I struggle with.
Lets start with what I know. I know that I want to search a specific directory for a file. And I want to search that file for a specific line that contains a specific string, and return only that line.
Which brings me to what I don't know. I have a vague description of the specific filename:
some_file_{variable}_{somedatestamp}_{sometimestamp}.log
So I know the file will start with some_file followed by a known variable and ending in .log. I don't know the date stamp or the time stamp. And to top it off, the file might not still exist at the time of searching, so I need to cover for that eventuality.
To better describe the problem, I have the line of the BASH script that accomplishes this:
ls -1tr /dir/some_file_${VARIABLE}_*.log | tail -2 | xargs -I % grep "SEARCH STRING" %
So basically, I want to recreate that line of code from the BASH script in Python, and throw a message in the case that the search returns no files.

Some variant of this will work. This will work for an arbitrary folder structure and will search all subfolders.... results will hold path to directory, filename, and txt, line number (1-based).
Some key pieces:
os.walk is beautiful for searching directory trees. The top= can be relative or absolute
use a context manager with ... as ... to open files as it closes them automatically
python iterates over text-based files in line-by line format
Code
from os import walk, path
magic_text = 'lucky charms'
results = []
for dirpath, _, filenames in walk(top='.'):
for f in filenames:
# check the name. can use string methods or indexing...
if f.startswith('some_name') and f[-4:] == '.log':
# read the lines and look for magic text
with open(path.join(dirpath, f), 'r') as src:
# if you iterate over a file, it returns line-by-line
for idx, line in enumerate(src):
# strings support the "in" operator...
if magic_text in line:
results.append((dirpath, f, idx+1, line))
for item in results:
print(f'on path {item[0]} in file {item[1]} on line {item[2]} found: {item[3]}')
In a trivial folder tree, I placed the magic words in one file and got this result:
on path ./subfolder/subsub in file some_name_331a.log on line 2 found: lucky charms are delicious

See if this works for you:
from glob import glob
import os
log_dir = 'C:\\Apps\\MyApp\\logs\\'
log_variable = input("Enter variable:")
filename = "some_file"+log_variable
# Option 1
searched_files1 = glob(log_dir+filename+'*.log')
print(f'Total files found = {len(searched_files1)}')
print(searched_files1)
# Option 2
searched_files2 = []
for object in os.listdir(log_dir):
if (os.path.isfile(os.path.join(log_dir,object)) and object.startswith(filename) and object.endswith('.log')):
searched_files2.append(object)
print(f'Total files found = {len(searched_files2)}')
print(searched_files2)

Python for loop only executes once

I am currently working on some code that will look through multiple directories from an .ini file then print and copy them to a new directory. I ran into a problem where the for loop that prints the files only executes once when it is supposed to execute 5 times. How can i fix it so the for loop works every time it is called?
Code:
def copyFiles(path):
rootPath = path
print(rootPath)
pattern = "*.wav"
search = ""
#searches the directories for the specified file type then prints the name
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(filename)
def main():
#opens the file containing all the directories
in_file = open('wheretolook.ini', "r")
#create the new directory all the files will be moved to
createDirectory()
#takes the path names one at a time and then passes them to copyFiles
for pathName in in_file:
copyFiles(pathName)
Output i get from running my code
The output should have the 0 through 4 files under every diretory.
Thank you for the help!

The pathName you get when iterating over the file has a newline character at the end for each line but the last. This is why you get the blank lines in your output after each path is printed.
You need to call strip on your paths to remove the newlines:
for pathName in in_file:
copyFiles(pathname.strip())
You could be more restrictive and use rstrip('\n'), but I suspect getting rid of all leading and trailing whitespace is better anyway.

Issue finding a file in a list to be rewritten

Very new to Python and programming in general so apologies if I am missing anything straightforward.
I am trying to iterate through a directory and open the included .txt files and modify them with new content.
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
f=open(filename, 'r')
lines=f.read()
f.close()
for line in lines:
f=open(filename, 'w')
newline='rewritten content here'
f.write(newline)
f.close()
return x
rootdir("/Users/russellculver/documents/testfolder")`
Is giving me: IOError: [Errno 2] No such file or directory: 'TestText1.rtf'
EDIT: I should clarify there IS a file named 'TestText1.rtf' in the folder specified in the function argument. It is the first one of three text files.
When I try moving where the file is closed / opened as seen below:
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
f=open(filename, 'r+')
lines=f.read()
for line in lines:
newline='rewritten content here'
f.write(newline)
f.close()
return x
rootdir("/Users/russellculver/documents/testfolder")
It gives me: ValueError: I/O operation on closed file
Thanks for any thoughts in advance.
#mescalinum Okay so I've made amendments to what I've got based on everyones assistance (thanks!), but it is still failing to enter the text "newline" in any of the .txt files in the specified folder.
import os
x = raw_input("Enter the directory here: ")
def rootdir(x):
for dirpaths, dirnames, files in os.walk(x):
for filename in files:
try:
with open(os.dirpaths.join(filename, 'w')) as f:
f.write("newline")
return x
except:
print "There are no files in the directory or the files cannot be opened!"
return x

From https://docs.python.org/2/library/os.html#os.walk:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
Also, f.close() should be outside for line in lines, otherwise you call it multiple times, and the second time you call it, f is already closed, and it will give that I/O error.
You should avoid explicitly open()ing and close()ing files, like:
f=open(filename, 'w')
f.write(newline)
f.close()
and instead use context managers (i.e. the with statement):
with open(filename, 'w'):
f.write(newline)
which does exactly the same thing, but implicitly closes the file when the body of with is finished.

Here is the code that does as you asked:
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
try:
f=open(os.path.join(dirpath, name), 'w')
f.write('new content here')
f.close()
except Exception, e:
print "Could not open " + filename
rootdir("/Users/xrisk/Desktop")
However, I have a feeling you don’t quite understand what’s happening here (no offence). First have a look at the documentation of os.walk provided by #mescalinum . The third tuple element files will contain only the file name. You need to combine it with paths to get a full path to the file.
Also, you don’t need to read the file first to write to it. On the other hand, if you want to append to the file, you should use the mode 'a' when opening the file
In general, when reading/writing a file, you only close it after finishing all the read/writes. Otherwise you will get an exception.
Thanks #mescalinum

Match data in two files

I'm trying to match (what are network logon usernames) in two files. The All is a text file of names I'm (or will be) interested to match. Currently, I'm doing something like this:
def find_files(directory, pattern):
#directory= (raw_input("Enter a directory to search for Userlists: ")
directory=("c:\\TEST")
os.chdir(directory)
for root, dirs, files in os.walk(directory):
for basename in files:
if fnmatch.fnmatch(basename, pattern):
filename = os.path.join(root, basename)
yield filename
for filename in find_files('a-zA-Z0-9', '*.txt'):
with open (filename, "r") as file1:
with open ("c:/All.txt", "r") as file2:
list1 = file1.readlines()[18:]
list2 = file2.readlines()
for i in list1:
for j in list2:
if i == j:
I'n new to python and am wondering if this is the best, and most efficient way of doing this. It seems, to me even as a newbie a little clunky, but with my current coding knowledge is the best I can come up with at the moment.
Any help and advice would be gratefully received.

You want to read one file into memory first, storing it in a set. Membership testing in a set is very efficient, much more so than looping over the lines of the second file for every line in the first file.
Then you only need to read the second file, and line by line process it and test if lines match.
What file you keep in memory depends on the size of All.txt. If it is < 1000 lines or so, just keep that in memory and compare it to the other files. If All.txt is really large, re-open it for every file1 you process, and read only the first 18 lines of file1 into memory and match those against every line in All.txt, line by line.
To read just 18 lines of a file, use itertools.islice(); files are iterables and islice() is the easiest way to pick a subset of lines to read.
Reading All.txt into memory first:
from itertools import islice
with open ("c:/All.txt", "r") as all:
# storing lines without whitespace to make matching a little more robust
all_lines = set(line.strip() for line in all)
for filename in find_files('a-zA-Z0-9', '*.txt'):
with open(filename, "r") as file1:
for line in islice(file1, 18):
if line.strip() in all_lines:
# matched line
If All.txt is large, store those 18 lines of each file in a set first, then re-open All.txt and process it line by line:
for filename in find_files('a-zA-Z0-9', '*.txt'):
with open(filename, "r") as file1:
file1_lines = set(line.strip() for line in islice(file1, 18))
with open ("c:/All.txt", "r") as all:
for line in all:
if line.strip() in file1_lines:
# matched line
Note that you do not have to change directories in find_files(); os.walk() is already passed the directory name. The fnmatch module also has a .filter() method, use that to loop over files instead of using fnmatch.fnmatch() on each file individually:
def find_files(directory, pattern):
directory = "c:\\TEST"
for root, dirs, files in os.walk(directory):
for basename in fnmatch.filter(files, pattern):
yield os.path.join(root, basename)

Opening and performing an operation on multiple files

I have 3 files that contain lists of other files in the directory. I'm trying take the files that are in the lists and copy them to a new directory. I think I'm tripping up on the best way to open the files as I get a IOError: [Errno 2] No such file or directory. I had a play around using with to open the files but couldn't get my operation to work. Here's my code and a bit of one of the files I'm trying to read.
import shutil
import os
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
res_files=[line.split()[1] for line in f]
f=close()
os.mkdir(os.path.expanduser('~/Clustered/polymorph_matches'))
for file in res_files:
shutil.copy(file, (os.path.expanduser('~/Clustered/polymorph_matches')) + "/" + file)
PENCEN.res 2.res number molecules matched: 15 rms deviation 0.906016
PENCEN.res 3.res number molecules matched: 15 rms deviation 1.44163
PENCEN.res 5.res number molecules matched: 15 rms deviation 0.867366
Edit: I used Ayas code below to fix this but now get IOError: [Errno 2] No such file or directory: 'p'. I'm guessing its reading the first character of the file name and failing there but I can't figure out why.
res_files = []
for filename in 'polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt':
res_files += [line.split()[1] for line in open(filename)]

Python treats consecutive string constants as a single string, so the line...
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
...is actually interpreted as...
f=open('polymorphI_hits.txtpolymorphII_hits.txtpolymorphIII_hits.txt')
...which presumably refers to a non-existent file.
I don't believe there's a way to use open() to open multiple files in one call, so you'll need to change...
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
res_files=[line.split()[1] for line in f]
f=close()
...to something more like...
res_files = []
for filename in 'polymorphI_hits.txt', 'polymorphII_hits.txt', 'polymorphIII_hits.txt':
res_files += [line.split()[1] for line in open(filename)]
The rest of the code looks okay, though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to search string in files recursively in python - python

Related

Seach a directory for a file that fits a filemask, search it line by line for specific text and return that line

Python for loop only executes once

Issue finding a file in a list to be rewritten

Match data in two files

Opening and performing an operation on multiple files

Categories

Resources