how to match file name in the file using python - python

How to find out if two file exists with same pattern inside a file.If all filenames have two-set of filenames ( csv.new and csv) then go ahead to next step otherwise exit with error message.
The prefix "abc_package" will have two files one with extension "csv.new" and second file with extension "csv". There could be many filenames inside the "list_of_files.txt".
Ex: List_of_files.txt
abc_package.1406728501.csv.new
abc_package.1406728501.csv
abc_package.1406724901.csv.new
abc_package.1406724901.csv

For matching the file name name in python you can use fnmatch module..I will provide you a sample code from the documentation.
import fnmatch
import os
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
print file
The syntax would be fnmatch.fnmatchcase(filename, pattern)
Please have a look here for more examples

with open("in.txt","r") as fo:
f = fo.readlines()
cs_new = set()
cs = set()
for ele in f:
ele = ele.rstrip()
if not ele.endswith(".new"):
cs.add(ele)
else:
cs_new.add(ele.split(".new")[0])
diff = cs ^ cs_new
for fi in diff:
print fi
As you need either filename you will need to check for the existence against both lists:
with open("in.txt","r") as f:
f = [x.rstrip() for x in f]
cs, cs_new, diff = [],[],[]
for ind, ele in enumerate(f):
if ele.endswith(".csv"):
cs.append(ele)
else:
cs_new.append([ele.split(".new")[0],ind]) # keep track of original element in with the ind/index
for ele in cs:
if not any(ele in x for x in cs_new):
diff.append(ele)
for ele in cs_new:
if not any(ele[0] in x for x in cs):
diff.append(f[ele[1]]) # append original element with full extension

Assuming the file isn't so ridiculously huge that you can't fit it into memory, just create a set of all .csv.new files and a set of all .csv files and verify that they're identical. For example:
csvfiles = set()
newfiles = set()
with open('List_of_files.txt') as f:
for line in f:
line = line.rstrip()
if line.endswith('.csv.new'):
newfiles.add(line[:-4])
elif line.endswith('.csv'):
csvfiles.add(line)
if csvfiles != newfiles:
raise ValueError('Mismatched files!')
If you want to know which files were mismatched, csvfiles - newfiles gives you the .csv files without corresponding .csv.new, and newfiles - csvfiles gives you the opposite.
(There are ways to make this cleaner and more readable, from using os.path.splitext to using a general partition-an-iterable-by-filter function, but I think this should be the easiest for a novice to immediately understand.)

Related

import filenames iteratively from a different file

I have a large number of entries in a file. Let me call it file A.
File A:
('aaa.dat', 'aaa.dat', 'aaa.dat')
('aaa.dat', 'aaa.dat', 'bbb.dat')
('aaa.dat', 'aaa.dat', 'ccc.dat')
I want to use these entries, line by line, in a program that would iteratively pick an entry from file A, concatenate the files in this way:
filenames = ['aaa.dat', 'aaa.dat', 'ccc.dat'] ###entry number 3
with open('out.dat', 'w') as outfile: ###the name has to be aaa-aaa-ccc.dat
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read().strip())
All I need to do is to substitute the filenames iteratively and create an output in a "aaa-aaa-aaa.dat" format. I would appreciate any help-- feeling a bit lost!
Many thanks!!!
You can retrieve and modify the file names in the following way:
import re
pattern = re.compile('\W')
with open('fnames.txt', 'r') as infile:
for line in infile:
line = (re.sub(pattern, ' ', line)).split()
# Old filenames - to concatenate contents
content = [x + '.dat' for x in line[::2]];
# New filename
new_name = ('-').join(line[::2]) + '.dat'
# Write the concatenated content to the new
# file (first read the content all at once)
with open(new_name, 'w') as outfile:
for con in content:
with open(con, 'r') as old:
new_content = old.read()
outfile.write(new_content)
This program reads your input file, here named fnames.txt with the exact structure from your post, line by line. For each line it splits the entries using a precompiled regex (precompiling regex is suitable here and should make things faster). This assumes that your filenames are only alphanumeric characters, since the regex substitutes all non-alphanumeric characters with a space.
It retrieves only 'aaa' and dat entries as a list of strings for each line and forms a new name by joining every second entry starting from 0 and adding a .dat extension to it. It joins using a - as in the post.
It then retrieves the individual file names from which it will extract the content into a list content by selecting every second entry from line.
Finally, it reads each of the files in content and writes them to the common file new_name. It reads each of them all at ones which may be a problem if these files are big and in general there may be more efficient ways of doing all this. Also, if you are planning to do more things with the content from old files before writing, consider moving the old file-specific operations to a separate function for readability and any potential debugging.
Something like this:
with open(fname) as infile, open('out.dat', 'w') as outfile:
for line in infile:
line = line.strip()
if line: # not empty
filenames = eval(line.strip()) # read tuple
filenames = [f[:-4] for f in filenames] # remove extension
filename = '-'.join(filenames) + '.dat' # make filename
outfile.write(filename + '\n') # write
If your problem is just calculating the new filenames, how about using os.path.splitext?
'-'.join([
f[0] for f in [os.path.splitext(path) for path in filenames]
]) + '.dat'
Which can be probably better understood if you see it like this:
import os
clean_fnames = []
filenames = ['aaa.dat', 'aaa.dat', 'ccc.dat']
for fname in filenames:
name, extension = os.path.splitext(fname)
clean_fnames.append(name)
name_without_ext = '-'.join(clean_fnames)
name_with_ext = name_without_ext + '.dat'
print(name_with_ext)
HOWEVER: If your issue is that you can not get the filenames in a list by reading the file line by line, you must keep in mind that when you read files, you get text (strings) NOT Python structures. You need to rebuild a list from a text like: "('aaa.dat', 'aaa.dat', 'aaa.dat')\n".
You could take a look to ast.literal_eval or try to rebuild it yourself. The code below outputs a lot of messages to show what's happening:
import pprint
collected_fnames = []
with open('./fileA.txt') as f:
for line in f:
print("Read this (literal) line: %s" % repr(line))
line_without_whitespaces_on_the_sides = line.strip()
if not line_without_whitespaces_on_the_sides:
print("line is empty... skipping")
continue
else:
line_without_parenthesis = (
line_without_whitespaces_on_the_sides
.lstrip('(')
.rstrip(')')
)
print("Cleaned parenthesis: %s" % line_without_parenthesis)
chunks = line_without_parenthesis.split(', ')
print("Collected %s chunks in a %s: %s" % (len(chunks), type(chunks), chunks))
chunks_without_quotations = [chunk.replace("'", "") for chunk in chunks]
print("Now we don't have quotations: %s" % chunks_without_quotations)
collected_fnames.append(chunks_without_quotations)
print("collected %s lines with filenames:\n%s" %
(len(collected_fnames), pprint.pformat(collected_fnames)))

Unable to strip and store content of some files in CSV format

I have a files which have look like:
They are placed in
~/ansible-environments/aws/random_name_1/inventory/group_vars/all
~/ansible-environments/aws/random_name_2/inventory/group_vars/all
~/ansible-environments/aws/random_name_3/inventory/group_vars/all
I wrote:
import os
import sys
rootdir='/home/USER/ansible-environments/aws'
#print "aa"
for root, subdirs, files in os.walk(rootdir):
for subdir in subdirs:
all_path = os.path.join(rootdir, subdir, "inventory", "group_vars", "all")
if not os.path.isfile(all_path):
continue
try:
with open(all_path, "r") as f:
all_content = f.readlines()
except (OSError, IOError):
continue # ignore errors
csv_line = [""] * 3
for line in all_content:
if line[:9] == "isv_alias:":
csv_line[0] = line[7:].strip()
elif line[:21] == "LMID:":
csv_line[1] = line[6:].strip()
elif line[:17] == "products:":
csv_line[2] = line[10:].strip()
if all(value != "" for value in csv_line):
with open(os.path.join("/home/nsingh/nishlist.csv"), "a") as csv:
csv.write(",".join(csv_line))
csv.write("\n")
I just need the LMIT, isv_alias, products in the following format :
alias,LMIT,product
bloodyhell,80,rms_scl
something_else,434,some_other_prod
There are three problems here:
Finding all key-value files
Extracting keys and values from each file
Turning the keys and values from each file into rows in a CSV
First use os.listdir() to find the contents of
~/ansible-environments/aws, then build the expected path of the
inventory/group_vars directory inside each using
os.path.join(), and see which ones actually exist. Then list
the contents of those directories that do exist, and assume all
files inside (such as all) are key-value files. The example
code at the end of this answer assumes that all files can be
found this way; if they cannot, you may have to adapt the example
code to find the files using os.walk() or another method.
Each key-value file is a sequence of lines, where each line is a key
and value separated by a colon (":"). Your approach using search
for a substring (operator in) will fail if, say, the secret key
contains the string "LMIT". Instead, split the line at the colon.
The expression line.split(":", 1) splits the line at the first
colon, but not subsequent colons in case the value itself has a
colon. Then strip off excess whitespace from the key and value,
and build a dictionary of keys and values.
Now choose which keys you want to keep. Once you've parsed each
file, look up the associated values in the dictionary from that
file, and build a list out of them. Then add the list of values
from this file to a list of lists of values from all files, and
use csv.writer to write out the list of lists as a CSV file.
It might look something like this:
#!/usr/bin/env python2
from __future__ import with_statement, print_function, division
import os
import csv
def read_kv_file(filename):
items = {}
with open(filename, "rU") as infp:
for line in infp:
# Split at a colon and strip leading and trailing space
line = [x.strip() for x in line.split(":", 1)]
# Add the key and value to the dictionary
if len(line) > 1:
items[line[0]] = line[1]
return items
# First find all random names
outer_dir = os.path.expanduser("~/ansible-environments/aws")
random_names = os.listdir(outer_dir)
inner_dirs = [
os.path.join(outer_dir, name, "inventory/group_vars")
for name in random_names
]
# Now filter it to those directories that actually exist
inner_dirs = [name for name in inner_dirs if os.path.isdir(name)]
wanted_keys = ["alias", "LMIT", "products"]
out_columns = ["alias", "LMIT", "product"]
# Collect key-value pairs from all files in these folders
rows = []
for dirname in inner_dirs:
for filename in os.listdir(dirname):
path = os.path.join(dirname, filename)
# Skip non-files in this directory
if not os.path.isfile(path):
continue
# If the file has a non-blank value for any of the keys of
# interest, add a row
items = read_kv_file(path)
this_file_values = [items.get(key) for key in wanted_keys]
if any(this_file_values):
rows.append(this_file_values)
# And write them out
with open("out.csv", "wb") as outfp:
writer = csv.writer(outfp, "excel")
writer.writerow(out_columns)
writer.writerows(rows)
You didn't specify how are you obtaining the files (the f in the first line) but under the assumption that you've sorted out the files traversal and that the files are exactly as you present them (so no extra spaces or something like that), you can modify your code to:
csv_line = [""] * 3
for line in f:
if line[:6] == "alias:":
csv_line[0] = line[7:].strip()
elif line[:5] == "LMIT:":
csv_line[1] = line[6:].strip()
elif line[:9] == "products:":
csv_line[2] = line[10:].strip()
with open(rootdir + '/' + 'list.csv', "a") as csv:
csv.write(",".join(csv_line))
csv.write("\n")
This will add a new line with the proper vars in your CSV for each file that was loaded as f, however keep in mind that it doesn't check for the data validity so it will be happy to write empty new lines if the opened file didn't contain the proper data.
You can prevent that by checking for all(value != "" for value in csv_line) before opening the csv file for writing. You can use any instead of all if you want to write entries that have at least one variable populated.
UPDATE: The code you just pasted has serious indentation and structural issues. It at least make more sense on what you want to do - assuming everything else is ok, this should do it:
for root, subdirs, files in os.walk(rootdir):
for subdir in subdirs:
all_path = os.path.join(rootdir, subdir, "inventory", "group_vars", "all")
if not os.path.isfile(all_path):
continue
try:
with open(all_path, "r") as f:
all_content = f.readlines()
except (OSError, IOError):
continue # ignore errors
csv_line = [""] * 3
for line in all_content:
if line[:6] == "alias:":
csv_line[0] = line[7:].strip()
elif line[:5] == "LMIT:":
csv_line[1] = line[6:].strip()
elif line[:9] == "products:":
csv_line[2] = line[10:].strip()
if all(value != "" for value in csv_line):
with open(os.path.join(rootdir, "list.csv"), "a") as csv:
csv.write(",".join(csv_line))
csv.write("\n")

python, find and print specific cells in csv files that are in different directories

I have different csv files in different directories. so i want to find specific cells in different columns that correspond to a specific date in my input.txt file.
here is what i have until now:
import glob, os, csv, numpy
import re, csv
if __name__ == '__main__':
Input=open('Input.txt','r');
output = []
for i, line in enumerate(Input):
if i==0:
header_Input = Input.readline().replace('\n','').split(',');
else:
date_input = Input.readline().replace('\n','').split(',');
a=os.walk("path to the directory")
[x[0] for x in os.walk("path to the directory")]
print(a)
b=next(os.walk('.'))[1] # immediate child directories.
for dirname, dirnames, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in dirnames:
print(os.path.join(dirname, subdirname))
# print path to all filenames.
for filename in filenames:
#print(os.path.join(dirname, filename))
csvfile = 'csv_file'
if csvfile in filename:
print(os.path.join(dirname, filename))
Now I have the csv files, so i need to find the date_input in every file, and print the line that contains all the information. Or if possible, to get only the cells that are in the columns with header == header_input.
This is not intended to be a full answer to your question. But you may want to consider replacing
for i, line in enumerate(Input):
if i==0:
header_Input = Input.readline().replace('\n','').split(',');
else:
date_input = Input.readline().replace('\n','').split(',');
with
header_Input = Input.readline().strip().split(',')
date_input = Input.readline().strip().split(',')
The enumerate(Input) expression reads lines from the file, and so do calls to readline() in the loop body. This will most likely result in some unfortunate results like reading alternating lines from the file.
The strip() method removes whitespace from the start and end of the line. Alternatively you may want to know that s[:-1] strips off the last character of s.

Iterate over 2 files in each folder and compare them

I compare two text files and print out the results to a 3rd file. I am trying to make it so the script i'm running would iterate over all of the folders that have two text files in them, in the CWD of the script.
What i have so far:
import os
import glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
print('current file is: ' + infile)
with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
Would this be a good way to start the iteration process?
It's not the most clear code but it gets the job done. However, i'm pretty sure i need to take the logic out of the read / write methods but i'm not sure where to start.
What i'm basically trying to do is have a script iterate over all of the folders in its CWD, open each folder, compare the two text files inside, write a 3rd text file to the same folder, then move on to the next.
Another method i have tried is as follows:
import os
rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
And this outputs the following (to give you a better example of the file structure:
Found directory: C:\Python27\test
test.py
Found directory: C:\Python27\test\asdd
asd1.txt
asd2.txt
Found directory: C:\Python27\test\chro
ch1.txt
ch2.txt
Found directory: C:\Python27\test\hway
hw1.txt
hw2.txt
Would it be wise to put the compare logic under the for fname in fileList? How do i make sure it compares the two text files inside the specific folder and not with other fnames in the fileList?
This is the full code that i am trying to add this functionality into. I appologize for the Frankenstein nature of it but i am still working on a refined version but it does not work yet.
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
frame_rects2 = defaultdict(list)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects2[frame].append(id)
frame_rects2[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
for frame, rects in sorted(frame_rects2.iteritems()):
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, fileOut1, fileOut2):
with open(fileOut1+'.txt', 'r') as fin1:
with open(fileOut2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self, result_file):
with open(result_file+'.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video0=BilinearResize(video0,640,480)\n')
fout.write('video1=BilinearResize(video1,640,480)\n')
fout.write('StackHorizontal(video0,video1)\n')
fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))
using the load_and_compare() function, i define two input text files, two output text files, a file for the comparison results and a final phase that writes many files for all of the differences.
What i am trying to do is have this whole class run on the current working directory and go through every sub folder, compare the two text files, and write everything into the same folder, specifically the final() results.
You can indeed use os.walk(), since that already separates the directories from the files. You only need the directories it returns, because that's where you're looking for your 2 specific files.
You could also use os.listdir() but that returns directories as well files in the same list, so you would have to check for directories yourself.
Either way, once you have the directories, you iterate over them (for subdir in dirnames) and join the various path components you have: The dirpath, the subdir name that you got from iterating over the list and your filename.
Assuming there are also some directories that don't have the specific 2 files, it's a good idea to wrap the open() calls in a try..except block and thus ignore the directories where one of the files (or both of them) doesn't exist.
Finally, if you used os.walk(), you can easily choose if you only want to go into directories one level deep or walk the whole depth of the tree. In the former case, you just clear the dirnames list by dirnames[:] = []. Note that dirnames = [] wouldn't work, since that would just create a new empty list and put that reference into the variable instead of clearing the old list.
Replace the print("do something ...") with your program logic.
#!/usr/bin/env python
import errno
import os
f1 = "test1"
f2 = "test2"
path = "."
for dirpath, dirnames, _ in os.walk(path):
for subdir in dirnames:
filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
try:
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
except IOError as e:
# ignore directiories that don't contain the 2 files
if e.errno != errno.ENOENT:
# reraise exception if different from "file or directory doesn't exist"
raise
# comment the next line out if you want to traverse all subsubdirectories
dirnames[:] = []
Edit:
Based on your comments, I hope I understand your question better now.
Try the following code snippet instead. The overall structure stays the same, only now I'm using the returned filenames of os.walk(). Unfortunately, that would also make it harder to do something like "go only into the subdirectories 1 level deep", so I hope walking the tree recursively is fine with you. If not, I'll have to add a little code to later.
#!/usr/bin/env python
import fnmatch
import os
filter_pattern = "*.txt"
path = "."
for dirpath, dirnames, filenames in os.walk(path):
# comment this out if you don't want to filter
filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]
if len(filenames) == 2:
# comment this out if you don't want the 2 filenames to be sorted
filenames.sort(key=str.lower)
filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
I'm still not really sure what your program logic does, so you will have to interface the two yourself.
However, I noticed that you're adding the ".txt" extension to the file name explicitly all over your code, so depending on how you are going to use the snippet, you might or might not need to remove the ".txt" extension first before handing the filenames over. That would be achieved by inserting the following line after or before the sort:
filenames = [os.path.splitext(fn)[0] for fn in filenames]
Also, I still don't understand why you're using eval(). Do the text files contain python code? In any case, eval() should be avoided and be replaced by code that's more specific to the task at hand.
If it's a list of comma separated strings, use line.split(",") instead.
If there might be whitespace before or after the comma, use [word.strip() for word in line.split(",")] instead.
If it's a list of comma separated integers, use [int(num) for num in line.split(",")] instead - for floats it works analogously.
etc.

filter directory in python

I am trying to get filtered list of all Text and Python file, like below
from walkdir import filtered_walk, dir_paths, all_paths, file_paths
vdir=raw_input ("enter director :")
files = file_paths(filtered_walk(vdir, depth=0,included_files=['*.py', '*.txt']))
I want to:
know the total number of files found in given directory
I have tried options like : Number_of_files= len (files) or for n in files n=n+1 but all are failing as "files" is something called "generator" Object which I searched on python docs but couldn't make use of it
I also want to find a string e.g. "import sys" in the list of files found in above and store the file names having my search string in new file called "found.txt"
I believe this does what you want, if I misunderstood your specification, please let me know after you give this a test. I've hardcoded the directory searchdir, so you'll have to prompt for it.
import os
searchdir = r'C:\blabla'
searchstring = 'import sys'
def found_in_file(fname, searchstring):
with open(fname) as infp:
for line in infp:
if searchstring in line:
return True
return False
with open('found.txt', 'w') as outfp:
count = 0
search_count = 0
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name)
if ext in ('.txt', '.py'):
count += 1
full_name = os.path.join(root, name)
if found_in_file(full_name, searchstring):
outfp.write(full_name + '\n')
search_count += 1
print 'total number of files found %d' % count
print 'number of files with search string %d' % search_count
Using with to open the file will also close the file automatically for you later.
A python generator is a special kind of iterator. It yields one item after the other, without knowing in advance how much items there are. You only can know it at the end.
It should be ok, though, to do
n = 0
for item in files:
n += 1
do_something_with(items)
print "I had", n, "items."
You can think of a generator (or generally, an iterator) as a list that gives you one item at a time. (NO, it is not a list). So, you cannot count how much items it will give you unless you go through them all, because you have to take them one by one. (This is just a basic idea, now you should be able to understand the docs, and I'm sure there are lots of questions here about them too).
Now, for your case, you used a not-so-wrong approach:
count = 0
for filename in files:
count += 1
What you were doing wrong was taking f and incrementing, but f here is the filename! Incrementing makes no sense, and an Exception too.
Once you have these filenames, you have to open each individual file, read it, search for your string and return the filename.
def contains(filename, match):
with open(filename, 'r') as f:
for line in f:
if f.find(match) != -1:
return True
return False
match_files = []
for filename in files:
if contains(filename, "import sys"):
match_file.append(filename)
# or a one-liner:
match_files = [f for f in files if contains(f, "import sys")]
Now, as an example of a generator (don't read this before you read the docs):
def matching(filenames):
for filename in files:
if contains(filename, "import sys"):
# feed the names one by one, you are not storing them in a list
yield filename
# usage:
for f in matching(files):
do_something_with_the_files_that_match_without_storing_them_all_in_a_list()
You should try os.walk
import os
dir = raw_input("Enter Dir:")
files = [file for path, dirname, filenames in os.walk(dir) for file in filenames if file[-3:] in [".py", ".txt"]]
nfiles = len(files)
print nfiles
For searching for a string in a file look at Search for string in txt file Python
Combining both these your code would be something like
import os
import mmap
dir = raw_input("Enter Dir:")
print "Directory %s" %(dir)
search_str = "import sys"
count = 0
search_count = 0
write_file = open("found.txt", "w")
for dirpath, dirnames, filenames in os.walk(dir):
for file in filenames:
if file.split(".")[-1] in ["py", "txt"]:
count += 1
print dirpath, file
f = open(dirpath+"/"+file)
# print f.read()
if search_str in f.read():
search_count += 1
write_file.write(dirpath+"/"+file)
write_file.close()
print "Number of files: %s" %(count)
print "Number of files containing string: %s" %(search_count)

Categories

Resources