Python recursing through directories, looking for certain file - fastest method? - python

I have hundreds of common files in hundreds of directories of which I'm appending to a list. With that I'm wanting to know the fastest python method to recurse through directory + x number of sub-directories looking for certain file:
I've recorded time elapse for 50 mock tests for each method below. There does not seem to be much of a significant difference between the two (method 1 on average about 0.2 seconds faster):
Method 1:
for root, dirs, files in os.walk(inputDir):
for f in files:
if f == fileName + '.xyz':
Method 2:
for root, dirs, files in os.walk(inputDir):
for f in [x for x in files if x == fileName + '.xyz']:
Are there any faster methods out there for this?
Thank you!

Instead of
for root, dirs, files in os.walk(inputDir):
for f in [x for x in files if x == fileName + '.xyz']:
Use this:
filename = 'file.xyz'
for root, dirs, files in os.walk(inputDir):
if(filename in files):
print("File found!")

Related

Print statement not responding in my filemanagement system

I have 2 folders: Source and Destination. Each of those folders have 3 subfolders inside them named A, B and C. The 3 subfolders in Source all contain multiple files. The 3 subfolders in Destination are empty (yet).
I need the full path of all because my goal is to overwrite the files from Source A, B and C in Destination A, B and C.
How come my two print statements are not printing anything? I have zero errors.
import os
src = r'c:\data\AM\Desktop\Source'
dst = r'c:\data\AM\Desktop\Destination'
os.chdir(src)
for root, subdirs, files in os.walk(src):
for f in subdirs:
subdir_paths = os.path.join(src, f)
subdir_paths1 = os.path.join(dst, f)
for a in files:
file_paths = os.path.join(subdir_paths, a)
file_paths1 = os.path.join(subdir_paths1, a)
print(file_paths)
print(file_paths1)
Problem
As jasonharper said in a comment,
You are misunderstanding how os.walk() works. The files returned in files are in the root directory; you are acting as if though they existed in each of the subdirs directories, which are actually in root themselves.
The reason nothing is printed is that, on the first iteration, files is empty, so for a in files is not entered. Then on the following iterations (where root is A, B and C respectively), subdirs is empty, so for f in subdirs is not entered.
Solution
In fact you can ignore subdirs entirely. Instead walk the current dir, and join src/dst + root + a:
import os
src = r'c:\data\AM\Desktop\Source'
dst = r'c:\data\AM\Desktop\Destination'
os.chdir(src)
for root, subdirs, files in os.walk('.'):
src_dir = os.path.join(src, root)
dst_dir = os.path.join(dst, root)
for a in files:
src_file = os.path.join(src_dir, a)
dst_file = os.path.join(dst_dir, a)
print(src_file)
print(dst_file)
The output should have an extra dot directory between src/dst and root. If anyone could tell me how to get rid of it, I'm all ears.

Extracting folder name from file through iteration - slow

I have a program where I need to loop throughout the files and sub-directories. I need to extract the subfolder name where the has been extracted.
I have a dictionary that contains all the subfolders names that I need to work with, d. Then by iterating through the files, I need to check if their director is in d or not.
Here is my code:
d = {'folder_1': 'a', 'folder_2': 'b', 'folder_3': 'c'}
dir_path = "/Users/user_1/Desktop/images_testing"
for root, directories, files in os.walk(dir_path):
for filename in files:
filepath = os.path.join(root, filename)
temp_path = os.path.dirname(filepath)
temp_sub_dir = temp_path.split("/")
if temp_sub_dir[-1] in d:
#do some work
This works fine but SUPER slow. Is there any way to make this process faster? It is super slow.
My main problem is on these lines:
temp_path = os.path.dirname(filepath)
temp_sub_dir = temp_path.split("/")
I do not need the full path, I just need the folder name where this file came from.
How about do that like this:
for root, directories, files in os.walk(dir_path):
temp_sub_dir = os.path.basename(root)
if temp_sub_dir in d:
for filename in files:
filepath = os.path.join(root, filename)
#do some work
As you 'walk' through check whether the current directory is one of those listed in d. If it is and if the file in this dictionary location is in the current directory then 'do something'. Seems simpler.
import os
d = {'folder_1': 'a', 'folder_2': 'b', 'folder_3': 'c'}
dir_path = "/Users/user_1/Desktop/images_testing"
for dirpath, dirnames, filenames in os.walk(dir_path):
if os.path.split(dirpath) in d and d[os.path.split(dirpath)] in files:
#do some work

count number of folders with given name

I am lookling to get count of folders and subfolders with a given name... Here I am searching for number of subfolders named "L-4"? Returns zero and I am sure thats not true? What did I miss?
import os
path = "R:\\"
i = 0
for (path, dirs, files) in os.walk(path):
if os.path.dirname == "L-4":
i += 1
print i
os.path.dirname is a reference to the standard library function, not a string. Perhaps you wanted to use os.path.dirname(path) instead here.
You could instead count how many times L-4 appears in the dirs list:
i = 0
for root, dirs, files in os.walk(path):
i += dirs.count('L-4')
print i
or, as a one-liner:
print sum(dirs.count('L-4') for _, dirs, _ in os.walk(path))

How get number of subfolders and folders using Python os walks?

What I have a directory of folders and subfolders. What I'm trying to do is get the number of subfolders within the folders, and plot them on a scatter plot using matplotlib. I have the code to get the number of files, but how would I get the number of subfolders within a folder. This probably has a simple answer but I'm a newb to Python. Any help is appreciated.
This is the code I have so far to get the number of files:
import os
import matplotlib.pyplot as plt
def fcount(path):
count1 = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
count1 += 1
return count1
path = "/Desktop/lay"
print fcount(path)
import os
def fcount(path, map = {}):
count = 0
for f in os.listdir(path):
child = os.path.join(path, f)
if os.path.isdir(child):
child_count = fcount(child, map)
count += child_count + 1 # unless include self
map[path] = count
return count
path = "/Desktop/lay"
map = {}
print fcount(path, map)
Here is a full implementation and tested. It returns the number of subfolders without the current folder. If you want to change that you have to put the + 1 in the last line instead of where the comment is.
I think os.walk could be what you are looking for:
import os
def fcount(path):
count1 = 0
for root, dirs, files in os.walk(path):
count1 += len(dirs)
return count1
path = "/home/"
print fcount(path)
This will walk give you the number of directories in the given path.
Try the following recipe:
import os.path
import glob
folder = glob.glob("path/*")
len(folder)
Answering to:
how would I get the number of subfolders within a folder
You can use the os.path.isdir function similarly to os.path.isfile to count directories.
I guess you are looking for os.walk. Look in the Python reference, it says that:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down
or bottom-up. For each directory in the tree rooted at directory top
(including top itself), it yields a 3-tuple (dirpath, dirnames,
filenames).
So, you can try to do this to get only the directories:
for root, dirs, files in os.walk('/usr/bin'):
for name in dirs:
print os.path.join(root, name)
count += 1

Non-recursive os.walk()

I'm looking for a way to do a non-recursive os.walk() walk, just like os.listdir() works. But I need to return in the same way the os.walk() returns. Any idea?
Thank you in advance.
Add a break after the filenames for loop:
for root, dirs, filenames in os.walk(workdir):
for fileName in filenames:
print (fileName)
break #prevent descending into subfolders
This works because (by default) os.walk first lists the files in the requested folder and then goes into subfolders.
next(os.walk(...))
My a bit more parametrised solution would be this:
for root, dirs, files in os.walk(path):
if not recursive:
while len(dirs) > 0:
dirs.pop()
//some fancy code here using generated list
Edit: fixes, if/while issue. Thanks, #Dirk van Oosterbosch :}
Well what Kamiccolo meant was more in line with this:
for str_dirname, lst_subdirs, lst_files in os.walk(str_path):
if not bol_recursive:
while len(lst_subdirs) > 0:
lst_subdirs.pop()
Empty the directories list
for r, dirs, f in os.walk('/tmp/d'):
del dirs[:]
print(f)
Flexible Function for counting files:
You can set recursive searching and what types you want to look for. The default argument: file_types=("", ) looks for any file. The argument file_types=(".csv",".txt") would search for csv and txt files.
from os import walk as os_walk
def count_files(path, recurse=True, file_types = ("",)):
file_count = 0
iterator = os_walk(path) if recurse else ((next(os_walk(path))), )
for _, _, file_names in iterator:
for file_name in file_names:
file_count += 1 if file_name.endswith(file_types) else 0
return file_count

Categories

Resources