I'm looking for a way to do a non-recursive os.walk() walk, just like os.listdir() works. But I need to return in the same way the os.walk() returns. Any idea?
Thank you in advance.
Add a break after the filenames for loop:
for root, dirs, filenames in os.walk(workdir):
for fileName in filenames:
print (fileName)
break #prevent descending into subfolders
This works because (by default) os.walk first lists the files in the requested folder and then goes into subfolders.
next(os.walk(...))
My a bit more parametrised solution would be this:
for root, dirs, files in os.walk(path):
if not recursive:
while len(dirs) > 0:
dirs.pop()
//some fancy code here using generated list
Edit: fixes, if/while issue. Thanks, #Dirk van Oosterbosch :}
Well what Kamiccolo meant was more in line with this:
for str_dirname, lst_subdirs, lst_files in os.walk(str_path):
if not bol_recursive:
while len(lst_subdirs) > 0:
lst_subdirs.pop()
Empty the directories list
for r, dirs, f in os.walk('/tmp/d'):
del dirs[:]
print(f)
Flexible Function for counting files:
You can set recursive searching and what types you want to look for. The default argument: file_types=("", ) looks for any file. The argument file_types=(".csv",".txt") would search for csv and txt files.
from os import walk as os_walk
def count_files(path, recurse=True, file_types = ("",)):
file_count = 0
iterator = os_walk(path) if recurse else ((next(os_walk(path))), )
for _, _, file_names in iterator:
for file_name in file_names:
file_count += 1 if file_name.endswith(file_types) else 0
return file_count
Related
I want to recursively walk through a directory, find the files that match any of the strings in a given list, and then copy these files to another folder. I thought the any() function would accomplish this, but I get a TypeError that it expected a string, not a list. Is there a more elegant way to do this?
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if any(s in filename for s in string_to_match):
shutil.copy(filename, destination_dir)
print(filename)
I know glob.glob can work well for finding files that match a specific string or pattern, but I haven't been able to find an answer that allows for multiple matches.
You can just use in
Example:
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
Here also a glob version:
import glob
import itertools
root_dir = '/home/user'
files = ['apple.txt', 'pear.txt', 'banana.txt']
files_found = list(itertools.chain.from_iterable([glob.glob(f'{root_dir}/**/{f}', recursive=True) for f in files])
for f in files_found:
shutil.copy(f, destination_dir)
First, find an element in list takes O(n), so just convert it to a set which takes O(1).
Then you can do like this
string_to_match = {'apple.txt', 'pear.txt', 'banana.txt'}
for filename in os.listdir(source_dir):
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
I would use sets
def find_names(names,source_dir):
names = set(names)
# note os.walk will walk the subfolders too
# if you just want that source_dir use `strings_to_match.intersection(os.listdir(sourcedir))`
for root,subdirs,fnames in os.walk(sourcedir):
for matched_name in strings_to_match.intersection(fnames):
yield os.path.join(root,matched_name)
strings_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for match in find_names(strings_to_match,'/path/to/start'):
print("Match:", match)
[edited] typo intersection not intersect
(you could alternatively just pass in a set {'a','b','c'} instead of a list ['a','b','c'] and skip the conversion to a set)
here is an alternative that only looks in the source dir (not children)
def find_names_in_folder(names,source_dir):
return [os.path.join(source_dir,n) for n in set(names).intersection(os.listdir(source_dir))]
I am searching for all .csv's located in a subfolder with glob like so:
def scan_for_files(path):
file_list = []
for path, dirs, files in os.walk(path):
for d in dirs:
for f in glob.iglob(os.path.join(path, d, '*.csv')):
file_list.append(f)
return file_list
If I call:
path = r'/data/realtimedata/trades/bitfinex/'
scan_for_files(path)
I get the correct recursive list of files:
['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']
However when using the actual sub-directory containing the files I want - it returns an empty list. Any idea why this is happening? Thanks.
path = r'/data/realtimedata/trades/bitfinex/btcusd/'
scan_for_files(path)
returns: []
Looks like btcusd is a bottom-level directory. That means that when you call os.walk with the r'/data/realtimedata/trades/bitfinex/btcusd/' path, the dirs variable will be an empty list [], so the inner loop for d in dirs: does not execute at all.
My advice would be to re-write your function to iterate over the files directly, and not the directories... don't worry, you'll get there eventually, that's the nature of a directory tree.
def scan_for_files(path):
file_list = []
for path, _, files in os.walk(path):
for f in files:
file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))
return file_list
However, on more recent versions of python (3.5+), you can use recursive glob:
def scan_for_files(path):
return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)
Source.
What I have a directory of folders and subfolders. What I'm trying to do is get the number of subfolders within the folders, and plot them on a scatter plot using matplotlib. I have the code to get the number of files, but how would I get the number of subfolders within a folder. This probably has a simple answer but I'm a newb to Python. Any help is appreciated.
This is the code I have so far to get the number of files:
import os
import matplotlib.pyplot as plt
def fcount(path):
count1 = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
count1 += 1
return count1
path = "/Desktop/lay"
print fcount(path)
import os
def fcount(path, map = {}):
count = 0
for f in os.listdir(path):
child = os.path.join(path, f)
if os.path.isdir(child):
child_count = fcount(child, map)
count += child_count + 1 # unless include self
map[path] = count
return count
path = "/Desktop/lay"
map = {}
print fcount(path, map)
Here is a full implementation and tested. It returns the number of subfolders without the current folder. If you want to change that you have to put the + 1 in the last line instead of where the comment is.
I think os.walk could be what you are looking for:
import os
def fcount(path):
count1 = 0
for root, dirs, files in os.walk(path):
count1 += len(dirs)
return count1
path = "/home/"
print fcount(path)
This will walk give you the number of directories in the given path.
Try the following recipe:
import os.path
import glob
folder = glob.glob("path/*")
len(folder)
Answering to:
how would I get the number of subfolders within a folder
You can use the os.path.isdir function similarly to os.path.isfile to count directories.
I guess you are looking for os.walk. Look in the Python reference, it says that:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down
or bottom-up. For each directory in the tree rooted at directory top
(including top itself), it yields a 3-tuple (dirpath, dirnames,
filenames).
So, you can try to do this to get only the directories:
for root, dirs, files in os.walk('/usr/bin'):
for name in dirs:
print os.path.join(root, name)
count += 1
I'm using a recursive glob to find and copy files from a drive to another
def recursive_glob(treeroot, pattern):
results = []
for base, dirs, files in os.walk(treeroot):
goodfiles = fnmatch.filter(files, pattern)
results.extend(os.path.join(base, f) for f in goodfiles)
return results
Works fine. But I also want to have access to the elements that don't match the filter.
Can someone offer some help? I could build a regex within the loop, but there must be a simpler solution, right?
If order doesn't matter, use a set:
goodfiles = fnmatch.filter(files, pattern)
badfiles = set(files).difference(goodfiles)
Another loop inside the os.walk loop can also be used:
goodfiles = []
badfiles = []
for f in files:
if fnmatch.fnmatch(f, pattern):
goodfiles.append(f)
else:
badfiles.append(f)
Note: With this solution you have to iterate through the list of files just once. In fact, the os.path.join part can be moved to the loop above.
can I do something like this(actually the it doesn't work)
flist = [dirpath + f for f for fnames for dirpath, dirnames, fnames in os.walk('/home/user')]
thanks!
fnames doesn't exist yet. Swap the loops.
flist = [dirpath + f for dirpath, dirnames, fnames in os.walk('/home/user') for f in fnames]
Personally I'd write it as a generator:
def filetree(top):
for dirpath, dirnames, fnames in os.walk(top):
for fname in fnames:
yield os.path.join(dirpath, fname)
Then you can either use it in a loop:
for name in filetree('/home/user'):
do_something_with(name)
or slurp it into a list:
flist = list(filetree('/home/user'))
flist = [os.path.join(pdir,f) for pdir, dirs, files in os.walk('/home/user') for f in files]
(os.path.join should be used instead of string concatenation to handle OS-specific separators and idiosyncrasies)
However, as several have already pointed out, multi-level list comprehension is not very readable and easy to get wrong.
Assuming you really do want to have the results in a list:
flist = []
for root, dirs, files in os.walk(root_dir):
flist.extend(os.path.join(root, f) for f in files)
# to support python <2.4, use flist.extend([...])
If you're simply using flist as an intermediate storage to iterate through, you might be better off using a generator as shown in John's answer.
Using map:
map(lambda data: map(lambda file: data[0] + '\\' + file, data[2]), os.walk('/home/user'))
OR:
map(lambda data: map(lambda file: os.path.join(data[0], file), data[2]), os.walk('/home/user'))
path = '/home/user/' # keep trailing '/'
flist = [path+name for name in os.listdir(path)]