Non-recursive os.walk() - python

I'm looking for a way to do a non-recursive os.walk() walk, just like os.listdir() works. But I need to return in the same way the os.walk() returns. Any idea?
Thank you in advance.

Add a break after the filenames for loop:
for root, dirs, filenames in os.walk(workdir):
for fileName in filenames:
print (fileName)
break #prevent descending into subfolders
This works because (by default) os.walk first lists the files in the requested folder and then goes into subfolders.

next(os.walk(...))

My a bit more parametrised solution would be this:
for root, dirs, files in os.walk(path):
if not recursive:
while len(dirs) > 0:
dirs.pop()
//some fancy code here using generated list
Edit: fixes, if/while issue. Thanks, #Dirk van Oosterbosch :}

Well what Kamiccolo meant was more in line with this:
for str_dirname, lst_subdirs, lst_files in os.walk(str_path):
if not bol_recursive:
while len(lst_subdirs) > 0:
lst_subdirs.pop()

Empty the directories list
for r, dirs, f in os.walk('/tmp/d'):
del dirs[:]
print(f)

Flexible Function for counting files:
You can set recursive searching and what types you want to look for. The default argument: file_types=("", ) looks for any file. The argument file_types=(".csv",".txt") would search for csv and txt files.
from os import walk as os_walk
def count_files(path, recurse=True, file_types = ("",)):
file_count = 0
iterator = os_walk(path) if recurse else ((next(os_walk(path))), )
for _, _, file_names in iterator:
for file_name in file_names:
file_count += 1 if file_name.endswith(file_types) else 0
return file_count

Related

Search for files that match any strings from a list?

I want to recursively walk through a directory, find the files that match any of the strings in a given list, and then copy these files to another folder. I thought the any() function would accomplish this, but I get a TypeError that it expected a string, not a list. Is there a more elegant way to do this?
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if any(s in filename for s in string_to_match):
shutil.copy(filename, destination_dir)
print(filename)
I know glob.glob can work well for finding files that match a specific string or pattern, but I haven't been able to find an answer that allows for multiple matches.
You can just use in
Example:
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
Here also a glob version:
import glob
import itertools
root_dir = '/home/user'
files = ['apple.txt', 'pear.txt', 'banana.txt']
files_found = list(itertools.chain.from_iterable([glob.glob(f'{root_dir}/**/{f}', recursive=True) for f in files])
for f in files_found:
shutil.copy(f, destination_dir)
First, find an element in list takes O(n), so just convert it to a set which takes O(1).
Then you can do like this
string_to_match = {'apple.txt', 'pear.txt', 'banana.txt'}
for filename in os.listdir(source_dir):
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
I would use sets
def find_names(names,source_dir):
names = set(names)
# note os.walk will walk the subfolders too
# if you just want that source_dir use `strings_to_match.intersection(os.listdir(sourcedir))`
for root,subdirs,fnames in os.walk(sourcedir):
for matched_name in strings_to_match.intersection(fnames):
yield os.path.join(root,matched_name)
strings_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for match in find_names(strings_to_match,'/path/to/start'):
print("Match:", match)
[edited] typo intersection not intersect
(you could alternatively just pass in a set {'a','b','c'} instead of a list ['a','b','c'] and skip the conversion to a set)
here is an alternative that only looks in the source dir (not children)
def find_names_in_folder(names,source_dir):
return [os.path.join(source_dir,n) for n in set(names).intersection(os.listdir(source_dir))]

Scanning for file paths with glob

I am searching for all .csv's located in a subfolder with glob like so:
def scan_for_files(path):
file_list = []
for path, dirs, files in os.walk(path):
for d in dirs:
for f in glob.iglob(os.path.join(path, d, '*.csv')):
file_list.append(f)
return file_list
If I call:
path = r'/data/realtimedata/trades/bitfinex/'
scan_for_files(path)
I get the correct recursive list of files:
['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']
However when using the actual sub-directory containing the files I want - it returns an empty list. Any idea why this is happening? Thanks.
path = r'/data/realtimedata/trades/bitfinex/btcusd/'
scan_for_files(path)
returns: []
Looks like btcusd is a bottom-level directory. That means that when you call os.walk with the r'/data/realtimedata/trades/bitfinex/btcusd/' path, the dirs variable will be an empty list [], so the inner loop for d in dirs: does not execute at all.
My advice would be to re-write your function to iterate over the files directly, and not the directories... don't worry, you'll get there eventually, that's the nature of a directory tree.
def scan_for_files(path):
file_list = []
for path, _, files in os.walk(path):
for f in files:
file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))
return file_list
However, on more recent versions of python (3.5+), you can use recursive glob:
def scan_for_files(path):
return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)
Source.

How get number of subfolders and folders using Python os walks?

What I have a directory of folders and subfolders. What I'm trying to do is get the number of subfolders within the folders, and plot them on a scatter plot using matplotlib. I have the code to get the number of files, but how would I get the number of subfolders within a folder. This probably has a simple answer but I'm a newb to Python. Any help is appreciated.
This is the code I have so far to get the number of files:
import os
import matplotlib.pyplot as plt
def fcount(path):
count1 = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
count1 += 1
return count1
path = "/Desktop/lay"
print fcount(path)
import os
def fcount(path, map = {}):
count = 0
for f in os.listdir(path):
child = os.path.join(path, f)
if os.path.isdir(child):
child_count = fcount(child, map)
count += child_count + 1 # unless include self
map[path] = count
return count
path = "/Desktop/lay"
map = {}
print fcount(path, map)
Here is a full implementation and tested. It returns the number of subfolders without the current folder. If you want to change that you have to put the + 1 in the last line instead of where the comment is.
I think os.walk could be what you are looking for:
import os
def fcount(path):
count1 = 0
for root, dirs, files in os.walk(path):
count1 += len(dirs)
return count1
path = "/home/"
print fcount(path)
This will walk give you the number of directories in the given path.
Try the following recipe:
import os.path
import glob
folder = glob.glob("path/*")
len(folder)
Answering to:
how would I get the number of subfolders within a folder
You can use the os.path.isdir function similarly to os.path.isfile to count directories.
I guess you are looking for os.walk. Look in the Python reference, it says that:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down
or bottom-up. For each directory in the tree rooted at directory top
(including top itself), it yields a 3-tuple (dirpath, dirnames,
filenames).
So, you can try to do this to get only the directories:
for root, dirs, files in os.walk('/usr/bin'):
for name in dirs:
print os.path.join(root, name)
count += 1

Get also elements that don't match fnmatch

I'm using a recursive glob to find and copy files from a drive to another
def recursive_glob(treeroot, pattern):
results = []
for base, dirs, files in os.walk(treeroot):
goodfiles = fnmatch.filter(files, pattern)
results.extend(os.path.join(base, f) for f in goodfiles)
return results
Works fine. But I also want to have access to the elements that don't match the filter.
Can someone offer some help? I could build a regex within the loop, but there must be a simpler solution, right?
If order doesn't matter, use a set:
goodfiles = fnmatch.filter(files, pattern)
badfiles = set(files).difference(goodfiles)
Another loop inside the os.walk loop can also be used:
goodfiles = []
badfiles = []
for f in files:
if fnmatch.fnmatch(f, pattern):
goodfiles.append(f)
else:
badfiles.append(f)
Note: With this solution you have to iterate through the list of files just once. In fact, the os.path.join part can be moved to the loop above.

get full path name using list comprehension in python

can I do something like this(actually the it doesn't work)
flist = [dirpath + f for f for fnames for dirpath, dirnames, fnames in os.walk('/home/user')]
thanks!
fnames doesn't exist yet. Swap the loops.
flist = [dirpath + f for dirpath, dirnames, fnames in os.walk('/home/user') for f in fnames]
Personally I'd write it as a generator:
def filetree(top):
for dirpath, dirnames, fnames in os.walk(top):
for fname in fnames:
yield os.path.join(dirpath, fname)
Then you can either use it in a loop:
for name in filetree('/home/user'):
do_something_with(name)
or slurp it into a list:
flist = list(filetree('/home/user'))
flist = [os.path.join(pdir,f) for pdir, dirs, files in os.walk('/home/user') for f in files]
(os.path.join should be used instead of string concatenation to handle OS-specific separators and idiosyncrasies)
However, as several have already pointed out, multi-level list comprehension is not very readable and easy to get wrong.
Assuming you really do want to have the results in a list:
flist = []
for root, dirs, files in os.walk(root_dir):
flist.extend(os.path.join(root, f) for f in files)
# to support python <2.4, use flist.extend([...])
If you're simply using flist as an intermediate storage to iterate through, you might be better off using a generator as shown in John's answer.
Using map:
map(lambda data: map(lambda file: data[0] + '\\' + file, data[2]), os.walk('/home/user'))
OR:
map(lambda data: map(lambda file: os.path.join(data[0], file), data[2]), os.walk('/home/user'))
path = '/home/user/' # keep trailing '/'
flist = [path+name for name in os.listdir(path)]

Categories

Resources