can I do something like this(actually the it doesn't work)
flist = [dirpath + f for f for fnames for dirpath, dirnames, fnames in os.walk('/home/user')]
thanks!
fnames doesn't exist yet. Swap the loops.
flist = [dirpath + f for dirpath, dirnames, fnames in os.walk('/home/user') for f in fnames]
Personally I'd write it as a generator:
def filetree(top):
for dirpath, dirnames, fnames in os.walk(top):
for fname in fnames:
yield os.path.join(dirpath, fname)
Then you can either use it in a loop:
for name in filetree('/home/user'):
do_something_with(name)
or slurp it into a list:
flist = list(filetree('/home/user'))
flist = [os.path.join(pdir,f) for pdir, dirs, files in os.walk('/home/user') for f in files]
(os.path.join should be used instead of string concatenation to handle OS-specific separators and idiosyncrasies)
However, as several have already pointed out, multi-level list comprehension is not very readable and easy to get wrong.
Assuming you really do want to have the results in a list:
flist = []
for root, dirs, files in os.walk(root_dir):
flist.extend(os.path.join(root, f) for f in files)
# to support python <2.4, use flist.extend([...])
If you're simply using flist as an intermediate storage to iterate through, you might be better off using a generator as shown in John's answer.
Using map:
map(lambda data: map(lambda file: data[0] + '\\' + file, data[2]), os.walk('/home/user'))
OR:
map(lambda data: map(lambda file: os.path.join(data[0], file), data[2]), os.walk('/home/user'))
path = '/home/user/' # keep trailing '/'
flist = [path+name for name in os.listdir(path)]
Related
I want to recursively walk through a directory, find the files that match any of the strings in a given list, and then copy these files to another folder. I thought the any() function would accomplish this, but I get a TypeError that it expected a string, not a list. Is there a more elegant way to do this?
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if any(s in filename for s in string_to_match):
shutil.copy(filename, destination_dir)
print(filename)
I know glob.glob can work well for finding files that match a specific string or pattern, but I haven't been able to find an answer that allows for multiple matches.
You can just use in
Example:
string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for root, subdirs, filename in os.walk(source_dir)
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
Here also a glob version:
import glob
import itertools
root_dir = '/home/user'
files = ['apple.txt', 'pear.txt', 'banana.txt']
files_found = list(itertools.chain.from_iterable([glob.glob(f'{root_dir}/**/{f}', recursive=True) for f in files])
for f in files_found:
shutil.copy(f, destination_dir)
First, find an element in list takes O(n), so just convert it to a set which takes O(1).
Then you can do like this
string_to_match = {'apple.txt', 'pear.txt', 'banana.txt'}
for filename in os.listdir(source_dir):
if filename in string_to_match:
shutil.copy(filename, destination_dir)
print(filename)
I would use sets
def find_names(names,source_dir):
names = set(names)
# note os.walk will walk the subfolders too
# if you just want that source_dir use `strings_to_match.intersection(os.listdir(sourcedir))`
for root,subdirs,fnames in os.walk(sourcedir):
for matched_name in strings_to_match.intersection(fnames):
yield os.path.join(root,matched_name)
strings_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for match in find_names(strings_to_match,'/path/to/start'):
print("Match:", match)
[edited] typo intersection not intersect
(you could alternatively just pass in a set {'a','b','c'} instead of a list ['a','b','c'] and skip the conversion to a set)
here is an alternative that only looks in the source dir (not children)
def find_names_in_folder(names,source_dir):
return [os.path.join(source_dir,n) for n in set(names).intersection(os.listdir(source_dir))]
I am searching for all .csv's located in a subfolder with glob like so:
def scan_for_files(path):
file_list = []
for path, dirs, files in os.walk(path):
for d in dirs:
for f in glob.iglob(os.path.join(path, d, '*.csv')):
file_list.append(f)
return file_list
If I call:
path = r'/data/realtimedata/trades/bitfinex/'
scan_for_files(path)
I get the correct recursive list of files:
['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']
However when using the actual sub-directory containing the files I want - it returns an empty list. Any idea why this is happening? Thanks.
path = r'/data/realtimedata/trades/bitfinex/btcusd/'
scan_for_files(path)
returns: []
Looks like btcusd is a bottom-level directory. That means that when you call os.walk with the r'/data/realtimedata/trades/bitfinex/btcusd/' path, the dirs variable will be an empty list [], so the inner loop for d in dirs: does not execute at all.
My advice would be to re-write your function to iterate over the files directly, and not the directories... don't worry, you'll get there eventually, that's the nature of a directory tree.
def scan_for_files(path):
file_list = []
for path, _, files in os.walk(path):
for f in files:
file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))
return file_list
However, on more recent versions of python (3.5+), you can use recursive glob:
def scan_for_files(path):
return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)
Source.
I'm trying to write a function which will find similar files by a name (song.mp3, song1.mp3, (1)song.mp3) in a specified folder. What I have by now:
def print_duplicates(source):
files_list = []
new_list = []
for dirpath, dirnames, filenames in os.walk(source):
for fname in filenames:
if ('\w*' + fname + '\w*') in files_list:
new_list.append(os.path.join(dirpath, fname))
else:
files_list.append(fname)
for a in new_list:
print(a)
If the filename wasn't before in files_list it will be added, if it was than it will be added to new_list with its path. This way I have list of 'duplicate' files. However it's not working, the new_list remains empty.
Could you correct my mistakes? Which part of my code is wrong?
If you want to use regex in your code, you need to use re module.
So change this line,
if ('\w*' + fname + '\w*') in files_list:
to,
if re.search(r'\w*' + fname + r'\w*', files_list):
which is exactly same as,
if fname in file_list:
because \w* means zero or more word characters. And I think you want to use word boundaries.
if re.search(r'\b' + fname + r'\b', files_list):
brand new to python and stumped all ready, would appreciate a hand.
testn1 = {'names':('tn1_name1','tn1_name2','tn1_name3'),'exts':('.log','.txt')}
testn2 = {'names':('tn2_name1'),'exts':('.nfo')}
testnames = {1:testn1,2:testn1}
directory = 'C:\\temp\\root\\'
for subdir in os.listdir(directory):
# check if name of sub directory matches the name in any of the dicts in testnames[testn*]['names']
if os.path.isdir(os.path.join(directory, subdir)) and [subdir in subdir.lower() in testnames[testn1]['names']]: # this works but need to iterate through all dicts
print(subdir)
# if the a dir name matches do a recursive search for all filenames that exist in the same dict with the corresponding extensions
for dirname, dirnames, filenames in os.walk(os.path.join(directory, subdir)):
for file in filenames:
if file.endswith(testnames[testn1]['exts']): # this works but need to match with corresponding folder
print(file)
I thought i'd be able to do something like this but i'm sure my understanding of python isn't were it needs to be.
if os.path.isdir(os.path.join(directory, subdir)) and [subdir in subdir.lower() in [for testnames[key]['names'] in key, value in testnames.items()]]:
I'm hoping to keep it structured this way but would be open to anything.
EDIT: I ended up going with...
if os.path.isdir(os.path.join(directory, subdir)) and [i for i in testnames.values() if subdir.lower() in i['names']]:
thanks to #pzp1997 for the headsup on .values()
Not exactly sure what you want, but I think this is it:
if os.path.isdir(os.path.join(directory, subdir)) and subdir.lower() in [i['names'] for i in testnames.values()]
This did it!
if os.path.isdir(os.path.join(directory, subdir)) and [i for i in testnames.values() if subdir.lower() in i['names']]:
What about this instead?
testn1 = {'names':('tn1_name1','tn1_name2','tn1_name3'),'exts':('.log','.txt')}
testn2 = {'names':('tn2_name1'),'exts':('.nfo')}
testnames = {1:testn1,2:testn1}
directory = 'C:\\temp\\root\\'
for dirname, _, filenames in os.walk(directory):
the_dir = os.path.split(dirname)[-1]
for testn in testnames.itervalues():
if the_dir in testn['names']:
for file in filenames:
_, ext = os.path.splitext(file)
if ext in testn['exts']:
print the_dir, file
I'm looking for a way to do a non-recursive os.walk() walk, just like os.listdir() works. But I need to return in the same way the os.walk() returns. Any idea?
Thank you in advance.
Add a break after the filenames for loop:
for root, dirs, filenames in os.walk(workdir):
for fileName in filenames:
print (fileName)
break #prevent descending into subfolders
This works because (by default) os.walk first lists the files in the requested folder and then goes into subfolders.
next(os.walk(...))
My a bit more parametrised solution would be this:
for root, dirs, files in os.walk(path):
if not recursive:
while len(dirs) > 0:
dirs.pop()
//some fancy code here using generated list
Edit: fixes, if/while issue. Thanks, #Dirk van Oosterbosch :}
Well what Kamiccolo meant was more in line with this:
for str_dirname, lst_subdirs, lst_files in os.walk(str_path):
if not bol_recursive:
while len(lst_subdirs) > 0:
lst_subdirs.pop()
Empty the directories list
for r, dirs, f in os.walk('/tmp/d'):
del dirs[:]
print(f)
Flexible Function for counting files:
You can set recursive searching and what types you want to look for. The default argument: file_types=("", ) looks for any file. The argument file_types=(".csv",".txt") would search for csv and txt files.
from os import walk as os_walk
def count_files(path, recurse=True, file_types = ("",)):
file_count = 0
iterator = os_walk(path) if recurse else ((next(os_walk(path))), )
for _, _, file_names in iterator:
for file_name in file_names:
file_count += 1 if file_name.endswith(file_types) else 0
return file_count