Dynamic if statement in dictionary comprehension - python

I am using a dictionary comprehension to get a dictionary of key value pairs where the key is the name of an mp3 file and the value is the path to the file.
I do:
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == '.mp3'}
# more code
# ...
# ...
I do more logic with the source files in the more code part. Now, I want to repeat this logic for any pictures (i.e. .gif, .jpeg etc)
So I could do:
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == '.jpeg' or os.path.splitext(filename)[1].lower() == '.gif'}
and then wrap the more code part into a function and call it for the picture files. However, I am wondering could I just make the if expression dynamic in the dictionary comprehension and then just pass in one if expression for music files and another if expression for the picture files?

I think you are looking for the fnmatch.fnmatch function instead, or even fnmatch.filter()
from fnmatch import filter
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in filter(files, '*.jpg')}
But if you need to match multiple extensions, it's much easier to use str.endswith():
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in files if filename.endswith(('.jpg', '.png', '.gif'))}
Using .endswith() you can then use any string or tuple of extensions:
mp3s = '.mp3'
images = ('.jpg', '.png', '.gif')
then use:
extensions = images
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in files if filename.endswith(extensions)}
I'm not sure why you are using a dict comprehension here; each iteration of the loop, root will be constant. You may as well do:
for root, dirs, files in os.walk(rootDir):
source_files = dict.fromkeys(filter(files, '*.jpg'), root)
or use
for root, dirs, files in os.walk(rootDir):
source_files = dict.fromkeys([f for f in files if f.endswith(extensions)], root)
If you wanted to create a dictionary of all files in a nested directory structure, you'll need to move the dict comprehension out and integrate the os.walk() call in the dict comprehension instead:
source_files = {filename: root
for root, dirs, files in os.walk(rootDir)
for filename in files if f.endswith(extensions)}
I removed all the topdown='true' lines; the default is topdown=True anyway (note: python booleans are True and False, not strings. It happened to work because 'true' as a string is 'truthy', it's considered True in a boolean context because it is non-empty).

Does this:
def a_func(extension):
# some code
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == extension}
# more code
# ...
# ...
fit your needs?

Related

How to append subdirectories to a list in python

I would like to know how I can only append sub-directories of a given directory to a list in Python.
Something like this is not what I'm searching for as it outputs the files:
filelist = []
for root, dirs, files in os.walk(job['config']['output_dir']):
for file in files:
# append the file name to the list
filelist.append(os.path.join(root, file))
# print all the file names
for name in filelist:
print(name)
Just use the dirs variable instead the files variable, like this:
dirs_list = []
for root, dirs, files in os.walk('test'):
for dir in dirs:
dirs_list.append(os.path.join(root, dir))
print(dirs_list)

Python list comprehension returns nested list

I built this little program to simulate 2 libraries I want to compare files with.
The code is this:
import os
path = "C:\Users\\nelson\Desktop\Lib Check"
pc_path = os.path.join(path, "pc")
phone_path = os.path.join(path, "phone")
pc_lib = [filename for path, dirname, filename in os.walk(pc_path)]
print pc_lib
it returns
[['1.txt', '2.txt', '3.txt', '4.txt', '5.txt', '6.txt', '8.txt', '9.txt']]
everything is fine except for the fact that the results are in a nested list. Why?
The only way I can stop this is by using
pc_lib = []
for path, dirname, filename in os.walk(pc_path):
pc_lib.extend(filename)
filename is a list of files (the name you've used is not intuitive), so the results are expected
for root, dirs, files in os.walk('my/home/directory'):
print(files)
#['close_button.gif', 'close_button_red.gif'],
#['toolbar.js']
extend unwraps the argument list and appends the resulting elements to the list that made the call
If you want a list of filenames, I would suggest:
[os.path.join(path, filename)
for path, dirnames, filenames in os.walk(pc_path)
for filename in filenames]
os.walk(path) returns an iterator over tuples of (root, dirs, files).
Where
root is the path of the current directory relative to the argument of os.walk
dirs is a list of all the subdirectories in the current directory
files is a list of all normal files in the current directory
If you want a flat list of all files in a filesystem tree, use:
pc_lib = [
filename for _, _, files in os.walk(pc_path)
for filename in files
]
You probably want to retain the absolute path, to get thos use:
pc_lib = [
os.path.join(root, filename)
for root, _, files in os.walk(pc_path)
for filename in files
]

searching for a filename with extension and printing its relative path

I have the below code to print the filename which is find criteria with file extension *.org. How could I print the relative path of the file found. Thanks in advance
def get_filelist() :
directory = "\\\\networkpath\\123\\abc\\"
filelist = []
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('Org'):
print(str(dirs) +"\\" + str(file)) #prints empty list [] followed by filename
filelist.append(os.path.splitext(file)[0])
return (filelist)
Please see me as novice in python
files and dirs list the children of root. dirs thus lists siblings of file. You want to print this instead:
print(os.path.relpath(os.path.join(root, file)))
you need to use os.path.join:
def get_filelist() :
directory = "\\\\networkpath\\123\\abc\\"
filelist = []
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('org'): # here 'org' will be in small letter
print(os.path.join(root,file))
filelist.append(os.path.join(root,file))
return filelist

Looping through os.walk() confusion

I'm working on a script that will crawl a hard drive and collect information on each file
it encounters by way of fnmatch and magic.
I have a feeling that the first nested for-loop in yield_files(root) are unnecessary
def yield_files(root):
for root, dirs, files in os.walk(root):
""" Is this necessary
for directory in dirs:
for filename in directory:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
"""
for filename in files:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
Would os.walk() end-up recursing into these directories anyway?
def yield_files(root):
for root, dirs, files in os.walk(root):
for filename in files:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
That's all you need. The rest is indeed unnecessary. os.walk goes into subdirectories so you don't need the current directory's path, you just need root as the base for the path join as you've done.

os.walk without digging into directories below

How do I limit os.walk to only return files in the directory I provide it?
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
return outputList
Don't use os.walk.
Example:
import os
root = "C:\\"
for item in os.listdir(root):
if os.path.isfile(os.path.join(root, item)):
print item
Use the walklevel function.
import os
def walklevel(some_dir, level=1):
some_dir = some_dir.rstrip(os.path.sep)
assert os.path.isdir(some_dir)
num_sep = some_dir.count(os.path.sep)
for root, dirs, files in os.walk(some_dir):
yield root, dirs, files
num_sep_this = root.count(os.path.sep)
if num_sep + level <= num_sep_this:
del dirs[:]
It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.
I think the solution is actually very simple.
use
break
to only do first iteration of the for loop, there must be a more elegant way.
for root, dirs, files in os.walk(dir_name):
for f in files:
...
...
break
...
The first time you call os.walk, it returns tulips for the current directory, then on next loop the contents of the next directory.
Take original script and just add a break.
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
break
return outputList
The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().
The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))
You could use os.listdir() which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.
If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.
ie:
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
dirs[:] = [d for d in dirs if is_good(d)]
for f in files:
do_stuff()
Note - be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn't know about the external rebinding.
for path, dirs, files in os.walk('.'):
print path, dirs, files
del dirs[:] # go only one level deep
Felt like throwing my 2 pence in.
baselevel = len(rootdir.split(os.path.sep))
for subdirs, dirs, files in os.walk(rootdir):
curlevel = len(subdirs.split(os.path.sep))
if curlevel <= baselevel + 1:
[do stuff]
The same idea with listdir, but shorter:
[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]
Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.
You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir
You could also do the following:
for path, subdirs, files in os.walk(dir_name):
for name in files:
if path == ".": #this will filter the files in the current directory
#code here
In Python 3, I was able to do this:
import os
dir = "/path/to/files/"
#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )
#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )
root folder changes for every directory os.walk finds. I solver that checking if root == directory
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
if root == dir_name: #This only meet parent folder
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
return outputList
import os
def listFiles(self, dir_name):
names = []
for root, directory, files in os.walk(dir_name):
if root == dir_name:
for name in files:
names.append(name)
return names
This is how I solved it
if recursive:
items = os.walk(target_directory)
else:
items = [next(os.walk(target_directory))]
...
There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:
for dirname in os.listdir(rootdir):
if os.path.isdir(os.path.join(rootdir, dirname)):
print("I got a subdirectory: %s" % dirname)
The alternative is to change to the directory to do the testing without the os.path.join().
You can use this snippet
for root, dirs, files in os.walk(directory):
if level > 0:
# do some stuff
else:
break
level-=1
create a list of excludes, use fnmatch to skip the directory structure and do the process
excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
for root, directories, files in os.walk(nf_root):
....
do the process
....
same as for 'includes':
if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):
Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.
For example like this:
# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
# logic stuff
# your later part
Works for me on python 3.
Also: A break is simpler too btw. (Look at the answer from #Pieter)
A slight change to Alex's answer, but using __next__():
print(next(os.walk('d:/'))[2])
or
print(os.walk('d:/').__next__()[2])
with the [2] being the file in root, dirs, file mentioned in other answers
This is a nice python example
def walk_with_depth(root_path, depth):
if depth < 0:
for root, dirs, files in os.walk(root_path):
yield [root, dirs[:], files]
return
elif depth == 0:
return
base_depth = root_path.rstrip(os.path.sep).count(os.path.sep)
for root, dirs, files in os.walk(root_path):
yield [root, dirs[:], files]
cur_depth = root.count(os.path.sep)
if base_depth + depth <= cur_depth:
del dirs[:]

Categories

Resources