Looping through os.walk() confusion - python

I'm working on a script that will crawl a hard drive and collect information on each file
it encounters by way of fnmatch and magic.
I have a feeling that the first nested for-loop in yield_files(root) are unnecessary
def yield_files(root):
for root, dirs, files in os.walk(root):
""" Is this necessary
for directory in dirs:
for filename in directory:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
"""
for filename in files:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
Would os.walk() end-up recursing into these directories anyway?

def yield_files(root):
for root, dirs, files in os.walk(root):
for filename in files:
filename = os.path.join(root, filename)
if os.path.isfile(filename) or os.path.isdir(filename):
yield FileInfo(filename)
That's all you need. The rest is indeed unnecessary. os.walk goes into subdirectories so you don't need the current directory's path, you just need root as the base for the path join as you've done.

Related

Why does this Python (using 3.7) renamer not allow directories to start with a number?

I have created the following renamer (below) to replace periods from the filename and directory name, which seems to work fine for filenames but doesn't work for directory names if they start with an integer. No errors are raised. If there are no integers in any directory names, then it works fine for directories. Otherwise, it simply renames the files but not the directories. Can anybody tell me why and how to get around this?
Any help is much appreciated.
import os
def Replace_Filename(Root_Folder):
for Root, Dirs, Files in os.walk(Root_Folder):
for File in Files:
print(File)
Fname, Fext = os.path.splitext(File)
print(Fname)
print(Fext)
Replaced = Fname.replace(".","_")
print(Replaced)
New_Fname = Replaced + Fext
print(New_Fname)
F_path = os.path.join(Root, File)
print(F_path)
New_Fpath = os.path.join(Root, New_Fname)
print(New_Fpath)
os.rename(F_path, New_Fpath)
def Replace_Dirname(Root_Folder):
for Root, Dirs, Files in os.walk(Root_Folder):
for Dir in Dirs:
print(Dir)
New_Dname = Dir.replace(".","_")
print(New_Dname)
D_Path = os.path.join(Root, Dir)
print(D_Path)
New_Dpath = os.path.join(Root, New_Dname)
print(New_Dpath)
os.rename(D_Path, New_Dpath)
Root_Folder = "D:\\Practicerename-Copy"
Replace_Filename(Root_Folder)
Replace_Dirname(Root_Folder)

Recursive listing of files in a directory matching a pattern

I am trying to recursively list all file names that are in sub directories called Oracle (but not list files in other sub directories).
I have the following code:
for root, dirs, files in os.walk(r"Y:\Data\MXD_DC\DataSourceChange", topdown=True):
for name in dirs:
if fnmatch.fnmatch(name, 'Oracle'):
for filename in files:
fullpath = os.path.join(root, filename)
print "FullPath is: " + fullpath
I can only get it to list all file names of all sub directories. It does not even go to the sub directory called Oracle.
Currently, when you find a directory named Oracle, you list the files that are at the same level in the hierachy instead of listing the files contained in the Oracle folder, because the tuple returned by os.walk contains directories and files at same level.
You have 2 ways to list the expected files:
only use dirnames from walk and use listdir once you have found an Oracle folder
for root, dirs, files in os.walk(r"Y:\Data\MXD_DC\DataSourceChange", topdown=True):
for name in dirs:
if name == 'Oracle':
path = os.path.join(root, name)
for filename in os.listdir(path):
fullpath = os.path.join(path, filename)
print "FullPath is: " + fullpath
ignore dirnames, use last component from root and test if it is Oracle:
for root, dirs, files in os.walk(r"Y:\Data\MXD_DC\DataSourceChange", topdown=True):
if os.path.basename(root) == 'Oracle':
for filename in files:
fullpath = os.path.join(root, filename)
print "FullPath is: " + fullpath
if you want to list the files in a particular directory you can use
import os
os.listdir("Oracle")
to print the directories from a script use this
import os
print "%s" %os.listdir("Oracle")

Python list comprehension returns nested list

I built this little program to simulate 2 libraries I want to compare files with.
The code is this:
import os
path = "C:\Users\\nelson\Desktop\Lib Check"
pc_path = os.path.join(path, "pc")
phone_path = os.path.join(path, "phone")
pc_lib = [filename for path, dirname, filename in os.walk(pc_path)]
print pc_lib
it returns
[['1.txt', '2.txt', '3.txt', '4.txt', '5.txt', '6.txt', '8.txt', '9.txt']]
everything is fine except for the fact that the results are in a nested list. Why?
The only way I can stop this is by using
pc_lib = []
for path, dirname, filename in os.walk(pc_path):
pc_lib.extend(filename)
filename is a list of files (the name you've used is not intuitive), so the results are expected
for root, dirs, files in os.walk('my/home/directory'):
print(files)
#['close_button.gif', 'close_button_red.gif'],
#['toolbar.js']
extend unwraps the argument list and appends the resulting elements to the list that made the call
If you want a list of filenames, I would suggest:
[os.path.join(path, filename)
for path, dirnames, filenames in os.walk(pc_path)
for filename in filenames]
os.walk(path) returns an iterator over tuples of (root, dirs, files).
Where
root is the path of the current directory relative to the argument of os.walk
dirs is a list of all the subdirectories in the current directory
files is a list of all normal files in the current directory
If you want a flat list of all files in a filesystem tree, use:
pc_lib = [
filename for _, _, files in os.walk(pc_path)
for filename in files
]
You probably want to retain the absolute path, to get thos use:
pc_lib = [
os.path.join(root, filename)
for root, _, files in os.walk(pc_path)
for filename in files
]

Iterating in python with os.walk not returning the expected values

I am new to python and I've been trying to iterate through a big directory with plenty of subdirectories with certain depth.
Until now i got until this code.
for dirpath, subdirs, files in os.walk("//media//rayeus//Datos//Mis Documentos//Nueva Carpeta//", topdown=True):
for name in files:
f = os.path.join(dirpath, name)
print f
for name in subdirs:
j = os.path.join(dirpath, name)
print j
the idea is use the iterations to make an inventory on an excel file of the structure inside the directory.
The problem is that if i just leave the same path without "Nueva carpeta" it works perfectly... but when i add "Nueva Carpeta" the script runs without errors but doesnt returns anything
import os
def crawl(*args, **kw):
'''
This will yield all files in all subdirs
'''
for root, _, files in os.walk(*args, **kw):
for fname in files:
yield os.path.join(root, fname)
for fpath in crawl('.', topdown=1):
print fpath

Dynamic if statement in dictionary comprehension

I am using a dictionary comprehension to get a dictionary of key value pairs where the key is the name of an mp3 file and the value is the path to the file.
I do:
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == '.mp3'}
# more code
# ...
# ...
I do more logic with the source files in the more code part. Now, I want to repeat this logic for any pictures (i.e. .gif, .jpeg etc)
So I could do:
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == '.jpeg' or os.path.splitext(filename)[1].lower() == '.gif'}
and then wrap the more code part into a function and call it for the picture files. However, I am wondering could I just make the if expression dynamic in the dictionary comprehension and then just pass in one if expression for music files and another if expression for the picture files?
I think you are looking for the fnmatch.fnmatch function instead, or even fnmatch.filter()
from fnmatch import filter
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in filter(files, '*.jpg')}
But if you need to match multiple extensions, it's much easier to use str.endswith():
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in files if filename.endswith(('.jpg', '.png', '.gif'))}
Using .endswith() you can then use any string or tuple of extensions:
mp3s = '.mp3'
images = ('.jpg', '.png', '.gif')
then use:
extensions = images
for root, dirs, files in os.walk(rootDir):
source_files = {filename: root for filename in files if filename.endswith(extensions)}
I'm not sure why you are using a dict comprehension here; each iteration of the loop, root will be constant. You may as well do:
for root, dirs, files in os.walk(rootDir):
source_files = dict.fromkeys(filter(files, '*.jpg'), root)
or use
for root, dirs, files in os.walk(rootDir):
source_files = dict.fromkeys([f for f in files if f.endswith(extensions)], root)
If you wanted to create a dictionary of all files in a nested directory structure, you'll need to move the dict comprehension out and integrate the os.walk() call in the dict comprehension instead:
source_files = {filename: root
for root, dirs, files in os.walk(rootDir)
for filename in files if f.endswith(extensions)}
I removed all the topdown='true' lines; the default is topdown=True anyway (note: python booleans are True and False, not strings. It happened to work because 'true' as a string is 'truthy', it's considered True in a boolean context because it is non-empty).
Does this:
def a_func(extension):
# some code
for root, dirs, files in os.walk(rootDir, topdown='true'):
source_files = {filename:root for filename in files if os.path.splitext(filename)[1].lower() == extension}
# more code
# ...
# ...
fit your needs?

Categories

Resources