Python - recursive directory hit with os.walk()

Python - recursive directory hit with os.walk() - python

I'm writing a program that renames files and directories by taking out a certain pattern.
My renaming function works well for files since os.walk() targets all files, but no so much with directories
for root, dirs, files in os.walk(path): # Listing the files
for i, foldername in enumerate(dirs):
output = foldername.replace(pattern, "") # Taking out pattern
if output != foldername:
os.rename( # Renaming
os.path.join(path, foldername),
os.path.join(path, output))
else:
pass
Could someone suggest a solution to target ALL directories and not only first level ones?

Setting topdown=False in os.walk does the trick
for root, dirs, files in os.walk(path, topdown=False): # Listing the files
for i, name in enumerate(dirs):
output = name.replace(pattern, "") # Taking out pattern
if output != name:
os.rename( # Renaming
os.path.join(root, name),
os.path.join(root, output))
else:
pass
Thanks to J.F Sebastian!

This will do the trick (pymillsutils.getFiles()):
def getFiles(root, pattern=".*", tests=[isfile], **kwargs):
"""getFiles(root, pattern=".*", tests=[isfile], **kwargs) -> list of files
Return a list of files in the specified path (root)
applying the predicates listed in tests returning
only the files that match the pattern. Some optional
kwargs can be specified:
* full=True (Return full paths)
* recursive=True (Recursive mode)
"""
def test(file, tests):
for test in tests:
if not test(file):
return False
return True
full = kwargs.get("full", False)
recursive = kwargs.get("recursive", False)
files = []
for file in os.listdir(root):
path = os.path.abspath(os.path.join(root, file))
if os.path.isdir(path):
if recursive:
files.extend(getFiles(path, pattern, **kwargs))
elif test(path, tests) and re.match(pattern, path):
if full:
files.append(path)
else:
files.append(file)
return files
Usage:
getFiles("*.txt", recursive=True)
To list just directories:
from os.path import isdir
getFiles("*.*", tests=[isdir], recursive=True)
There is also a nice OOP-style library for path manipulation and traversal called py which has really nice API(s) that I quite like and use in all my projects.

Related

How to extract sub-folder path?

I have about 100 folders and in each folder files that should be read and analyzed.
I can read the files from their subfolders, but I want to start processing at e.g. the 10th folder until the end. And I need the exact folder path.
How can I do this?
To clarify my question, I extracted a sample from my code:
rootDir = 'D:/PhD/result/Pyradiomic_input/'
for (path, subdirs, files) in os.walk(rootDir):
sizefile=len(path)
if "TCGA-" in path :
print(path)
The output is:
D:/PhD/result/Pyradiomic_input/TCGA-02-0006
D:/PhD/result/Pyradiomic_input/TCGA-02-0009
D:/PhD/result/Pyradiomic_input/TCGA-02-0011
D:/PhD/result/Pyradiomic_input/TCGA-02-0027
D:/PhD/result/Pyradiomic_input/TCGA-02-0046
D:/PhD/result/Pyradiomic_input/TCGA-02-0069
Now my question is how can I start working from e.g. D:/PhD/result/Pyradiomic_input/TCGA-02-0046 until the end, instead of starting from the top? I tried some ideas but they did not work.

You could set a flag to capture when you hit a specific directory
rootDir = 'D:/PhD/result/Pyradiomic_input/'
first_folder = 'TCGA-02-0046'
process = False
for (path, subdirs, files) in os.walk(rootDir):
sizefile=len(path)
if "TCGA-" in path :
print(path)
if first_folder in path:
process = True
if process:
#process folder
If you want a specific folder to indicate the script should stop processing
rootDir = 'D:/PhD/result/Pyradiomic_input/'
first_folder = 'TCGA-02-0046'
last_folder = 'TCGA-02-0099'
process = False
for (path, subdirs, files) in os.walk(rootDir):
sizefile=len(path)
if "TCGA-" in path :
print(path)
if first_folder in path:
process = True
if last_folder in path:
break
if process:
#process folder
You can also set a list of directories that you want to process
rootDir = 'D:/PhD/result/Pyradiomic_input/'
process_dirs = ['TCGA-02-0046', ...]
process = False
for (path, subdirs, files) in os.walk(rootDir):
sizefile=len(path)
if "TCGA-" in path :
print(path)
if any(d in path for d in process_dirs):
#process folder

You can simply skip the values you aren't interested in. Here a bit simplified:
counter = 0
# mocking the file operations
for path in ["/dir-1", "/dir-2", "/dir-3", "/dir-4", "/dir-5"]:
# skip the first two paths
if counter < 2:
counter += 1
continue
# do something
print(path)
Alternatively you could collect the paths first, like this:
paths = []
# mocking the file operations
for path in ["/dir-1", "/dir-2", "/dir-3", "/dir-4", "/dir-5"]:
# collect paths in array
paths.append(path)
# skip the first two elements
paths = paths[2:]
for path in paths:
# do something
print(path)
The second version can become a bit shorter if you use generator expressions, but I favor readability.

Iterating in python with os.walk not returning the expected values

I am new to python and I've been trying to iterate through a big directory with plenty of subdirectories with certain depth.
Until now i got until this code.
for dirpath, subdirs, files in os.walk("//media//rayeus//Datos//Mis Documentos//Nueva Carpeta//", topdown=True):
for name in files:
f = os.path.join(dirpath, name)
print f
for name in subdirs:
j = os.path.join(dirpath, name)
print j
the idea is use the iterations to make an inventory on an excel file of the structure inside the directory.
The problem is that if i just leave the same path without "Nueva carpeta" it works perfectly... but when i add "Nueva Carpeta" the script runs without errors but doesnt returns anything

import os
def crawl(*args, **kw):
'''
This will yield all files in all subdirs
'''
for root, _, files in os.walk(*args, **kw):
for fname in files:
yield os.path.join(root, fname)
for fpath in crawl('.', topdown=1):
print fpath

How to remove all empty files within folder and its sub folders?

I am trying to remove all empty files in a folder, and there are folders within the folder so it needs to check inside those folders too:
e.g
remove all empty files within C:\folder1\folder1 and C:\folder1\folder2 etc

import sys
import os
def main():
getemptyfiles(sys.argv[1])
def getemptyfiles(rootdir):
for root, dirs, files in os.walk(rootdir):
for d in ['RECYCLER', 'RECYCLED']:
if d in dirs:
dirs.remove(d)
for f in files:
fullname = os.path.join(root, f)
try:
if os.path.getsize(fullname) == 0:
print fullname
os.remove(fullname)
except WindowsError:
continue
This will work with a bit of adjusting:
The os.remove() statement could fail so you might want to wrap it with try...except as well. WindowsError is platform specific. Filtering the traversed directories is not strictly necessary but helpful.

The for loop uses dir to find all files, but not directories, in the current directory and all subfolders recursively. Then the second line checks to see if the length of each file is less than 1 byte before deleting it.
cd /d C:\folder1
for /F "usebackq" %%A in (`dir/b/s/a-d`) do (
if %%~zA LSS 1 del %%A
)

import os
while(True):
path = input("Enter the path")
if(os.path.isdir(path)):
break
else:
print("Entered path is wrong!")
for root,dirs,files in os.walk(path):
for name in files:
filename = os.path.join(root,name)
if os.stat(filename).st_size == 0:
print(" Removing ",filename)
os.remove(filename)

I do first remove empty files, afterwards by following this answer (https://stackoverflow.com/a/6215421/2402577), I have removed the empty folders.
In addition, I added topdown=False in os.walk() to walk from leaf to roo since the default behavior of os.walk() is to walk from root to leaf.
So empty folders that also contains empty folders or files are removed as well.
import os
def remove_empty_files_and_folders(dir_path) -> None:
for root, dirnames, files in os.walk(dir_path, topdown=False):
for f in files:
full_name = os.path.join(root, f)
if os.path.getsize(full_name) == 0:
os.remove(full_name)
for dirname in dirnames:
full_path = os.path.realpath(os.path.join(root, dirname))
if not os.listdir(full_path):
os.rmdir(full_path)

I hope this can help you
#encoding = utf-8
import os
docName = []
def listDoc(path):
docList = os.listdir(path)
for doc in docList:
docPath = os.path.join(path,doc)
if os.path.isfile(docPath):
if os.path.getsize(docPath)==o:
os.remove(docPath)
if os.path.isdir(docPath):
listDoc(docPath)
listDoc(r'C:\folder1')

(python) recursively remove capitalisation from directory structure?

uppercase letters - what's the point of them? all they give you is rsi.
i'd like to remove as much capitalisation as possible from my directory structure. how would i write a script to do this in python?
it should recursively parse a specified directory, identify the file/folder names with capital letters and rename them in lowercase.

os.walk is great for doing recursive stuff with the filesystem.
import os
def lowercase_rename( dir ):
# renames all subforders of dir, not including dir itself
def rename_all( root, items):
for name in items:
try:
os.rename( os.path.join(root, name),
os.path.join(root, name.lower()))
except OSError:
pass # can't rename it, so what
# starts from the bottom so paths further up remain valid after renaming
for root, dirs, files in os.walk( dir, topdown=False ):
rename_all( root, dirs )
rename_all( root, files)
The point of walking the tree upwards is that when you have a directory structure like '/A/B' you will have path '/A' during the recursion too. Now, if you start from the top, you'd rename /A to /a first, thus invalidating the /A/B path. On the other hand, when you start from the bottom and rename /A/B to /A/b first, it doesn't affect any other paths.
Actually you could use os.walk for top-down too, but that's (slightly) more complicated.

try the following script:
#!/usr/bin/python
'''
renames files or folders, changing all uppercase characters to lowercase.
directories will be parsed recursively.
usage: ./changecase.py file|directory
'''
import sys, os
def rename_recursive(srcpath):
srcpath = os.path.normpath(srcpath)
if os.path.isdir(srcpath):
# lower the case of this directory
newpath = name_to_lowercase(srcpath)
# recurse to the contents
for entry in os.listdir(newpath): #FIXME newpath
nextpath = os.path.join(newpath, entry)
rename_recursive(nextpath)
elif os.path.isfile(srcpath): # base case
name_to_lowercase(srcpath)
else: # error
print "bad arg: " + srcpath
sys.exit()
def name_to_lowercase(srcpath):
srcdir, srcname = os.path.split(srcpath)
newname = srcname.lower()
if newname == srcname:
return srcpath
newpath = os.path.join(srcdir, newname)
print "rename " + srcpath + " to " + newpath
os.rename(srcpath, newpath)
return newpath
arg = sys.argv[1]
arg = os.path.expanduser(arg)
rename_recursive(arg)

os.walk without digging into directories below

How do I limit os.walk to only return files in the directory I provide it?
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
return outputList

Don't use os.walk.
Example:
import os
root = "C:\\"
for item in os.listdir(root):
if os.path.isfile(os.path.join(root, item)):
print item

Use the walklevel function.
import os
def walklevel(some_dir, level=1):
some_dir = some_dir.rstrip(os.path.sep)
assert os.path.isdir(some_dir)
num_sep = some_dir.count(os.path.sep)
for root, dirs, files in os.walk(some_dir):
yield root, dirs, files
num_sep_this = root.count(os.path.sep)
if num_sep + level <= num_sep_this:
del dirs[:]
It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.

I think the solution is actually very simple.
use
break
to only do first iteration of the for loop, there must be a more elegant way.
for root, dirs, files in os.walk(dir_name):
for f in files:
...
...
break
...
The first time you call os.walk, it returns tulips for the current directory, then on next loop the contents of the next directory.
Take original script and just add a break.
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
break
return outputList

The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().
The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))

You could use os.listdir() which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.

If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.
ie:
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
dirs[:] = [d for d in dirs if is_good(d)]
for f in files:
do_stuff()
Note - be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn't know about the external rebinding.

for path, dirs, files in os.walk('.'):
print path, dirs, files
del dirs[:] # go only one level deep

Felt like throwing my 2 pence in.
baselevel = len(rootdir.split(os.path.sep))
for subdirs, dirs, files in os.walk(rootdir):
curlevel = len(subdirs.split(os.path.sep))
if curlevel <= baselevel + 1:
[do stuff]

The same idea with listdir, but shorter:
[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.
You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir

You could also do the following:
for path, subdirs, files in os.walk(dir_name):
for name in files:
if path == ".": #this will filter the files in the current directory
#code here

In Python 3, I was able to do this:
import os
dir = "/path/to/files/"
#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )
#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

root folder changes for every directory os.walk finds. I solver that checking if root == directory
def _dir_list(self, dir_name, whitelist):
outputList = []
for root, dirs, files in os.walk(dir_name):
if root == dir_name: #This only meet parent folder
for f in files:
if os.path.splitext(f)[1] in whitelist:
outputList.append(os.path.join(root, f))
else:
self._email_to_("ignore")
return outputList

import os
def listFiles(self, dir_name):
names = []
for root, directory, files in os.walk(dir_name):
if root == dir_name:
for name in files:
names.append(name)
return names

This is how I solved it
if recursive:
items = os.walk(target_directory)
else:
items = [next(os.walk(target_directory))]
...

There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:
for dirname in os.listdir(rootdir):
if os.path.isdir(os.path.join(rootdir, dirname)):
print("I got a subdirectory: %s" % dirname)
The alternative is to change to the directory to do the testing without the os.path.join().

You can use this snippet
for root, dirs, files in os.walk(directory):
if level > 0:
# do some stuff
else:
break
level-=1

create a list of excludes, use fnmatch to skip the directory structure and do the process
excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
for root, directories, files in os.walk(nf_root):
....
do the process
....
same as for 'includes':
if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.
For example like this:
# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
# logic stuff
# your later part
Works for me on python 3.
Also: A break is simpler too btw. (Look at the answer from #Pieter)

A slight change to Alex's answer, but using __next__():
print(next(os.walk('d:/'))[2])
or
print(os.walk('d:/').__next__()[2])
with the [2] being the file in root, dirs, file mentioned in other answers

This is a nice python example
def walk_with_depth(root_path, depth):
if depth < 0:
for root, dirs, files in os.walk(root_path):
yield [root, dirs[:], files]
return
elif depth == 0:
return
base_depth = root_path.rstrip(os.path.sep).count(os.path.sep)
for root, dirs, files in os.walk(root_path):
yield [root, dirs[:], files]
cur_depth = root.count(os.path.sep)
if base_depth + depth <= cur_depth:
del dirs[:]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - recursive directory hit with os.walk() - python

Related

How to extract sub-folder path?

Iterating in python with os.walk not returning the expected values

How to remove all empty files within folder and its sub folders?

(python) recursively remove capitalisation from directory structure?

os.walk without digging into directories below

Categories

Resources