Python os.walk skip directories with specific name instead of path - python

So I have a file system that I want to be able to check and update using python. my solution was os.walk but it becomes problematic with my needs and my file system. This is how the directories are laid out:
Root
dir1
subdir
1
2
3...
file1
file2
dir2
subdir
1
2
3...
file1
file2
...
The main directories have different names hence "dir1" and "dir2" but the directories inside those have the same name as each other and contain a lot of different files and directories. The sub directories are the ones I want to exclude from os.walk as they add unnecessary computing.
Is there a way to exclude directories from os.walk based on the directory's name instead of path or will I need to do something else?

os.walk allows you to modify the list of directories it gives you. If you take some out, it won't descend into those directories.
for dirpath, dirnames, filenames in os.walk("/root/path"):
if "subdir" in dirnames:
dirnames.remove("subdir")
# process the files here
(Note that this doesn't work if you use the bottom-up style of scanning. The top-down style is the default.)
See the documentation

Related

Select files from specific directories

I am trying to loop through a list of subdirectories, and perform two related operations:
Only select subdirectories that match a certain pattern, and save part of that name
Read a file in that subdirectory
I have tried adapting the answers in this question but am having trouble opening only certain subdirectories. I know I can do this recursively, where I loop through every file, and pull its parent directory using Path.parent, but this would also go into the directories I am not interested in.
My file structure looks like:
002normal
|- names.txt
|- test.txt
002custom
|- names.txt
|- test.txt
I would like only the directories ending in "normal". I'll then read the file named "names.txt" in that directory. I have tried something like the below, without luck.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
for f in files:
print(subdir)
You can modify the dirs list in-place to filter out any subdirectories with names not ending with 'normal' so that os.walk won't traverse into them:
for subdir, dirs, files in os.walk(root_dir):
dirs[:] = (name for name in dirs if name.endswith('normal'))
if 'names.txt' in files:
with open(os.path.join(subdir, 'names.txt')) as file:
print(os.path.basename(subdir), file.read())
Excerpt from the documentation of os.walk:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
if str(subdir).endswith("normal"):
for file in files:
if str(file).startswith("names"):
print(os.path.basename(subdir), file)
f = open(os.path.join(root_dir,subdir,file), "r")
print(f.read())
That's how you can do it with your file structure. First you check if any subdir ends with "normal" and if it does you can check the content in the file. Also you have to build the path to the file so that you can read the file with os.path.join
In case you have multiple subdirectories of unknown depth you have to do something with while, but as long as the directory which contains names.txt ends with normal it works.

Getting the absolute paths of all files in a folder, without traversing the subfolders

Let
my_dir = "/raid/user/my_dir"
be a folder on my filesystem, which is not the current folder (i.e., it's not the result of os.getcwd()). I want to retrieve the absolute paths of all files at the first level of hierarchy in my_dir (i.e., the absolute paths of all files which are in my_dir, but not in a subfolder of my_dir) as a list of strings absolute_paths. I need it, in order to later delete those files with os.remove().
This is nearly the same use case as
Get absolute paths of all files in a directory
but the difference is that I don't want to traverse the folder hierarchy: I only need the files at the first level of hierarchy (at depth 0? not sure about terminology here).
It's easy to adapt that solution: Call os.walk() just once, and don't let it continue:
root, dirs, files = next(os.walk(my_dir, topdown=True))
files = [ os.path.join(root, f) for f in files ]
print(files)
You can use the os.path module and a list comprehension.
import os
absolute_paths= [os.path.abspath(f) for f in os.listdir(my_dir) if os.path.isfile(f)]
You can use os.scandir which returns an os.DirEntry object that has a variety of options including the ability to distinguish files from directories.
with os.scandir(somePath) as it:
paths = [entry.path for entry in it if entry.is_file()]
print(paths)
If you want to list directories as well, you can, of course, remove the condition from the list comprehension if you want to see them in the list.
The documentation also has this note under listDir:
See also The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

os.walk() returning '/' instead of actual folder name

I know there are a lot of questions related to this, but I can't seem to find an answer that helps me solve the problem.
I'm using os.walk() to loop through subfolders in my main folder, which contains both folders and files.
Main Folder
Pass Folder
files.txt
Fail Folder
files.txt
file.txt
file2.txt
So I'm using this code to create a new text file based on the subfolder names. However this returns folder/.txt, which means that dirs is returning '/' and files is returning ['file.txt', 'file2.txt'].
for root, dirs, files in os.walk(path):
for dirs in root:
new_txt = 'folder%s.txt' % (dirs)
How do fix it so that dirs returns ['Main Folder/Pass Folder', 'Main Folder/Fail Folder'] and files returns the files in each folder?
I used something similar to this in my code recently (which, if I recall correctly, I also found on SO). Mine went something like this:
for (dirpath, subdirs, filelist) in os.walk(folder):
# join directories in here
From the documentation:
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
I'm not sure os.walk() does quite what you expect. I would suggest joining the directories together using os.path.join() to get what you want.

Get symlink using os.walk

I'm using os.walk to traverse my directories. The problem is I want to recognize if a file is a symbolic link, not following through with the link. This code:
for root, dirs, files in os.walk(PROJECT_PATH):
for f in files:
# I want os.path.islink(f) to return true for symlink here
# instead of ignoring them by default
will not give me symlinks, while this code
for root, dirs, files in os.walk(PROJECT_PATH, followlinks=True):
for f in files
will walk the directories that the symlinks point to but doesn't give me the symlinks themselves. Thanks.
os.walk() does give you symlinks. There are three things to take into account:
os.path.islink(f) is incorrect — you have to call os.path.islink on os.path.join(root, f).
Symlinks that point to directories will be included in dirs (but not followed, unless you also specify followlinks=True, which you don't need to do, since you don't need to actually follow them).
Symlinks that point to non-directories will be included in files.

Getting sub directories and files using python

What would be the best method of getting sub directories of a drive including files located within them? Would it be best to use os.listdir() and filter out directories from files by checking if they have a '.' in them?
Any ideas would be helpful, and i would much prefer that i use only the standard library for this task.
Take a look at os.walk(), it allows you to visit each directory and get a list of files and a list of sub directories for each directory that you visit.
Here is how you could only go down a single level:
for root, dirs, files in os.walk(path):
# do whatever you want to with dirs and files
if root != path:
# one level down, modify dirs in place so we don't go any deeper
del dirs[:]

Categories

Resources