Getting sub directories and files using python - python

What would be the best method of getting sub directories of a drive including files located within them? Would it be best to use os.listdir() and filter out directories from files by checking if they have a '.' in them?
Any ideas would be helpful, and i would much prefer that i use only the standard library for this task.

Take a look at os.walk(), it allows you to visit each directory and get a list of files and a list of sub directories for each directory that you visit.
Here is how you could only go down a single level:
for root, dirs, files in os.walk(path):
# do whatever you want to with dirs and files
if root != path:
# one level down, modify dirs in place so we don't go any deeper
del dirs[:]

Related

Getting the absolute paths of all files in a folder, without traversing the subfolders

Let
my_dir = "/raid/user/my_dir"
be a folder on my filesystem, which is not the current folder (i.e., it's not the result of os.getcwd()). I want to retrieve the absolute paths of all files at the first level of hierarchy in my_dir (i.e., the absolute paths of all files which are in my_dir, but not in a subfolder of my_dir) as a list of strings absolute_paths. I need it, in order to later delete those files with os.remove().
This is nearly the same use case as
Get absolute paths of all files in a directory
but the difference is that I don't want to traverse the folder hierarchy: I only need the files at the first level of hierarchy (at depth 0? not sure about terminology here).
It's easy to adapt that solution: Call os.walk() just once, and don't let it continue:
root, dirs, files = next(os.walk(my_dir, topdown=True))
files = [ os.path.join(root, f) for f in files ]
print(files)
You can use the os.path module and a list comprehension.
import os
absolute_paths= [os.path.abspath(f) for f in os.listdir(my_dir) if os.path.isfile(f)]
You can use os.scandir which returns an os.DirEntry object that has a variety of options including the ability to distinguish files from directories.
with os.scandir(somePath) as it:
paths = [entry.path for entry in it if entry.is_file()]
print(paths)
If you want to list directories as well, you can, of course, remove the condition from the list comprehension if you want to see them in the list.
The documentation also has this note under listDir:
See also The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

os.walk() returning '/' instead of actual folder name

I know there are a lot of questions related to this, but I can't seem to find an answer that helps me solve the problem.
I'm using os.walk() to loop through subfolders in my main folder, which contains both folders and files.
Main Folder
Pass Folder
files.txt
Fail Folder
files.txt
file.txt
file2.txt
So I'm using this code to create a new text file based on the subfolder names. However this returns folder/.txt, which means that dirs is returning '/' and files is returning ['file.txt', 'file2.txt'].
for root, dirs, files in os.walk(path):
for dirs in root:
new_txt = 'folder%s.txt' % (dirs)
How do fix it so that dirs returns ['Main Folder/Pass Folder', 'Main Folder/Fail Folder'] and files returns the files in each folder?
I used something similar to this in my code recently (which, if I recall correctly, I also found on SO). Mine went something like this:
for (dirpath, subdirs, filelist) in os.walk(folder):
# join directories in here
From the documentation:
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
I'm not sure os.walk() does quite what you expect. I would suggest joining the directories together using os.path.join() to get what you want.

recursive script to rename folders ending with a space or period

We just switched over our storage server to a new file system. The old file system allowed users to name folders with a period or space at the end. The new system considers this an illegal character. How can I write a python script to recursively loop through all directories and rename and folder that has a period or space at the end?
Use os.walk. Give it a root directory path and it will recursively iterate over it. Do something like
for root, dirs, files in os.walk('root path'):
for dir in dirs:
if dir.endswith(' ') or dir.endswith('.'):
os.rename(...)
EDIT:
We should actually rename the leaf directories first - here is the workaround:
alldirs = []
for root, dirs, files in os.walk('root path'):
for dir in dirs:
alldirs.append(os.path.join(root, dir))
# the following two lines make sure that leaf directories are renamed first
alldirs.sort()
alldirs.reverse()
for dir in alldirs:
if ...:
os.rename(...)
You can use os.listdir to list the folders and files on some path. This returns a list that you can iterate through. For each list entry, use os.path.join to combine the file/folder name with the parent path and then use os.path.isdir to check if it is a folder. If it is a folder then check the last character's validity and, if it is invalid, change the folder name using os.rename. Once the folder name has been corrected, you can repeat the whole process with that folder's full path as the base path. I would put the whole process into a recursive function.

Get symlink using os.walk

I'm using os.walk to traverse my directories. The problem is I want to recognize if a file is a symbolic link, not following through with the link. This code:
for root, dirs, files in os.walk(PROJECT_PATH):
for f in files:
# I want os.path.islink(f) to return true for symlink here
# instead of ignoring them by default
will not give me symlinks, while this code
for root, dirs, files in os.walk(PROJECT_PATH, followlinks=True):
for f in files
will walk the directories that the symlinks point to but doesn't give me the symlinks themselves. Thanks.
os.walk() does give you symlinks. There are three things to take into account:
os.path.islink(f) is incorrect — you have to call os.path.islink on os.path.join(root, f).
Symlinks that point to directories will be included in dirs (but not followed, unless you also specify followlinks=True, which you don't need to do, since you don't need to actually follow them).
Symlinks that point to non-directories will be included in files.

How to list only regular files (excluding directories) under a directory in Python

One can use os.listdir('somedir') to get all the files under somedir. However, if what I want is just regular files (excluding directories) like the result of find . -type f under shell.
I know one can use [path for path in os.listdir('somedir') if not os.path.isdir('somedir/'+path)] to achieve similar result as in this related question: How to list only top level directories in Python?. Just wondering if there are more succinct ways to do so.
You could use os.walk, which returns a tuple of path, folders and files:
files = next(os.walk('somedir'))[2]
I have a couple of ways that i do such tasks. I cannot comment on the succinct nature of the solution. FWIW here they are:
1.the code below will take all files that end with .txt. you may want to remove the ".endswith" part
import os
for root, dirs, files in os.walk('./'): #current directory in terminal
for file in files:
if file.endswith('.txt'):
#here you can do whatever you want to with the file.
2.This code here will assume that the path is provided to the function and will append all .txt files to a list and if there are subdirectories in the path, it will append those files in the subdirectories to subfiles
def readFilesNameList(self, path):
basePath = path
allfiles = []
subfiles = []
for root, dirs, files in os.walk(basePath):
for f in files:
if f.endswith('.txt'):
allfiles.append(os.path.join(root,f))
if root!=basePath:
subfiles.append(os.path.join(root, f))
I know the code is just skeletal in nature but i think you can get the general picture.
post if you find the succinct way! :)
The earlier os.walk answer is perfect if you only want the files in the top-level directory. If you want subdirectories' files too, though (a la find), you need to process each directory, e.g.:
def find_files(path):
for prefix, _, files in os.walk(path):
for name in files:
yield os.path.join(prefix, name)
Now list(find_files('.')) is a list of the same thing find . -type f -print would have given you (the list is because find_files is a generator, in case that's not obvious).

Categories

Resources