Get all subdirectories quickly with Python

Get all subdirectories quickly with Python - python

I know the question of how to list all sub-directories in a given directories is answered in this question from 2011. It includes this accepted solution:
subdirs = [x[0] for x in os.walk(dirToSearch)]
That works fine when there are only a few files in the directory. However I am trying to use this on folders that contain thousands of files, and os.walk is apparently iterating over all of them, meaning it takes a really long time to run. Is there a way to do this (identify all subdirectories) without getting bogged down by the files? An alternative to os.walk that ignores files?
I'm trying to do this on a Windows network directory.
Thanks,
Alex

You can use pathlib for this.
This will get all immediate subdirectories:
from pathlib import Path
p = Path('.')
subdirs = [x for x in p.iterdir() if x.is_dir()]
This will get all nested subdirectories:
for subdir in p.glob('**/'):
print(subdir.name)

Related

Getting the absolute paths of all files in a folder, without traversing the subfolders

Let
my_dir = "/raid/user/my_dir"
be a folder on my filesystem, which is not the current folder (i.e., it's not the result of os.getcwd()). I want to retrieve the absolute paths of all files at the first level of hierarchy in my_dir (i.e., the absolute paths of all files which are in my_dir, but not in a subfolder of my_dir) as a list of strings absolute_paths. I need it, in order to later delete those files with os.remove().
This is nearly the same use case as
Get absolute paths of all files in a directory
but the difference is that I don't want to traverse the folder hierarchy: I only need the files at the first level of hierarchy (at depth 0? not sure about terminology here).

It's easy to adapt that solution: Call os.walk() just once, and don't let it continue:
root, dirs, files = next(os.walk(my_dir, topdown=True))
files = [ os.path.join(root, f) for f in files ]
print(files)

You can use the os.path module and a list comprehension.
import os
absolute_paths= [os.path.abspath(f) for f in os.listdir(my_dir) if os.path.isfile(f)]

You can use os.scandir which returns an os.DirEntry object that has a variety of options including the ability to distinguish files from directories.
with os.scandir(somePath) as it:
paths = [entry.path for entry in it if entry.is_file()]
print(paths)
If you want to list directories as well, you can, of course, remove the condition from the list comprehension if you want to see them in the list.
The documentation also has this note under listDir:
See also The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

How to iterate over folders in Python [duplicate]

This question already has answers here:
How can I iterate over files in a given directory?
(11 answers)
Closed 5 months ago.
Have repository folder in which I have 100 folders of images. I want to iterate over each folder and then do the same over images inside these folders.
for example : repository --> folder1 --> folder1_images ,folder2 --> folder2_images ,folder3 --> folder3_images
May someone know elegante way of doing it?
P.S my OS is MacOS (have .DS_Store files of metadata inside)

You can do use os.walk to visit every subdirectory, recursively. Here's a general starting point:
import os
parent_dir = '/home/example/folder/'
for subdir, dirs, files in os.walk(parent_dir):
for file in files:
print os.path.join(subdir, file)
Instead of print, you can do whatever you want, such as checking that the file type is image or not, as required here.

Have a look at os.walk which is meant exactly to loop through sub-directories and the files in them.
More info at : https://www.tutorialspoint.com/python/os_walk.htm

Everyone has covered how to iterate through directories recursively, but if you don't need to go through all directories recursively, and you just want to iterate over the subdirectories in the current folder, you could do something like this:
dirs = list(filter(lambda d: os.path.isdir(d), os.listdir(".")))
Annoyingly, the listdir function doesn't only list directories, it also lists files as well, so you have to apply the os.path.isdir() function to conditionally 'extract' the elements from the list only if they're a directory and not a file.
(Tested with Python 3.10.6)

Finding File Path in Python

I'd like to find the full path of any given file, but when I tried to use
os.path.abspath("file")
it would only give me the file location as being in the directory where the program is running. Does anyone know why this is or how I can get the true path of the file?

What you are looking to accomplish here is ultimately a search on your filesystem. This does not work out too well, because it is extremely likely you might have multiple files of the same name, so you aren't going to know with certainty whether the first match you get is in fact the file that you want.
I will give you an example of how you can start yourself off with something simple that will allow you traverse through directories to be able to search.
You will have to give some kind of base path to be able to initiate the search that has to be made for the path where this file resides. Keep in mind that the more broad you are, the more expensive your searching is going to be.
You can do this with the os.walk method.
Here is a simple example of using os.walk. What this does is collect all your file paths with matching filenames
Using os.walk
from os import walk
from os.path import join
d = 'some_file.txt'
paths = []
for i in walk('/some/base_path'):
if d in i[2]:
paths.append(join(i[0], d))
So, for each iteration over os.walk you are going to get a tuple that holds:
(path, directories, files)
So that is why I am checking against location i[2] to look at files. Then I join with i[0], which is the path, to put together the full filepath name.
Finally, you can actually put the above code all in to one line and do:
paths = [join(i[0], d) for i in walk('/some/base_path') if d in i[2]]

Recurse through selected level of subdirectories

I am new to python. I have the following piece of code which works well by retrieving selected directories into a list for me. But because there are quite a lot of sub-directories and files, the code is rather slow, compared to the Perl code which I have upgraded it from.
using re
using os
foundarr = []
allpaths = ["X:\\Storage", "Y:\\Storage"]
for path in allpaths:
for root, dirs, files in os.walk(path):
for dir in dirs:
if re.match("[DILMPY]\d{8}", dir):
foundarr.append(os.path.join(root, dir))
break
My question: Is there a way to recurse through ONLY a selected level of directories using os.walk ? Or somehow prune the ones I do not want to recurse through? I have added the break in the for loop assuming it will break after it finds my selected dir and moves on, but I dont think this helps as it still has to go through thousands of sub-directories and files.
In the Perl code a simple $File::Find::prune = 1 if /[DILMPY]\d{8}$/; prevents the compiler from recursing through the rest of the sub-directories and files.

If the depth is fixed using glob is a good idea. As per this SO post you can set the depth of traversal using glob.
import glob
import os.path
depth2 = glob.glob('*/*')
depth2 = filter(lambda f: os.path.isdir(f), depth2)
This will list all subdirectories with a depth of 2.

Python: how to discern if a path is within another path?

I need to know if pathA is a subset of, or is contained within pathB.
I'm making a little script that will walk some old volumes and find duplicate files. My general approach (and even if it's a bad one for it's inefficiency, it's just for me and it works, so I'm ok with the brute-forceness of it) has been:
Map all the files to a log
Create a hash for all the files in the log
Sort the hash list for duplicates
Move the duplicates somewhere for inspection prior to deletion
I want to be able to exclude certain directories, though (ie. System files). This is what I've written:
#self.search_dir = top level directory to be searched for duplicates
#self.mfl = master_file_list, being built by this func, a list of all files in search_dir
#self.no_crawl_list = list of files and directories to be excluded from the search
def build_master_file_list(self):
for root, directories, files in os.walk(self.search_dir):
files = [f for f in files if not f[0] == '.']
directories[:] = [d for d in directories if not d[0] == '.']
for filename in files:
filepath = os.path.join(root, filename)
if [root, filepath] in self.no_crawl_list:
pass
else:
self.mfl.write(filepath + "\n")
self.mfl.close()
But I'm pretty sure this isn't going to do what I'd intended. My goal is to have all subdirectories of anything in self.no_crawl_list excluded as well, such that:
if
/path/to/excluded_dir is added to self.no_crawl_list
then paths like /path/to/excluded_dir/sub_dir/implicitly_excluded_file.txt
will be skipped as well. I think my code is currently being entirely literal about what to skip. Short of exploding the path parts and comparing them to every possible combination in self.no_crawl_list, however, I don't know how to do this. 'Lil help? :)

As per the assistance of Lukas Graf in the comments above, I was able to build this and it works like a charm:
def is_subpath(self, path, of_paths):
if isinstance(of_paths, basestring): of_paths = [of_paths]
abs_of_paths = [os.path.abspath(of_path) for of_path in of_paths]
return any(os.path.abspath(path).startswith(subpath) for subpath in abs_of_paths)
Also, this currently doesn't account for symlinks and assumes a UNIX filesystem, see comments in original question for advice on extending this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get all subdirectories quickly with Python - python

You can use pathlib for this. This will get all immediate subdirectories: from pathlib import Path p = Path('.') subdirs = [x for x in p.iterdir() if x.is_dir()] This will get all nested subdirectories: for subdir in p.glob('**/'): print(subdir.name)

Related

Getting the absolute paths of all files in a folder, without traversing the subfolders

How to iterate over folders in Python [duplicate]

Finding File Path in Python

Recurse through selected level of subdirectories

Python: how to discern if a path is within another path?

Categories

Resources