Exclude specific folders and subfolders in os.walk - python

List all the files having ext .txt in the current directory .
L = [txt for f in os.walk('.')
for txt in glob(os.path.join(file[0], '*.txt'))]
I want to avoid files from one specific directory and its subdirectories . Lets say I do not want to dig into folder3 and its available subdirectories to get the .txt files. I tried below
d = list(filter(lambda x : x != 'folder3', next(os.walk('.'))[1]))
but further steps not able to figure it out.How to include both to work together?
EDIT:
I tried referring the link provided as already answered query but I am unable to get desired output with below and surprisingly getting empty list as output for a
a=[]
for root, dirs, files in os.walk('.'):
dirs[:] = list(filter(lambda x : x != 'folder3', dirs))
for txt in glob(os.path.join(file[0], '*.txt')):
a.append(txt)

The following solution seems to be working, any directory specified in the exclude set will be ignored, any extension in the extensions set will be included.
import os
exclude = set(['folder3'])
extensions = set(['.txt', '.dat'])
for root, dirs, files in os.walk('c:/temp/folder', topdown=True):
dirs[:] = [d for d in dirs if d not in exclude]
files = [file for file in files if os.path.splitext(file)[1] in extensions]
for fname in files:
print(fname)
This code uses the option topdown=True to modify the list of dir names in place as specified in the docs:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search

Related

Python: Searching a directory and subdirectories for a certain file type

I have a folder that contains a few files of different types (.cpp, .hpp, .ipp ...) and in that folder are multiple sub-folders which also contain these different file types. My question is, is there a single loop that I can make that will search the first main folder and return a list full of all the .cpp files from either folder? So far, I know that:
folder_list = [f for f in os.listdir(os.getcwd()) if os.path.isdir(f)]
will return a list of the sub-folders, and then I can change the working directory and get the files list to append.
I also know that:
file_list = [f for f in listdir(os.getcwd()) if isfile(join(os.getcwd(), f))]
will return a list of the files.
However, I won't know the names of these sub-folders (and therefore the directory) beforehand. Thank you for any help
Just use the pathlib.Path.rglob function
from pathlib import Path
list(Path(".").rglob("*.cpp" ))
You can do it using listdir and endswith to identify characters at the end of a string:
filetypes = ['cpp', 'hpp', 'ipp']
dir = "target directory"
files = [[f for f in os.listdir(dir) if f.endswith(type_)] for type_ in filetypes]
This will result in list of lists where each list will hold files of specific type.
I think what you're looking for is os.walk():
filetypes = ['.cpp', '.hpp', '.ipp']
for current_folder, subfolders, files in os.walk(dir):
files = [f for f in files if f.endswith(filetype) for filetype in filetypes]
More info on os.walk() here: https://www.tutorialspoint.com/python/os_walk.htm

How to separate specific files from sub-folders?

I have 5 folders and inside of them, I have different files and I have to exclude files that start with a specific string. The code that I have written to open directory, sub-folders reading sorting files is below but it is not able to exclude files.
yourpath = r'C:\Users\Hasan\Desktop\output\new our scenario\beta 15\test'
import os
import numpy as np
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
#print(os.path.join(root, name))
CASES = [f for f in sorted(os.path.join(root,name)) if f.startswith('config')] #To find the files
maxnum = np.max([int(os.path.splitext(f)[0].split('_')[1]) for f in CASES]) #Sorting based on numbers
CASES= ['configuration_%d.out' % i for i in range(maxnum)] #Reading sorted files
## Doing My computations
I am kinda confused by this line:
CASES = [f for f in sorted(os.path.join(root,name)) if f.startswith('config')] #To find the files
Are you trying to find files in the directory which are starting with 'config' and adding them to the list 'CASES'?
Because then the logic is a little bit off. You are creating a full path with os.path.join, then checking if the full path 'C:...' starts with config. And on top of that you store the filename as a sorted list. ['C', ':', ...].
You can could simply say:
if name.startswith('config'):
CASES.append(name)
or
CASES.append(name) if name.startswith('config') else None

Sort all files in folder and subfolder according to file basenames

I've one root folder under which there are two folders, now I want to sort all files under root folder according to their names irrespective of sub-folder names.
Below is code which I tried, but problem here is that is sorting according to the name of subfolder under which the files are :
.../verify/AU1/APPLaunch_ftrace_au1.txt,
.../verify/AU1/Mp3BT_ftrace_au1.txt,
.../verify/AU2/APPLaunch_ftrace_au2.txt,
.../verify/AU2/Mp3BT_ftrace_au2.txt
files_list = []
for root, dirs, files in os.walk(trace_folder, topdown = False):
files_list.extend(join(root,f) for f in files)
files_list.sort()
what I would like to have is :
.../verify/AU1/APPLaunch_ftrace_au1.txt,
.../verify/AU2/APPLaunch_ftrace_au2.txt,
.../verify/AU1/Mp3BT_ftrace_au1.txt,
.../verify/AU2/Mp3BT_ftrace_au2.txt
just add a sort criteria to sort which only considers the basename of the file
files_list.sort(key=os.path.basename)
if you don't care about casing, that's also doable:
files_list.sort(key=lambda x : os.path.basename(x).lower())

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?
os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)
The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))
If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.
def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path

How to get list of subdirectories names [duplicate]

This question already has answers here:
How to get all of the immediate subdirectories in Python
(15 answers)
Closed 7 years ago.
There is a directory that contains folders as well as files of different formats.
import os
my_list = os.listdir('My_directory')
will return full content of files and folders names. I can use, for example, endswith('.txt') method to select just text files names, but how to get list of just folders names?
I usually check for directories, while assembling a list in one go. Assuming that there is a directory called foo, that I would like to check for sub-directories:
import os
output = [dI for dI in os.listdir('foo') if os.path.isdir(os.path.join('foo',dI))]
You can use os.walk() in various ways
(1) to get the relative paths of subdirectories. Note that '.' is the same value you get from os.getcwd()
for i,j,y in os.walk('.'):
print(i)
(2) to get the full paths of subdirectories
for root, dirs, files in os.walk('path'):
print(root)
(3) to get a list of subdirectories folder names
dir_list = []
for root, dirs, files in os.walk(path):
dir_list.extend(dirs)
print(dir_list)
(4) Another way is glob module (see this answer)
Just use os.path.isdir on the results returned by os.listdir, as in:
def listdirs(path):
return [d for d in os.listdir(path) if os.path.isdir(os.path.join(path, d))]
That should work :
my_dirs = [d for d in os.listdir('My_directory') if os.path.isdir(os.path.join('My_directory', d))]
os.walk already splits files and folders up into different lists, and works recursively:
for root,dirs,_ in os.walk('.'):
for d in dirs:
print os.path.join(root,d)

Categories

Resources