How to separate specific files from sub-folders? - python

I have 5 folders and inside of them, I have different files and I have to exclude files that start with a specific string. The code that I have written to open directory, sub-folders reading sorting files is below but it is not able to exclude files.
yourpath = r'C:\Users\Hasan\Desktop\output\new our scenario\beta 15\test'
import os
import numpy as np
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
#print(os.path.join(root, name))
CASES = [f for f in sorted(os.path.join(root,name)) if f.startswith('config')] #To find the files
maxnum = np.max([int(os.path.splitext(f)[0].split('_')[1]) for f in CASES]) #Sorting based on numbers
CASES= ['configuration_%d.out' % i for i in range(maxnum)] #Reading sorted files
## Doing My computations

I am kinda confused by this line:
CASES = [f for f in sorted(os.path.join(root,name)) if f.startswith('config')] #To find the files
Are you trying to find files in the directory which are starting with 'config' and adding them to the list 'CASES'?
Because then the logic is a little bit off. You are creating a full path with os.path.join, then checking if the full path 'C:...' starts with config. And on top of that you store the filename as a sorted list. ['C', ':', ...].
You can could simply say:
if name.startswith('config'):
CASES.append(name)
or
CASES.append(name) if name.startswith('config') else None

Related

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Python: Searching a directory and subdirectories for a certain file type

I have a folder that contains a few files of different types (.cpp, .hpp, .ipp ...) and in that folder are multiple sub-folders which also contain these different file types. My question is, is there a single loop that I can make that will search the first main folder and return a list full of all the .cpp files from either folder? So far, I know that:
folder_list = [f for f in os.listdir(os.getcwd()) if os.path.isdir(f)]
will return a list of the sub-folders, and then I can change the working directory and get the files list to append.
I also know that:
file_list = [f for f in listdir(os.getcwd()) if isfile(join(os.getcwd(), f))]
will return a list of the files.
However, I won't know the names of these sub-folders (and therefore the directory) beforehand. Thank you for any help
Just use the pathlib.Path.rglob function
from pathlib import Path
list(Path(".").rglob("*.cpp" ))
You can do it using listdir and endswith to identify characters at the end of a string:
filetypes = ['cpp', 'hpp', 'ipp']
dir = "target directory"
files = [[f for f in os.listdir(dir) if f.endswith(type_)] for type_ in filetypes]
This will result in list of lists where each list will hold files of specific type.
I think what you're looking for is os.walk():
filetypes = ['.cpp', '.hpp', '.ipp']
for current_folder, subfolders, files in os.walk(dir):
files = [f for f in files if f.endswith(filetype) for filetype in filetypes]
More info on os.walk() here: https://www.tutorialspoint.com/python/os_walk.htm

How to add specific files from a series of folders to an array?

So far I've managed to compile all of the files from a series of folders using the following:
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
for sub in subfolders:
for f in os.listdir(sub):
print(f)
files = [i for i in f if os.path.isfile(os.path.join(f,'*.txt')) and 'data' in f]
Where f prints out the names of all of the files. What I want to do is take only certain files from this (starts with 'data' and is a .txt file) and put these in an array called files. The last line in the above code is where I tried to do this but whenever I print files it's still an empty array. Any ideas where I'm going wrong and how to fix it?
Update
I've made some progress, I changed the last line to:
if os.path.isfile(os.path.join(sub,f)) and 'data' in f:
files.append(f)
So I now have an array with the correct file names. The problem now is that there's a mix of .meta, .index and .txt files and I only want the .txt files. What's the best way to filter out the other types of files?
I would probably do it like this. Considering f is the filename, and is a string, python has functions startswith() and endswith() that can be applied to specifically meet your criteria of starting with data and ending with .txt. If we find such a file, we append it to file_list. If you want the full path in file_list, I trust you are able to make that modification.
import os
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
file_list = []
for sub in subfolders:
for f in os.listdir(sub):
if (f.startswith("data") and f.endswith(".txt")):
file_list.append(f)
print(file_list)

Exclude specific folders and subfolders in os.walk

List all the files having ext .txt in the current directory .
L = [txt for f in os.walk('.')
for txt in glob(os.path.join(file[0], '*.txt'))]
I want to avoid files from one specific directory and its subdirectories . Lets say I do not want to dig into folder3 and its available subdirectories to get the .txt files. I tried below
d = list(filter(lambda x : x != 'folder3', next(os.walk('.'))[1]))
but further steps not able to figure it out.How to include both to work together?
EDIT:
I tried referring the link provided as already answered query but I am unable to get desired output with below and surprisingly getting empty list as output for a
a=[]
for root, dirs, files in os.walk('.'):
dirs[:] = list(filter(lambda x : x != 'folder3', dirs))
for txt in glob(os.path.join(file[0], '*.txt')):
a.append(txt)
The following solution seems to be working, any directory specified in the exclude set will be ignored, any extension in the extensions set will be included.
import os
exclude = set(['folder3'])
extensions = set(['.txt', '.dat'])
for root, dirs, files in os.walk('c:/temp/folder', topdown=True):
dirs[:] = [d for d in dirs if d not in exclude]
files = [file for file in files if os.path.splitext(file)[1] in extensions]
for fname in files:
print(fname)
This code uses the option topdown=True to modify the list of dir names in place as specified in the docs:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?
os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)
The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))
If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.
def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path

Categories

Resources