How can I make an array of subdirectories in Python? - python

I want to make an array of subdirectories in Python. Here is an example layout and model list I would like to obtain.
Root
|
directories
/ \
subdir_1... subdir_n
Hence from the root, I would like to run a program that will make a list of all the subdirectories. That way if I were to write:
print(List_of_Subdirectories)
Where List_of_Subdirectories is the list of appended directories. I would obtain the output:
[subdir_1, subdir_2, ... , subdir_n]
In essence, I would like to achieve the same results as if I were to hard code every directory into a list. For example:
List_of_Subdirectories = ["subdir_1", "subdir_2", ... , "subdir_n"]
Where subdir_n denotes an arbitrary nth directory.
Unlike other posts here on stack overflow, I would like the list to contain just the directory names without tuples or paths.

If you just want the directory names, you can use os.walk to do this:
os.walk(directory)
will yield a tuple for each subdirectory. The first entry in the 3-tuple is a directory name. You can wrap this in a function to simply return the list of directory names like so:
def list_paths(path):
directories = [x[1] for x in os.walk(path)]
non_empty_dirs = [x for x in directories if x] # filter out empty lists
return [item for subitem in non_empty_dirs for item in subitem] # flatten the list
should give you all of the directories.

If all you want is a list of the subdirectories of an specified directory, all you need is os.listdir and a filter to display only directories.
It's as simple as:
List_of_Subdirectories = list(filter(os.path.isdir, os.listdir()))
print(List_of_Subdirectories)
The return from os.listdir is a list containing the names of all the available elements in the specified directory (or . by default), directories and files. We filter only the directories using os.path.isdir. Then, as you want a list, we explicitly convert the filtered result.
You wouldn't be able to print the filtered result, but you would be able to iterate over it. The snippet below would achieve the same result as the one avobe.
directory_elements = filter(os.path.isdir, os.listdir())
List_of_Subdirectories = []
for element in directory_elements:
List_of_Subdirectories.append(element)
print(List_of_Subdirectories)

Related

Find files with regex and their respective directory

I'm working on the 'C:\Documents' directory.
It has many subdirectories and I need to find all the files that their filename starts with 'A0' prefix and ends with '.xls' extension. For example, 'A0SSS.xls' or 'A0ASDF.xls'
Is it possible to fetch all those files and get their directory?
For instace, if the file 'A0SSS.xls' is located on 'C:\Documents\Folder1', I need to know the file name (A0SSS.xls) along with their respective directory (C:\Documents\Folder1).
To find the path of the matching files, you run a recursive search with a filter. I recommend for you to use pathlib, so you can easily get the parent folder for each of them. The list of parent folders can be redundant, if you have got multiple matching files in the same folder. There are many ways to make a list unique in python. One of them is to convert the list to set, which must be unique by definition, and convert it back to list.
from pathlib import Path
search_path = Path("C:\Documents")
results = list(search_path.rglob("A0*.xlsx"))
string_results = [str(matching_path) for matching_path in results]
containing_folders = [r.parent for r in results]
unique_folders = list(set(containing_folders))
print("matching files:")
for r in string_results:
print(r)
print()
print("containing folders:")
for f in unique_folders:
print(f)

How to get ALL subdirectories, all levels deep except files in AWS S3 with python boto3

There are a lot of similar questions but I don't find an answer to exactly on this question. How to get ALL sub-directories starting from an initial one. The depth of the sub-directories is unknown.
Lets say I have:
data/subdir1/subdir2/file.csv
data/subdir1/subdir3/subdir4/subdir5/file2.csv
data/subdir6/subdir7/subdir8/file3.csv
So I would like to either get a list of all sub-directories all length deep OR even better all the paths one level before the files. In my example I would ideally want to get:
data/subdir1/subdir2/
data/subdir1/subdir3/subdir4/subdir5/
data/subdir6/subdir7/subdir8/
but I could work with this as well:
data/subdir1/
data/subdir1/subdir2/
data/subdir1/subdir3/
data/subdir1/subdir3/subdir4/
etc...
data/subdir6/subdir7/subdir8/
My code so far only gets me one level deep of directories:
result = await self.s3_client.list_objects(
Bucket=bucket, Prefix=prefix, Delimiter="/"
)
subfolders = set()
for content in result.get("CommonPrefixes"):
print(f"sub folder : {content.get('Prefix')}")
subfolders.add(content.get("Prefix"))
return subfolders
import os
# list_objects returns a dictionary. The 'Contents' key contains a
# list of full paths including the file name stored in the bucket
# for example: data/subdir1/subdir3/subdir4/subdir5/file2.csv
objects = s3_client.list_objects(Bucket='bucket_name')['Contents']
# here we iterate over the fullpaths and using
# os.path.dirname we get the fullpath excluding the filename
for obj in objects:
print(os.path.dirname(obj['Key'])
To make this a unique sorted list of directory "paths", we would use sort a set comprehension inline. Sets are unique, and sorted will convert this to a list.
See https://docs.python.org/3/tutorial/datastructures.html#sets
import os
paths = sorted({os.path.dirname(obj['Key']) for obj in objects})

Comparing two differently formatted lists in Python?

I need to compare two lists of records. One list has records that are stored in a network drive:
C:\root\to\file.pdf
O:\another\root\to\record.pdf
...
The other list has records stored in ProjectWise, collaboration software. It contains only filenames:
drawing.pdf
file.pdf
...
I want to create a list of the network drive file paths that do not have a filename that is in the ProjectWise list. It must include the paths. Currently, I am searching a list of each line in the drive list with a regular expression consisting of a line ending with any of the names in the ProjectWise list. The script is taking an unbearably long time and I feel I am overcomplicating the process.
I have thought about using sets to compare the lists (set(list1)-set(list2)) but this would only work with and return filenames on their own without the paths.
If you use os.path.basename on the list that contains full paths to the file you can get the filename and can then compare that to the other list.
import os
orig_list = [path_dict[os.path.basename(path) for path in file_path_list]
missing_filepaths = set(orig_list) - set(file_name_list)
that will get you a list of all filenames that don't have an associated path and you should be able to go from there.
Edit:
So, you want a list of paths that don't have an associated filename, correct? Then pretty simple. Extending from the code before you can do this:
paths_without_filenames = [path for path in file_path_list if os.path.split(path)[1] in missing_filepaths]
this will generate a list of filepaths from your list of filepaths that don't have an associated filename in the list of filenames.

Extract subset from several file names using python

I have a lot of files in a directory with name like:
'data_2000151_avg.txt', 'data_2000251_avg.txt', 'data_2003051_avg.txt'...
Assume that one of them is called fname. I would like to extract a subset from each like so:
fname.split('_')[1][:4]
This will give as a result, 2000. I want to collect these from all the files in the directory and create a unique list. How do I do that?
You should use os.
import os
dirname = 'PathToFile'
myuniquelist = []
for d in os.listdir(dirname):
if d.startswith('fname'):
myuniquelist.append(d.split('_')[1][:4])
EDIT: Just saw your comment on wanting a set. After the for loop add this line.
myuniquelist = list(set(myuniquelist))
If unique list means a list of unique values, then a combination of glob (in case the folder contains files that do not match the desired name format) and set should do the trick:
from glob import glob
uniques = {fname.split('_')[1][:4] for fname in glob('data_*_avg.txt')}
# In case you really do want a list
unique_list = list(uniques)
This assumes the files reside in the current working directory. Append path as necessary to glob('path/to/data_*_avg.txt').
For listing files in directory you can use os.listdir(). For generating the list of unique values best suitable is set comprehension.
import os
data = {f.split('_')[1][:4] for f in os.listdir(dir_path)}
list(data) #if you really need a list

python-read file names and build a namelist

I want to make a name list and store all the names in four folders. I build
namelist = {1:[], 2:[], 3:[], 4:[]}
In the method, I write
for file_name in sorted(os.listdir(full_subdir_name)):
full_file_name = os.path.join(full_subdir_name,file_name)
#namelist[level] += blabla...
I want to add the names from the first folder into namelist[1], from the second folder to namelist[2]. I don't know how I can add all the names in different levels to it. Thx!
I am not totally sure this is what you are getting at with your question above. But it seems that first, you want to use enumerate() to allow you to keep the index and name of the four folders. But then, I think you will actually need an additional for-loop, based on your comments above, to actually append all of the contents for each of the files within each sub-folder. The following should do the trick.
Also note that your dictionary keys start with 1, so you need to account for the fact that the enumerate index starts with 0.
# Here, enumerate gives back the index and element.
for i,file_name in enumerate(sorted(os.listdir(full_subdir_name))):
full_file_name = os.path.join(full_subdir_name,file_name)
# Here, 'elem' will be the strings naming the actual
# files inside of the folders.
for elem in sorted(os.listdir(full_file_name)):
# Here I am assuming you don't want to append the full path,
# but you can easily change what to append by adding the
# whole current file path: os.path.join(full_file_name, elem)
namelist[i+1].append(elem)

Categories

Resources