Visiting multiple folders with extensions - python

I'm working on something here, and I'm completely confused. Basically, I have the script in my directory, and that script has to run on multiple folders with a particular extension. Right now, I have it up and running on a single folder. Here's the structure, I have a main folder say, Python, inside that I have multiple folders all with the same .ext, and inside each sub-folder I again have few folders, inside which I have the working file.
Now, I want the script to visit the whole path say, we are inside the main folder 'python', inside which we have folder1.ext->sub-folder1->working-file, come out of this again go back to the main folder 'Python' and start visiting the second directory.
Now there are so many things in my head, the glob module, os.walk, or the for loop. I'm getting the logic wrong. I desperately need some help.
Say, Path=r'\path1'
How do I start about? Would greatly appreciate any help.

I'm not sure if this is what you want, but this main function with a recursive helper function gets a dictionary of all of the files in a main directory:
import os, os.path
def getFiles(path):
'''Gets all of the files in a directory'''
sub = os.listdir(path)
paths = {}
for p in sub:
print p
pDir = os.path.join(path, p)
if os.path.isdir(pDir):
paths.update(getAllFiles(pDir, paths))
else:
paths[p] = pDir
return paths
def getAllFiles(mainPath, paths = {}):
'''Helper function for getFiles(path)'''
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = os.path.join(path, p)
if os.path.isdir(pathDir):
paths.update(getAllFiles(pathDir, paths))
else:
paths[path] = pathDir
return paths
This returns a dictionary of the form {'my_file.txt': 'C:\User\Example\my_file.txt', ...}.

Since you distinguish first level directories from its sub-directories, you could do something like this:
# this is a generator to get all first level directories
dirs = (d for d in os.listdir(my_path) if os.path.isdir(d)
and os.path.splitext(d)[-1] == my_ext)
for d in dirs:
for root, sub_dirs, files in os.walk(d):
for f in files:
# call your script on each file f

You could use Formic (disclosure: I am the author). Formic allows you to specify one multi-directory glob to match your files so eliminating directory walking:
import formic
fileset = formic.FileSet(include="*.ext/*/working-file", directory=r"path1")
for file_name in fileset:
# Do something with file_name
A couple of points to note:
/*/ matches every subdirectory, while /**/ recursively descends into every subdirectory, their subdirectories and so on. Some options:
If the working file is precisely one directory below your *.ext, then use /*/
If the working file is at any depth under *.ext, then use /**/ instead.
If the working file is at least one directory, then you might use /*/**/
Formic starts searching in the current working directory. If this is the correct directory, you can omit the directory=r"path1"
I am assuming the working file is literally called working-file. If not, substitute a glob that matches it, like *.sh or script-*.

Related

How to use data files of sub-directories and perform iterative operation in python

I have my jupyter notebook (python script) in current directory. In current directory, I have two subfolders, namely a and b. In both directories a and b I have equal number of .dat files with same names. For example, directory a contains files, namely x1-x1-val_1, x1-x1-val_5, x1-x1-val_11...x1-x1-val_86 and x1-x2-val_1, x1-x2-val_5, x1-x2-val_11...x1-x2-val_86, i.e. values are in range(1,90,5). Likewise I have files in directory b.
I want to use my python script to access files in a and b to perform iterative operations on .dat files. My present code works only if I keep files of directory a or b in current directory. For example, my script uses following function.
def get_info(test):
my_dict = {'test':test}
c = []
for i in range(1,90,5):
x_val = 'x_val_'+test+'-val_'+str(i)
y_val = 'y_val_'+test+'-val_'+str(i)
my_dict[x_val],my_dict[y_val]= np.loadtxt(test+'-val_'+str(i)+'.dat'
,usecols= (1,2),unpack=True)
dw = compute_yy(my_dict[x_val],my_dict[y_val],test)
c.append(dw)
my_dict.update({test+'_c'+:np.array(c)})
return my_dict
I call get_info() by using following:
tests = ['x1-x1', 'x1-x2']
new_dict = {}
for i in tests:
new_dict.update({i:get_info(i)})
How can I use my code to access files in either directory a and/or b? I know its about providing correct path, but I am unsure how can I do so. One way I thought is following;
ext = '.dat'
for files in os.listdir(path_to_dir):
if files.endswith(ext):
print(files) # do operations
Alternative could be to make use of os.path.join(). However, I am unable to solve it such that I can use same python script (with minimum changes perhaps) that can use files and iterate on them which are in subfolders a and b. Thanks for your feedback in advance!
If you want to run get_info() on every folder separatelly then you have two methods:
First: described by #medium-dimensional in comment
You can use os.chdir(folder) to change Current Working Directory. And then code will run with files in this folder
You can see current working directory with print( os.getcwd() )
os.chdir("a")
get_info(i)
os.chdir("..") # move back to parent folder
os.chdir("b")
get_info(i)
os.chdir("..") # move back to parent folder
chdir() (similar to command cd in console) can use relative path (r"a") full path (r"C:\full\path\to\a") and .. to move to parent folder (r"a\..\b")
If files can be in nested folders then .. may not go back you can use getcwd()
cwd = os.getcwd()
os.chdir("folder1/folder2/a")
get_info(i)
os.chdir(cwd) # move back to previous folder
os.chdir("folder1/folder2/b")
get_info(i)
os.chdir(cwd) # move back to previous folder
(BTW: in console on Linux you can use cd - to move back to previous folder)
Second: use folder when you open file
Every command which gets filename can also get path with folder\filename (it can be relative path, full path, and path with ..) like
r"a\filename.dat"
r"C:\full\path\to\b\filename.dat"
r"a\..\b\filename.dat"
So you could define function with extra option folder
def get_info(text, folder):
and use this folder when you read file
loadtxt(folder + r'\' + test+'-val_'+str(i)+'.dat', ...)
or more readable with f-string
loadtxt(rf'{folder}\{test}-val_{i}.dat', ...)
And later you run it as
get_info(i, "a")
get_info(i, "b")

Python os.path subfolder discover

I'm looking for effective way to go to every folder (including subfolder) in my directory list. I then need to run some processes on that folder (like size, number of folders and files etc.).
I know that I have 2 options for that:
- Recurrence (my current implementation, code below)
- At start of program generating list of all folders and invoking my function in look
I know that my current implementation is not perfect can somebody take a look on it and possibly advise any updates. In addition can somebody help me howto (I'm assuming using os.path library) generate list of all folder including subfolders ?
My current code that analyse folder (using recurrence):
def analyse_folder(path, resultlist=[]):
# This is trick to check are we in last directory
subfolders = fsprocess.get_subdirs(path)
for subfolder in subfolders:
analyse_folder(subfolder, resultlist)
files, dirs = fsprocess.get_numbers(subfolder)
size = fsprocess.get_folder_size(subfolder)
resultlist = add_result([subfolder, size, files, dirs], resultlist)
return resultlist
This is the code that getting list of subfolders inside folder:
def get_subdirs(rootpath, ignorelist=[]):
# We are starting with empty list
subdirs = []
# Generate main list
for path in os.listdir(rootpath):
# We are only interested in dirs and thins not from ignore list
if not os.path.isfile(os.path.join(rootpath, path)) and path not in ignorelist:
subdirs.append(os.path.join(rootpath, path))
# We are giving back list of subdirectories
return subdirs
And this is simple function to add it to resullist:
def add_result(result, main_list):
main_list.append(result)
return main_list
So if anyone can:
1) Tell me is my attitude is good
2) Provide me code to generate list of all of directories in given folder (for example everything under C:\users)
Thank you
Try os.walk:
import os
for (root, dirs, files) in os.walk(somefolder):
# root is the place you're listing
# dirs is a list of directories directly under root
# files is a list of files directly under root

How to move in and out of folders in python

so I'm a rookie at programming and I'm trying to make a program in python that basically opens a text file with a bunch of columns and writes the data to 3 different text files based on a string in the row. As my program stands right now, I have it change the directory to a specific output folder using os.chdir so it can open my text file but what I want is it to do something like this:
Imagine a folder set up like this :
Source Folder contains N number of folders. Each of those folders contains N number of output folders. Each output folder contains 1 Results.txt.
The idea is to have the program start at the source folder, look into Folder 1, look for output 1, open the .txt file then do it's thing. Once it's done, it should go back to folder 1 and open output 2 and do it's thing again. Then it should go back to Folder 1 and if it can't find any more output folders, it needs to go to Folder A and then enter Folder 2 and repeat the process until there are no more folders. Honestly not sure where to really start with this, the best I could do is make a small program that prints all my .txt files but I'm not sure how to open them at all. Hope my question makes sense and thanks for the help.
If all you need is to process each file in a directory recursively:
import os
def process_dir(dir):
for subdir, dirs, files in os.walk(dir):
for file in files:
file_path = os.path.join(subdir, file)
print file_path
# process file here
This will process each file in the root dir recursively. If you're looking for conditional iteration you might need to make the loop a little smarter.
Read the base folder path and stored into variable and move to sub folder and process the text file using chdir and base path change the directory and read the sub folder once again.
dirlist = os.listdir(os.getcwd())
dirlist = filter(lambda x: os.path.isdir(x), filelist)
for dirname in dirlist:
print os.path.join(os.getcwd(),dirname,'Results.txt')
first, i think you could format your question for better reading.
Concerning your question, here's a naïve implementation example :
import os
where = "J:/tmp/"
what = "Results.txt"
def processpath(where, name):
for elem in os.listdir(where):
elempath = os.path.join(where,elem)
if (elem == name):
# Do something with your file
f = open(elempath, "w") # example
f.write("modified2") # example
elif(os.path.isdir(elempath)):
processpath(elempath, name)
processpath(where, what)
I would do this without chdir. The most straight forward solution to me is to use os.listdir and filter the results. Then os.path.join to construct complete relative paths instead of chdir. I suspect this would be less prone to bugs such as winding up in an unexpected current working directory where all your relative paths are then wrong.
nfolders = [d for d in os.listdir(".") if re.match("^Folder [0-9]+$", d)]
for f1 in nfolders:
noutputs = [d for d in os.listdir(f1) if re.match("^Output [0-9]+$", d)]
for f2 in noutputs:
resultsFilename = os.path.join(f1, f2, "results.txt")
#do whatever with resultsFilename

Need 'if os.havefiles' like function for subfolder search in python

I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' don´t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...
I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).
os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.

Remove certain filetypes in Python

I am running a script that walks a directory structure and generates new files in each folder in the directory. I want to delete some of the files right after creation. This is my idea, but it is quite wrong I imagine:
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
os.remove(os.path.join(directory,"*.mesh.xml"))
How to you put wildcards in a path? I guess not like /home/me/*.txt, but that is what I am trying.
Thanks,
Gareth
You can use the glob module:
import glob
glob.glob("*.mesh.xml")
to get a list of matching files. Then you delete them, one by one.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
# you can use absolute pathes in the glob
# to ensure, that you're purging the files in
# the right directory, e.g. "/tmp/*.mesh.xml"
for f in glob.glob("*.mesh.xml"):
os.remove(f)
do a for loop with the list of files as the thing you are looping over.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
for filename in os.listdir(dir):
if not(re.match(".*\.mesh\".xml ,filename) is None):
os.remove(directory + "/" + file)

Categories

Resources