Need 'if os.havefiles' like function for subfolder search in python - python

I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' don´t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...

I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).

os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.

Related

Getting the absolute paths of all files in a folder, without traversing the subfolders

Let
my_dir = "/raid/user/my_dir"
be a folder on my filesystem, which is not the current folder (i.e., it's not the result of os.getcwd()). I want to retrieve the absolute paths of all files at the first level of hierarchy in my_dir (i.e., the absolute paths of all files which are in my_dir, but not in a subfolder of my_dir) as a list of strings absolute_paths. I need it, in order to later delete those files with os.remove().
This is nearly the same use case as
Get absolute paths of all files in a directory
but the difference is that I don't want to traverse the folder hierarchy: I only need the files at the first level of hierarchy (at depth 0? not sure about terminology here).
It's easy to adapt that solution: Call os.walk() just once, and don't let it continue:
root, dirs, files = next(os.walk(my_dir, topdown=True))
files = [ os.path.join(root, f) for f in files ]
print(files)
You can use the os.path module and a list comprehension.
import os
absolute_paths= [os.path.abspath(f) for f in os.listdir(my_dir) if os.path.isfile(f)]
You can use os.scandir which returns an os.DirEntry object that has a variety of options including the ability to distinguish files from directories.
with os.scandir(somePath) as it:
paths = [entry.path for entry in it if entry.is_file()]
print(paths)
If you want to list directories as well, you can, of course, remove the condition from the list comprehension if you want to see them in the list.
The documentation also has this note under listDir:
See also The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

Finding duplicate folders and renaming them by prefixing parent folder name in python

I have a folder structure as shown below
There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.
e.g.
DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder.
eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt
What I have done till now ?
I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.
import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".txt"):
path_file = os.path.join(root)
print(path_file)
list_dir.append(path_file)
The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.
import os
cwd = os.getcwd()
to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
if to_be_renamed == set():
to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
else:
to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])
for rootdir in next(os.walk(cwd))[1]:
subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
for s in subdirs:
if s in to_be_renamed:
srcpath = os.path.join(cwd, rootdir, s)
dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
# First rename files
for f in next(os.walk(srcpath))[2]:
os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
# Now rename dir
os.rename(srcpath, dstpath)
print('Renamed', s, 'and files')
Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).
Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.
os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.
So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.
If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.
NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.

python recurcive os.path append with some brake

Hey, this is my first post (!)
Just looking after headache recursive solution to my littel project :)
Trying to collect all folders path (recursively),
thats contaion some specefic file
to array of path's.
ex:
my (root) path is:
c:/test
folder test is contain the file 'test.txt'
and some folders: '1','2','3'.
any of them contain 'test.txt' too!
(if 'text.txt' is not found:
just brake the loop and dont search in subfolders!)
now my function will look for 'test.txt'
and then, collect all folders to my folderslist:
if os.path.exists(os.path.join(path, 'test.txt')):
full_list = os.listdir(path)
folderslist = []
for folder in full_list:
if os.path.isfile(os.path.join(path, folder)) == 0:
folderslist.append(os.path.join(path, folder))
its working not bad, just not recurcive...
really dont know how to call the function again
with the same list, and force him to change the 'current path'...
not sure if 'list' is the best data struct for me to call with it again.
my goal is to make some opration's in every forlder on this list:
c:/test c:/test/1 c:/test/2 c:/test/3
but if there is more folders (that not contain 'test.txt' so, just dont add it to my folder list, and do not looking inside)
hope my fisrt post was clear enough :X
You can use os.walk to traverse the subfolders, and if test.txt is not found, clear the directory list so os.walk won't traverse its subfolders any further:
import os
folderslist = []
for root, dirs, files in os.walk('c:/test'):
if 'test.txt' in files:
folderslist.append(root)
else:
dirs.clear()

Making os.walk work in a non-standard way

I'm trying to do the following, in this order:
Use os.walk() to go down each directory.
Each directory has subfolders, but I'm only interested in the first subfolder. So the directory looks like:
/home/RawData/SubFolder1/SubFolder2
For example. I want, in RawData2, to have folders that stop at the SubFolder1 level.
The thing is, it seems like os.walk() goes down through ALL of the RawData folder, and I'm not certain how to make it stop.
The below is what I have so far - I've tried a number of other combinations of substituting variable dirs for root, or files, but that doesn't seem to get me what I want.
import os
for root, dirs, files in os.walk("/home/RawData"):
os.chdir("/home/RawData2/")
make_path("/home/RawData2/"+str(dirs))
I suggest you use glob instead.
As the help on glob describes:
glob(pathname)
Return a list of paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards a la
fnmatch. However, unlike fnmatch, filenames starting with a
dot are special cases that are not matched by '*' and '?'
patterns.
So, your pattern is every first level directory, which I think would be something like this:
/root_path/*/sub_folder1/sub_folder2
So, you start at your root, get everything in that first level, and then look for sub_folder1/sub_folder2. I think that works.
To put it all together:
from glob import glob
dirs = glob('/root_path/*/sub_folder1/sub_folder2')
# Then iterate for each path
for i in dirs:
print(i)
Beware: Documentation for os.walk says:
don’t change the current working directory between resumptions of walk(). walk() never changes the current directory, and assumes that its caller doesn’t either
so you should avoid os.chdir("/home/RawData2/") in the walk loop.
You can easily ask walk not to recurse by using topdown=True and clearing dirs:
for root, dirs, files in os.walk("/home/RawData", True):
for rep in dirs:
make_path(os.join("/home/RawData2/", rep )
# add processing here
del dirs[] # tell walk not to recurse in any sub directory

Visiting multiple folders with extensions

I'm working on something here, and I'm completely confused. Basically, I have the script in my directory, and that script has to run on multiple folders with a particular extension. Right now, I have it up and running on a single folder. Here's the structure, I have a main folder say, Python, inside that I have multiple folders all with the same .ext, and inside each sub-folder I again have few folders, inside which I have the working file.
Now, I want the script to visit the whole path say, we are inside the main folder 'python', inside which we have folder1.ext->sub-folder1->working-file, come out of this again go back to the main folder 'Python' and start visiting the second directory.
Now there are so many things in my head, the glob module, os.walk, or the for loop. I'm getting the logic wrong. I desperately need some help.
Say, Path=r'\path1'
How do I start about? Would greatly appreciate any help.
I'm not sure if this is what you want, but this main function with a recursive helper function gets a dictionary of all of the files in a main directory:
import os, os.path
def getFiles(path):
'''Gets all of the files in a directory'''
sub = os.listdir(path)
paths = {}
for p in sub:
print p
pDir = os.path.join(path, p)
if os.path.isdir(pDir):
paths.update(getAllFiles(pDir, paths))
else:
paths[p] = pDir
return paths
def getAllFiles(mainPath, paths = {}):
'''Helper function for getFiles(path)'''
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = os.path.join(path, p)
if os.path.isdir(pathDir):
paths.update(getAllFiles(pathDir, paths))
else:
paths[path] = pathDir
return paths
This returns a dictionary of the form {'my_file.txt': 'C:\User\Example\my_file.txt', ...}.
Since you distinguish first level directories from its sub-directories, you could do something like this:
# this is a generator to get all first level directories
dirs = (d for d in os.listdir(my_path) if os.path.isdir(d)
and os.path.splitext(d)[-1] == my_ext)
for d in dirs:
for root, sub_dirs, files in os.walk(d):
for f in files:
# call your script on each file f
You could use Formic (disclosure: I am the author). Formic allows you to specify one multi-directory glob to match your files so eliminating directory walking:
import formic
fileset = formic.FileSet(include="*.ext/*/working-file", directory=r"path1")
for file_name in fileset:
# Do something with file_name
A couple of points to note:
/*/ matches every subdirectory, while /**/ recursively descends into every subdirectory, their subdirectories and so on. Some options:
If the working file is precisely one directory below your *.ext, then use /*/
If the working file is at any depth under *.ext, then use /**/ instead.
If the working file is at least one directory, then you might use /*/**/
Formic starts searching in the current working directory. If this is the correct directory, you can omit the directory=r"path1"
I am assuming the working file is literally called working-file. If not, substitute a glob that matches it, like *.sh or script-*.

Categories

Resources