We just switched over our storage server to a new file system. The old file system allowed users to name folders with a period or space at the end. The new system considers this an illegal character. How can I write a python script to recursively loop through all directories and rename and folder that has a period or space at the end?
Use os.walk. Give it a root directory path and it will recursively iterate over it. Do something like
for root, dirs, files in os.walk('root path'):
for dir in dirs:
if dir.endswith(' ') or dir.endswith('.'):
os.rename(...)
EDIT:
We should actually rename the leaf directories first - here is the workaround:
alldirs = []
for root, dirs, files in os.walk('root path'):
for dir in dirs:
alldirs.append(os.path.join(root, dir))
# the following two lines make sure that leaf directories are renamed first
alldirs.sort()
alldirs.reverse()
for dir in alldirs:
if ...:
os.rename(...)
You can use os.listdir to list the folders and files on some path. This returns a list that you can iterate through. For each list entry, use os.path.join to combine the file/folder name with the parent path and then use os.path.isdir to check if it is a folder. If it is a folder then check the last character's validity and, if it is invalid, change the folder name using os.rename. Once the folder name has been corrected, you can repeat the whole process with that folder's full path as the base path. I would put the whole process into a recursive function.
Related
I have a folder structure as shown below
There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.
e.g.
DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder.
eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt
What I have done till now ?
I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.
import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".txt"):
path_file = os.path.join(root)
print(path_file)
list_dir.append(path_file)
The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.
import os
cwd = os.getcwd()
to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
if to_be_renamed == set():
to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
else:
to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])
for rootdir in next(os.walk(cwd))[1]:
subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
for s in subdirs:
if s in to_be_renamed:
srcpath = os.path.join(cwd, rootdir, s)
dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
# First rename files
for f in next(os.walk(srcpath))[2]:
os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
# Now rename dir
os.rename(srcpath, dstpath)
print('Renamed', s, 'and files')
Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).
Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.
os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.
So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.
If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.
NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.
I need to make a script that will iterate through all the directories inside a directory. It should go into each directory, get its name and save it to a variable and comes back out, and then loops.
for dir in os.walk(exDir):
path = dir
os.chdir(path)
source = #dir trimmed to anything after the last /
os.chdir("..")
loops
It needs to go into the directory to do other things not mentioned above. I've only just started Python and have been stuck on this problem for the last day or so.
For each iteration of your for loop, dir is a tuple of format (filepath, subdirectories, files). As such dir[0] will give you the filepath.
It sounds like you just want to os.chdir for each folder recursively in exDir in which case the following will work:
for dir in os.walk(exDir):
os.chdir(dir[0])
...
I know there are a lot of questions related to this, but I can't seem to find an answer that helps me solve the problem.
I'm using os.walk() to loop through subfolders in my main folder, which contains both folders and files.
Main Folder
Pass Folder
files.txt
Fail Folder
files.txt
file.txt
file2.txt
So I'm using this code to create a new text file based on the subfolder names. However this returns folder/.txt, which means that dirs is returning '/' and files is returning ['file.txt', 'file2.txt'].
for root, dirs, files in os.walk(path):
for dirs in root:
new_txt = 'folder%s.txt' % (dirs)
How do fix it so that dirs returns ['Main Folder/Pass Folder', 'Main Folder/Fail Folder'] and files returns the files in each folder?
I used something similar to this in my code recently (which, if I recall correctly, I also found on SO). Mine went something like this:
for (dirpath, subdirs, filelist) in os.walk(folder):
# join directories in here
From the documentation:
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
I'm not sure os.walk() does quite what you expect. I would suggest joining the directories together using os.path.join() to get what you want.
I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' donĀ“t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...
I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).
os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.
What would be the best method of getting sub directories of a drive including files located within them? Would it be best to use os.listdir() and filter out directories from files by checking if they have a '.' in them?
Any ideas would be helpful, and i would much prefer that i use only the standard library for this task.
Take a look at os.walk(), it allows you to visit each directory and get a list of files and a list of sub directories for each directory that you visit.
Here is how you could only go down a single level:
for root, dirs, files in os.walk(path):
# do whatever you want to with dirs and files
if root != path:
# one level down, modify dirs in place so we don't go any deeper
del dirs[:]