python storing path names with forward vs backward slash - python

I have a procedure that os.walks a directory and its subdirectories to filter pdf files, separating out their names and corresponding pathnames. The issue I am having is that it will scan the topmost directory and print the appropriate filename e.g. G:/Books/Title.Pdf but the second it scans a subfolder e.g G:/Books/Sub Folder/Title.pdf it will print the following
G:/Books/Sub Folder\\Title.Pdf
(which is obviously an invalid path name). It will also add \\ to any subfolders within subfolders.
Below is the procedure:
def dicitonary_list():
indexlist=[] #holds all files in the given directory including subfolders
pdf_filenames=[] #holds list of all pdf filenames in indexlist
pdf_dir_list = [] #holds path names to indvidual pdf files
for root, dirs,files in os.walk('G:/Books/'):
for name in files:
indexlist.append(root + name)
if ".pdf" in name[-5:]:
pdf_filenames.append(name)
for files in indexlist:
if ".pdf" in files[-5:]:
pdf_dir_list.append(files)
dictionary=dict(zip(pdf_filenames, pdf_dir_list)) #maps the pdf names to their directory address
I know it's something simple that I am missing but for love nor money can i see what it is. A fresh pair of eyes would help greatly!

Forward slashes and backward slashes are both perfectly valid path separators in Python on Windows.
>>> import os
>>> os.getcwd()
'j:\\RpmV'
>>> os.path.exists('j:\\Rpmv\\make.py')
True
>>> os.path.exists('j:/rpmv/make.py')
True
>>> os.path.isfile('j:\\Rpmv/make.py')
True

Related

Finding duplicate folders and renaming them by prefixing parent folder name in python

I have a folder structure as shown below
There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.
e.g.
DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder.
eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt
What I have done till now ?
I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.
import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".txt"):
path_file = os.path.join(root)
print(path_file)
list_dir.append(path_file)
The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.
import os
cwd = os.getcwd()
to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
if to_be_renamed == set():
to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
else:
to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])
for rootdir in next(os.walk(cwd))[1]:
subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
for s in subdirs:
if s in to_be_renamed:
srcpath = os.path.join(cwd, rootdir, s)
dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
# First rename files
for f in next(os.walk(srcpath))[2]:
os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
# Now rename dir
os.rename(srcpath, dstpath)
print('Renamed', s, 'and files')
Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).
Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.
os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.
So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.
If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.
NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.

Renaming files and folders with the os.walk in Python 3

I am trying to rename all files and folders in a given directory. Id like to replace a space with a hyphen and then rename all to lowercase. I am stuck with the code below. When os.rename is commented out, the print function returns all files as expected, however when I uncomment os.rename I get an error stating that file XYZ -> x-y-z cant be found.
import os
folder = r"C:\Users\Tim\Documents\storage"
space = " "
hyphen = "-"
for root, dirs, files in os.walk(folder):
for file in files:
if space in file:
newFilename = filename.replace(space, hyphen).lower()
os.rename(file, newFilename)
print(newFilename)
Obvioulsy this is just for the files but I'd like to apply the same logic to folders too. Any help would be greately appreciated. Pretty new at Python so this is a little beyond me! Thanks so much.
os.rename() resolves relative file paths (paths that doesn't start with a / in Linux/Mac or a drive letter in Windows) relative to the current working directory.
You'll want to os.path.join() the names with root before passing it to os.rename(), as otherwise rename would look for the file with that name in the current working directory instead of in the original folder.
So it should be:
os.rename(os.path.join(root, file), os.path.join(root, newFilename))

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

recursive script to rename folders ending with a space or period

We just switched over our storage server to a new file system. The old file system allowed users to name folders with a period or space at the end. The new system considers this an illegal character. How can I write a python script to recursively loop through all directories and rename and folder that has a period or space at the end?
Use os.walk. Give it a root directory path and it will recursively iterate over it. Do something like
for root, dirs, files in os.walk('root path'):
for dir in dirs:
if dir.endswith(' ') or dir.endswith('.'):
os.rename(...)
EDIT:
We should actually rename the leaf directories first - here is the workaround:
alldirs = []
for root, dirs, files in os.walk('root path'):
for dir in dirs:
alldirs.append(os.path.join(root, dir))
# the following two lines make sure that leaf directories are renamed first
alldirs.sort()
alldirs.reverse()
for dir in alldirs:
if ...:
os.rename(...)
You can use os.listdir to list the folders and files on some path. This returns a list that you can iterate through. For each list entry, use os.path.join to combine the file/folder name with the parent path and then use os.path.isdir to check if it is a folder. If it is a folder then check the last character's validity and, if it is invalid, change the folder name using os.rename. Once the folder name has been corrected, you can repeat the whole process with that folder's full path as the base path. I would put the whole process into a recursive function.

Copy multiple files with control

I want to copy multiple files in one directory and copy and rename the file in increments of 500. For example the first 500 files in C:\Pics (with random original names) will be renamed 500-1000 and the new directory they are placed in is called 500…….files 1000-1500 would go into directory 1000 and so on.
The current code does not rename the files put instead puts it in a new directory with the correct number. This was just a start. I believe the code below Is a good start can anyone help me modify to get the results desired?
import os, glob
target = 'C:\Pics'
prefix = 'p0'
os.chdir(target)
allfiles = os.listdir(target)
count = 500
for filename in allfiles:
if not glob.glob('*.jpg'): continue
dirname = prefix + str(count)
target = os.path.join(dirname, filename)
os.renames(filename, target)
count +=1
os.listdir and glob.glob are similar functions. They both return lists of files/dirs, so they don't belong in the same loop (at least not the way you're trying to use them). The main difference is that os.listdir just takes a directory and returns basically *.* from it (minus . and ..), where as glob.glob expects a "globbing pattern" which can contain * ? [] in a restricted regex format. The function you might be thinking of here (instead of glob.glob) is fnmatch.fnmatch, which applies a globbing pattern to a single file name.
os.listdir(path)
Return a list containing the names of the entries in the directory
given by path. The list is in arbitrary order. It does not include the
special entries '.' and '..' even if they are present in the
directory.
Availability: Unix, Windows.
Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a Unicode object, the result > will be a list of Unicode objects. Undecodable filenames will still be returned as string
objects.
glob.glob(pathname)
Return a possibly-empty list of path names that
match pathname, which must be a string containing a path
specification. pathname can be either absolute (like
/usr/src/Python-1.5/Makefile) or relative (like ../../Tools//.gif),
and can contain shell-style wildcards. Broken symlinks are included in
the results (as in the shell).
Sorry, too lazy to actually mock up files and test this, but then I'd be doing all the work for you. But this should work (or be a darn close to what I think you're aiming at). ;)
import os
import fnmatch
import os.path
target = 'C:\Pics'
os.chdir(target)
allfiles = os.listdir(target)
count = 500
for filename in allfiles:
if not fnmatch.fnmatch(filename, '*.jpg'):
continue
if count % 500 == 0:
dirname = 'p%04d' % count
if not os.path.exists(dirname):
os.mkdir(dirname)
target = os.path.join(dirname, '%d.jpg' % count)
os.rename(filename, target)
count += 1

Categories

Resources