I need to apply a new naming convention to files across a lot of subdirectories. For example, the files in one subdirectory might be:
ABC (E) String of Text.txt
ABC (E) String of Text.ocr.txt
ABC (E) String of Text.pdf
They need to all be renamed to follow this convention:
ABC String of Text (E).txt
ABC String of Text (E).ocr.txt
ABC String of Text (E).pdf
Here's what I've got so far...
import os, re
regex = re.compile('\s\([a-zA-Z]+\)')
path = os.path.expanduser('~/Google Drive/Directory/Subdirectory/')
for files in os.walk(path):
for name in files:
strname = str(name)
oldName = os.path.join(path,strname)
if(regex.search(strname)):
# identifying the token that needs shuffling
token = regex.findall(oldName)
# remove the token
removed = (regex.split(oldName)[0] + ' ' +
regex.split(oldName)[1].strip())
print removed # this is where everything goes wrong
# remove the file extension
split = removed.split('.')
# insert the token at the end of the filename
reformatted = split[0] + token[0]
# reinsert the file extension
for i in range(1,len(split)):
reformatted += '.' + split[i]
os.rename(oldName,reformatted)
It ends up trying to rename the files by pulling a substring from a list of files in the directory, but includes list-related characters like "[" and "'", resulting in WindowsError: [Error 3] The system cannot find the path specified.
Example:
C:\Users\Me/Google Drive/Directory/Subdirectory/['ABC String of Text.txt', 'ABC
My hope is that someone can see what I'm trying to accomplish and point me in the right direction.
Your problem is with os.walk which doesn't do what you want for the way you're using it: see https://docs.python.org/2/library/os.html#os.walk
Generate the file names in a directory tree by walking the tree either
top-down or bottom-up. For each directory in the tree rooted at
directory top (including top itself), it yields a 3-tuple (dirpath,
dirnames, filenames).
Perhaps you mean to do something like:
for (dirpath, dirnames, filenames) in os.walk(path):
for filename in filenames:
oldName = os.path.join(dirpath, filename)
...
Related
I have multiple text files with names containing 6 groups of period-separated digits matching the pattern year.month.day.hour.minute.second.
I want to add a .txt suffix to these files to make them easier to open as text files.
I tried the following code and I I tried with os.rename without success:
Question
How can I add .txt to the end of these file names?
path = os.chdir('realpath')
for f in os.listdir():
file_name = os.path.splitext(f)
name = file_name +tuple(['.txt'])
print(name)
You have many problems in your script. You should read each method's documentation before using it. Here are some of your mistakes:
os.chdir('realpath') - Do you really want to go to the reapath directory?
os.listdir(): − Missing argument, you need to feed a path to listdir.
print(name) - This will print the new filename, not actually rename the file.
Here is a script that uses a regex to find files whose names are made of 6 groups of digits (corresponding to your pattern year.month.day.hour.minute.second) in the current directory, then adds the .txt suffix to those files with os.rename:
import os
import re
regex = re.compile("[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+")
for filename in os.listdir("."):
if regex.match(filename):
os.rename(filename, filename + ".txt")
I'm, trying to write a simple program to batch rename files in a folder.
file format:
11170_tcd001-20160824-094716.txt
11170_tcd001-20160824-094716.rst
11170_tcd001-20160824-094716.raw
I have 48 of the above with a different 14 digit character configuration after the first "-".
My final goal is to convert the above to:
11170_tcd001.txt
11170_tcd001.rst
11170_tcd001.raw
I know it's possible to os.rename files in python. However, I can't figure out how to batch rename multiple files with a different character configuration.
Is this possible?
some pseudocode below of what I would like to achieve.
import os
pathiter = (os.path.join(root, filename)
for root, _, filenames in os.walk(folder)
for filename in filenames
)
for path in pathiter:
newname = path.replace('14 digits.txt', ' 0 digits.txt')
if newname != path:
os.rename(path,newname)
If you are looking for a non-regex approach and considering your files all match that particular pattern you are expecting, what you can do first is get the extension of the file using splitext:
from os.path import splitext
file_name = '11170_tcd001-20160824-094716.txt'
extension = splitext(file_name)[1]
print(extension) # outputs: .txt
Then, with the extension in hand, split the file_name on the - and get the first item since you know that is the part that you want to keep:
new_filename = file_name.split('-')[0]
print(new_filename) # 11170_tcd001
Now, append the extension:
new_filename = new_filename + extension
print(new_filename) # 11170_tcd001.txt
Now you can proceed with the rename:
os.rename(file_name, new_filename)
You should probably try using regular expressions, like
import re
<...>
newfilename = re.sub(r'-\d{8}-\d{6}\b', '', oldfilename)
<...>
This will replace any 'hyphen, 8 digits, hyphen, 6 digits' not followed by letter, digit or underscore with empty string in your filename. Hope I got you right.
I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.
I have several (n=1,030) CAD drawing files (.dwg) spread across 51 subdirectories that have the following file naming convention:
(a) 0000-0n-0n.dwg
which needs to be changed to:
(b) _0000_0n_0n.dwg
The original file names (a) comprise three strings, each separated by dashes (-), namely:
a fixed four numeral prefix, followed by n > 1 alphanumeric, then another n> 1 alphanumeric, ending with the .dwg file extension.
The renamed files (b) should preserve these three strings described above,
but prefix the file name with an underscore and replace the current dashes with underscores as well.
My assumption is that the script works recursively form the parent directory on all .dwg files
I've tried using an os.rename() function but I think I need to put the (a) files into a list and
split them before possibly writing new files with the renaming convention of (b).
If anyone is wondering where this is going - I want these CAD files renamed so they can undergo
a conversion to ESRI feature class format (not shape files), and their geo-database doesn't like
feature class names beginning with numerals (thus the _ prefix), nor does it like dashes.
The following code should do. But test it before run please, I just tested the regex expression here, not the whole program.
import re
import sys, os
targetfolder = <your CAD file root folder>
for root, dirs, files in os.walk(targetfolder):
for f in files:
if os.path.splitext(f)[1] == ".dwg":
p = re.compile(r'(?P<prefix>\d+)-(?P<mid>\w+)-(?P<last>\w+).dwg')
m = p.match(f)
if m:
newf = '_' + m.group('prefix') + '_' + m.group('mid') + '_' + m.group('last') + '.dwg'
newfile = os.path.join(root, newf)
os.rename (os.path.join(root,f), newfile)
you don't need to use regular expressions; here is a working example:
import sys, os
top = "C:\Users\Philip\AppData\Local\Temp" # use your own top level directory
os.chdir(top)
for root, dirs, files in os.walk(top):
for f in files:
if f.lower().endswith(".dwg"):
old = root + "\\" + f
new = root + "\\_" + f.replace("-","_")
os.rename(old,new)
I have a procedure that os.walks a directory and its subdirectories to filter pdf files, separating out their names and corresponding pathnames. The issue I am having is that it will scan the topmost directory and print the appropriate filename e.g. G:/Books/Title.Pdf but the second it scans a subfolder e.g G:/Books/Sub Folder/Title.pdf it will print the following
G:/Books/Sub Folder\\Title.Pdf
(which is obviously an invalid path name). It will also add \\ to any subfolders within subfolders.
Below is the procedure:
def dicitonary_list():
indexlist=[] #holds all files in the given directory including subfolders
pdf_filenames=[] #holds list of all pdf filenames in indexlist
pdf_dir_list = [] #holds path names to indvidual pdf files
for root, dirs,files in os.walk('G:/Books/'):
for name in files:
indexlist.append(root + name)
if ".pdf" in name[-5:]:
pdf_filenames.append(name)
for files in indexlist:
if ".pdf" in files[-5:]:
pdf_dir_list.append(files)
dictionary=dict(zip(pdf_filenames, pdf_dir_list)) #maps the pdf names to their directory address
I know it's something simple that I am missing but for love nor money can i see what it is. A fresh pair of eyes would help greatly!
Forward slashes and backward slashes are both perfectly valid path separators in Python on Windows.
>>> import os
>>> os.getcwd()
'j:\\RpmV'
>>> os.path.exists('j:\\Rpmv\\make.py')
True
>>> os.path.exists('j:/rpmv/make.py')
True
>>> os.path.isfile('j:\\Rpmv/make.py')
True