fnmatch working inconsistently in different scripts - python

I am working on a python script that will write input files for an analysis program I use. One of the steps is to take a list of filenames and search the input directory for them, open them, and get some information out of them. I wrote the following using os.walk and fnmatch in a test-script that has the directory of interest hard-coded in, and it worked just fine:
for locus in loci_select: # for each locus we'll include
print("Finding file " + locus)
for root, dirnames, filenames in os.walk('../phylip_wigeon_mid1_names'):
for filename in fnmatch.filter(filenames, locus): # look in the input directory
print("Found file for locus " + locus + " in set")
loci_file = open(os.path.join('../phylip_wigeon_mid1_names/', filename))
with loci_file as f:
for i, l in enumerate(f):
pass
count = (i) * 0.5 # how many individuals present
print(filename + "has sequences for " + str(count) + " individuals")
...and so on (the other bits all work, so I'll spare you).
As soon as I put this into the larger script and switch out the directory names for input arguments, though, it seems to stop working between the third and fourth lines, despite being nearly identical:
for locus in use_loci: # for each locus we'll include
log.info("Finding file " + locus)
for root, dirnames, filenames in os.walk(args.input_dir):
for filename in fnmatch.filter(filenames, locus): # look in the input directory
log.info("Found file for locus " + locus + " in set")
loci_file = open(os.path.join(args.input_dir, filename))
with loci_file as f:
for i, l in enumerate(f):
pass
count = (i) * 0.5 # how many individuals present
log.info(filename + "has sequences for " + str(count) + " individuals")
I've tested it with temporary print statements between the suspected lines, and it seems like they are the culprits, since my screen output looks like:
2015-11-17 15:53:20,505 - write_ima2p_input_file - INFO - Getting selected loci for analysis
2015-11-17 15:53:20,505 - write_ima2p_input_file - INFO - Finding file uce-7999_wigeon_mid1_contigs.phy
2015-11-17 15:53:20,629 - write_ima2p_input_file - INFO - Finding file uce-4686_wigeon_mid1_contigs.phy
2015-11-17 15:53:20,647 - write_ima2p_input_file - INFO - Finding file uce-5012_wigeon_mid1_contigs.phy
...and so on.
I've tried switching out to glob, as well as simple things like rearranging where this section falls in my larger code, but nothing is working. Any insight would be much appreciated!

Related

python regex: Parsing file name

I have a text file (filenames.txt) that contains the file name with its file extension.
filename.txt
[AW] One Piece - 629 [1080P][Dub].mkv
EP.585.1080p.mp4
EP609.m4v
EP 610.m4v
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One_Piece_0745_Sons'_Cups!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One Piece - 621 1080P.mkv
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).
Example:
Input:
``````
EP609.m4v
EP 610.m4v
EP.585.1080p.mp4
One Piece - 621 1080P.mkv
[AW] One Piece - 629 [1080P][Dub].mkv
One_Piece_0745_Sons'_Cups!.mp4
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
Expected Output:
````````````````
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4 (or) 0745.mp4
696.mp4 (or) 0696.mp4
591.m4v
577.mp4
Hope someone will help me parse and rename these filenames. Thanks in advance!!!
As you tagged python, I guess you are willing to use python.
(Edit: I've realized a loop in my original code is unnecessary.)
import re
with open('filename.txt', 'r') as f:
files = f.read().splitlines() # read filenames
# assume: an episode comprises of 3 digits possibly preceded by 0
p = re.compile(r'0?(\d{3})')
for file in files:
if m := p.search(file):
print(m.group(1) + '.' + file.split('.')[-1])
else:
print(file)
This will output
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4
696.mp4
591.m4v
577.mp4
Basically, it searches for the first 3-digit number, possibly preceded by 0.
I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.
(Original answer:)
p = re.compile(r'\d{3,4}')
for file in files:
for m in p.finditer(file):
ep = m.group(0)
if int(ep) < 1000:
print(ep.lstrip('0') + '.' + file.split('.')[-1])
break # go to next file if ep found (avoid the else clause)
else: # if ep not found, just print the filename as is
print(file)
Program to parse episode number and renaming it.
Modules used:
re - To parse File Name
os - To rename File Name
full/path/to/folder - is the path to the folder where your file lives
import re
import os
for file in os.listdir(path="full/path/to/folder/"):
# searches for the first 3 or 4 digit number less than 1000 for each line.
for match_obj in re.finditer(r'\d{3,4}', file):
episode = match_obj.group(0)
if int(episode) < 1000:
new_filename = episode.lstrip('0') + '.' + file.split('.')[-1]
old_name = "full/path/to/folder/" + file
new_name = "full/path/to/folder/" + new_filename
os.rename(old_name, new_name)
# go to next file if ep found (avoid the else clause)
break
else:
# if episode not found, just leave the filename as it is
pass

Bulk rename txt files with different parts using Python

I have a list of files that I wish to rename to.
Receipt ABC-001 623572349-1.txt --> Receipt ABC-001A.txt
Receipt ABC-001 623572349-2.txt --> Receipt ABC-001B.txt
However, even at the first step, everytime I get the following error "Cannot create a file when that file already exists:". What would be the best option to achieve the above outcome where files ending with 1 will become A; ending with 5.txt will become E.txt, and soforth?
Below is the code I have used:
import os, fnmatch
#Set directory of locataion; include double slash for each subfolder.
file_path = "C:\\Users\\Mr.Slowbro\\Desktop\\TBU\\"
#Set file extension accordingly
files_to_rename = fnmatch.filter(os.listdir(file_path), '*.txt')
for file_name in files_to_rename:
file_name_new = file_name[-5:5]
os.rename(file_path + file_name, file_path + file_name_new)
This should help you out. Using the ord() function returns the Unicode point of a character. So 'a' would be 97, 'b' would be 98, etc. Likewise, chr() returns the character of that Unicode point. So, I think the code below will help you with your issue.
#Set directory of locataion; include double slash for each subfolder.
file_path = "C:\\Users\\Mr.Slowbro\\Desktop\\TBU\\"
#Set file extension accordingly
files_to_rename = fnmatch.filter(os.listdir(file_path), '*.txt')
for file_name in files_to_rename:
number = chr(int(file_name[-5]) - 1 + ord('A'))
file_name_new = 'Receipt ABC-001' + number + '.txt'
os.rename(file_name, file_name_new)```

How to remove first n character of multiple file names in mac

I want to rename multiple files to that the first 9 characters are deleted.
example:
Before:
19.49.29 1
19.50.17 2
19.50.24 3
19.50.28 4
.
.
After that:
1
2
3
4
.
.
I tried using python but it screwed up my files and the orders:
import os
folderPath = r'/Users/**myusername**/Desktop/FOLDER'
fileNumber = 1
for filename in os.listdir(folderPath):
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
fileNumber +=1
maybe there's a way using terminal or anything else?
With zsh (which the OP included as a tag)
% autoload zmv
% zmv '* (*)' '$1'
This will treat each filename as a space-separated pair of words, and use the second word as the new name for each file.
If you really need the condition to be "drop the first nine characters", then
% zmv '?????????(*)' '$1'
If you're set on using python3, you can simply use the slicing feature of strings (as they're all arrays) and just keep the 9 leftmost characters like this:
filename = "12.23.34 1.jpeg"
print(filename[9:])
This will start at the 9th character(1) and spit out the rest so you would have "1.jpeg". So in your code, if we assume that ALL your images are 10 characters long (eg: "12.23.34 1.jpeg") the line you had:
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
can be changed to:
os.rename(folderPath + '//' + filename, folderPath + '/' + filename[9:])

Output in stdout differs in cmd and Python console

I'm new to Python and working on a little program that copies all files of given extension from a folder and it's subfolders to an another directory. Recently I added a simple progress bar and a counter of remaining files.
The problem is that when I run it from cmd and counter comes from say 1000 to 999 cmd adds a zero in the place of a last digit instead of space. Moreover, when the program is finished remaining files counter should be substituted by the word "Done." and it also doesn't work well.
I tried to replace sys.stdout.write with print and tried not to use f-strings, the result is the same.
def show_progress_bar(total, counter=0, length=80):
percent = round(100 * (counter / total))
filled_length = int(length * counter // total)
bar = '=' * filled_length + '-' * (length - filled_length)
if counter < total:
suffix = f'Files left: {total - counter}'
else:
suffix = 'Done.'
sys.stdout.write(f'\rProgress: |{bar}| {percent}% {suffix}')
sys.stdout.flush()
def selective_copy(source, destination, extension):
global counter
show_progress_bar(total)
for foldername, subfolders, filenames in os.walk(source):
for filename in filenames:
if filename.endswith(extension):
if not os.path.exists(os.path.join(destination, filename)):
shutil.copy(os.path.join(foldername, filename), os.path.join(destination, filename))
else:
new_filename = f'{os.path.basename(foldername)}_{filename}'
shutil.copy(os.path.join(foldername, filename), os.path.join(destination, new_filename))
counter += 1
show_progress_bar(total, counter)
I expected that the output in cmd will be the same as in the console, which is this:
Program running:
Progress: |=========-----------------------------------------------------------------------| 12% Files left: 976
Program finished:
Progress: |================================================================================| 100% Done.
But in the cmd I got this:
Program running:
Progress: |=========-----------------------------------------------------------------------| 12% Files left: 9760
Program finished:
Progress: |================================================================================| 100% Done. left: 100
Typically, printing "\r" will return the cursor to the beginning of the line, but it won't erase anything already written. So if you write "1000" followed by "\r" followed by "999", the last 0 of "1000" will still be visible.
(I'm not sure why this isn't happening in your Python console. Maybe it interprets "\r" in a different way. Hard to say without knowing exactly what software you're running.)
One solution is to print a couple of spaces after your output to ensure that slightly longer old messages get overwritten. You can probably get away with just one space for your "Files left:" suffix, since that only decreases by one character at most, but the "done" suffix will need more.
if counter < total:
suffix = f'Files left: {total - counter} '
else:
suffix = 'Done. '

Automator/Applescript rename files if

I have a large list of images that have been misnamed by my artist. I was hoping to avoid giving him more work by using Automator but I'm new to it. Right now they're named in order what001a and what002a but that should be what001a and what001b. So basically odd numbered are A and even numbered at B. So i need a script that changes the even numbered to B images and renumbers them all to the proper sequential numbering. How would I go about writing that script?
A small Ruby script embedded in an AppleScript provides a very comfortable solution, allowing you to select the files to rename right in Finder and displaying an informative success or error message.
The algorithm renames files as follows:
number = first 3 digits in filename # e.g. "006"
letter = the letter following those digits # e.g. "a"
if number is even, change letter to its successor # e.g. "b"
number = (number + 1)/2 # 5 or 6 => 3
replace number and letter in filename
And here it is:
-- ask for files
set filesToRename to choose file with prompt "Select the files to rename" with multiple selections allowed
-- prepare ruby command
set ruby_script to "ruby -e \"s=ARGV[0]; m=s.match(/(\\d{3})(\\w)/); n=m[1].to_i; a=m[2]; a.succ! if n.even?; r=sprintf('%03d',(n+1)/2)+a; puts s.sub(/\\d{3}\\w/,r);\" "
tell application "Finder"
-- process files, record errors
set counter to 0
set errors to {}
repeat with f in filesToRename
try
do shell script ruby_script & (f's name as text)
set f's name to result
set counter to counter + 1
on error
copy (f's name as text) to the end of errors
end try
end repeat
-- display report
set msg to (counter as text) & " files renamed successfully!\n"
if errors is not {} then
set AppleScript's text item delimiters to "\n"
set msg to msg & "The following files could NOT be renamed:\n" & (errors as text)
set AppleScript's text item delimiters to ""
end if
display dialog msg
end tell
Note that it will fail when the filename contains spaces.
A friend of mine wrote a Python script to do what I needed. Figured I'd post it here as an answer for anyone stumbling upon a similar problem looking for help. It is in Python though so if anyone wants to convert it to AppleScript for those that may need it go for it.
import os
import re
import shutil
def toInt(str):
try:
return int(str)
except:
return 0
filePath = "./"
extension = "png"
dirList = os.listdir(filePath)
regx = re.compile("[0-9]+a")
for filename in dirList:
ext = filename[-len(extension):]
if(ext != extension): continue
rslts = regx.search(filename)
if(rslts == None): continue
pieces = regx.split(filename)
if(len(pieces) < 2): pieces.append("")
filenumber = toInt(rslts.group(0).rstrip("a"))
newFileNum = (filenumber + 1) / 2
fileChar = "b"
if(filenumber % 2): fileChar = "a"
newFileName = "%s%03d%s%s" % (pieces[0], newFileNum, fileChar, pieces[1])
shutil.move("%s%s" % (filePath, filename), "%s%s" % (filePath, newFileName))

Categories

Resources