Python - Check for exact string in file name

Python - Check for exact string in file name - python

I have a folder where each file is named after a number (i.e. img 1, img 2, img-3, 4-img, etc). I want to get files by exact string (so if I enter '4' as an input, it should only return files with '4' and not any files containing '14' or 40', for example. My problem is that the program returns all files as long as it matches the string. Note, the numbers aren't always in the same spot (for same files its at the end, for others it's in the middle)
For instance, if my folder has the files ['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4', 'ep.4.', 'ep.4 ', 'ep. 4. ',ep4xxx, 'ep 4 ', '404ep'],and I want only files with the exact number 4 in them, then I would only want to return ['ep 4', 'img4', '4xxx','file 4.mp4','ep.4.','ep.4 ', 'ep. 4. ',ep4xxx,'ep 4 ','404ep]
here is what I have (in this case I only want to return all mp4 file type)
for (root, dirs, file) in os.walk(source_folder):
for f in file:
if '.mp4' and ('4') in f:
print(f)
Tried == instead of in

Judging by your inputs, your desired regular expression needs to meet the following criteria:
Match the number provided, exactly
Ignore number matches in the file extension, if present
Handle file names that include spaces
I think this will meet all these requirements:
def generate(n):
return re.compile(r'^[^.\d]*' + str(n) + r'[^.\d]*(\..*)?$')
def check_files(n, files):
regex = generate(n)
return [f for f in files if regex.fullmatch(f)]
Usage:
>>> check_files(4, ['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4'])
['ep 4', 'img4', '4xxx', 'file 4.mp4']
Note that this solution involves creating a Pattern object and using that object to check each file. This strategy offers a performance benefit over calling re.fullmatch with the pattern and filename directly, as the pattern does not have to be compiled for each call.
This solution does have one drawback: it assumes that filenames are formatted as name.extension and that the value you're searching for is in the name part. Because of the greedy nature of regular expressions, if you allow for file names with . then you won't be able to exclude extensions from the search. Ergo, modifying this to match ep.4 would also cause it to match file.mp4. That being said, there is a workaround for this, which is to strip extensions from the file name before doing the match:
def generate(n):
return re.compile(r'^[^\d]*' + str(n) + r'[^\d]*$')
def strip_extension(f):
return f.removesuffix('.mp4')
def check_files(n, files):
regex = generate(n)
return [f for f in files if regex.fullmatch(strip_extension(f))]
Note that this solution now includes the . in the match condition and does not exclude an extension. Instead, it relies on preprocessing (the strip_extension function) to remove any file extensions from the filename before matching.
As an addendum, occasionally you'll get files have the number prefixed with zeroes (ex. 004, 0001, etc.). You can modify the regular expression to handle this case as well:
def generate(n):
return re.compile(r'^[^\d]*0*' + str(n) + r'[^\d]*$')

We can use re.search along with a list comprehension for a regex option:
files = ['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4']
num = 4
regex = r'(?<!\d)' + str(num) + r'(?!\d)'
output = [f for f in files if re.search(regex, f)]
print(output) # ['ep 4', 'img4', '4xxx', 'file.mp4', 'file 4.mp4']

this can be accomplished with the following function
import os
files = ["ep 4", "xxx 3 ", "img4", "4xxx", "ep-40", "file.mp4", "file 4.mp4"]
desired_output = ["ep 4", "img4", "4xxx", "file 4.mp4"]
def number_filter(files, number):
filtered_files = []
for file_name in files:
# if the number is not present, we can skip this file
if file_name.count(str(number)) == 0:
continue
# if the number is present in the extension, but not in the file name, we can skip this file
name, ext = os.path.splitext(file_name)
if (
isinstance(ext, str)
and ext.count(str(number)) > 0
and isinstance(name, str)
and name.count(str(number)) == 0
):
continue
# if the number is preseent in the file name, we must determine if it's part of a different number
num_index = file_name.index(str(number))
# if the number is at the beginning of the file name
if num_index == 0:
# check if the next character is a digit
if file_name[num_index + len(str(number))].isdigit():
continue
# if the number is at the end of the file name
elif num_index == len(file_name) - len(str(number)):
# check if the previous character is a digit
if file_name[num_index - 1].isdigit():
continue
# if it's somewhere in the middle
else:
# check if the previous and next characters are digits
if (
file_name[num_index - 1].isdigit()
or file_name[num_index + len(str(number))].isdigit()
):
continue
print(file_name)
filtered_files.append(file_name)
return filtered_files
output = number_filter(files, 4)
for file in output:
assert file in desired_output
for file in desired_output:
assert file in output

Related

python regex: Parsing file name

I have a text file (filenames.txt) that contains the file name with its file extension.
filename.txt
[AW] One Piece - 629 [1080P][Dub].mkv
EP.585.1080p.mp4
EP609.m4v
EP 610.m4v
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One_Piece_0745_Sons'_Cups!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One Piece - 621 1080P.mkv
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).
Example:
Input:
``````
EP609.m4v
EP 610.m4v
EP.585.1080p.mp4
One Piece - 621 1080P.mkv
[AW] One Piece - 629 [1080P][Dub].mkv
One_Piece_0745_Sons'_Cups!.mp4
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
Expected Output:
````````````````
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4 (or) 0745.mp4
696.mp4 (or) 0696.mp4
591.m4v
577.mp4
Hope someone will help me parse and rename these filenames. Thanks in advance!!!

As you tagged python, I guess you are willing to use python.
(Edit: I've realized a loop in my original code is unnecessary.)
import re
with open('filename.txt', 'r') as f:
files = f.read().splitlines() # read filenames
# assume: an episode comprises of 3 digits possibly preceded by 0
p = re.compile(r'0?(\d{3})')
for file in files:
if m := p.search(file):
print(m.group(1) + '.' + file.split('.')[-1])
else:
print(file)
This will output
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4
696.mp4
591.m4v
577.mp4
Basically, it searches for the first 3-digit number, possibly preceded by 0.
I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.
(Original answer:)
p = re.compile(r'\d{3,4}')
for file in files:
for m in p.finditer(file):
ep = m.group(0)
if int(ep) < 1000:
print(ep.lstrip('0') + '.' + file.split('.')[-1])
break # go to next file if ep found (avoid the else clause)
else: # if ep not found, just print the filename as is
print(file)

Program to parse episode number and renaming it.
Modules used:
re - To parse File Name
os - To rename File Name
full/path/to/folder - is the path to the folder where your file lives
import re
import os
for file in os.listdir(path="full/path/to/folder/"):
# searches for the first 3 or 4 digit number less than 1000 for each line.
for match_obj in re.finditer(r'\d{3,4}', file):
episode = match_obj.group(0)
if int(episode) < 1000:
new_filename = episode.lstrip('0') + '.' + file.split('.')[-1]
old_name = "full/path/to/folder/" + file
new_name = "full/path/to/folder/" + new_filename
os.rename(old_name, new_name)
# go to next file if ep found (avoid the else clause)
break
else:
# if episode not found, just leave the filename as it is
pass

input() and \n characters in python

I am trying to find and replace several lines of plain text in multiple files with input() but when I enter '\n' characters to represent where the new line chars would be in the text, it doesn't find it and doesn't replace it.
I tried to use raw_strings but couldn't get them to work.
Is this a job for regular expressions?
python 3.7
import os
import re
import time
start = time.time()
# enter path and check input for standard format
scan_folder = input('Enter the absolute path to scan:\n')
validate_path_regex = re.compile(r'[a-z,A-Z]:\\?(\\?\w*\\?)*')
mo = validate_path_regex.search(scan_folder)
if mo is None:
print('Path is not valid. Please re-enter path.\n')
import sys
sys.exit()
os.chdir(scan_folder)
# get find/replaceStrings, and then confirm that inputs are correct.
find_string = input('Enter the text you wish to find:\n')
replace_string = input('Enter the text to replace:\n')
permission = input('\nPlease confirm you want to replace '
+ find_string + ' with '
+ replace_string + ' in ' + scan_folder
+ ' directory.\n\nType "yes" to continue.\n')
if permission == 'yes':
change_count = 0
# Context manager for results file
with open('find_and_replace.txt', 'w') as results:
for root, subdirs, files in os.walk(scan_folder):
for file in files:
# ignore files that don't endwith '.mpr'
if os.path.join(root, file).endswith('.mpr'):
fullpath = os.path.join(root, file)
# context manager for each file opened
with open(fullpath, 'r+') as f:
text = f.read()
# only add to changeCount if find_string is in text
if find_string in text:
change_count += 1
# move cursor back to beginning of the file
f.seek(0)
f.write(text.replace(find_string, replace_string))
results.write(str(change_count)
+ ' files have been modified to replace '
+ find_string + ' with ' + replace_string + '.\n')
print('Done with replacement')
else:
print('Find and replace has not been executed')
end = time.time()
print('Program took ' + str(round((end - start), 4)) + ' secs to complete.\n')
find_string = BM="LS"\nTI="12"\nDU="7"
replace_string = BM="LSL"\nDU="7"
The original file looks like
BM="LS"
TI="12"
DU="7"
and I would like it to change to
BM="LSL"
DU="7"
but the file doesn't change.

So, the misconception you have is the distinction between source code, which understands escape sequences like "this is a string \n with two lines", and things like "raw strings" (a concept that doesn't make sense in this context) and the data your are providing as user input. The input function basically processes data coming in from the standard input device. When you provide data to standard input, it is being interpreted as a raw bytes and then the input function assumes its meant to be text (decoded using whatever your system setting imply). There are two approaches to allow a user to input newlines, the first is to use sys.stdin, however, this will require you to provide an EOF, probably using ctrl + D:
>>> import sys
>>> x = sys.stdin.read()
here is some text and i'm pressing return
to make a new line. now to stop input, press control d>>> x
"here is some text and i'm pressing return\nto make a new line. now to stop input, press control d"
>>> print(x)
here is some text and i'm pressing return
to make a new line. now to stop input, press control d
This is not very user-friendly. You have to either pass a newline and an EOF, i.e. return + ctrl + D or do ctrl + D twice, and this depends on the system, I believe.
A better approach would be to allow the user to input escape sequences, and then decode them yourself:
>>> x = input()
I want this to\nbe on two lines
>>> x
'I want this to\\nbe on two lines'
>>> print(x)
I want this to\nbe on two lines
>>> x.encode('utf8').decode('unicode_escape')
'I want this to\nbe on two lines'
>>> print(x.encode('utf8').decode('unicode_escape'))
I want this to
be on two lines
>>>

parsing array contents and adding the values

I have several files that end in ".log". Last but three lines contain the data of interest.
Example File contents (Last four lines. fourth line is blank):
Total: 150
Success: 120
Error: 30
I am reading these contents into an array and trying to find an elegant way to:
1)extract the numeric data for each category (Total, Success, Error). Error out if numeric data is not there in the second part
2)Add them all up
I came up with the following code (getLastXLines function excluded for brevity) that returns the aggregate:
def getSummaryData(testLogFolder):
(path, dirs, files) = os.walk(testLogFolder).next()
#aggregate = [grandTotal, successTotal, errorTotal]
aggregate = [0, 0, 0]
for currentFile in files:
fullNameFile = path + "\\" + currentFile
if currentFile.endswith(".log"):
with open(fullNameFile,"r") as fH:
linesOfInterest=getLastXLines(fH, 4)
#If the file doesn't contain expected number of lines
if len(linesOfInterest) != 4:
print fullNameFile + " doesn't contain the expected summary data"
else:
for count, line in enumerate(linesOfInterest[0:-1]):
results = line.split(': ')
if len(results)==2:
aggregate[count] += int(results[1])
else:
print "error with " + fullNameFile + " data. Not adding the total"
return aggregate
Being relatively new to python, and seeing the power of it, I feel there may be a more powerful and efficient way to do this. May be there is a short list comprehension to do this kind of stuff? Please help.

def getSummaryData(testLogFolder):
summary = {'Total':0, 'Success':0, 'Error':0}
(path, dirs, files) = os.walk(testLogFolder).next()
for currentFile in files:
fullNameFile = path + "\\" + currentFile
if currentFile.endswith(".log"):
with open(fullNameFile,"r") as fH:
for pair in [line.split(':') for line in fH.read().split('\n')[-5:-2]]:
try:
summary[pair[0].strip()] += int(pair[1].strip())
except ValueError:
print pair[1] + ' is not a number'
except KeyError:
print pair[0] + ' is not "Total", "Success", or "Error"'
return summary
Piece by peice:
fH.read().split('\n')[-5:-2]
Here we take the last 4 lines except the very last of the file
line.split(':') for line in
From those lines, we break by the colon
try:
summary[pair[0].strip()] += int(pair[1].strip())
Now we try to get a number from the second, and a key from the first and add to our total
except ValueError:
print pair[1] + ' is not a number'
except KeyError:
print pair[0] + ' is not "Total", "Success", or "Error"'
And if we find something that isn't a number, or a key that isn't what we are looking for, we print an error

Python Regex or Filename Function

Question about rename file name in folder. My file name looks like this:
EPG CRO 24 Kitchen 09.2013.xsl
With name space between, and I used code like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
With this code I removed white space, but how can I remove date, from file name so it can look like this: EPG_CRO_24_Kitche.xsl. Can you give me some solution about this.

Regex
As utdemir was eluding to, regular expressions can really help in situations like these. If you have never been exposed to them, it can be confusing at first. Checkout https://www.debuggex.com/r/4RR6ZVrLC_nKYs8g for a useful tool that helps you construct regular expressions.
Solution
An updated solution would be:
import re
def rename_file(filename):
if filename.startswith('EPG') and ' ' in filename:
# \s+ means 1 or more whitespace characters
# [0-9]{2} means exactly 2 characters of 0 through 9
# \. means find a '.' character
# [0-9]{4} means exactly 4 characters of 0 through 9
newfilename = re.sub("\s+[0-9]{2}\.[0-9]{4}", '', filename)
newfilename = newfilename.replace(" ","_")
os.rename(filename, newfilename)
Side Note
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
Unless I'm mistaken, the from the comment you made above, filename.find("2013|09 ") > 0 won't work.
Given the following:
In [76]: filename = "EPG CRO 24 Kitchen 09.2013.xsl"
In [77]: filename.find("2013|09 ")
Out[77]: -1
And your described comment, you might want something more like:
In [80]: if filename.startswith('EPG') and ' ' in filename:
....: print('process this')
....:
process this

If all file names have the same format: NAME_20XX_XX.xsl, then you can use python's list slicing instead of regex:
name.replace(' ','_')[:-12] + '.xsl'

If dates are always formatted same;
>>> s = "EPG CRO 24 Kitchen 09.2013.xsl"
>>> re.sub("\s+\d{2}\.\d{4}\..{3}$", "", s)
'EPG CRO 24 Kitchen'

How about little slicing:
newfilename = input1[:input1.rfind(" ")].replace(" ","_")+input1[input1.rfind("."):]

Automator/Applescript rename files if

I have a large list of images that have been misnamed by my artist. I was hoping to avoid giving him more work by using Automator but I'm new to it. Right now they're named in order what001a and what002a but that should be what001a and what001b. So basically odd numbered are A and even numbered at B. So i need a script that changes the even numbered to B images and renumbers them all to the proper sequential numbering. How would I go about writing that script?

A small Ruby script embedded in an AppleScript provides a very comfortable solution, allowing you to select the files to rename right in Finder and displaying an informative success or error message.
The algorithm renames files as follows:
number = first 3 digits in filename # e.g. "006"
letter = the letter following those digits # e.g. "a"
if number is even, change letter to its successor # e.g. "b"
number = (number + 1)/2 # 5 or 6 => 3
replace number and letter in filename
And here it is:
-- ask for files
set filesToRename to choose file with prompt "Select the files to rename" with multiple selections allowed
-- prepare ruby command
set ruby_script to "ruby -e \"s=ARGV[0]; m=s.match(/(\\d{3})(\\w)/); n=m[1].to_i; a=m[2]; a.succ! if n.even?; r=sprintf('%03d',(n+1)/2)+a; puts s.sub(/\\d{3}\\w/,r);\" "
tell application "Finder"
-- process files, record errors
set counter to 0
set errors to {}
repeat with f in filesToRename
try
do shell script ruby_script & (f's name as text)
set f's name to result
set counter to counter + 1
on error
copy (f's name as text) to the end of errors
end try
end repeat
-- display report
set msg to (counter as text) & " files renamed successfully!\n"
if errors is not {} then
set AppleScript's text item delimiters to "\n"
set msg to msg & "The following files could NOT be renamed:\n" & (errors as text)
set AppleScript's text item delimiters to ""
end if
display dialog msg
end tell
Note that it will fail when the filename contains spaces.

A friend of mine wrote a Python script to do what I needed. Figured I'd post it here as an answer for anyone stumbling upon a similar problem looking for help. It is in Python though so if anyone wants to convert it to AppleScript for those that may need it go for it.
import os
import re
import shutil
def toInt(str):
try:
return int(str)
except:
return 0
filePath = "./"
extension = "png"
dirList = os.listdir(filePath)
regx = re.compile("[0-9]+a")
for filename in dirList:
ext = filename[-len(extension):]
if(ext != extension): continue
rslts = regx.search(filename)
if(rslts == None): continue
pieces = regx.split(filename)
if(len(pieces) < 2): pieces.append("")
filenumber = toInt(rslts.group(0).rstrip("a"))
newFileNum = (filenumber + 1) / 2
fileChar = "b"
if(filenumber % 2): fileChar = "a"
newFileName = "%s%03d%s%s" % (pieces[0], newFileNum, fileChar, pieces[1])
shutil.move("%s%s" % (filePath, filename), "%s%s" % (filePath, newFileName))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Check for exact string in file name - python

Related

python regex: Parsing file name

input() and \n characters in python

parsing array contents and adding the values

Python Regex or Filename Function

Automator/Applescript rename files if

Categories

Resources