Python Regex or Filename Function

Python Regex or Filename Function - python

Question about rename file name in folder. My file name looks like this:
EPG CRO 24 Kitchen 09.2013.xsl
With name space between, and I used code like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
With this code I removed white space, but how can I remove date, from file name so it can look like this: EPG_CRO_24_Kitche.xsl. Can you give me some solution about this.

Regex
As utdemir was eluding to, regular expressions can really help in situations like these. If you have never been exposed to them, it can be confusing at first. Checkout https://www.debuggex.com/r/4RR6ZVrLC_nKYs8g for a useful tool that helps you construct regular expressions.
Solution
An updated solution would be:
import re
def rename_file(filename):
if filename.startswith('EPG') and ' ' in filename:
# \s+ means 1 or more whitespace characters
# [0-9]{2} means exactly 2 characters of 0 through 9
# \. means find a '.' character
# [0-9]{4} means exactly 4 characters of 0 through 9
newfilename = re.sub("\s+[0-9]{2}\.[0-9]{4}", '', filename)
newfilename = newfilename.replace(" ","_")
os.rename(filename, newfilename)
Side Note
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
Unless I'm mistaken, the from the comment you made above, filename.find("2013|09 ") > 0 won't work.
Given the following:
In [76]: filename = "EPG CRO 24 Kitchen 09.2013.xsl"
In [77]: filename.find("2013|09 ")
Out[77]: -1
And your described comment, you might want something more like:
In [80]: if filename.startswith('EPG') and ' ' in filename:
....: print('process this')
....:
process this

If all file names have the same format: NAME_20XX_XX.xsl, then you can use python's list slicing instead of regex:
name.replace(' ','_')[:-12] + '.xsl'

If dates are always formatted same;
>>> s = "EPG CRO 24 Kitchen 09.2013.xsl"
>>> re.sub("\s+\d{2}\.\d{4}\..{3}$", "", s)
'EPG CRO 24 Kitchen'

How about little slicing:
newfilename = input1[:input1.rfind(" ")].replace(" ","_")+input1[input1.rfind("."):]

Related

Add part of line found after #solution to file

I have a script that puts the line that starts with #Solution 1 in a new file together with the name of the input file. But I want to add the piece belonging to Major from the input file. Can someone please help me to figure out how to get the piece of text?
The script now:
#!/usr/bin/env python3
import os
dr = "/home/nwalraven/Result_pgx/Runfolder/Runres_Aldy" outdr = "/home/nwalraven/Result_pgx/Runfolder/Aldy_res_txt" tag = ".aldy"
for f in os.listdir(dr):
if f.endswith(tag):
print(f)
new_file_name = f.split('_')[0]+'.txt' # get the name of the file before the '_' and add '.txt' to it
with open(dr+"/"+f) as file:
for line in file.readlines():
f
if line.startswith("#Solution 1"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(f.split('.')[0] + "\n")
new_file.write(line + "\n")
if line.startswith("#Solution 2"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(line + "\n")
print("Meerdere oplossingen gevonden! Check Aldy bestand" )
The input:
file = EMQN3-S3_COMT.aldy
#Sample Gene SolutionID Major Minor Copy Allele Location Type Coverage Effect dbSNP Code Status
#Solution 1: *Met, *ValB
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 0 Met 19950234 C>T 530 H62= rs4633
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 0 Met 19951270 G>A 651 V158M rs4680
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 1 ValB
file = EMQN3-S3_CYP2B6.aldy
#Sample Gene SolutionID Major Minor Copy Allele Location Type Coverage Effect dbSNP Code Status
#Solution 1: *1.001, *1.001
EMQN3-S3 CYP2B6 1 *1/*1 1.001;1.001 0 1.001
EMQN3-S3 CYP2B6 1 *1/*1 1.001;1.001 1 1.001
The result it gives right now:
EMQN3-S3_COMT.aldy
#Solution 1: *Met, *ValB
EMQN3-S3_CYP2B6.aldy
#Solution 1: *1.001, *1.001
The result I need:
EMQN3-S3_COMT.aldy
#Solution 1: *Met/*ValB
EMQN3-S3_CYP2B6.aldy
#Solution 1: *1/*1

If you print out the line, you could use regular expression to replace text before printing the line.
On the other hand, if you know it always starts with a fixed number of chars, then it's easier and faster to edit the line manually.
With regex:
# Importing regular expressions
import re
# Setting up regex replacement to replace ", " with "/"
regex = "\, "
replacement = "/"
...
# Format the line before printing it
line_formatted = re.sub(regex, replacement, line)
new_file.write(line.replace(regex, replacement) + "\n") # edited
...

Try to replace this part of your script:
...
if line.startswith("#Solution 1"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(f.split('.')[0] + "\n")
solution = "/".join([x.strip().split(".")[0] for x in line.split(",")])
new_file.write(solution + "\n")
...
It will do the following:
split the string into two tokens, based on the comma
strip them
remove the decimal part (if any) from the token
rejoin the string using the slash.
Hope it helps.

python regex: Parsing file name

I have a text file (filenames.txt) that contains the file name with its file extension.
filename.txt
[AW] One Piece - 629 [1080P][Dub].mkv
EP.585.1080p.mp4
EP609.m4v
EP 610.m4v
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One_Piece_0745_Sons'_Cups!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One Piece - 621 1080P.mkv
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).
Example:
Input:
``````
EP609.m4v
EP 610.m4v
EP.585.1080p.mp4
One Piece - 621 1080P.mkv
[AW] One Piece - 629 [1080P][Dub].mkv
One_Piece_0745_Sons'_Cups!.mp4
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
Expected Output:
````````````````
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4 (or) 0745.mp4
696.mp4 (or) 0696.mp4
591.m4v
577.mp4
Hope someone will help me parse and rename these filenames. Thanks in advance!!!

As you tagged python, I guess you are willing to use python.
(Edit: I've realized a loop in my original code is unnecessary.)
import re
with open('filename.txt', 'r') as f:
files = f.read().splitlines() # read filenames
# assume: an episode comprises of 3 digits possibly preceded by 0
p = re.compile(r'0?(\d{3})')
for file in files:
if m := p.search(file):
print(m.group(1) + '.' + file.split('.')[-1])
else:
print(file)
This will output
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4
696.mp4
591.m4v
577.mp4
Basically, it searches for the first 3-digit number, possibly preceded by 0.
I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.
(Original answer:)
p = re.compile(r'\d{3,4}')
for file in files:
for m in p.finditer(file):
ep = m.group(0)
if int(ep) < 1000:
print(ep.lstrip('0') + '.' + file.split('.')[-1])
break # go to next file if ep found (avoid the else clause)
else: # if ep not found, just print the filename as is
print(file)

Program to parse episode number and renaming it.
Modules used:
re - To parse File Name
os - To rename File Name
full/path/to/folder - is the path to the folder where your file lives
import re
import os
for file in os.listdir(path="full/path/to/folder/"):
# searches for the first 3 or 4 digit number less than 1000 for each line.
for match_obj in re.finditer(r'\d{3,4}', file):
episode = match_obj.group(0)
if int(episode) < 1000:
new_filename = episode.lstrip('0') + '.' + file.split('.')[-1]
old_name = "full/path/to/folder/" + file
new_name = "full/path/to/folder/" + new_filename
os.rename(old_name, new_name)
# go to next file if ep found (avoid the else clause)
break
else:
# if episode not found, just leave the filename as it is
pass

How to remove first n character of multiple file names in mac

I want to rename multiple files to that the first 9 characters are deleted.
example:
Before:
19.49.29 1
19.50.17 2
19.50.24 3
19.50.28 4
.
.
After that:
1
2
3
4
.
.
I tried using python but it screwed up my files and the orders:
import os
folderPath = r'/Users/**myusername**/Desktop/FOLDER'
fileNumber = 1
for filename in os.listdir(folderPath):
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
fileNumber +=1
maybe there's a way using terminal or anything else?

With zsh (which the OP included as a tag)
% autoload zmv
% zmv '* (*)' '$1'
This will treat each filename as a space-separated pair of words, and use the second word as the new name for each file.
If you really need the condition to be "drop the first nine characters", then
% zmv '?????????(*)' '$1'

If you're set on using python3, you can simply use the slicing feature of strings (as they're all arrays) and just keep the 9 leftmost characters like this:
filename = "12.23.34 1.jpeg"
print(filename[9:])
This will start at the 9th character(1) and spit out the rest so you would have "1.jpeg". So in your code, if we assume that ALL your images are 10 characters long (eg: "12.23.34 1.jpeg") the line you had:
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
can be changed to:
os.rename(folderPath + '//' + filename, folderPath + '/' + filename[9:])

Getting NoneType Error When Using Regex to Change Filenames in Python

I'm trying to use change a bunch of filenames using regex groups but can't seem to get it to work (despite writing what regexr.com tells me should be a valid regex statement). The 93,000 files I currently have all look something like this:
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt
And I want them to look like this:
20120731McCONNELL2014sep19_at_182325.txt
And ignore any file that starts with anything other than Mr., Mrs., and Ms.
But every time I run the script below, I get the following error:
Traceback (most recent call last):
File "changefilenames.py", line 11, in <module>
date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'
Thanks so much for your help. My apologies if this is a silly question. I'm just starting with RegEx and Python and can't seem to figure this one out.
import io
import os
import re
from dateutil.parser import parse
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
if filename.startswith("Mrs."):
m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
if filename.startswith("Ms."):
m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
I've made the adjustments suggested in Using Regex to Change Filenames with Python but still no luck.
EDIT: Made the following changes based on answer below:
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
print filename
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
And it spit out this:
Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
File "changefilenames.py", line 19, in <module>
os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory

You are passing bare file names to os.rename, probably with missing paths.
Consider the following layout:
yourscript.py
subdir/
- one
- two
This is similar to your code:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(fn, fn + '_moved')
and it throws an exception (somewhat nicer in Python 3):
FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'
because in the current working directory, there is no file named two. But consider this:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))
This works, because the full path is used. Instead of using 'subdir' again and again (or in a variable), you should perhaps change the working directory as a first step:
import os
os.chdir('subdir')
for fn in os.listdir():
print(fn)
os.rename(fn, fn + '_moved')

After you do a search, you'll always want to make sure you have a match before doing any processing. It looks like you may have a file that starts with 'Mr.' but doesn't match your expression in general.
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # Only look at groups if we have a match.
date = m.group(2)
name = m.group(1)
....
I would also suggest not using startswith('Mr.') and regex at the same time, since your regex should already only work on strings that start with 'Mr.', though you may want to add a '^' to the beginning of the regex to enforce this:
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # ^ added carat to signify start of string.
date = m.group(2)
name = m.group(1)
...
Additionally, you may want to verify what files you are not matching, since with that much data, you will often run into problems like extra whitespace or improper case, so you may want to look into making your regex more robust.

Remove punctuation from file name while keeping file extension intact

I would like to remove all punctuation from a filename but keep its file extension intact.
e.g. I want:
Flowers.Rose-Murree-[25.10.11].jpg
Time.Square.New-York-[20.7.09].png
to look like:
Flowers Rose Muree 25 10 11.jpg
Time Square New York 20 7 09.png
I'm trying python:
re.sub(r'[^A-Za-z0-9]', ' ', filename)
But that produces:
Flowers Rose Muree 25 10 11 jpg
Time Square New York 20 7 09 png
How do I remove the punctuation but keep the file extension?

There's only one right way to do this:
os.path.splitext to get the filename and the extension
Do whatever processing you want to the filename.
Concatenate the new filename with the extension.

You could use a negative lookahead, that asserts that you are not dealing with a dot that is only followed by digits and letters:
re.sub(r'(?!\.[A-Za-z0-9]*$)[^A-Za-z0-9]', ' ', filename)

I suggest you to replace each occurrence of [\W_](?=.*\.) with space .

See if this works for you. You can actually do it without Regex
>>> fname="Flowers.Rose-Murree-[25.10.11].jpg"
>>> name,ext=os.path.splitext(fname)
>>> name = name.translate(None,string.punctuation)
>>> name += ext
>>> name
'FlowersRoseMurree251011.jpg'
>>>

#katrielalex beat me to the type of answer, but anyway, a regex-free solution:
In [23]: f = "/etc/path/fred.apple.png"
In [24]: path, filename = os.path.split(f)
In [25]: main, suffix = os.path.splitext(filename)
In [26]: newname = os.path.join(path,''.join(c if c.isalnum() else ' ' for c in main) + suffix)
In [27]: newname
Out[27]: '/etc/path/fred apple.png'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Regex or Filename Function - python

If all file names have the same format: NAME_20XX_XX.xsl, then you can use python's list slicing instead of regex: name.replace(' ','_')[:-12] + '.xsl'

If dates are always formatted same; >>> s = "EPG CRO 24 Kitchen 09.2013.xsl" >>> re.sub("\s+\d{2}\.\d{4}\..{3}$", "", s) 'EPG CRO 24 Kitchen'

How about little slicing: newfilename = input1[:input1.rfind(" ")].replace(" ","_")+input1[input1.rfind("."):]

Related

Add part of line found after #solution to file

python regex: Parsing file name

How to remove first n character of multiple file names in mac

Getting NoneType Error When Using Regex to Change Filenames in Python

Remove punctuation from file name while keeping file extension intact

Categories

Resources