How to remove first n character of multiple file names in mac - python

I want to rename multiple files to that the first 9 characters are deleted.
example:
Before:
19.49.29 1
19.50.17 2
19.50.24 3
19.50.28 4
.
.
After that:
1
2
3
4
.
.
I tried using python but it screwed up my files and the orders:
import os
folderPath = r'/Users/**myusername**/Desktop/FOLDER'
fileNumber = 1
for filename in os.listdir(folderPath):
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
fileNumber +=1
maybe there's a way using terminal or anything else?

With zsh (which the OP included as a tag)
% autoload zmv
% zmv '* (*)' '$1'
This will treat each filename as a space-separated pair of words, and use the second word as the new name for each file.
If you really need the condition to be "drop the first nine characters", then
% zmv '?????????(*)' '$1'

If you're set on using python3, you can simply use the slicing feature of strings (as they're all arrays) and just keep the 9 leftmost characters like this:
filename = "12.23.34 1.jpeg"
print(filename[9:])
This will start at the 9th character(1) and spit out the rest so you would have "1.jpeg". So in your code, if we assume that ALL your images are 10 characters long (eg: "12.23.34 1.jpeg") the line you had:
os.rename(folderPath + '//' + filename, folderPath + '/' + str(fileNumber) + '.jpeg')
can be changed to:
os.rename(folderPath + '//' + filename, folderPath + '/' + filename[9:])

Related

python regex: Parsing file name

I have a text file (filenames.txt) that contains the file name with its file extension.
filename.txt
[AW] One Piece - 629 [1080P][Dub].mkv
EP.585.1080p.mp4
EP609.m4v
EP 610.m4v
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One_Piece_0745_Sons'_Cups!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One Piece - 621 1080P.mkv
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).
Example:
Input:
``````
EP609.m4v
EP 610.m4v
EP.585.1080p.mp4
One Piece - 621 1080P.mkv
[AW] One Piece - 629 [1080P][Dub].mkv
One_Piece_0745_Sons'_Cups!.mp4
One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4
Expected Output:
````````````````
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4 (or) 0745.mp4
696.mp4 (or) 0696.mp4
591.m4v
577.mp4
Hope someone will help me parse and rename these filenames. Thanks in advance!!!
As you tagged python, I guess you are willing to use python.
(Edit: I've realized a loop in my original code is unnecessary.)
import re
with open('filename.txt', 'r') as f:
files = f.read().splitlines() # read filenames
# assume: an episode comprises of 3 digits possibly preceded by 0
p = re.compile(r'0?(\d{3})')
for file in files:
if m := p.search(file):
print(m.group(1) + '.' + file.split('.')[-1])
else:
print(file)
This will output
609.m4v
610.m4v
585.mp4
621.mkv
629.mkv
745.mp4
696.mp4
591.m4v
577.mp4
Basically, it searches for the first 3-digit number, possibly preceded by 0.
I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.
(Original answer:)
p = re.compile(r'\d{3,4}')
for file in files:
for m in p.finditer(file):
ep = m.group(0)
if int(ep) < 1000:
print(ep.lstrip('0') + '.' + file.split('.')[-1])
break # go to next file if ep found (avoid the else clause)
else: # if ep not found, just print the filename as is
print(file)
Program to parse episode number and renaming it.
Modules used:
re - To parse File Name
os - To rename File Name
full/path/to/folder - is the path to the folder where your file lives
import re
import os
for file in os.listdir(path="full/path/to/folder/"):
# searches for the first 3 or 4 digit number less than 1000 for each line.
for match_obj in re.finditer(r'\d{3,4}', file):
episode = match_obj.group(0)
if int(episode) < 1000:
new_filename = episode.lstrip('0') + '.' + file.split('.')[-1]
old_name = "full/path/to/folder/" + file
new_name = "full/path/to/folder/" + new_filename
os.rename(old_name, new_name)
# go to next file if ep found (avoid the else clause)
break
else:
# if episode not found, just leave the filename as it is
pass

Python: CSV delimiter failing randomly

I have created a script which a number of random passwords are generated (see below)
import string
import secrets
import datetime
now = datetime.datetime.now()
T = now.strftime('%Y_%m_d')
entities = ['AA','BB','CC','DD','EE','FF','GG','HH']
masterpass = ('MasterPass' + '_' + T + '.csv')
f= open(masterpass,"w+")
def random_secure_string(stringLength):
secureStrMain = ''.join((secrets.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits + ('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength)))
return secureStrMain
def random_secure_string_lower(stringLength):
secureStrLower = ''.join((secrets.choice(string.ascii_lowercase)) for i in range(stringLength))
return secureStrLower
def random_secure_string_upper(stringLength):
secureStrUpper = ''.join((secrets.choice(string.ascii_uppercase)) for i in range(stringLength))
return secureStrUpper
def random_secure_string_digit(stringLength):
secureStrDigit = ''.join((secrets.choice(string.digits)) for i in range(stringLength))
return secureStrDigit
def random_secure_string_char(stringLength):
secureStrChar = ''.join((secrets.choice('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength))
return secureStrChar
for x in entities:
f.write(x + ',' + random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1) + ',' + T + "\n")
f.close()
I use pandas to get the code to import a list, so normally it is for 200-250 entities, not just the 8 in the example.
The issue comes every so often where it looks like the comma delimiter fails to be read (see row 6 of attached photo)
In all the cases I have had of this (multiple run throughs), it looks like the 10th character is a comma, the 4 before (characters 6-9) are as stated in the script, but then instead of generating 6 initial characters (from random_secure_string(6)), it is generating 5. Could this be causing the issue? If so, how do I fix this?
Thank you in advance
Wild guess, because the content of the csv file as text is required to make sure.
A csv is a Comma Separated Values text file. That means that it is a plain text files where fields are delimited with a separator, normally the comma (,). In order to allow text fields to contain commas or even new lines, they can be enclosed in quotes (normally ") or special characters can be escaped, normally with \.
That means that if a line contains abcdefg\,2020_05 the comma will not be interpreted as a separator.
How to fix:
CSV is a simple format, but with many corner cases. The rule is avoid to read or write it by hand. Just use the standard library csv module here:
...
import csv
...
with open(masterpass,"w+", newline='') as f:
wr = csv.writer(f)
for x in entities:
wr.writerow([x, random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1), T])
The writer will take care for special characters and ensure that appropriate encoding or escaping will be used

fnmatch working inconsistently in different scripts

I am working on a python script that will write input files for an analysis program I use. One of the steps is to take a list of filenames and search the input directory for them, open them, and get some information out of them. I wrote the following using os.walk and fnmatch in a test-script that has the directory of interest hard-coded in, and it worked just fine:
for locus in loci_select: # for each locus we'll include
print("Finding file " + locus)
for root, dirnames, filenames in os.walk('../phylip_wigeon_mid1_names'):
for filename in fnmatch.filter(filenames, locus): # look in the input directory
print("Found file for locus " + locus + " in set")
loci_file = open(os.path.join('../phylip_wigeon_mid1_names/', filename))
with loci_file as f:
for i, l in enumerate(f):
pass
count = (i) * 0.5 # how many individuals present
print(filename + "has sequences for " + str(count) + " individuals")
...and so on (the other bits all work, so I'll spare you).
As soon as I put this into the larger script and switch out the directory names for input arguments, though, it seems to stop working between the third and fourth lines, despite being nearly identical:
for locus in use_loci: # for each locus we'll include
log.info("Finding file " + locus)
for root, dirnames, filenames in os.walk(args.input_dir):
for filename in fnmatch.filter(filenames, locus): # look in the input directory
log.info("Found file for locus " + locus + " in set")
loci_file = open(os.path.join(args.input_dir, filename))
with loci_file as f:
for i, l in enumerate(f):
pass
count = (i) * 0.5 # how many individuals present
log.info(filename + "has sequences for " + str(count) + " individuals")
I've tested it with temporary print statements between the suspected lines, and it seems like they are the culprits, since my screen output looks like:
2015-11-17 15:53:20,505 - write_ima2p_input_file - INFO - Getting selected loci for analysis
2015-11-17 15:53:20,505 - write_ima2p_input_file - INFO - Finding file uce-7999_wigeon_mid1_contigs.phy
2015-11-17 15:53:20,629 - write_ima2p_input_file - INFO - Finding file uce-4686_wigeon_mid1_contigs.phy
2015-11-17 15:53:20,647 - write_ima2p_input_file - INFO - Finding file uce-5012_wigeon_mid1_contigs.phy
...and so on.
I've tried switching out to glob, as well as simple things like rearranging where this section falls in my larger code, but nothing is working. Any insight would be much appreciated!

Python - splitting lines in txt file by semicolon in order to extract a text title...except sometimes the title has semicolons in it

So, I have an extremely inefficient way to do this that works, which I'll show, as it will help illustrate the problem more clearly. I'm an absolute beginner in python and this is definitely not "the python way" nor "remotely sane."
I have a .txt file where each line contains information about a large number of .csv files, following format:
File; Title; Units; Frequency; Seasonal Adjustment; Last Updated
(first entry:)
0\00XALCATM086NEST.csv;Harmonized Index of Consumer Prices: Overall Index Excluding Alcohol and Tobacco for Austria©; Index 2005=100; M; NSA; 2015-08-24
and so on, repeats like this for a while. For anyone interested, this is the St.Louis Fed (FRED) data.
I want to rename each file (currently named the alphanumeric code # the start, 00XA etc), to the text name. So, just split by semicolon, right? Except, sometimes, the text title has semicolons within it (and I want all of the text).
So I did:
data_file_data_directory = 'C:\*****\Downloads\FRED2_csv_3\FRED2_csv_2'
rename_data_file_name = 'README_SERIES_ID_SORT.txt'
rename_data_file = open(data_file_data_directory + '\\' + rename_data_file_name)
for line in rename_data_file.readlines():
data = line.split(';')
if len(data) > 2 and data[0].rstrip().lstrip() != 'File':
original_file_name = data[0]
These last 2 lines deal with the fact that there is some introductory text that we want to skip, and we don't want to rename based on the legend # the top (!= 'File'). It saves the 00XAL__.csv as the oldname. It may be possible to make this more elegant (I would appreciate the tips), but it's the next part (the new, text name) that gets really ugly.
if len(data) ==6:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==7:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==8:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==9:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==10:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[5][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
(etc)
What I'm doing here is that there is no way to know for each line in the .csv how many items are in the list created by splitting it by semicolons. Ideally, the list would be length 6 - as follows the key # the top of my example of the data. However, for every semicolon in the text name, the length increases by 1...and we want everything before the last four items in the list (counting backwards from the right: date, seasonal adjustment, frequency, units/index) but after the .csv code (this is just another way of saying, I want the text "title" - everything for each line after .csv but before units/index).
Really what I want is just a way to save the entirety of the text name as "new_name" for each line, even after I split each line by semicolon, when I have no idea how many semicolons are in each text name or the line as a whole. The above code achieves this, but OMG, this can't be the right way to do this.
Please let me know if it's unclear or if I can provide more info.

Python Regex or Filename Function

Question about rename file name in folder. My file name looks like this:
EPG CRO 24 Kitchen 09.2013.xsl
With name space between, and I used code like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
With this code I removed white space, but how can I remove date, from file name so it can look like this: EPG_CRO_24_Kitche.xsl. Can you give me some solution about this.
Regex
As utdemir was eluding to, regular expressions can really help in situations like these. If you have never been exposed to them, it can be confusing at first. Checkout https://www.debuggex.com/r/4RR6ZVrLC_nKYs8g for a useful tool that helps you construct regular expressions.
Solution
An updated solution would be:
import re
def rename_file(filename):
if filename.startswith('EPG') and ' ' in filename:
# \s+ means 1 or more whitespace characters
# [0-9]{2} means exactly 2 characters of 0 through 9
# \. means find a '.' character
# [0-9]{4} means exactly 4 characters of 0 through 9
newfilename = re.sub("\s+[0-9]{2}\.[0-9]{4}", '', filename)
newfilename = newfilename.replace(" ","_")
os.rename(filename, newfilename)
Side Note
# Remove whitespace from files where EPG named with space " " replace with "_"
for filename in os.listdir("."):
if filename.find("2013|09 ") > 0:
newfilename = filename.replace(" ","_")
os.rename(filename, newfilename)
Unless I'm mistaken, the from the comment you made above, filename.find("2013|09 ") > 0 won't work.
Given the following:
In [76]: filename = "EPG CRO 24 Kitchen 09.2013.xsl"
In [77]: filename.find("2013|09 ")
Out[77]: -1
And your described comment, you might want something more like:
In [80]: if filename.startswith('EPG') and ' ' in filename:
....: print('process this')
....:
process this
If all file names have the same format: NAME_20XX_XX.xsl, then you can use python's list slicing instead of regex:
name.replace(' ','_')[:-12] + '.xsl'
If dates are always formatted same;
>>> s = "EPG CRO 24 Kitchen 09.2013.xsl"
>>> re.sub("\s+\d{2}\.\d{4}\..{3}$", "", s)
'EPG CRO 24 Kitchen'
How about little slicing:
newfilename = input1[:input1.rfind(" ")].replace(" ","_")+input1[input1.rfind("."):]

Categories

Resources