I'm trying to replace strings in several Excel files using python.
I'm trying to do it in bulk, and also I'm not sure about the regularity of a string that I want to replace...
first, I get the list of file names below to obtain filenames excluding ".xlsx"(I intentionally exclude .xlsx for other purposes)
from C:\Users\username\Desktop\file\pro
import openpyxl
import os
from os import walk
os.chdir(r'C:\Users\username\Desktop\file')
pro = 'pro//'
extension = ('.xlsx')
filenames = next(walk(pro)), (None, None, []))[2]
filelist = []
for i in filenames:
new = i.replace(extension, "")
filelist.append(new)
Then I iterate each file to find the string I want to replace
replacer = "=[1]!BError"
for i in filelist:
filename = i + extension
wb = openpyxl.load_workbook(pro+filename)
ws = wb["Val"]
for r in range(1, ws.max_row + 1):
for c in range(1, ws.max_column + 1):
s = str(ws.cell(r, c).value)
if s != None and replacer in s:
ws.cell(r, c).value = s.replace(replacer,'=')
wb.save(i + extension)
The above only works if the string is exactly "=[1]!BError" but the problem is, the number between brackets varies from 1~50. The number is the same for every cell in one particular file, but different in each file.
And sometimes it could be very weird like
="_xlfn.SINGLE([11]!BError"
Is there any way that I can replace strings of "=~~~~~!BError" into "=" regardless of what's in ~~~~ is?
Thank you!
If “!Berror” is the common part of all the strings you want to replace just write that in replacer variable. That way the “in” key word will catch it in your if statement.
Related
I'm trying to rename and add pad the names of a few hundred files to the same length.
So far I've managed to correctly rename and pad the file names within my IDE but I'm not sure how I link that to actually rename the files themselves.
Atomic Samurai__________.png
BabyYodatheBased________.png
Baradum_________________.png
bcav____________________.png
This is the code that does the rename and the padding within my IDE:
import glob, os
pad_images = glob.glob(r"C:\Users\test\*.png")
split_images = []
for i in pad_images:
split = i.split("\\")[-1]
split_images.append(split)
longest_file_name = max(split_images, key=len)
longest_int = len(longest_file_name)
new_images = []
for i in split_images:
parts = i.split('.')
new_name = (parts[0]).ljust(longest_int, '_') + "." + parts[1])
I've been trying to get os.rename(old_name, new_name) to work but I'm not sure where I actually get the old name from as I've split things up into different for loops.
Try saving the old file names to a list and do all the modifications (split and rename) in a single loop thereafter:
path = "C:/Users/test"
images = [f for f in os.listdir(path) if f.endswith(".png")]
length = len(max(images, key=len))
for file in images:
parts = file.split("\\")[-1].split(".")
new_name = f'{parts[0].ljust(length,"_")}.{parts[1]}'
os.rename(os.path.join(path,file), os.path.join(path,new_name))
i am reading .txt and .log extension files having entries below
$AV:3666,0000,0*
$AV:3664,0000,0*
but i want to remove extra characters and symbols (AV....0000,0*)so that i can have an entry like this
$:2226
$:2308
how can i go about it in python,below is the code i am using
source_path = 'C:\\Users\\User\\Downloads\\file1'
file_formats = ['.txt','.log']
filenames = []
for filename in os.listdir():
for file_format in file_formats:
if filename.endswith(file_format):
filenames.append(filename)
will appreciate your help
Take a look at my beutiful regex
import re
x = "$AV:3664,0000,0*"
line = re.sub('[AV,0000,0*]', '', x)
It's just amazing.
Look at this spottles output.
$:3664
Using Python, I need to add 100 to the integer part of some filenames to rename the files. The files look like this: 0000000_6dee7e249cf3.log where 6dee7e249cf3 is a random number. At the end I should have:
0000000_6dee7e249cf3.log should change to 0000100_6dee7e249cf3.log
0000001_12b2bb88d493.log should change to 0000101_12b2bb88d493.log
etc, etc…
I can print the initial files using:
initial: glob('{0:07d}_*[a-z]*'.format(NUM))
but the final files returns an empty list:
final: glob('{0:07d}_*[a-z]*'.format(NUM+100))
Moreover, I cannot not rename initial to final using os.rename because it can not read the list created using the globe function.
I've included your regex search. It looks like glob doesn't handle regex, but re does
import os
import re
#for all files in current directory
for f in os.listdir('./'):
#if the first 7 chars are numbers
if re.search('[0-9]{7}',f):
lead_int = int(f.split('_')[0])
#if the leading integer is less than 100
if lead_int < 100:
# rename this file with leading integer + 100
os.rename(f,'%07d_%s'%(lead_int + 100,f.split('_')[-1]))
Split the file name value using '_' separator and use those two values to reconstruct your file name.
s = name.split('_')
n2 = str(int(s[0]) + 100)
new_name = s[0][:len(s[0]) - len(n2)] + n2 + '_' + s[1]
Well, I'm learning Python, so I'm working on a project that consists in passing some numbers of PDF files to xlsx and placing them in their corresponding columns, rows determined according to row heading.
The idea that came to me to carry it out is to convert the PDF files to txt and make a dictionary with the txt files, whose key is a part of the file name (because it contains a part of the row header) and the values be the numbers I need.
I have already managed to convert the txt files, now i'm dealing with the script to carry the dictionary. at the moment look like this:
import os
import re
p = re.compile(r'\w+\f+')
'''
I'm not entirely sure at the moment how the .compile of regular expressions works, but I know I'm missing something to indicate that what I want is immediately to the right, I'm also not sure if the keywords will be ignored, I just want take out the numbers
'''
m = p.match('Theese are the keywords' or 'That are immediately to the left' or 'The numbers I want')
def IsinDict(txtDir):
ToData = ()
if txtDir == "": txtDir = os.getcwd() + "\\"
for txt in os.listdir(txtDir):
ToKey = txt[9:21]
if ToKey == (r"\w+"):
Data = open(txt, "r")
for string in Data:
ToData += m.group()
Diccionary = dict.fromkeys(ToKey, ToData)
return Diccionary
txtDir = "Absolute/Path/OfTheText/Files"
IsinDict(txtDir)
Any contribution is welcome, thanks for your attention.
I'm trying to rename some files in a directory using Python. I've looked around the forums here, and because I'm a newbie, I can't adapt what I need from what is out there.
Say in a directory I have a group of files called
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125225754_7_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125236347_8_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125236894_5_S110472_I238621.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125248691_6_S110472_I238621.jpg
and I want to remove "125225754", "125236347", "125236894" and "125248691" here so my resulting filename will be
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_7_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_8_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_5_S110472_I238621.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_6_S110472_I238621.jpg
I'm trying to use the os.path.split but it's not working properly.
I have also considered using string manipulations, but have not been successful with that either.
Any help would be greatly appreciated. Thanks.
os.path.split splits a path (/home/mattdmo/work/projects/python/2014/website/index.html) into its component directories and file name.
As #wim suggested, if the file names are all exactly the same length, you can use string slicing to split out whatever occurs between two indexes, then join them back together. So, in your example,
filename = "FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125248691_6_S110472_I238621.jpg"
newname = filename[:57] + filename[66:]
print(newname)
# FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_6_S110472_I238621.jpg
This takes the first 58 characters of the string (remember in Python string indexes are 0-based) and joins it to all characters after the 67 one.
Now that you can do this, just put all the filenames into a list and iterate over it to get your new filenames:
import os
filelist = os.listdir('.') # get files in current directory
for filename in filelist:
if ".jpg" in filename: # only process pictures
newname = filename[:57] + filename[66:]
print(filename + " will be renamed as " + newname)
os.rename(filename, newname)
Can we assume that the files are all the same name up to the date _20110602[difference here]?
If that's the case then it's actually fairly easy to do.
First you need the index of that difference. Starting from character 0 which is 'F' in this case, count right until you hit that first difference. You can programatically do this by this:
s1 = 'String1'
s2 = 'String2'
i = 0
while(i < len(s1) && i < len(s2)):
if(s1[i] == s2[i]) i++
else break
And i is now set to the first difference of s1 and s2 (or if there is none, their length).
From here you know that you want to strip everything from this index to the following _.
j = i
while(j < len(s1)):
if(s1[j] != '_') j++
else break
# j is the index of the _ character after i
p1 = s1[:i] # Everything up to i
p2 = s1[j:] # Everything after j
s1 = p1.concat(p2)
# Do the same for s2, or even better, do this in a loop.
The only caveat here is that they have to be the same name up to this point for this to work. If they are the same length then this is still fairly easy, but you have to figure out yourself what the indices are rather than using the string difference method.
If you always have exact string: '20110602' in the file names stored in 'my_directory' folder:
import re #for regular expression
from os import rename
from glob import glob
for filename in glob('my_directory/*.jpg'):
match = re.search('20110602', filename)
if match:
newname = re.sub(r'20110602[0-9]+_','20110602_', filename)
rename(filename, newname)
A more general code to match any YYYYMMDD (or YYYYDDMM):
import re #for regular expression
from os import rename
from glob import glob
for filename in glob('my_directory/*.jpg'):
match = re.search(r'\d{4}\d{2}\d{2}\d+_', filename)
if match:
newname = re.sub(r'(\d{4}\d{2}\d{2})(\d+)(_)', '\\1'+'\\3', filename)
rename(filename, newname)
'\\1': This is match.group(1) that refers to the first set of parentheses
'\\3': This is match.group(3) that refers to the third set of parentheses
\d or [0-9]: are the same. They match any digit
{number}: the number of times the previous token (in this case a digit) are repeated
+ : 1 or more of previous expression (in this case a digit)