stripping a pattern from the end of the string - python

I want to see if a file like test_100.webp exists and then look at the file test.yaml. Therefore, I need to strip the pattern "_100.webp" from the end. I tried to use the code below and it is giving me issues.
for i, image in enumerate(images_in_item):
if image.endswith("_100.webp"):
image_strip = image.rstrip(_100.webp)
snapshot_markup = os.path.join(image_strip + 'yaml')

Do this:
suffix = '_100.webp'
if image.endswith(suffix):
image_strip = image[:-len(suffix)]
snapshot_markup = os.path.join(image_strip + 'yaml')

Related

Python - Possibly Regex - How to replace part of a filepath with another filepath based on a match?

I'm new to Python and relatively new to programming. I'm trying to replace part of a file path with a different file path. If possible, I'd like to avoid regex as I don't know it. If not, I understand.
I want an item in the Python list [] before the word PROGRAM to be replaced with the 'replaceWith' variable.
How would you go about doing this?
Current Python List []
item1ToReplace1 = \\server\drive\BusinessFolder\PROGRAM\New\new.vb
item1ToReplace2 = \\server\drive\BusinessFolder\PROGRAM\old\old.vb
Variable to replace part of the Python list path
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
Desired results for Python List []:
item1ToReplace1 = C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb
item1ToReplace2 = C:\ProgramFiles\Micosoft\PROGRAM\old\old.vb
Thank you for your help.
The following code does what you ask, note I updated your '' to '\', you probably need to account for the backslash in your code since it is used as an escape character in python.
import os
item1ToReplace1 = '\\server\\drive\\BusinessFolder\\PROGRAM\\New\\new.vb'
item1ToReplace2 = '\\server\\drive\\BusinessFolder\\PROGRAM\\old\\old.vb'
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
keyword = "PROGRAM\\"
def replacer(rp, s, kw):
ss = s.split(kw,1)
if (len(ss) > 1):
tail = ss[1]
return os.path.join(rp, tail)
else:
return ""
print(replacer(replaceWith, item1ToReplace1, keyword))
print(replacer(replaceWith, item1ToReplace2, keyword))
The code splits on your keyword and puts that on the back of the string you want.
If your keyword is not in the string, your result will be an empty string.
Result:
C:\ProgramFiles\Microsoft\PROGRAM\New\new.vb
C:\ProgramFiles\Microsoft\PROGRAM\old\old.vb
One way would be:
item_ls = item1ToReplace1.split("\\")
idx = item_ls.index("PROGRAM")
result = ["C:", "ProgramFiles", "Micosoft"] + item_ls[idx:]
result = "\\".join(result)
Resulting in:
>>> item1ToReplace1 = r"\\server\drive\BusinessFolder\PROGRAM\New\new.vb"
... # the above
>>> result
'C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb'
Note the use of r"..." in order to avoid needing to have to 'escape the escape characters' of your input (i.e. the \). Also that the join/split requires you to escape these characters with a double backslash.

Remove everything but #number in brackets

I have a file where the lines have the form #nr = name(#nr, (#nr), different vars, and names).
I would like to only have the #nr in the brackets to get the form #nr = name(#nr, #nr)
I have tried to solve this in different ways like using regex, startswith() and lists but nothing has worked so far.
Any help is much appreciated.
Edit: Code
for line in f.split():
start = line.find( '(' )
end = line.find( ')' )
if start != -1 and end != -1:
line = ''.join(i for i in x if not i.startswith('#'))
print(line)
Edit 2:
As example I have:
#304= IFCRELDEFINESBYPROPERTIES('0FZ0hKNanFNAQpJ_Iqh4zM',#42,$,$,(#142),#301);
Afterwards I want to have:
#304= IFCRELDEFINESBYPROPERTIES(#42,#142,#301);
This can be solved using regex, though trying to do it with a single find/replace would be more complicated. Instead, you can do it in two steps:
import re
def sub_func(match):
nums = re.findall(r'#\d+', match.group(2))
return match.group(1) + '(' + ','.join(nums) + ');'
text = "#304= IFCRELDEFINESBYPROPERTIES('0FZ0hKNanFNAQpJ_Iqh4zM',#42,$,$,(#142),#301);"
result = re.sub(r'(^[^(]+)\((.*)\);', sub_func, text)
print(result)
# '#304= IFCRELDEFINESBYPROPERTIES(#42,#142,#301);'
So instead of passing a string as the second argument for re.sub, we pass a function instead, where we can process the results of the match with some more regex and reformatting the results before passing it back.

Find specific substring while iterating through multiple file names

I need to find the identification number of a big number of files while iterating throught them.
The file names are loaded onto a list and look like:
ID322198.nii
ID9828731.nii
ID23890.nii
FILEID988312.nii
So the best way to approach this would be to find the number that sits between ID and .nii
Because number of digits varies I can't simply select [-10:-4] of thee file name. Any ideas?
You can use a regex (see it in action here):
import re
files = ['ID322198.nii','ID9828731.nii','ID23890.nii','FILEID988312.nii']
[re.findall(r'ID(\d+)\.nii', file)[0] for file in files]
Returns:
['322198', '9828731', '23890', '988312']
to find the position of ID and .nii, you can use python's index() function
for line in file:
idpos =
nilpos =
data =
or as a list of ints:
[ int(line[line.index("ID")+1:line.index(".nii")]) for line in file ]
Using rindex:
s = 'ID322198.nii'
s = s[s.rindex('D')+1 : s.rindex('.')]
print(s)
Returns:
322198
Then apply this sintax to a list of strings.
It seems like you could filter the digits out, like this:
digits = ''.join(d for d in filename if d.isdigit())
That will work nicely as long as there are no other digits in the filename (e.g backups with a .1 suffix or something).
for name in files:
name = name.replace('.nii', '')
id_num = name.replace(name.rstrip('0123456789'), '')
How this works:
# example
name = 'ID322198.nii'
# remove '.nii'. -> name1 = 'ID322198'
name1 = name.replace('.nii', '')
# strip all digits from the end. -> name2 = 'ID'
name2 = name1.rstrip('0123456789')
# remove 'ID' from 'ID322198'. -> id_num = '322198'
id_num = name1.replace(name2, '')

Splitting lines in a file into string and hex and do operations on the hex values

I have a large file with several lines as given below.I want to read in only those lines which have the _INIT pattern in them and then strip off the _INIT from the name and only save the OSD_MODE_15_H part in a variable. Then I need to read the corresponding hex value, 8'h00 in this case, ans strip off the 8'h from it and replace it with a 0x and save in a variable.
I have been trying strip the off the _INIT,the spaces and the = and the code is becoming really messy.
localparam OSD_MODE_15_H_ADDR = 16'h038d;
localparam OSD_MODE_15_H_INIT = 8'h00
Can you suggest a lean and clean method to do this?
Thanks!
The following solution uses a regular expression (compiled to speed searching up) to match the relevant lines and extract the needed information. The expression uses named groups "id" and "hexValue" to identify the data we want to extract from the matching line.
import re
expression = "(?P<id>\w+?)_INIT\s*?=.*?'h(?P<hexValue>[0-9a-fA-F]*)"
regex = re.compile(expression)
def getIdAndValueFromInitLine(line):
mm = regex.search(line)
if mm == None:
return None # Not the ..._INIT parameter or line was empty or other mismatch happened
else:
return (mm.groupdict()["id"], "0x" + mm.groupdict()["hexValue"])
EDIT: If I understood the next task correctly, you need to find the hexvalues of those INIT and ADDR lines whose IDs match and make a dictionary of the INIT hexvalue to the ADDR hexvalue.
regex = "(?P<init_id>\w+?)_INIT\s*?=.*?'h(?P<initValue>[0-9a-fA-F]*)"
init_dict = {}
for x in re.findall(regex, lines):
init_dict[x.groupdict()["init_id"]] = "0x" + x.groupdict()["initValue"]
regex = "(?P<addr_id>\w+?)_ADDR\s*?=.*?'h(?P<addrValue>[0-9a-fA-F]*)"
addr_dict = {}
for y in re.findall(regex, lines):
addr_dict[y.groupdict()["addr_id"]] = "0x" + y.groupdict()["addrValue"]
init_to_addr_hexvalue_dict = {init_dict[x] : addr_dict[x] for x in init_dict.keys() if x in addr_dict}
Even if this is not what you actually need, having init and addr dictionaries might help to achieve your goal easier. If there are several _INIT (or _ADDR) lines with the same ID and different hexvalues then the above dict approach will not work in a straight forward way.
try something like this- not sure what all your requirements are but this should get you close:
with open(someFile, 'r') as infile:
for line in infile:
if '_INIT' in line:
apostropheIndex = line.find("'h")
clean_hex = '0x' + line[apostropheIndex + 2:]
In the case of "16'h038d;", clean_hex would be "0x038d;" (need to remove the ";" somehow) and in the case of "8'h00", clean_hex would be "0x00"
Edit: if you want to guard against characters like ";" you could do this and test if a character is alphanumeric:
clean_hex = '0x' + ''.join([s for s in line[apostropheIndex + 2:] if s.isalnum()])
You can use a regular expression and the re.findall() function. For example, to generate a list of tuples with the data you want just try:
import re
lines = open("your_file").read()
regex = "([\w]+?)_INIT\s*=\s*\d+'h([\da-fA-F]*)"
res = [(x[0], "0x"+x[1]) for x in re.findall(regex, lines)]
print res
The regular expression is very specific for your input example. If the other lines in the file are slightly different you may need to change it a bit.

Matching a character class multiple times in a string

I am writing a short script to sanitise folder and file names for upload to SharePoint. Since SharePoint is fussy and has some filename rules beyond simple disallowed characters (multiple consecutive periods are disallowed for instance) it seemed like regular expressions were the way to go rather than simple replacement of single characters. One expression that doesn't seem to be working however is:
[/<>*?|:"~#%&{}\\]+
As a simple character class match I would have expected this to work fine, and it appears to do so in notepad++. My expectation was that a string like
St\r/|ng
with the above regex would match \, / and |. However no matter what I do I can only get the string to match the first backslash, or the first of whatever character is in that class that it comes across. This is being done with the Python re library. Does anyone know what the issue is here?
import os, sys, shutil, re
def cleanPath(path):
#Compiling regex...
multi_dot = re.compile(r"[\.]{2,}")
start_dot = re.compile(r"^[\.]")
end_dot = re.compile(r"[\.]$")
disallowed_chars = re.compile(r'[/<>*?|:"~#%&{}\\]+')
dis1 = re.compile(r'\.files$')
dis2 = re.compile(r'_files$')
dis3 = re.compile(r'-Dateien$')
dis4 = re.compile(r'_fichiers$')
dis5 = re.compile(r'_bestanden$')
dis5 = re.compile(r'_file$')
dis6 = re.compile(r'_archivos$')
dis7 = re.compile(r'-filer$')
dis8 = re.compile(r'_tiedostot$')
dis9 = re.compile(r'_pliki$')
dis10 = re.compile(r'_soubory$')
dis11 = re.compile(r'_elemei$')
dis12 = re.compile(r'_ficheiros$')
dis13 = re.compile(r'_arquivos$')
dis14 = re.compile(r'_dosyalar$')
dis15 = re.compile(r'_datoteke$')
dis16 = re.compile(r'_fitxers$')
dis17 = re.compile(r'_failid$')
dis18 = re.compile(r'_fails$')
dis19 = re.compile(r'_bylos$')
dis20 = re.compile(r'_fajlovi$')
dis21 = re.compile(r'_fitxategiak$')
regxlist = [multi_dot,start_dot,end_dot,disallowed_chars,dis1,dis2,dis3,dis4,dis5,dis5,dis6,dis7,dis8,dis9,dis10,dis11,dis12,dis13,dis14,dis15,dis16,dis17,dis18,dis19,dis20,dis21]
print("************************************\n\n"+path+"\n\n************************************\n")
for x in regxlist:
match = x.search(path)
if match:
print("\n")
print("MATCHED")
print(match.group())
print("___________________________________________________________________________")
return path
#testlist of conditions that should be found, some OK, some bad
testlist = ["string","str....ing","str..ing","str.ing",".string","string.",".string.","$tring",r"st\r\ing","st/r/ing",r"st\r/|ng","/str<i>ng","str.filesing","string.files"]
testlist_ans = ["OK","Match ....","Match ..","OK","Match .","Match .","Match . .","OK",r"Match \ ","Match /",r"Match \/|","Match / < >","OK","Match .files"]
count = 0
for i in testlist:
print(testlist_ans[count])
count = count + 1
cleanPath(i)
What Python re command do you use ?
You should use : re.findall
re.sub(pattern,new_txt,subject) #replace all instinces of pattern with new_txt
re.findall(pattern,subject) #find all instances

Categories

Resources