Replace characters with particular format with a variable value in python - python

I have filenames with the particular format as given
II.NIL.10.BHZ.M.2058.190.160877
II.NIL.10.BHA.M.2008.190.168857
II.NIL.10.BHB.M.2078.198.160857
.
.
.
I want to remove the BH?.M part with the value in a string variable in name.
name=['T','D','FG'.....]
expected output
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857
.
.
.
Is it possible with str.replace()?

You could use the built-in regex module (re) alongside the following pattern to effectively replace the content in your strings.
Pattern
'(?<=BH)[A-Z]+\.M'
This pattern looks behind (non-matching) to ensure to check for the substring 'BH', then matches on any uppercase character [A-Z] one or more times + followed by the substring '.M'.
Solution
The below solution uses re.sub() alongside the pattern outlined above to return a string with the substring matched by the pattern replaced with that defined here as replacement.
import re
original = 'II.NIL.10.BHB.M.2078.198.160857'
replacement = 'FG'
output = re.sub(r'(?<=BH)[A-Z]+\.M', replacement, original)
print(output)
Output
II.NIL.10.BHFG.2078.198.160857
Processing multiple files
To repeat this process for multiple files you could apply the above logic within a loop/comprehension, running the re.sub() function on each original/replacement pairing and storing/processing appropriately.
The below example uses the data from your original question alongside the above logic to create a list containing the results of each re.sub() operation by way of a dictionary mapping between the original filenames and substrings to be inserted using re.sub().
import re
originals = [
'II.NIL.10.BHZ.M.2058.190.160877',
'II.NIL.10.BHA.M.2008.190.168857',
'II.NIL.10.BHB.M.2078.198.160857'
]
replacements = ['T','D','FG']
mapping = {originals[i]: replacements[i] for i, _ in enumerate(originals)}
results = [re.sub(r'(?<=BH)[A-Z]+\.M', v, k) for k,v in mapping.items()]
for r in results:
print(r)
Output
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857

Nope, you cannot use str.replace with a wildcard. You will have to use regex with something such as the following
import re
filenames = ['II.NIL.10.BHA.M.2008.190.168857 ', 'II.NIL.10.BHB.M.2078.198.160857',
'II.NIL.10.BHC.M.2078.198.160857']
name = ['T','D','FG']
newfilenames = []
for i in range(len(filenames)):
newfilenames.append(re.sub(r'BH.?\.M', 'BH'+name[i], filenames[i]))
print(' '.join(newfilenames)) # outputs II.NIL.10.BHT.2008.190.168857 II.NIL.10.BHD.2078.198.160857 II.NIL.10.BHFG.2078.198.160857

You can use iter with next in the replacement lambda of re.sub:
import re
name = iter(['T','D','FG'])
s = """
II.NIL.10.BHZ.M.2058.190.160877
II.NIL.10.BHA.M.2008.190.168857
II.NIL.10.BHB.M.2078.198.160857
"""
result = re.sub('(?<=BH)\w\.\w', lambda x:f'{next(name)}', s)
Output:
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857

Related

Replace placeholders in string with replacements sequence

I have a location string with placeholders, used as '#'. Another string which are replacements for the placeholders. I want to replace them sequentially, (like format specifiers). What is the way to do it in Python?
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
replacements = 'xyz'
result = '/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3'
You should use the replace method of a string as follows:
for replacement in replacements:
location = location.replace('#', replacement, 1)
It is important you use the third argument, count, in order to replace that placeholder just once. Otherwise, it will replace every time you find your placeholder.
If your location string does not contains format specifiers ({}) you could do:
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
replacements='xyz'
print(location.replace("#", "{}").format(*replacements))
Output
/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3
As an alternative you could use the fact that repl in re.sub can be a function:
import re
from itertools import count
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
def repl(match, replacements='xyz', index=count()):
return replacements[next(index)]
print(re.sub('#', repl, location))
Output
/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3

Splitting based on particular pattern and editing string

I am trying to split a string based on a particular pattern in an effort to rejoin it later after adding a few characters.
Here's a sample of my string: "123\babc\b:123" which I need to convert to "123\babc\\"b\":123". I need to do it several times in a long string. I have tried variations of the following:
regex = r"(\\b[a-zA-Z]+)\\b:"
test_str = "123\\babc\\b:123"
x = re.split(regex, test_str)
but it doesn't split at the right positions for me to join. Is there another way of doing this/another way of splitting and joining?
You're right, you can do it with re.split as suggested. You can split by \b and then rebuild your output with a specific separator (and keep the \b when you want too).
Here an example:
# Import module
import re
string = "123\\babc\\b:123"
# Split by "\n"
list_sliced = re.split(r'\\b', "123\\babc\\b:123")
print(list_sliced)
# ['123', 'abc', ':123']
# Define your custom separator
custom_sep = '\\\\"b\\"'
# Build your new output
output = list_sliced[0]
# Iterate over each word
for i, word in enumerate(list_sliced[1:]):
# Chose the separator according the parity (since we don't want to change the first "\b")
sep = "\\\\b"
if i % 2 == 1:
sep = custom_sep
# Update output
output += sep + word
print(output)
# 123\\babc\\"b\":123
Maybe, the following expression,
^([\\]*)([^\\]+)([\\]*)([^\\]+)([\\]*)([^:]+):(.*)$
and a replacement of,
\1\2\3\4\5\\"\6\\":\7
with a re.sub might return our desired output.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

Replace named captured groups with arbitrary values in Python

I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.
I have a string like this one :
s = 'monthday=1, month=5, year=2018'
and I have a regex matching it with captured groups like the following :
regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')
now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :
'monthday=aaa, month=bbb, year=ccc'
basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.
Is there a way to achieve the desired result ?
Note
This is just an example, I could have other input regexs with different structure, but same name capturing groups ...
Update
Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :
input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :
'ccc-bbb-aaa'
replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.
In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.
This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.
Since you've written your regex the wrong way, you have to do most of the substitution operation manually:
"""
Replaces the text captured by named groups.
"""
def replace_groups(pattern, string, replacements):
pattern = re.compile(pattern)
# create a dict of {group_index: group_name} for use later
groupnames = {index: name for name, index in pattern.groupindex.items()}
def repl(match):
# we have to split the matched text into chunks we want to keep and
# chunks we want to replace
# captured text will be replaced. uncaptured text will be kept.
text = match.group()
chunks = []
lastindex = 0
for i in range(1, pattern.groups+1):
groupname = groupnames.get(i)
if groupname not in replacements:
continue
# keep the text between this match and the last
chunks.append(text[lastindex:match.start(i)])
# then instead of the captured text, insert the replacement text for this group
chunks.append(replacements[groupname])
lastindex = match.end(i)
chunks.append(text[lastindex:])
# join all the junks to obtain the final string with replacements
return ''.join(chunks)
# for each occurence call our custom replacement function
return re.sub(pattern, repl, string)
>>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
'monthday=aaa, month=bbb, year=ccc'
You can use string formatting with a regex substitution:
import re
s = 'monthday=1, month=5, year=2018'
s = re.sub('(?<=\=)\d+', '{}', s).format(*['aaa', 'bbb', 'ccc'])
Output:
'monthday=aaa, month=bbb, year=ccc'
Edit: given an arbitrary input string and regex, you can use formatting like so:
input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
new_s = re.sub(regex, '{}', input).format(*["aaa", "bbb", "ccc"])
Extended Python 3.x solution on extended example (re.sub() with replacement function):
import re
d = {'d':'aaa', 'm':'bbb', 'Y':'ccc'} # predefined dict of replace words
pat = re.compile('(monthday=)(?P<d>\d{1,2})|(month=)(?P<m>\d{1,2})|(year=)(?P<Y>20\d{2})')
def repl(m):
pair = next(t for t in m.groupdict().items() if t[1])
k = next(filter(None, m.groups())) # preceding `key` for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=')
return k + d.get(pair[0], '')
s = 'Data: year=2018, monthday=1, month=5, some other text'
result = pat.sub(repl, s)
print(result)
The output:
Data: year=ccc, monthday=aaa, month=bbb, some other text
For Python 2.7 :
change the line k = next(filter(None, m.groups())) to:
k = filter(None, m.groups())[0]
I suggest you use a loop
import re
regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')
s = 'monthday=1, month=1, year=2017 \n'
s+= 'monthday=2, month=2, year=2019'
regex_as_str = 'monthday={d}, month={m}, year={Y}'
matches = [match.groupdict() for match in regex.finditer(s)]
for match in matches:
s = s.replace(
regex_as_str.format(**match),
regex_as_str.format(**{'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
)
You can do this multile times wiht your different regex patterns
Or you can join ("or") both patterns together

How to replace only elements found using a python re.findall rather than entire string?

How would I replace groups found using the python regex findall method without having to change the rest of the string too.
For example:
import re
repl1='k1'
repl2='k2'
pattern=re.compile('CN=Root,Model=.*,Vector=Reactions\[(.*)\],ParameterGroup=Parameters,Parameter=(.*),Reference=Value')
I want use the re.sub to replace ONLY the elements within the (.*) with repl1 and repl1 rather than having to change the rest of the string too.
-------edit -----
The output I want should look like this:
output = 'CN=Root,Model=.*,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value')
But note I have left the '.*' in after model because this will change every time. I.e. this can be anything.
----------edit 2----------
The input is a simple one line which is almost exactly the same at pattern. For example:
input= 'CN=Root,Model=Model1,Vector=Reactions\[k10],ParameterGroup=Parameters,Parameter=k12,Reference=Value')
re.sub's argument repl can be a one-argument function, and in that case it is called with the match object as an argument. So, if you ensure that all parts of the pattern are in a group you should have all the information you need to replace the old string with the new one.
import re
repl1='k1'
repl2='k2'
pattern=re.compile('(CN=Root,Model=.*,Vector=Reactions\[)(.*)(\],ParameterGroup=Parameters,Parameter=)(.*)(,Reference=Value)')
target = 'CN=Root,Model=something,Vector=Reactions[somethingelse],ParameterGroup=Parameters,Parameter=1234,Reference=Value'
Now define a function that produces the matched string with groups 1 and 3 replaced with your desired values:
def repl(m):
g = list(m.groups())
g[1] = repl1
g[3] = repl2
return "".join(g)
Passing this function as the first argument to re.sub than achieves the desired transformation:
pattern.sub(repl, target)
gives the result
'CN=Root,Model=something,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value'

replace wildcard numbers in pattern with additional text + same numbers

I need to find all parts of a large text string in this particular pattern:
"\t\t" + number (between 1-999) + "\t\t"
and then replace each occurrence with:
TEXT+"\t\t"+same number+"\t\t"
So, the end result is:
'TEXT\t\t24\t\tblah blah blahTEXT\t\t56\t\t'... and so on...
The various numbers are between 1-999 so it needs some kind of wildcard.
Please can somebody show me how to do it? Thanks!
You'll want to use Python's re library, and in particular the re.sub function:
import re # re is Python's regex library
SAMPLE_TEXT = "\t\t45\t\tbsadfd\t\t839\t\tds532\t\t0\t\t" # Test text to run the regex on
# Run the regex using re.sub (for substitute)
# re.sub takes three arguments: the regex expression,
# a function to return the substituted text,
# and the text you're running the regex on.
# The regex looks for substrings of the form:
# Two tabs ("\t\t"), followed by one to three digits 0-9 ("[0-9]{1,3}"),
# followed by two more tabs.
# The lambda function takes in a match object x,
# and returns the full text of that object (x.group(0))
# with "TEXT" prepended.
output = re.sub("\t\t[0-9]{1,3}\t\t",
lambda x: "TEXT" + x.group(0),
SAMPLE_TEXT)
print output # Print the resulting string.

Categories

Resources