Replace text between parentheses in python

Replace text between parentheses in python - python

My string will contain () in it. What I need to do is to change the text between the brackets.
Example string: "B.TECH(CS,IT)".
In my string I need to change the content present inside the brackets to something like this.. B.TECH(ECE,EEE)
What I tried to resolve this problem is as follows..
reg = r'(()([\s\S]*?)())'
a = 'B.TECH(CS,IT)'
re.sub(reg,"(ECE,EEE)",a)
But I got output like this..
'(ECE,EEE)B(ECE,EEE).(ECE,EEE)T(ECE,EEE)E(ECE,EEE)C(ECE,EEE)H(ECE,EEE)((ECE,EEE)C(ECE,EEE)S(ECE,EEE),(ECE,EEE)I(ECE,EEE)T(ECE,EEE))(ECE,EEE)'
Valid output should be like this..
B.TECH(CS,IT)
Where I am missing and how to correctly replace the text.

The problem is that you're using parentheses, which have another meaning in RegEx. They're used as grouping characters, to catch output.
You need to escape the () where you want them as literal tokens. You can escape characters using the backslash character: \(.
Here is an example:
reg = r'\([\s\S]*\)'
a = 'B.TECH(CS,IT)'
re.sub(reg, '(ECE,EEE)', a)
# == 'B.TECH(ECE,EEE)'

The reason your regex does not work is because you are trying to match parentheses, which are considered meta characters in regex. () actually captures a null string, and will attempt to replace it. That's why you get the output that you see.
To fix this, you'll need to escape those parens – something along the lines of
\(...\)
For your particular use case, might I suggest a simpler pattern?
In [268]: re.sub(r'\(.*?\)', '(ECE,EEE)', 'B.TECH(CS,IT)')
Out[268]: 'B.TECH(ECE,EEE)'

Related

re.sub for string starting with special character

Sorry if this question seems too similar to other's I have found. This is a variation of using re.sub to replace exact characters in a string.
I have a string that looks like:
C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
I would like to only replace, for example, the '*:1' with 'Ar'. My current attempt looks like this:
smiles_all='C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
print(smiles_all)
new_smiles=re.sub('[*:]1','Ar',smiles_all)
print(new_smiles)
C1([*:5])C([*:6])C2=NC1=C([*Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*Ar0])C(=N4)C([*:3])=C5C([*Ar1])=C([*Ar2])C(=C2([*:4]))N5
As you can see, this is still changing the values that were previously 10,11, etc. I've tried different variations where I select [*:1], but that is also incorrect. Any help here would be greatly appreciated. In my current output, the * also remains. That needs to be swapped so that *:1 becomes Ar
Here is an example of what the output should be
C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
*Edit:
At one point this question was flagged as answered by this question:
Escaping regex string
When I implement re.escape as suggested, I still get an error:
new_smiles=re.sub(re.escape('*:1'),'Ar',smiles_all)
C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([Ar0])C(=N4)C([*:3])=C5C([Ar1])=C([Ar2])C(=C2([*:4]))N5

Given:
smiles_all='C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
desired='C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
You are trying to replace the literal string [*:1] with [Ar]. In a regex, the expression [*:1] is a character class that matches a single one of the characters inside the class with one match. If you add any regex repetition to a character class, it will match those characters in any order up to the repetition limit.
The easiest way to to replace the literal [*:1] with [Ar] is to use Python's string methods:
>>> smiles_all.replace('[*:1]','[Ar]')==desired
True
If you want to use a regex, you need to escape those metacharaters to get a literal string:
>>> re.sub(r'\[\*:1\]', "[Ar]", smiles_all)==desired
True
Or let Python do the escaping for you:
>>> re.sub(re.escape(r'[*:1]'), "[Ar]", smiles_all)==desired
True

You can try:
re.sub(r"[*:]+1(?=])", "Ar", smiles_all)
Difference from yours is to allow 1+ repetitions of literal * and : followed by 1 which is also ensured to be followed by a ] via the ?=, i.e., positive lookahead.
to get
"C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5"

RegEx for a delimited string

I have a string like this '432342:username:full_name:1'. I need to write regular expression to check if string matches it.
I tried to .split(':') and then by accesing dict[i] checking if value in regular expression. But I need to match whole string.
only numbers:english letters and numbers:english, russian letters:1,2,3
Also tried like this but I don't understand how to add ':' separator to separate the string. Like in example above
pattern = r'[/b:]|[\d]|[a-zA-Z]|[а-яА-Я]|[1,2,3]'

As per your instructions, try this:
s = '432342:username:full_name:1'
re.findall(r'[0-9]+:[a-zA-Z]+:[а-яА-Я_]+:[123]',s)
#['432342:username:full_name:1']

Regex in python, repeated fragment finding

I try find in text using regex the elements like this: abs=abs , 1=1 etc.
i wrote this i this way:
opis="Some text abs=abs sfsdvc"
wyn=re.search('([\w]*)=\1',opis)
print(wyn.group(0))
And this find nothing, when i tried this code in the websites like www.regexr.com it was working correctly.
Am I doing something wrong in python re ?

You must specify the regex as raw string r'..'
>>> opis="Some text abs=abs sfsdvc"
>>> wyn=re.search(r'([\w]*)=\1',opis)
>>> print wyn.group(0)
abs=abs
From re documentation
Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:
Meaning, if you are not planing to use raw string, then all the \ in the string must be escaped as
>>> opis="Some text abs=abs sfsdvc"
>>> wyn=re.search('([\\w]*)=\\1',opis)
>>> print wyn.group(0)
abs=abs

Change your regex to:
re.search(r'(\w+)=\1', opis).group()
↑
Note that you don't really need character class here, the [ and ] are redundant, also it's better to have \w+ if you don't want to match the string "=" (lonely equal sign).

Using Python regex to sanitize input string

I have the string:
text = 'href = "www.google.com" onmouseover = blahblah >'
I want 'href = "www.google.com">'
Currently, my function looks like this:
text = re.sub(r'href = \".*\".*>', 'href = \".*\">', text)
which ends up removing the website link and replacing it with the string '.*' . I think I'm supposed to use ?Pname somehow?, but do not know ho to write it properly so that I get the correct output.

You don't want to substitute in .*, you want to substitute in whatever the first .* matched.
To do that, you need a backreference, like \1.
And this means you need something for the backreference to refer back to—a capture group, like (.*) instead of .*.
More generally, the replacement string is not a regular expression, it's a different kind of thing—basically, it's a template that's all literal characters except for backreferences.* So, you don't want to try to escape the quotes, unless you want literal backslashes in the results.
So:
>>> re.sub(r'href = \"(.*)\".*>', r'href = "\1">', text)
'href = "www.google.com">'
This is explained in more detail in Search and Replace in the Regular Expression HOWTO.
* Or it can be a function which takes each match object and returns a string.

An alternative way to accomplish your goal is to take a substring. No regular expression is needed. The idea is to find the second double-quote character using the string method index().
For a string called input, this expression gives you the position of the second double-quote character:
input.index('"', input.index('"')+1)
If that value is k, write input[:k+1] to extract everything up to and including the second double-quote character.
Try out the following in your Python interpreter.
input = 'href = "www.google.com" onmouseover=hax0rFunction()>'
k = input.index('"', input.index('"')+1)
input[0:k+1]

Use string as input to re.compile

I want to use a variable in a regex, like this:
variables = ['variableA','variableB']
for i in range(len(variables)):
regex = r"'('+variables[i]+')[:|=|\(](-?\d+(?:\.\d+)?)(?:\))?'"
pattern_variable = re.compile(regex)
match = re.search(pattern_variable, line)
The problem is that python adds an extra backslash character for each backslash character in my regex string (ipython), and makes my regex invalid:
In [76]: regex
Out[76]: "'('+variables[i]+')[:|=|\\(](-?\\d+(?:\\.\\d+)?)(?:\\))?'"
Any tips on how I can avoid this?

No, it only displays extra backslashes so that the string could be read in again and have the correct number of backslashes. Try
print regex
and you will see the difference.

There is no problem there. What you're seeing is the output of the repr() of the string. Since the repr is supposed to be more-or-less reversible back into the original object, it doubles up all backslashes, as well as escaping the type of quote used at the ends of the repr.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace text between parentheses in python - python

Related

re.sub for string starting with special character

RegEx for a delimited string

Regex in python, repeated fragment finding

Using Python regex to sanitize input string

Use string as input to re.compile

Categories

Resources