Numeric value directly after backreference [duplicate] - python

This question already has an answer here:
python re.sub group: number after \number
(1 answer)
Closed 6 years ago.
I'm trying to use re.sub on a string with a numeric value directly after a numeric backreference. That is, if my replacement value is 15.00 and my backreference is \1, my replacement string would look like:
\115.00, which as expected will throw an error: invalid group reference because it thinks my backreference group is 115.
Example:
import re
r = re.compile("(Value:)([-+]?[0-9]*\.?[0-9]+)")
to_replace = "Value:0.99" # Want Value:1.00
# Won't work:
print re.sub(r, r'\11.00', to_replace)
# Will work, but don't want the space
print re.sub(r, r'\1 1.00', to_replace)
Is there a solution that doesn't involve more than re.sub?

Use an unambiguous backreference syntax \g<1>. See re.sub reference:
\g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'.
See this regex demo.
Python demo:
import re
r = re.compile("(Value:)([-+]?[0-9]*\.?[0-9]+)")
to_replace = "Value:0.99" # Want Value:1.00
print(re.sub(r, r'\g<1>1.00', to_replace))
# => Value:1.00

Related

How to find values in specific format including parenthesis using regular expression in python [duplicate]

This question already has answers here:
Regular expression to return text between parenthesis
(11 answers)
Closed 2 years ago.
I have long string S, and I want to find value (numeric) in the following format "Value(**)", where ** is values I want to extract.
For example, S is "abcdef Value(34) Value(56) Value(13)", then I want to extract values 34, 56, 13 from S.
I tried to use regex as follows.
import re
regex = re.compile('\Value(.*'))
re.findall(regex, S)
But the code yields the result I did not expect.
Edit. I edited some mistakes.
You should escape the parentheses, correct the typo of Value (as opposed to Values), use a lazy repeater *? instead of *, add the missing right parenthesis, and capture what's enclosed in the escaped parentheses with a pair of parentheses:
regex = re.compile(r'Value\((.*?)\)')
Only one of your numbers follows the word 'Value', so you can extract anything inside parentheses. You also need to escape the parentheses which are special characters.
regex = re.compile('\(.*?\)')
re.findall(regex, S)
Output:
['(34)', '(56)', '(13)']
I think what you're looking for is a capturing group that can return multiple matches. This string is: (\(\d{2}\))?. \d matches an digit and {2} matches exactly 2 digits. {1,2} will match 1 or 2 digits ect. ? matches 0 to unlimited number of times. Each group can be indexed and combined into a list. This is a simple implementation and will return the numbers within parentheses.
eg. 'asdasd Value(102222), fgdf(20), he(77)' will match 20 and 77 but not 102222.

Replacing string between two characters in the file

I have a string in my proprieties file as below:
line = "variables=ORACLE_BASE_HOME=/u02/test/oracle/landscape/1/db_50,DB_UNIQUE_NAME=cdms,ORACLE_BASE=//u02/test,PDB_NAME=,DB_NAME=cdms,ORACLE_HOME=/u02/test/product/19/db_21,SID=ss"
I would like to replace the following string with a different value:
DB_NAME=cdms -> DB_NAME=abc
I have the code below, however, it seems not doing as expected:
f = fileinput.FileInput(rsp_file_path)
for line in f:
re.sub(",DB_NAME=(.*?),", "abc", line, flags=re.DOTALL)
f.close()
It should be:
re.sub("(,DB_NAME=)(.*?),", "\g<1>abc,", line, flags=re.DOTALL)
or using raw string:
re.sub(r"(,DB_NAME=)(.*?),", r"\1abc,", line, flags=re.DOTALL)
That's because the documentation for re.sub() states:
In string-type repl arguments, in addition to the character escapes
and backreferences described above, \g will use the substring
matched by the group named name, as defined by the (?P...)
syntax. \g uses the corresponding group number; \g<2> is
therefore equivalent to \2, but isn’t ambiguous in a replacement such
as \g<2>0. \20 would be interpreted as a reference to group 20, not a
reference to group 2 followed by the literal character '0'. The
backreference \g<0> substitutes in the entire substring matched by the
RE.
In your case (,DB_NAME=) is the first captured group which you refer to with \g<1>.
you can use use string.replace()
s.replace('DB_NAME', 'cdms', 1).replace('DB_NAME', 'abc', 1)

python - regex why does `findall` find nothing, but `search` works? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 5 years ago.
>>> reg = re.compile(r'^\d{1,3}(,\d{3})*$')
>>> str = '42'
>>> reg.search(str).group()
'42'
>>> reg.findall(str)
['']
>>>
python regex
Why does reg.findall find nothing, but reg.search works in this piece of code above?
When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
​# '42'
reg.findall(s)
# ['42']

My regular expression is not getting matched exactly in python [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 6 years ago.
Here's my code...
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]',i,re.M|re.I)
if matchobj:
print(i)
as I have mentioned chap[0-9].. so it should only those strings which follow only one integer after chap
so I should get the following output..
chap3
chap2
chap4
but I am getting the following output...
chap11
chap3
chap2
chap4
chap55
chap33
chap54
match matches your pattern at the beginning of the string. Append e.g. end of string '$' or word boundary '\b' to your pattern:
matchobj=re.match(r'chap\d$',i,re.M|re.I)
# \d (digit) is shortcut for [0-9]
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
You should add a dollar sign to the end of your regex expression. The dollar ($) means the end of the string, and for future reference, the carat (^) signifies the beginning.
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]$',i,re.M|re.I)
if matchobj:
print(i)
Output
chap3
chap2
chap4

Search preceding and following characters of re [duplicate]

This question already has an answer here:
Python regex alternation
(1 answer)
Closed 8 years ago.
I am trying to find the characters immediately before and after a regex match in a given string. This is the code.
>>>import re
>>>s='dafddadffdbdasbffsbbfdbabbfsdfadsfdfddf' #completely garbage test string
>>>re.findall('.{0,5}(abb).{0,5}',s)
['abb']
The test string has an occurence of 'abb' here ...fdbabbfsd... I am under the impression that the special character . matches any character other than \n and the {m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible as stated here
So I expect my re to return ['bbfdbabbfsdfa'] and not just ['abb']. What am I missing?
It's because of the capturing group. Just move the parentheses:
re.findall('(.{0,5}abb.{0,5})',s)
findall only matches groups, so everything you want to match needs to be in the parentheses.
According to re.findall documentation:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
So by surrounding whole pattern as a group or removing group will give you what you want.
>>> re.findall('(.{0,5}abb.{0,5})',s) # Entire pattern as a group
['bbfdbabbfsdfa']
>>> re.findall('.{0,5}abb.{0,5}',s) # No capturing group
['bbfdbabbfsdfa']

Categories

Resources