This question already has an answer here:
python re.sub group: number after \number
(1 answer)
Closed 4 years ago.
Here is my string
string = '03/25/93 Total time of visit (in minutes)'
I want to match '03/25/93' and replace it with '03/25/1993'. Currently I'm trying this
re.sub(r'(\d?\d/\d?\d/)(\d\d)', r'\119\2', string)
But apparently the '19' between '\1' and '\2' causes some errors. Is there a way to modify this method?
In that case you need to use the syntax \g<group>
Code
import re
string = '03/25/93 Total time of visit (in minutes)'
res = re.sub(r'(\d?\d/\d?\d/)(\d\d)', r'\g<1>19\2', string)
print(res)
Output
'03/25/1993 Total time of visit (in minutes)'
Taken from the docs
In string-type repl arguments, in addition to the character escapes and backreferences described above, \g will use the substring matched by the group named name, as defined by the (?P...) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE
Take a look at the official documentation of re.sub for better understanding
Related
This question already has answers here:
Regular Expressions- Match Anything
(17 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 2 years ago.
Following is a simple piece of code about regex match:
import re
pattern = ".*"
s = "ab"
print(re.search(pattern, s))
output:
<_sre.SRE_Match object; span=(0, 2), match='ab'>
My confusion is "." matches any single character, so here it's able to match "a" or "b" , then with a "*" behind it, this combo should be able to match "" "a" or "aa" or "aaa..." or "b" or "bb" or "bbb..." or other single characters that repeat for several times.
But how comes it(".*") matches "ab" the same time?
The comments more or less covered it, but to provide an answer: the pattern .* means to match any character . zero or more times *. And by default, a regex is greedy so when presented with 'abc', even though '' would satisfy that rule, or 'a' would, etc., it will match the entire string, since matching all of it still meets the requirement.
It does not mean to match the same character zero or more times. Every character it matches can be a different character or the same as a previously matched one.
If instead you want to match any character, but match as many of that same character as possible, zero or more times, you can use:
(.)?\1*
See here https://regex101.com/r/FgvuX2/1 and here https://regex101.com/r/FgvuX2/2
What this effectively does, is match a single character optionally, creating a back reference which can be used in the second part of the expression. Thus it matches any single character (if there is one) to group 1 and matches that group 1 zero or more times, being greedy.
This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 3 years ago.
I have a string which contains the number of processors:
SQLDB_GP_Gen5_2
The number is after _Gen and before _ (the number 5). How can I extract this using python and regular expressions?
I am trying to do it like this but don't get a match:
re.match('_Gen(.*?)_', 'SQLDB_GP_Gen5_2')
I was also trying this using pandas:
x['SLO'].extract(pat = '(?<=_Gen).*?(?:(?!_).)')
But this also wasn't working. (x is a Series)
Can someone please also point me to a book/tutorial site where I can learn regex and how to use with Pandas.
Thanks,
Mick
re.match searches from the beginning of the string. Use re.search instead, and retrieve the first capturing group:
>>> re.search(r'_Gen(\d+)_', 'SQLDB_GP_Gen5_2').group(1)
'5'
You need to use Series.str.extract with a pattern containing a capturing group:
x['SLO'].str.extract(r'_Gen(.*?)_', expand=False)
^^^^ ^^^^^^^^^^^
To only match a number, use r'_Gen(\d+)_'.
NOTES:
With Series.str.extract, you need to use a capturing group, the method only returns any value if it is captured
r'_Gen(.*?)_' will match _Gen, then will capture any 0+ chars other than line break chars as few as possible, and then match _. If you use \d+, it will only match 1+ digits.
Using re :
re.findall(r'Gen(.*)_',text)[0]
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What does this regex mean? I know the functionality of re.sub but unable to figure out the 2nd part:
s = re.sub(r'\.([a-zA-Z])', r'. \1', s)
^^^^^^^
Can someone explain me the underlined part?
Next time it you should mention which programming language you are using, because regular expression syntaxes are very different from one language to another. Also when using regular expressions to replace something, then usually the second argument isn't a regular expression, but just a string with a special syntax, so knowing the programming language would help with that, too.
\1 is a back reference to what the first capturing group (expression in parentheses) matched.
So \.([a-zA-Z]) matches a period followed by a letter, and that letter is captured (stored/saved/remembered) because it surrounded by parentheses and use at the place of \1. The period and the letter is then replaced with a period, a space and that letter.
Examples:
.H becomes . H.
This.is.a.Test becomes This. is. a. Test
This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
i saw a regular expression (?= (?:\d{5}|[A-Z]{2})) in a python re example, and was very confused about the meaning of the ?: .
I also see the python doc, there is the explain:
(?:...)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
who can give me an example, and explain why it works, thanks!!
Ordinarily, parentheses create a "capturing" group inside your regex:
regex = re.compile("(set|let) var = (\\w+|\\d+)")
print regex.match("set var = 12").groups()
results
('set', '12')
Later you can retrieve those groups by calling .groups() method on the result of a match. As you see whatever is inside parentheses is captured in "groups." But you might not care about all those groups. Say you only want to find what's in the second group and not the first. You need the first set of parentheses in order to group "get" and "set" but you can turn off capturing by putting "?:" at the beginning:
regex = re.compile("(?:set|let) var = (\\w+|\\d+)")
print regex.match("set var = 12").groups()
results:
('12',)
If you do not need the group to capture its match, you can optimize
this regular expression into Set(?:Value)?. The question mark and the
colon after the opening parenthesis are the syntax that creates a
non-capturing group. The question mark after the opening bracket is
unrelated to the question mark at the end of the regex. The final
question mark is the quantifier that makes the previous token
optional. This quantifier cannot appear after an opening parenthesis,
because there is nothing to be made optional at the start of a group.
Therefore, there is no ambiguity between the question mark as an
operator to make a token optional and the question mark as part of the
syntax for non-capturing groups, even though this may be confusing at
first. There are other kinds of groups that use the (? syntax in
combination with other characters than the colon that are explained
later in this tutorial.
color=(?:red|green|blue) is another regex with a non-capturing group.
This regex has no quantifiers.
From : http://www.regular-expressions.info/brackets.html
Also read: What is a non-capturing group? What does a question mark followed by a colon (?:) mean?
This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 1 year ago.
I have something like
store(s)
ending line like "1 store(s)".
I want to match it using Python regular expression.
I tried something like re.match('store\(s\)$', text)
but it's not working.
This is the code I tried:
import re
s = '1 store(s)'
if re.match('store\(s\)$', s):
print('match')
In more or less direct reply to your comment
Try this
import re
s = '1 stores(s)'
if re.match('store\(s\)$',s):
print('match')
The solution is to use re.search instead of re.match as the latter tries to match the whole string with the regexp while the former just tries to find a substring inside of the string that does match the expression.
Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default)
Straight from the docs, but it does come up alot.
have you considered re.match('(.*)store\(s\)$',text) ?