Regex to find specific number using Python regex - python

I need a regex to find the maxtimeout value (40 in the following) in the RequestReadTimeout directive in Apache config. file, for example :
RequestReadTimeout header=XXX-40,MinRate=XXX body=XXX
RequestReadTimeout header=40 body=XXX
PS : XXX refer to a decimal digit
I used this :
str="RequestReadTimeout header=10-40,MinRate=10 body=10"
re.search(r'header=\d+[-\d+]*', str).group()
'header=10-40'
But I need a regex to get only the maxtimeout value (40 in this example) in one row (without using other function like spit("-")[1] ...etc).
Thanks.

You'd group the part you wanted to extract:
re.search(r'header=(?:\d*-)?(\d+)', inputstr).group(1)
The (...) marks a group, and positional groups like that are numbered starting at 1.
I altered your expression a little to only capture the number after an optional non-capturing group containing digits and a dash, to match both patterns you are looking for. The (?:...) is a non-capturing group; it doesn't store the matched text in a group, but does let you use the ? quantifier on the group to mark it optional.
Pythex demo.
Python session:
>>> import re
>>> for inputstr in ('RequestReadTimeout header=1234-40,MinRate=XXX body=XXX', 'RequestReadTimeout header=40 body=XXX'):
... print re.search(r'header=(?:\d*-)?(\d+)', inputstr).group(1)
...
40
40

You could do it with the following regex:
'RequestReadTimeout\sheader=(?:\d+)?-?(\d+).*'
The first captured group \1 is what you want
Demo: http://regex101.com/r/cD6hY0

Related

python regex return non-capturing group

I want to generate a username from an email with :
firstname's first letter
lastname's first 7 letters
eg :
getUsername("my-firstname.my-lastname#email.com")
mmylastn
Here is getUsername's code :
def getUsername(email) :
re.match(r"(.){1}[a-z]+.([a-z]{7})",email.replace('-','')).group()
email.replace('-','') to get rid of the - symbol
regex that captures the 2 groups I discribed above
If I do .group(1,2) I can see the captured groups are m and mylastn, so it's all good.
But using .group() doesn't just return the capturing group but also everthing between them : myfirstnamemlastn
Can someone explain me this behavior ?
First of all, a . in a pattern is a metacharacter that matches any char excluding line break chars. You need to escape the . in the regex pattern
Also, {1} limiting quantifier is always redundant, you may safely remove it from any regex you have.
Next, if you need to get a mmylastn string as a result, you cannot use match.group() because .group() fetches the overall match value, not the concatenated capturing group values.
So, in your case,
Check if there is a match first, trying to access None.groups() will throw an exception
Then join the match.groups()
You can use
import re
def getUsername(email) :
m = re.match(r"(.)[a-z]+\.([a-z]{7})",email.replace('-',''))
if m:
return "".join(m.groups())
return email
print(getUsername("my-firstname.my-lastname#email.com"))
See the Python demo.

Python Regex selecting first option

I have the following regex that looks for the string 191(x)(y) and (z) and combinations of this (for example - 191(x) , 191(x) and (z).
My regular expression is:
(191?(?:\w|\(.{0,3}\)(?:( (and)?|-)*)){0,5})
See the regex demo.
This expression works for the most part I need help with the following (which I can't figure out):
While I do get 5 matches, there are 3 groups, I need to limit the result to only the first group.
If I have the text: '191Transit', the regex should only match 191 and ignore the word 'Transit'. in this case it's 'Transit' in other examples this could be any word e.g: 191Bob, 191Smith
I am using Python 3.6.
You can use
191?(?:\([^()]{0,3}\)(?: (?:and)?|-)*){0,5}
See the regex demo
Details
Replace .{0,3} to [^()]{0,3} to stay within parentheses
Remove one group around ( (?:and)?|-)* as it's redundant
Change the groups to non-capturing, i.e. (...) to (?:...)
Remove \w alternative, it matches any word char and thus matches 0 to 5 first letters/digits/underscores after 191

Python Regex Find match group of range of non digits after hyphen and if range is not present ignore rest of pattern

I'm newer to more advanced regex concepts and am starting to look into look behinds and lookaheads but I'm getting confused and need some guidance. I have a scenario in which I may have several different kind of release zips named something like:
v1.1.2-beta.2.zip
v1.1.2.zip
I want to write a one line regex that can find match groups in both types. For example if file type is the first zip, I would want three match groups that look like:
v1.1.2-beta.2.zip
Group 1: v1.1.2
Group 2: beta
Group 3. 2
or if the second zip one match group:
v1.1.2.zip
Group 1: v1.1.2
This is where things start getting confusing to me as I would assume that the regex would need to assert if the hyphen exists and if does not, only look for the one match group, if not find the other 3.
(v[0-9.]{0,}).([A-Za-z]{0,}).([0-9]).zip
This was the initial regex I wrote witch successfully matches the first type but does not have the conditional. I was thinking about doing something like match group range of non digits after hyphen but can't quite get it to work and don't not know to make it ignore the rest of the pattern and accept just the first group if it doesn't find the hyphen
([\D]{0,}(?=[-]) # Does not work
Can someone point me in the right right direction?
You can use re.findall:
import re
s = ['v1.1.2-beta.2.zip', 'v1.1.2.zip']
final_results = [re.findall('[a-zA-Z]{1}[\d\.]+|(?<=\-)[a-zA-Z]+|\d+(?=\.zip)', i) for i in s]
groupings = ["{}\n{}".format(a, '\n'.join(f'Group {i}: {c}' for i, c in enumerate(b, 1))) for a, b in zip(s, final_results)]
for i in groupings:
print(i)
print('-'*10)
Output:
v1.1.2-beta.2.zip
Group 1: v1.1.2
Group 2: beta
Group 3: 2
----------
v1.1.2.zip
Group 1: v1.1.2.
----------
Note that the result garnered from re.findall is:
[['v1.1.2', 'beta', '2'], ['v1.1.2.']]
Here is how I would approach this using re.search. Note that we don't need lookarounds here; just a fairly complex pattern will do the job.
import re
regex = r"(v\d+(?:\.\d+)*)(?:-(\w+)\.(\d+))?\.zip"
str1 = "v1.1.2-beta.2.zip"
str2 = "v1.1.2.zip"
match = re.search(regex, str1)
print(match.group(1))
print(match.group(2))
print(match.group(3))
print("\n")
match = re.search(regex, str2)
print(match.group(1))
v1.1.2
beta
2
v1.1.2
Demo
If you don't have a ton of experience with regex, providing an explanation of each step probably isn't going to bring you up to speed. I will comment, though, on the use of ?: which appears in some of the parentheses. In that context, ?: tells the regex engine not to capture what is inside. We do this because you only want to capture (up to) three specific things.
We can use the following regex:
(v\d+(?:\.\d+)*)(?:[-]([A-Za-z]+))?((?:\.\d+)*)\.zip
This thus produces three groups: the first one the version, the second is optional: a dash - followed by alphabetical characters, and then an optional sequence of dots followed by numbers, and finally .zip.
If we ignore the \.zip suffix (well I assume this is rather trivial), then there are still three groups:
(v\d+(?:\.\d+)*): a regex group that starts with a v followed by \d+ (one or more digits). Then we have a non-capture group (a group starting with (?:..) that captures \.\d+ a dot followed by a sequence of one or more digits. We repeat such subgroup zero or more times.
(?:[-]([A-Za-z]+))?: a capture group that starts with a hyphen [-] and then one or more [A-Za-z] characters. The capture group is however optional (the ? at the end).
((?:\.\d+)*): a group that again has such \.\d+ non-capture subgroup, so we capture a dot followed by a sequence of digits, and this pattern is repeated zero or more times.
For example:
rgx = re.compile(r'(v\d+(?:\.\d+)*)([-][A-Za-z]+)?((?:\.\d+)*)\.zip')
We then obtain:
>>> rgx.findall('v1.1.2-beta.2.zip')
[('v1.1.2', '-beta', '.2')]
>>> rgx.findall('v1.1.2.zip')
[('v1.1.2', '', '')]

Regular expression with two non-repeating symbols in any order

I need to create the regex that will match such string:
AA+1.01*2.01,BB*2.01+1.01,CC
Order of * and + should be any
I've created the following regex:
^(([A-Z][A-Z](([*+][0-9]+(\.[0-9])?[0-9]?){0,2}),)*[A-Z]{2}([*+][0-9]+(\.[0-9])?[0-9]?){0,2})$
But the problem is that with this regex + or * could be used twice but I only need any of them once so the following strings matches should be:
AA+1*2,CC - true
AA+1+2,CC - false (now is true with my regex)
AA*1+2,CC - true
AA*1*2,CC - false (now is true with my regex)
Either of the [+*] should be captured first and then use negative lookahead to match the other one.
Regex: [A-Z]{2}([+*])(?:\d+(?:\.\d+)?)(?!\1)[+*](?:\d+(?:\.\d+)?),[A-Z]{2}
Explanation:
[A-Z]{2} Matches two upper case letters.
([+*]) captures either of + or *.
(?:\d+(?:\.\d+)?) matches number with optional decimal part.
(?!\1)[+*] looks ahead for symbol captured and matched the other one. So if + is captured previously then * will be matched.
(?:\d+(?:\.\d+)?) matches number with optional decimal part.
,[A-Z]{2} matches , followed by two upper case letters.
Regex101 Demo
To match the first case AA+1.01*2.01,BB*2.01+1.01,CC which is just a little advancement over previous pattern, use following regex.
Regex: (?:[A-Z]{2}([+*])(?:\d+(?:\.\d+)?)(?!\1)[+*](?:\d+(?:\.\d+)?),)+[A-Z]{2}
Explanation: Added whole pattern except ,CC in first group and made it greedy by using + to match one or more such patterns.
Regex101 Demo
To get a regex to match your given example, extended to an arbitrary number of commas, you could use:
^(?:[A-Z]{2}([+*])?\d*\.?\d*(?!\1)[+*]?\d*\.?\d*,?)*$
Note that this example will also allow a trailing comma. I'm not sure if there is much you can do about that.
Regex 101 Example
If the trailing comma is an issue:
^(?:[A-Z]{2}([+*])?\d*\.?\d*(?!\1)[+*]?\d*\.?\d*,?)*?(?:[A-Z]{2}([+*])?\d*\.?\d*(?!\2)[+*]?\d*\.?\d*?)$
Regex 101 Example

Matching sequentially repeated brackets with Python Regex

Basically I'm trying to find a series of consecutive repeating patterns using the python with the regex:
(X[0-9]+)+
For example, give the input string:
YYYX4X5Z3X2
Get a list of results:
["X4X5", "X2"]
However I am instead getting:
["X5", "X2"]
I have tested the regex on regexpal and verified that it is correct however, due to the way python treats "()" I am unable to get the desired result. Can someone advise?
Turn your capturing group into a non-capturing (?:...) group instead ...
>>> import re
>>> re.findall(r'(?:X[0-9]+)+', 'YYYX4X5Z3X2')
['X4X5', 'X2']
Another example:
>>> re.findall(r'(?:X[0-9]+)+', 'YYYX4X5Z3X2Z4X6X7X8Z5X9')
['X4X5', 'X2', 'X6X7X8', 'X9']
modify your pattern like so
((?:X[0-9]+)+)
Demo
( # Capturing Group (1)
(?: # Non Capturing Group
X # "X"
[0-9] # Character Class [0-9]
+ # (one or more)(greedy)
) # End of Non Capturing Group
+ # (one or more)(greedy)
) # End of Capturing Group (1)
You need to give in a non-capturing group (?:<pattern>) for the first pattern:
((?:X[0-9]+)+)

Categories

Resources