Compilation of complex regex expressions [duplicate] - python

This question already has answers here:
String formatting in Python [duplicate]
(14 answers)
Closed 2 years ago.
I'm trying to understand the following code related to complex regex.
I do not understand how the full_regex line operates? What is the use of the '%s' as well as the other % before the (regex1, regex2...)
Can someone please help with this?
regex1 = '(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})'
regex2 = '((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\S]*[+\s]\d{1,2}[,]{0,1}[+\s]\d{4})'
regex3 = '(\d{1,2}[+\s](?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\S]*[+\s]\d{4})'
regex4 = '((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\S]*[+\s]\d{4})'
regex5 = '(\d{1,2}[/-][1|2]\d{3})'
regex6 = '([1|2]\d{3})'
full_regex = '(%s|%s|%s|%s|%s|%s)' %(regex1, regex2, regex3, regex4, regex5, regex6)

The expression
full_regex = '(%s|%s|%s|%s|%s|%s)' % (regex1, regex2, regex3, regex4, regex5, regex6)
just merges all of the other regexps into one big one that alternates between all of them; that's not regex syntax, it's just Python string interpolation.

Related

How to replace several overlapped characters with one in Python [duplicate]

This question already has answers here:
How do I coalesce a sequence of identical characters into just one?
(10 answers)
Closed 2 years ago.
I have a string, something like that (I don't know in advance how much similar characters in a sequence):
s = '&&&&&word&&&word2&&&'
and would like to obtain as a result this string:
'&word&word2&'
Workaround is something like this (not effective I guess for large texts):
while True:
if not '&&' in s:
break
s = s.replace('&&','&')
You can use a regex to replace any occurence of one or more '&' (&+) by '&':
import re
s = '&&&&&word&&&word2&&&'
res = re.sub(r'&+', '&', s)
print(res)
# &word&word2&

How to find 6 digits in a string with a particular pattern on the first 3 digits in Python? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I am trying to find a regular expression to return to me the entire 6 digits with the first 3 digits as a pattern/fixed.
Ex:
import re
string_ex = 'docs/data/622999/2013904065003_file.bin'
re.findall(r'622(\d{3})',string_ex)
results in just ['999']
but I want the result to be ['622999']
Thanks!
You should include 622 too within the parenthesis
>>> import re
>>> string_ex = 'docs/data/622999/2013904065003_file.bin'
>>> re.findall(r'(622\d{3})',string_ex)
['622999']
You can use "index" on the string directly.
i = string_ex.index("622")
found = string_ex[i-3:i+2]
https://www.tutorialspoint.com/python/string_index.htm

What is wrong with this Python regular expression? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 4 years ago.
Given a string, I want to find all the substrings consisting of two or three '4,'.
For example, given '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
I want to get ['4,4,', '4,4,', '4,4,4'].
str_ = '1,4,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,3,2,'
m = re.findall(r"(4,){2,3}", str_)
what I get is :
['4,', '4,', '4,', '4,']
what's wrong?
It seems to me that the parenthesis wrapping '4,' is interpreted as grouping but not telling Python '4' and ',' should occur together. However, I don't know how to do this.
Just use non-capturing group (online version of this regex here):
import re
s = '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
print(re.findall(r'(?:4,?){2,3}', s))
Prints:
['4,4,', '4,4,', '4,4,4,']
EDIT:
Edited regex to capture 2 or 3 elements "4,"

re.findall only finding half the patterns [duplicate]

This question already has answers here:
Why doesn't [01-12] range work as expected?
(7 answers)
Closed 4 years ago.
I'm using re.findall to parse the year and month from a string, however it is only outputting patterns from half the string. Why is this?
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('[1-2][0-9][0-9][0-9]-[1-12]', date_string)
print(find_year_and_month)
and my output is this:
['2011-1', '2012-1']
This is the current output for those dates but why am I only getting pattern matching for half the string?
[1-12] doesn't do what you think it does. It matches anything in the range 1 to 1, or it matches a 2.
See this question for some replacement regex options, like ([1-9]|1[0-2]): How to represent regex number ranges (e.g. 1 to 12)?
If you want an interactive tool for experimenting with regexes, I personally recommend Regexr.
Adjust your regex pattern as shown below:
import re
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('([1-2][0-9]{3}-(?:1[0-2]|[1-9]))', date_string)
print(find_year_and_month)
The output:
['2011-1', '2012-1', '2015-3', '2015-3']

python re.search matches too much [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 7 years ago.
import re
text = '"dimensionsDisplay" : ["Size","Color"], '
r = '"dimensionsDisplay" :(.*)?,'
s = re.search(r,text)
print s.group(1)
the output is :
' ["Size","Color"]'
Although it is the answer what I want , but I think it's should be:
' ["Size",'
I am puzzled about this. Is there anybody tell my why ?
r = '"dimensionsDisplay" :(.*?),'
You need to make your quantifier non greedy.? after (.*) makes it optional.But it will consume till the last , as it is greedy

Categories

Resources