Python3: complicated regex [duplicate]

Python3: complicated regex [duplicate] - python

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I'm trying to build a complicated regex. I want to match a regex of the following structure:
.+ (any character, at least once)
either "del" or "ins" or "dup" or [ATGC]
.* (string ends or is followed by whatever)
I have tried different things and at the moment I am here, which doesn't work:
hgvs = "c.*1017delT"
a = re.match('(.*)(del|ins|dup|[ATGC]).*', hgvs)
a.groups()
('c.*1017del', 'T')
I expect to catch everything before the "del" with "(.*)". But he seems to apply the [ATGC] match over the del match.

Try non-greedy match:
re.match('(.*?)(del|ins|dup|[ATGC]).*', hgvs)
^
With the non-greedy qualifier, the first .*? will match as few as possible.
P.S. If you learn more regex, you won't think this one is "complex" because there are far more really complex regex syntax.

Related

What difference does round brackets in regular expression make? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am currently going through pythonchallenge.com, and now trying to make a code that searches for a lowercase letter with exactly three uppercase letters on both side of it. Then I got stuck on trying to make a regular expression for it. This is what I have tried:
import re
#text is in https://pastebin.com/pAFrenWN since it is too long
p = re.compile("[^A-Z]+[A-Z]{3}[a-z][A-Z]{3}[^A-Z]+")
print("".join(p.findall(text)))
This is what I got with it:
dqIQNlQSLidbzeOEKiVEYjxwaZADnMCZqewaebZUTkLYNgouCNDeHSBjgsgnkOIXdKBFhdXJVlGZVme
gZAGiLQZxjvCJAsACFlgfe
qKWGtIDCjn
I later searched for the solution, which had this regular expression:
p = re.compile("[^A-Z]+[A-Z]{3}([a-z])[A-Z]{3}[^A-Z]+")
So there is a bracket around [a-z], and I couldn't figure out what difference it makes. I would like some explanation on this.

Use Parentheses for Grouping and Capturing By placing part of a
regular expression inside round brackets or parentheses, you can group
that part of the regular expression together. This allows you to apply
a quantifier to the entire group or to restrict alternation to part of
the regex.
https://www.regular-expressions.info/brackets.html
Basicly the regex engine can find a list of strings matching the whole search pattern, and return you the parts inside the ().

Remove everything after regex pattern match but keep pattern [duplicate]

This question already has answers here:
Using regex to remove all text after the last number in a string
(2 answers)
Closed 4 years ago.
I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.
ive tried using:
test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)
but still get the same string ..
example:
dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'
Desired output:
clean = 'AA1001'

You can use re.match() instead of re.sub():
re.match('\w\w\d\d\d\d', dirty).group(0) # returns 'AA1001'
Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().

Difference between r'^specific expression$' and r'specific expression' [duplicate]

This question already has answers here:
What is the need for caret (^) and dollar symbol ($) in regular expression?
(5 answers)
Closed 4 years ago.
My doubt is that I came across a regex which checks whether a password is strong or not. What is the impact of ^ and $ in this expression.
a = compile(r'^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}$')
It has ^ and $ signs in it. But the below code works the same as above.
a = compile(r'(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}')
If so why are they been used in the above code. Or is there reason for its usage. Thanks in advance!

The ^ means “beginning of a line” and the $ means “end of a line”.
In your case, every match is a line so you don't have any problem.

^ is followed by the string or pattern by which the string will be started with and $ follows the string or pattern by which the string will be ended with. For your case your regex matched with the pattern of the string without regarding the starting or ending portion.

understanding this python regular expression re.compile(r'[ :]') [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.

The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.

The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html

matching parentheses in python regular expression [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 1 year ago.
I have something like
store(s)
ending line like "1 store(s)".
I want to match it using Python regular expression.
I tried something like re.match('store\(s\)$', text)
but it's not working.
This is the code I tried:
import re
s = '1 store(s)'
if re.match('store\(s\)$', s):
print('match')

In more or less direct reply to your comment
Try this
import re
s = '1 stores(s)'
if re.match('store\(s\)$',s):
print('match')
The solution is to use re.search instead of re.match as the latter tries to match the whole string with the regexp while the former just tries to find a substring inside of the string that does match the expression.

Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default)
Straight from the docs, but it does come up alot.

have you considered re.match('(.*)store\(s\)$',text) ?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python3: complicated regex [duplicate] - python

Try non-greedy match: re.match('(.?)(del|ins|dup|[ATGC]).', hgvs) ^ With the non-greedy qualifier, the first .*? will match as few as possible. P.S. If you learn more regex, you won't think this one is "complex" because there are far more really complex regex syntax.

Related

What difference does round brackets in regular expression make? [duplicate]

Remove everything after regex pattern match but keep pattern [duplicate]

Difference between r'^specific expression$' and r'specific expression' [duplicate]

understanding this python regular expression re.compile(r'[ :]') [duplicate]

matching parentheses in python regular expression [duplicate]

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python3: complicated regex [duplicate] - python

Try non-greedy match: re.match('(.*?)(del|ins|dup|[ATGC]).*', hgvs) ^ With the non-greedy qualifier, the first .*? will match as few as possible. P.S. If you learn more regex, you won't think this one is "complex" because there are far more really complex regex syntax.

Related

What difference does round brackets in regular expression make? [duplicate]

Remove everything after regex pattern match but keep pattern [duplicate]

Difference between r'^specific expression$' and r'specific expression' [duplicate]

understanding this python regular expression re.compile(r'[ :]') [duplicate]

matching parentheses in python regular expression [duplicate]

Categories

Resources

Try non-greedy match: re.match('(.?)(del|ins|dup|[ATGC]).', hgvs) ^ With the non-greedy qualifier, the first .*? will match as few as possible. P.S. If you learn more regex, you won't think this one is "complex" because there are far more really complex regex syntax.