understanding this python regular expression re.compile(r'[ :]') [duplicate] - python

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.

The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.

The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html

Related

Regular Expression Python, space and plus [duplicate]

This question already has answers here:
How to parse for tags with '+' in python
(2 answers)
What special characters must be escaped in regular expressions?
(13 answers)
Closed 3 years ago.
I would like to get the regular expression of this kind of expressions
title="+34 952387749
title="+34 123456789
But I've got a problem with the space and +. As far I got this piece of code but I don't find how to express the space and +
^title="+34' space [0-9]{9}
Many thanks for your help !
Escape the special characters like space and + using .
In RegEx, space is denoted by "\s".
RegEx for your expression would be:
\+34\s\d{9}
\d is used to denote digit.
You need backslashed to specify that you specifically want to search for the +. Your regex would be:
\+34 [0-9]{9}
Try use this regex:
^title="\+\d{2} \d{9}$

Remove everything after regex pattern match but keep pattern [duplicate]

This question already has answers here:
Using regex to remove all text after the last number in a string
(2 answers)
Closed 4 years ago.
I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.
ive tried using:
test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)
but still get the same string ..
example:
dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'
Desired output:
clean = 'AA1001'
You can use re.match() instead of re.sub():
re.match('\w\w\d\d\d\d', dirty).group(0) # returns 'AA1001'
Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().

Python3: complicated regex [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I'm trying to build a complicated regex. I want to match a regex of the following structure:
.+ (any character, at least once)
either "del" or "ins" or "dup" or [ATGC]
.* (string ends or is followed by whatever)
I have tried different things and at the moment I am here, which doesn't work:
hgvs = "c.*1017delT"
a = re.match('(.*)(del|ins|dup|[ATGC]).*', hgvs)
a.groups()
('c.*1017del', 'T')
I expect to catch everything before the "del" with "(.*)". But he seems to apply the [ATGC] match over the del match.
Try non-greedy match:
re.match('(.*?)(del|ins|dup|[ATGC]).*', hgvs)
^
With the non-greedy qualifier, the first .*? will match as few as possible.
P.S. If you learn more regex, you won't think this one is "complex" because there are far more really complex regex syntax.

Understanding Regex Expressions in Python [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am a beginner in regular expressions in python, and I was hoping to understand the following line of code:
HTML_TAG_REGEX = re.compile(r'<[^>]*>', re.IGNORECASE)
I know that re.compile creates a regular expression object, and that the 'r' tells python we're dealing with a regular expression; however, I was hoping someone could explain what's going on with the rest of the code and specifically the usage of the less than/greater than signs. Thank you!
Your expression:
matches a "<" character
Then matches 0 or more characters that are not ">"
matches a ">" the end of the pattern
As pointed above, the r before the string means raw string, not regular expression.
You can use a regex translator to get these details.

Explain the regular expression [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What does this regex mean? I know the functionality of re.sub but unable to figure out the 2nd part:
s = re.sub(r'\.([a-zA-Z])', r'. \1', s)
^^^^^^^
Can someone explain me the underlined part?
Next time it you should mention which programming language you are using, because regular expression syntaxes are very different from one language to another. Also when using regular expressions to replace something, then usually the second argument isn't a regular expression, but just a string with a special syntax, so knowing the programming language would help with that, too.
\1 is a back reference to what the first capturing group (expression in parentheses) matched.
So \.([a-zA-Z]) matches a period followed by a letter, and that letter is captured (stored/saved/remembered) because it surrounded by parentheses and use at the place of \1. The period and the letter is then replaced with a period, a space and that letter.
Examples:
.H becomes . H.
This.is.a.Test becomes This. is. a. Test

Categories

Resources