This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I am extracting a string and need to check whether it follows a particular pattern
<![any word]>
if so I need to replace it with "". I am trying the following code
string1 = "<![if support]> hello"
string = re.sub(re.compile("<![.*?]>"),"",string1)
print(string)
But I am getting the output as
<![if support]> hello
I wish to get the output as hello. What am I doing wrong here?
[ and ] are treated as meta characters in regex. You'll need to escape them:
In [1]: re.sub(re.compile("<!\[.*?\]>"), "", "<![if support]> hello")
Out[1]: ' hello'
As a simplification (courtesy Wiktor Stribiżew), you can escape just the first left paren, shortening your regex to "<!\[.*?]>".
Related
This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 1 year ago.
I'm using an API that returns § characters with a color code (1-9 or a-h) which I want to eliminate (§ and following character). Their purpose is for color formatting but I'm not using that and my method iterates through a string to remove them but could not find a better way and it fees too hacky and buggy. Is there like a parameter for the str.replace function that removes the letter after the found character?
You can "eliminate" the precise pattern with regular expressions using the sub method:
import re
def clean_string(s):
return re.sub(r"\$[1-9a-h]", "", s)
clean_string("foo $4 - bar $f")
# > 'foo - bar '
If you want more flexibility, you can match any non whitespace character following $ with \S:
import re
def clean_string(s):
return re.sub(r"\$\S", "", s)
This question already has answers here:
Match text between two strings with regular expression
(3 answers)
Closed 3 years ago.
Given the following string:
dpkg.log.looker.test.2019-09-25
I'd want to be able to extract:
looker.test
or
looker.
I have been trying multiple combinations but none that actually extract only the hostname. If I try to filter the whole beggining of the file (dpkg.log.), it also ignores the subsequent characters:
/[^dpkg.log].+(?=.[0-9]{4}-[0-9]{2}-[0-9]{2})/
returns:
er.test
Is there a way to ignore the whole string "dpkg.log" without ignoring the subsequent repeated characters?
Maybe, the following expression would be working just OK with re.findall:
[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}
Demo
Test
import re
regex = r'[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}'
string = '''
dpkg.log.looker.test.2019-09-25
dpkg.log.looker.test1.test2.2019-09-25
'''
print(re.findall(regex, string))
Output
['looker.test', 'looker.test1.test2']
This question already has answers here:
Python- how do I use re to match a whole string [duplicate]
(4 answers)
Closed 5 years ago.
This is my first attempt at trying to use regex with Python or at all, and it is not working as expected. I want a regex to match any alphabetic character or underscore as the first character, then any number of alphanumeric characters or underscores after. The regex I am using is '^[a-z_,A-Z][a-z_A-Z0-9]*', which seems to produce what I want at pythex.org, but in my code it is matching strings that I do not want.
My code is as follows:
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*', someString)
return True if isMatch else False
Two examples of strings that are matching that I don't want are: "qq-q" and "va[r". What am I doing wrong?
I think that you just forgot the $ at the end of your regex to specify the end of the string.
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*$', someString)
Without that, it will match the beginning of the string and not the entire string, which explains why it worked on "qq-q" ("qq" is a match) and "va[r" ("va" is a match).
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.
The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.
The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html
This question already has answers here:
Python regular expression again - match URL
(7 answers)
Closed 8 years ago.
I am trying to find a URL in a Dokuwiki using python regex. Dokuwikis format URLs like this:
[['insert URL'|Name of External Link]]
I need to design a python regex that captures the URL but stops at '|'
I could try and type out every non-alphanumeric character besides '|'
(something like this: (https?://[\w|\.|\-|\?|\/|\=|\+|\!|\#|\#|\$|\%|^|&]*) )
However that sounds really tedious and I might miss one.
Thoughts?
You can use negative character sets, or [^things to not match].
In this case, you want to not match |, so you would have [^|].
import re
bool(re.match("[^|]", "a"))
#>>> True
bool(re.match("[^|]", "|"))
#>>> False
You expect any character that's not | followed by a | and some other characters that are not ], everything enclosed within double square brackets. This translates to:
pattern = re.compile('\[\[([^\|]+)\|([^/]]+)\]\]')
print pattern.match("[[http://bla.org/path/to/page|Name of External Link]]").groups()
This would print:
('http://bla.org/path/to/page', 'Name of External Link')
If you don't need the name of the link you can just remove the parenthesis around the second group. More on regular expressions in Python here