Using regex with special characters to find a match in python [duplicate] - python

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I am extracting a string and need to check whether it follows a particular pattern
<![any word]>
if so I need to replace it with "". I am trying the following code
string1 = "<![if support]> hello"
string = re.sub(re.compile("<![.*?]>"),"",string1)
print(string)
But I am getting the output as
<![if support]> hello
I wish to get the output as hello. What am I doing wrong here?

[ and ] are treated as meta characters in regex. You'll need to escape them:
In [1]: re.sub(re.compile("<!\[.*?\]>"), "", "<![if support]> hello")
Out[1]: ' hello'
As a simplification (courtesy Wiktor Stribiżew), you can escape just the first left paren, shortening your regex to "<!\[.*?]>".

Related

How to find a character in string and replace it and the following one in python [duplicate]

This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 1 year ago.
I'm using an API that returns § characters with a color code (1-9 or a-h) which I want to eliminate (§ and following character). Their purpose is for color formatting but I'm not using that and my method iterates through a string to remove them but could not find a better way and it fees too hacky and buggy. Is there like a parameter for the str.replace function that removes the letter after the found character?
You can "eliminate" the precise pattern with regular expressions using the sub method:
import re
def clean_string(s):
return re.sub(r"\$[1-9a-h]", "", s)
clean_string("foo $4 - bar $f")
# > 'foo - bar '
If you want more flexibility, you can match any non whitespace character following $ with \S:
import re
def clean_string(s):
return re.sub(r"\$\S", "", s)

Python REGEX to exclude beggining of string [duplicate]

This question already has answers here:
Match text between two strings with regular expression
(3 answers)
Closed 3 years ago.
Given the following string:
dpkg.log.looker.test.2019-09-25
I'd want to be able to extract:
looker.test
or
looker.
I have been trying multiple combinations but none that actually extract only the hostname. If I try to filter the whole beggining of the file (dpkg.log.), it also ignores the subsequent characters:
/[^dpkg.log].+(?=.[0-9]{4}-[0-9]{2}-[0-9]{2})/
returns:
er.test
Is there a way to ignore the whole string "dpkg.log" without ignoring the subsequent repeated characters?
Maybe, the following expression would be working just OK with re.findall:
[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}
Demo
Test
import re
regex = r'[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}'
string = '''
dpkg.log.looker.test.2019-09-25
dpkg.log.looker.test1.test2.2019-09-25
'''
print(re.findall(regex, string))
Output
['looker.test', 'looker.test1.test2']

Python regex matching on strings I don't want [duplicate]

This question already has answers here:
Python- how do I use re to match a whole string [duplicate]
(4 answers)
Closed 5 years ago.
This is my first attempt at trying to use regex with Python or at all, and it is not working as expected. I want a regex to match any alphabetic character or underscore as the first character, then any number of alphanumeric characters or underscores after. The regex I am using is '^[a-z_,A-Z][a-z_A-Z0-9]*', which seems to produce what I want at pythex.org, but in my code it is matching strings that I do not want.
My code is as follows:
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*', someString)
return True if isMatch else False
Two examples of strings that are matching that I don't want are: "qq-q" and "va[r". What am I doing wrong?
I think that you just forgot the $ at the end of your regex to specify the end of the string.
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*$', someString)
Without that, it will match the beginning of the string and not the entire string, which explains why it worked on "qq-q" ("qq" is a match) and "va[r" ("va" is a match).

understanding this python regular expression re.compile(r'[ :]') [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.
The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.
The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html

Python Regex stop at '|' character [duplicate]

This question already has answers here:
Python regular expression again - match URL
(7 answers)
Closed 8 years ago.
I am trying to find a URL in a Dokuwiki using python regex. Dokuwikis format URLs like this:
[['insert URL'|Name of External Link]]
I need to design a python regex that captures the URL but stops at '|'
I could try and type out every non-alphanumeric character besides '|'
(something like this: (https?://[\w|\.|\-|\?|\/|\=|\+|\!|\#|\#|\$|\%|^|&]*) )
However that sounds really tedious and I might miss one.
Thoughts?
You can use negative character sets, or [^things to not match].
In this case, you want to not match |, so you would have [^|].
import re
bool(re.match("[^|]", "a"))
#>>> True
bool(re.match("[^|]", "|"))
#>>> False
You expect any character that's not | followed by a | and some other characters that are not ], everything enclosed within double square brackets. This translates to:
pattern = re.compile('\[\[([^\|]+)\|([^/]]+)\]\]')
print pattern.match("[[http://bla.org/path/to/page|Name of External Link]]").groups()
This would print:
('http://bla.org/path/to/page', 'Name of External Link')
If you don't need the name of the link you can just remove the parenthesis around the second group. More on regular expressions in Python here

Categories

Resources