Python Regular Expression newline is matched [duplicate]

Python Regular Expression newline is matched [duplicate] - python

This question already has answers here:
REGEX - Differences between `^`, `$` and `\A`, `\Z`
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I want to match a string that has alphanumerics and some special characters but not the newline. But, whenever my string has a newline, it matches the newline character as well. I checked document for some flags but none of them looked relevant.
The following is a sample code in Python 3.6.2 REPL
>>> import re
>>> s = "do_not_match\n"
>>> p = re.compile(r"^[a-zA-Z\+\-\/\*\%\_\>\<=]*$")
>>> p.match(s)
<_sre.SRE_Match object; span=(0, 12), match='do_not_match'>
The expected result is that it shouldn't match as I have newline at the end.
https://regex101.com/r/qyRw5s/1
I am a bit confused on what I am missing here.

The problem is that $ matches at the end of the string before the newline (if any).
If you don't want to match the newline at the end, use \Z instead of $ in your regex.
See the re module's documentation:
'$'
Matches the end of the string or just before the newline at the end of the string,
\Z
Matches only at the end of the string.

Related

Regex - Python - exclude new line from match [duplicate]

This question already has answers here:
Regex: don't match string ending with newline (\n) with end-of-line anchor ($)
(3 answers)
Closed 8 months ago.
Why the following regex matches the specified string even though the end character is new line character? And how to exclude such string from being matched ?
import re
match = re.match(r"^\d{4}$", "1234\n")
print(match != None)

You can use this regex:
\A\d+\Z
RegEx Demo
RegEx Details:
\A: \A asserts position at start of the string
\Z asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

i am trying to extract date from text using regular expression [duplicate]

This question already has answers here:
How do I parse an ISO 8601-formatted date?
(29 answers)
Closed 4 years ago.
I am trying to extract the date from this '2025-03-21T12:54:41Z' text using python regular expression.
date=re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})$', date[0])
print(date)
This give output as None
also, I tried this code
date_reg_exp = re.compile('\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}')
matches_list=date_reg_exp.findall(date[0])
for match in matches_list:
print match
This gives output as - only
Please help

Your regular expression is wrong because it has a $ at the end. $ asserts that this is the end of the string.
The regex engine matches your string with the regex and after matching the last two digits, expects a $ - end of the string. However, your string still has T12:54:41Z before the end, so the regex does not match.
To fix this, remove $:
>>> re.match('(\d{4})[/.-](\d{2})[/.-](\d{2})', '2025-03-21T12:54:41Z')
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>

Instead of using $ sigil at the end of your regexp, which is matching end-of-line character, try using ^ at the beginning:
import re
date='2025-03-21T12:54:41Z'
date=re.match('^(\d{4})[/.-](\d{2})[/.-](\d{2})', date)
print(date)
Output in python3:
<_sre.SRE_Match object; span=(0, 10), match='2025-03-21'>
Python2:
<_sre.SRE_Match object at 0x7fd191ac1ae0>

Searching for an exact match that contains brackets [duplicate]

This question already has an answer here:
Python re - escape coincidental parentheses in regex pattern
(1 answer)
Closed 5 years ago.
I am reading in lines from a file each of which are formatted like this:
array_name[0]
array_name[1]
How can I do an exact match on this string in python? I've tried this:
if re.match(line, "array_name[0]")
but it seems to match all the time without taking the parts in bracket ([0], [1], etc.) into account

re.escape from the re module is a useful tool for automatically escaping characters that the regex engine considers special. From the docs:
re.escape(pattern)
Escape all the characters in pattern except ASCII
letters and numbers. This is useful if you want to match an arbitrary
literal string that may have regular expression metacharacters in it.
In [1]: re.escape("array_name[0]")
Out[1]: 'array_name\\[0\\]'
Also, you've reversed the order of your arguments. You'll need your pattern to come first, followed by the text you want to match:
re.match(re.escape("array_name[0]"), line)
Example:
In [2]: re.match(re.escape("array_name[0]"), 'array_name[0] in a line')
Out[2]: <_sre.SRE_Match object; span=(0, 13), match='array_name[0]'>

Unable to search for '\\n' using regular expressions in python3 [duplicate]

This question already has answers here:
Confused about backslashes in regular expressions [duplicate]
(3 answers)
Closed 5 years ago.
>>> import re
>>> a='''\\n5
... 8'''
>>> b=re.findall('\\n[0-9]',a)
>>> print(b)
['\n8']
Why does it show \n8 and not \n5?
I used a \ in front of \n the first time.
I am finding the use of raw string in regex in python a bit confusing. To me it does not seem to be making any changes to the result

This is because in strings, the newline character is considered that, a single character.
When you do \\n5 you're escaping the \, so that's literally printing \n5, and not a newline by Python standards.
When you search for a regex such as \\n[0-9] though, in the first \ you're escaping the \n regex expression, so in the end you're looking for \n which is Python's newline. That matches the actual newline in your string, but not \\n which is two separate characters, an escaped \ and an n.

\\n is not a newline, it's an escaped backslash with an n.
>>> import re
>>> a = '''\n5
... 8'''
>>> a=re.findall('\\n[0-9]',a)
>>> print(a)
['\n5', '\n8']

because \\n5 is not valid new line, it will print \n5

Regex - replace word having plus or brackets [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
In Python, I am trying to do
text = re.sub(r'\b%s\b' % word, "replace_text", text)
to replace a word with some text. Using re rather than just doing text.replace to replace only if the whole word matches using \b. Problem comes when there are characters like +, (, [ etc in word. For example +91xxxxxxxx.
Regex treats this + as wildcard for one or more and breaks with error. sre_constants.error: nothing to repeat. Same is in the case of ( too.
Could find a fix for this after searching around a bit. Is there a way?

Just use re.escape(string):
word = re.escape(word)
text = re.sub(r'\b{}\b'.format(word), "replace_text", text)
It replaces all critical characters with a special meaning in regex patterns with their escape forms (e.g. \+ instead of +).
Just a sidenote: formatting with the percent (%) character is deprecated and was replaced by the .format() method of strings.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Regular Expression newline is matched [duplicate] - python

Related

Regex - Python - exclude new line from match [duplicate]

i am trying to extract date from text using regular expression [duplicate]

Searching for an exact match that contains brackets [duplicate]

Unable to search for '\\n' using regular expressions in python3 [duplicate]

Regex - replace word having plus or brackets [duplicate]

Categories

Resources