Regex - Python - exclude new line from match [duplicate] - python

This question already has answers here:
Regex: don't match string ending with newline (\n) with end-of-line anchor ($)
(3 answers)
Closed 8 months ago.
Why the following regex matches the specified string even though the end character is new line character? And how to exclude such string from being matched ?
import re
match = re.match(r"^\d{4}$", "1234\n")
print(match != None)

You can use this regex:
\A\d+\Z
RegEx Demo
RegEx Details:
\A: \A asserts position at start of the string
\Z asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

Related

Python Regular Expression newline is matched [duplicate]

This question already has answers here:
REGEX - Differences between `^`, `$` and `\A`, `\Z`
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I want to match a string that has alphanumerics and some special characters but not the newline. But, whenever my string has a newline, it matches the newline character as well. I checked document for some flags but none of them looked relevant.
The following is a sample code in Python 3.6.2 REPL
>>> import re
>>> s = "do_not_match\n"
>>> p = re.compile(r"^[a-zA-Z\+\-\/\*\%\_\>\<=]*$")
>>> p.match(s)
<_sre.SRE_Match object; span=(0, 12), match='do_not_match'>
The expected result is that it shouldn't match as I have newline at the end.
https://regex101.com/r/qyRw5s/1
I am a bit confused on what I am missing here.
The problem is that $ matches at the end of the string before the newline (if any).
If you don't want to match the newline at the end, use \Z instead of $ in your regex.
See the re module's documentation:
'$'
Matches the end of the string or just before the newline at the end of the string,
\Z
Matches only at the end of the string.

Regular Expression to match "\\r" [duplicate]

This question already has answers here:
Can't escape the backslash with regex?
(7 answers)
Closed 7 months ago.
I'm having trouble writing a regex that matches these inputs:
1.\\r
2.\\rSomeString
I need a regex that matches \\r
Escape the back slashes twice. String's interpret \ as a special character marker.
Use \\\\r instead. \\ is actually interpreted as just \.
EDIT: So as per the comments you want any string that starts with \\r with any string after it. The regex pattern is as follows:
(\\\\r\S*)
\\\\r is the string you want at the start and \S* says any non-white space (\S) can come after any number of times (*).
A literal backslash in Python can be matched with r'\\' (note the use of the raw string literal!). You have two literal backslashes, thus, you need 4 backslashes (in a raw string literal) before r.
Since you may have any characters after \\r, you may use
import re
p = re.compile(r'\\\\r\S*')
test_str = r"\\r \\rtest"
print(p.findall(test_str))
See Python demo
Pattern description:
\\\\ - 2 backslashes
r - a literal r
\S* - zero or more non-whitespace characters.
Variations:
If the characters after r can only be alphanumerics or underscore, use \w* instead of \S*
If you want to only match \\r before non-word chars, add a \B non-word boundary before the backslashes in the pattern.
You can fine-tune your regular expressions on-line, e.g. at this site.

Regex - replace word having plus or brackets [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
In Python, I am trying to do
text = re.sub(r'\b%s\b' % word, "replace_text", text)
to replace a word with some text. Using re rather than just doing text.replace to replace only if the whole word matches using \b. Problem comes when there are characters like +, (, [ etc in word. For example +91xxxxxxxx.
Regex treats this + as wildcard for one or more and breaks with error. sre_constants.error: nothing to repeat. Same is in the case of ( too.
Could find a fix for this after searching around a bit. Is there a way?
Just use re.escape(string):
word = re.escape(word)
text = re.sub(r'\b{}\b'.format(word), "replace_text", text)
It replaces all critical characters with a special meaning in regex patterns with their escape forms (e.g. \+ instead of +).
Just a sidenote: formatting with the percent (%) character is deprecated and was replaced by the .format() method of strings.

Python regular expression to match a pattern when preceded by either start of line or whitespace [duplicate]

This question already has answers here:
Python Regex Engine - "look-behind requires fixed-width pattern" Error
(3 answers)
Closed 4 years ago.
I would like to write a regex that matches the word hello but only when it either starts a line or is preceded by whitespace. I don't want to match the whitespace if its there...I just need to know it (or the start of line) is there.
So I've tried:
r = re.compile('hello(?<=\s|^)')
but this throws:
error: look-behind requires fixed-width pattern
For the sake of an example, if my string to be searched is:
s = 'hello world hello thello'
then I would like my regex to match two times...at the locations in uppercase below:
'HELLO world HELLO thello'
where the first would match because it is preceded by the start of the line, while the second match would be because it is preceded by a space. The last 5 characters would not match because they are preceded by a t.
(?:(?<=\s)|^)hello would be that which you want. The lookbehind needs to be in the beginning of regular expression; and it must indeed be of fixed width - \s is 1 character wide, whereas ^ is 0 characters, so you cannot combine them with |. In this case we do not need to, we just alternate (?<=\s) and ^.
Notice that both of these would still match hellooo; if this is not acceptable, you have to add \b at the end.

Why do I need to add DOTALL to python regular expression to match new line in raw string [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
Why does one need to add the DOTALL flag for the python regular expression to match characters including the new line character in a raw string. I ask because a raw string is supposed to ignore the escape of special characters such as the new line character. From the docs:
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline.
This is my situation:
string = '\nSubject sentence is: Appropriate support for families of children diagnosed with hearing impairment\nCausal Verb is : may have\npredicate sentence is: a direct impact on the success of early hearing detection and intervention programs in reducing the negative effects of permanent hearing loss'
re.search(r"Subject sentence is:(.*)Causal Verb is :(.*)predicate sentence is:(.*)", string ,re.DOTALL)
results in a match , However , when I remove the DOTALL flag, I get no match.
In regex . means any character except \n
So if you have newlines in your string, then .* will not pass that newline(\n).
But in Python, if you use the re.DOTALL flag(also known as re.S) then it includes the \n(newline) with that dot .
Your source string is not raw, only your pattern string.
maybe try
string = r'\n...\n'
re.search("Subject sentence is:(.*)Causal Verb is :(.*)predicate sentence is:(.*)", string)

Categories

Resources