Why is 'r' used in regular expression in Python? [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What exactly do “u” and “r”string flags in Python, and what are raw string litterals?
p = re.compile(r'(\b\w+)\s+\1')
p.search('Paris in the the spring').group()
What is the meaning of r in the 1st line?

From the re documentation:
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

r designates a raw string in Python, which has different rules than a standard string, such as you don't have to escape backslashes and other special chars.

Related

Why is the 'r' before strings in python so important? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 6 years ago.
I first saw it used in building regular expressions across multiple lines as a method argument to re.compile(), so I assumed that r stands for RegEx.
For example:
regex = re.compile(
r'^[A-Z]'
r'[A-Z0-9-]'
r'[A-Z]$', re.IGNORECASE
)
So what does r mean in this case? Why do we need it?
The r means that the string is to be treated as a raw string, which means all escape codes will be ignored.
For an example:
'\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n.
When an 'r' or 'R' prefix is present,
a character following a backslash is
included in the string without change,
and all backslashes are left in the
string. For example, the string
literal r"\n" consists of two
characters: a backslash and a
lowercase 'n'. String quotes can be
escaped with a backslash, but the
backslash remains in the string; for
example, r"\"" is a valid string
literal consisting of two characters:
a backslash and a double quote; r"\"
is not a valid string literal (even a
raw string cannot end in an odd number
of backslashes). Specifically, a raw
string cannot end in a single
backslash (since the backslash would
escape the following quote character).
Note also that a single backslash
followed by a newline is interpreted
as those two characters as part of the
string, not as a line continuation.
Source: Python string literals
It means that escapes won’t be translated. For example:
r'\n'
is a string with a backslash followed by the letter n. (Without the r it would be a newline.)
b does stand for byte-string and is used in Python 3, where strings are Unicode by default. In Python 2.x strings were byte-strings by default and you’d use u to indicate Unicode.

Do raw strings in python disable meta characters such as \w or \d just as they do with \n? [duplicate]

This question already has answers here:
Confused about backslashes in regular expressions [duplicate]
(3 answers)
Closed 4 years ago.
I am new to Python. Can someone tell me what is the difference between these two regex statements (re.findall(r"\d+","i am aged 35")) and (re.findall("\d+","i am aged 35")).
I had the understanding that the raw string in the first statement will make "\d+" inactive because that is the primarily role of a raw string - to make escape characters inactive. In other words "\d+" will not be a meta character for finding/searching/matching digits if a raw string is used. However, I now see that both statements return the same result.
Both the Python parser and the regular expression parser handle escape sequences. This means that any escape sequence that both engines support must either use double slashes, or you use a raw string literal so the Python parser doesn't try to interpret escape sequences.
In this case, \d has no meaning to Python, so the backslash is left in place for the re module to handle. So here specifically, there is no difference between the two snippets.
However, if you needed to match a literal backslash before other text like section in your regular expression, without raw strings, you'd have to use '\\\\section' to define the pattern! That's because the Python interpreter would see '\\section' as an escape sequence producing a single backslash, and then the regular expression parser sees the start of the escape sequence \s.
See the section on backslashes and raw string literals in the Python regular expression HOWTO.

Python: Why do raw strings require backslash to be escaped? [duplicate]

This question already has answers here:
Can't escape the backslash with regex?
(7 answers)
Closed 7 years ago.
This explanation is from the python documentation:
Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the 'ur' syntax is not supported.
If raw strings treat backslashes as char literals, why does the backslash need to be escaped in the expression:
re.compile(r"'\\'")
Instead of just being able to write:
re.compile(r"'\'")
To capture a single backslash when using the re module?
because '\' has special meaning in re it means escape the character after it in the language you use to define a re so if you want to match '+' as a character your re will be '\+'

Including \n in a string in python [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 years ago.
How do i escape \n in a string in python.
How do i write out to stdin in python this string "abc\ndef" as one single input
Sys.stdout.write("abc\ndef")
current output
import sys
>>> sys.stdout.write("abc\ndef")
abc
def
I would like it to be abc\ndef
You should escape the backslash so that it's not treated as escaping character itself:
Sys.stdout.write("abc\\ndef")
Background
The backslash \ tells the parser that the next character is something special and must be treated differently. That's why \n will not print as \n but as a newline. But how do we write a backslash then? We need to escape it, too, resulting in \\ for a single backslash and \\n for the output \n.
Docs here, also see this SO question
Alternatively you can use "raw" strings, i.e. prefixing your strings with an r, to disable interpreting escape sequences is your strings:
Sys.stdout.write(r"abc\ndef")
As an alternative to escaping the backslash, you can disable backslash-escaping entirely by using a raw string literal:
>>> print(r"abc\ndef")
abc\ndef

Can '\' be in a Python string? [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 8 years ago.
I program in Python in PyCharm and whenever I write '\' as a string it says that the following statements do nothing. For example:
Is there a way to fix this and make it work?
Thanks.
You need to double the backslash:
'/-\\'
as a single backslash has special meaning in a Python string as the start of an escape sequence. A double \\ results in the string containing a single backslash:
>>> print '/-\\'
/-\
If the backslash is not the last character in the string, you could use a r'' raw string as well:
>>> print r'\-/'
\-/
You need to scape them to be in the string, for example:
>>>s='\\'
>>>print s
\
You can also use the r (raw string) modifier in front of the string to include them easily but they can't end with an odd number of backslash. You can read more about string literals on the docs.

Categories

Resources