Why does python allow escaping quotes in rawdata string? [duplicate] - python

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(14 answers)
Closed 8 years ago.
I'm trying to understand why python has this unheard of behavior.
If I'm writing rawdata string, it is much more likely that I won't want escaping quotes.
This behavior forces us to write this weird code:
s = r'something' + '\\' instead of just 's = r'something\'
any ideas why python developers found this more sensible?
EDIT:
I'm not asking why it is so. I'm asking what makes this design decision, or if anyone finds any thing good in it.

The r prefix for a string literal doesn't disable escaping, it changes escaping so that the sequence \x (where x is any character) is "converted" to itself. So then, \' emits \' and your string is unterminated because there's no ' at the end of it.

The decision to disallow an unpaired ending backslash in a raw string is explained in this faq:
Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose.

Related

what is the purpose of the \ because they do not affect the function of the code? [duplicate]

This question already has answers here:
What does a backslash by itself ('\') mean in Python? [duplicate]
(5 answers)
Closed 2 months ago.
I have a university assignment to create a noughts&crosses/tictactoe game from a given template and this below was part of it:
board = [['1','2','3'],\
['4','5','6'],\
['7','8','9']]
Whether those backslashes are present or not it doesnt affect the function of the code as the list gives the same printed output and behaves in the same way so I'm just curious as to what their purpose is?
The backslash escapes the next character. Since the next character is a newline, the python parser would work as if the newline wasn't there. But a newline is completely harmless in this context, so its distracting and pointless to do it here.
A different case is string concatentation
foo = "bar"\
"baz"
Without the backslash this would be a two lines of code - in this case resulting in an indentation error. But since the newline has been escaped, its the same as
foo = "bar" "baz"
and python will concatenate those into a single string during compile.
NOTE
Although there are many escape sequences in string literals, this is the only escape allowed outside of them. Its called a "line continuation character" (or Explicit line joining) because it suppresses normal line termination rules in the python lexer.
Alternately, there is also Implicit line joining:
Expressions in parentheses, square brackets or curly braces can be
split over more than one physical line without using backslashes.
Since your list definition already follows implicit line joining rules, there is no need to add the explicit rule.
Their purpose is consistency: they let you join physical lines when balanced delimiters aren’t available, and it’s simpler to let them be used anywhere than to allow them only when no other option exists.

python regex escaping meta characters among delimiters [duplicate]

This question already has answers here:
Why can't Python parse this JSON data? [closed]
(3 answers)
Closed 4 years ago.
Python 2.4.4 (yeah, long story)
I want to parse this fragment (with re)
"comment":"#2 Surely, (this) can't be any [more] complicated a reg-ex?",
i.e., it (the comment) can contain characters (upper or lower), numbers, hash, parentheses, square brackets, single quotes, and commas, and it (this fragment) specifically ends with a dquote and a comma
i've gotten this far with the expression,
r'\"comment\":\"(?P<COMMENT>[a-zA-Z0-9\s]+)\",'
but, of course, it only matches when none of the meta characters are in the comment. the final \", works as the the termination criterion. I've tried all kinds of escape, double escape ...
could a kind 're geek' please enlighten ?
i want to access the "entire" comment as match.group["COMMENT"]
corrected the pattern to what I was actually using when asked. my bad cut-n-paste.
until marked with all the "DUPLICATES", I couldn't spell JSON. But, I DID specify I had to do this with re.
even with all the JSON responses and code frags, it wasn't introduced until 2.6, and I did specify I'm still using 2.4.4.
Thanks to those responding with the regex-based solutions. Now working for me :)
Use a non-greedy .*? to match anything before ",, assuming this as the end of comment:
import re
s = '''"comment":"#2 Surely, (this) can't be any [more] complicated a reg-ex?",'''
match = re.search(r'"comment":"(?P<comment>.*?)",', s)
print(match.group('comment'))
# #2 Surely, (this) can't be any [more] complicated a reg-ex?
You can name your matched string using (?P<group_name>…).

Do raw strings in python disable meta characters such as \w or \d just as they do with \n? [duplicate]

This question already has answers here:
Confused about backslashes in regular expressions [duplicate]
(3 answers)
Closed 4 years ago.
I am new to Python. Can someone tell me what is the difference between these two regex statements (re.findall(r"\d+","i am aged 35")) and (re.findall("\d+","i am aged 35")).
I had the understanding that the raw string in the first statement will make "\d+" inactive because that is the primarily role of a raw string - to make escape characters inactive. In other words "\d+" will not be a meta character for finding/searching/matching digits if a raw string is used. However, I now see that both statements return the same result.
Both the Python parser and the regular expression parser handle escape sequences. This means that any escape sequence that both engines support must either use double slashes, or you use a raw string literal so the Python parser doesn't try to interpret escape sequences.
In this case, \d has no meaning to Python, so the backslash is left in place for the re module to handle. So here specifically, there is no difference between the two snippets.
However, if you needed to match a literal backslash before other text like section in your regular expression, without raw strings, you'd have to use '\\\\section' to define the pattern! That's because the Python interpreter would see '\\section' as an escape sequence producing a single backslash, and then the regular expression parser sees the start of the escape sequence \s.
See the section on backslashes and raw string literals in the Python regular expression HOWTO.

Python: Trailing backslash in raw strings [duplicate]

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(13 answers)
Closed 7 months ago.
The current Python grammar doesn't allow one to output a trailing \ in a raw string:
>>> print(r'a\b\c\')
SyntaxError: EOL while scanning string literal
On the contrary, you can write Bash like this:
echo 'a\b\c\'
I understand what the doc is saying. I wouldn't feel strange if an expression '\' fails because the backslash is escaping the quote. What I'm questioning is r'\': Aren't raw strings meant to be raw (which means backslashes in the string are taken literally)?
Do we have to write r'a\b\c' + '\\' or 'a\\b\\c\\' to make a string literal a\b\c\ in Python? I couldn't see how this is Pythonic.
From the documentation,
Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.
The limitation is due to the fact that you need someway to include a ' inside a raw string. Otherwise there is no way to put bob said "I'm not hungry" in a string.
So you end up in weird situation where you need an escape character for this case. So in raw strings you escape a ' with a \ and yes the \ stays in the string.
So r'bob said "I\'m not hungry"' it is!!
When you write print(r'\'), Python understand \' in that statement as a character. Because of that python raised syntax error because the there is a incomplete string inside print function.
For an example if you need to print i am "free" man , you should write
print("i am \"free\" man")

Python raw literal string [duplicate]

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(14 answers)
Closed 7 months ago.
str = r'c:\path\to\folder\' # my comment
IDE: Eclipse
Python2.6
When the last character in the string is a backslash, it seems like it will escape the last single quote and treat my comment as part of the string. But the raw string is supposed to ignore all escape characters, right? What could be wrong? Thanks.
Raw string literals don't treat backslashes as initiating escape sequences except when the immediately-following character is the quote-character that is delimiting the literal, in which case the backslash does escape it.
The design motivation is that raw string literals really exist only for the convenience of entering regular expression patterns – that is all, no other design objective exists for such literals. And RE patterns never need to end with a backslash, but they might need to include all kinds of quote characters, whence the rule.
Many people do try to use raw string literals to enable them to enter Windows paths the way they're used to (with backslashes) – but as you've noticed this use breaks down when you do need a path to end with a backslash. Usually, the simplest solution is to use forward slashes, which Microsoft's C runtime and all version of Python support as totally equivalent in paths:
s = 'c:/path/to/folder/'
(side note: don't shadow builtin names, like str, with your own identifiers – it's a horrible practice, without any upside, and unless you get into the habit of avoiding that horrible practice one day you'll find yourseld with a nasty-to-debug problem, when some part of your code tramples over a builtin name and another part needs to use the builtin name in its real meaning).
It's IMHO an inconsistency in Python, but it's described in the documentation. Go to the second last paragraph:
http://docs.python.org/reference/lexical_analysis.html#string-literals
r"\" is not a valid string literal
(even a raw string cannot end in an
odd number of backslashes)

Categories

Resources