Python escape sequence complex output - python

When I am writing the following command in Python IDLE it will give you the output with quotes, I want to know why it is giving such output.
x='''''abc\'abcddd'''''
print x
This is output of the written code.
''abc'abcddd

It is due to pythons triple quoted strings:
''' '''
It interprets everything in between as a character. So in your string:
'''''abc\'abcddd'''''
The first three quotes 'open' the string. Than it encounters 2 quotes, which it interprets as characters. Next it encounters an escaped quote, which would be printed as a quote anyway, but it still uses the escaped quote. It then encounters the first 3 of the last 5 quotes, ending the triple quoted string. It then encounters 2 more quotes forming an empty string ''.
A space at the places python considers 1 'thing':
''' ''abc\'abcddd ''' ''

Related

How to use "\newline" in python [duplicate]

I found the sequence \newline in a list of escape sequences in the python documentation. I wonder how it is used and for what. At least in my interpreter it seems this is just interpreted as '\n' + 'ewline':
>>> print('\newline')
ewline
It refers to the actual newline character - the one with character code "10" (0x0a) - not the text sequence "newline".
So, an example is like:
print("a\
b")
Here, the backslash is succeeded by the newline, inside a string, and what is printed is just "ab" with nothing apart.
it differs from \n - in here, the characer following the backslash is n (0x6e), and this sequence is translated to \x0a on parsing the string. On \<newline>, the source string contains the \x0a character and that is replaced by an empty string.
Maybe the documentation on that page would be more clear if it would read \<newline> instead of just \newline.
The documentation you are alluding to is explaining how a backslash followed by a literal newline is ignored, as if the next line were physically joined with the line on which the starting backslash was found.
The string \newline' has no special meaning; it is exactly what you say you think it is.

python 3: quoting result of random string generation

I'm new to python and things do not always work as I expect... but I am learning, slowly. Here is a case in point. If I randomly create a string via:
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
])
then I could end up with a string with single quotes as well as backslashes. I assume that I should then go through the string and quote the possibly problematic characters. So, for example: if I generate the (short) string:
cs]b77e\IM>&4/,u.s_jr"xmMdHD7a'wrEw(
my instinct tells me that I should quote that into:
cs]b77e\\IM>&4/,u.s_jr"xmMdHD7a\'wrEw(
It looks like the string.replace() method is my friend...
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
]).replace('\\', '\\').replace('\'', '\'')
but is there a better way?
Also, in the replace() methods the meaning of the single quoted strings seems to change depending on context. Coming from Perl this seems strange to me. My initial attempts had me doing things like replace('\\', '\\\\') thinking that I had to quote the characters going into the replacement string. Is this normal or am I missing something else?
Edit
My goal here is to end up with 63 characters in a string. I don't really think that I have to quote any generated single quotes but my thought is that if I later use the string and it has generated backslashes then the next character after the backslash would act like it was quoted, right? I mean:
len('1234')
yields 4 but
len('12\4')
yields 3 so I need to post-process the generated string to at least quote the backslashes, right? Is there a better way to quote problematic characters than a chain of replaces() methods?
A string can contain any valid characters; the quotes and backslashes are only useful or special when representing a string in Python code. So you don't normally need to do anything like this when you already have a string which contains the characters you want.
If you want a representation which can be parsed by Python (e.g. by writing it to a .py file), repr() does that.
You don't have to escape characters unless they are part of code you are writing or from an input from a user. If the backslash character or a quote character is generated by a Python program, then it is already stored as that character in memory. There is no need do any additional escaping.
Why? Because Python is not interpreting a string literal, it is simply generating characters, which are stored as numbers in memory. When you ask Python to display a string containing one of the characters such as a single quote or a backslash, it will automatically escape them.
Here is an example. A double quote is 34, single quote is character 39, and backslash is 92.
'a'+chr(34)+'b'+chr(39)+'c'+chr(92)+'d'
# returns:
'a"b\'c\\d'
Because I included a double quote and a single quote Python will use a single quote to surround the string, an unescaped double quote within the string, an escaped single quote, and and escaped backslash.
So there is no need to escape characters that are generated within a Python program, it does it for you.

Understanding file locations in python - unexpected errors

I am learning python 3.3 in windows 7. I have a two text files - lines.txt and raven.txt in a folder. Both contain the same text for the first example.
When I try to access ravens, using the code below, I get the error -
OSError: [Errno 22] Invalid argument: 'C:\\Python\raven.txt'
I know that the above error can be fixed by using an escape character like this -
C:\\Python\\raven.txt
C:\Python\\raven.txt
Why do both methods work ? Strangely, when I access lines.txt in the same folder, I get no error ! Why ?
import re
def main():
print('')
fh = open('C:\Python\lines.txt')
for line in fh:
if re.search('(Len|Neverm)ore', line):
print(line, end = '')
if __name__ == '__main__':main()
Also, when I use the line below, I get a completely different error - TypeError: embedded NUL character. Why ?
fh = open('C:\Python\Exercise Files\09 Regexes\raven.txt')
I can rectify this by using \ before every \ in the file path.
\r is an escape character, but \l is not. So, lines is interpreted as lines while raven is interpreted as aven, since \r is escaped.
In [1]: len('\l')
Out[1]: 2
In [2]: len('\r')
Out[2]: 1
You should always escape backslashes with \\. In cases your string doesn't have quotes, you can also use raw strings:
In [9]: len(r'\r')
Out[9]: 2
In [10]: r'\r'
Out[10]: '\\r'
See: https://docs.python.org/3/reference/lexical_analysis.html
maybe you can use raw string.
just like this open(r'C:\Python\Exercise Files\09 Regexes\raven.txt').
When an r' orR' prefix is present, backslashes are still used to
quote the following character, but all backslashes are left in the
string. For example, the string literal r"\n" consists of two
characters: a backslash and a lowercase `n'. String quotes can be
escaped with a backslash, but the backslash remains in the string; for
example, r"\"" is a valid string literal consisting of two characters:
a backslash and a double quote; r"\" is not a value string literal
(even a raw string cannot end in an odd number of backslashes).
Specifically, a raw string cannot end in a single backslash (since the
backslash would escape the following quote character). Note also that
a single backslash followed by a newline is interpreted as those two
characters as part of the string, not as a line continuation.
You can actually use forward slashes instead of backward ones, that way you don't have to escape them at all, which would save you a lot of headaches. Like this: 'C:/Python/raven.txt', I can guarantee that it works on Windows.

So what is the story with 4 quotes?

I was experimenting in the python shell with the type() operator. I noted that:
type('''' string '''')
returns an error which is trouble scanning the string
yet:
type(''''' string ''''')
works fine and responds that a string was found.
What is going on? does it have to do with the fact that type('''' string '''') is interpreted as type("" "" string "" "") and therefore a meaningless concatenation of empty strings and an undefined variable?
You are ending a string with 3 quotes, plus one extra. This works:
>>> ''''string'''
"'string"
In other words, Python sees 3 quotes, then the string ends at the next 3 quotes. Anything that follows after that is not part of the string anymore.
Python also concatenates strings that are placed one after the other:
>>> 'foo' 'bar'
'foobar'
so '''''string''''' means '''''string''' + '' really; the first string starts right after the opening 3 quotes until it finds 3 closing quotes. Those three closing quotes are then followed by two more quotes forming a separate but empty string:
>>> '''''string'''
"''string"
>>> '''''string'''''
"''string"
>>> '''''string'''' - extra extra! -'
"''string - extra extra! -"
Moral of the story: Python only supports triple or single quoting. Anything deviating from that can only lead to pain.
Your supposition seems to be correct, given the following:
a = '''' string ''''
File "<stdin>", line 1
a = '''' string ''''
^
SyntaxError: EOL while scanning string literal
As Martijn says in his answer, Python is trying to concatenate adjacent strings, and fails when it doesn't find the ending '.

Slash replacement inside a raw string

Just a simple question concerning raw string, regex pattern and replacement:
I have a string variable defined as follow:
> print repr(foo)
'\n\t\t\n\t\tIf (GUTIAttach>=1) //In case of GUTI attach Enodeb should not ask RRCUecapa again\n\t\tUECapInfo;//Mps("( \\"rat_Type\\":0 \\"ueCapabilitiesRAT_Container\\":hex:011c0000000080 )");
My problem are characters "(" and ")", I want to replace them by "\(" and "\)" inside the raw string because it will be used after as a regular expression pattern.
I tried to use this method:
foo_tmp= [inc.replace(')', '\)') for inc in foo]
foo_tmp= [inc.replace('(', '\)') for inc in foo_tmp]
foo = "".join(foo_tmp)
the result gives:
> print repr(foo)
'\n\t\t\n\t\tIf \\(GUTIAttach>=1\\) //In case of GUTI attach Enodeb should not ask RRCUecapa again\n\t\t{\n\t\t\tUECapInfo;//Mps\\("\\( \\"rat_Type\\":0 \\"ueCapabilitiesRAT_Container\\":hex:011c0000000080 \\)"\\);
Characters "(" and ")" have been replaced by "\\(" and "//)" instead of "\(" and "\)".
That's a bit unexpected for me, so do you know how I can proceed to get just a single slash without changing the other part of the string?
Note: The method .decode('string_escape') is also not working due to the rest of string. Double slashes already present in the original raw string must not change.
Thanks a lot for your help
Use the re.escape() function to escape regular expression meta characters for you.
What you are seeing is otherwise perfectly normal Python behaviour; you are looking at a python literal representation; the output can be pasted back into a Python interpreter and recreate the value. As such, anything that could be interpreted as an escape code is escaped for you; a single \ would normally be doubled to prevent it being interpreted as the start of an escape sequence:
>>> '\('
'\\('
>>> print '\\('
\(
You can see this at work in other places in your foo string; the \n character combination represents a newline character, not two separate characters \ and n. If you wanted to include a literal \ and n in the text, you'd have to double the backslash to \\n. Further on into the value of foo you'll find \\", which is a single backslash followed by a " quote.

Categories

Resources