Update for clarification
I have to replicate the functionality from a server. One of the responses of this old server is the one seen here http://test.muchticket.com/api/?token=carlos&method=ventas&ESP=11, except that the double slashes should be single ones.
End of update
Update No.2 for clarification
This variable then goes to a dictionary wich is dumped to an HttpResponse with this
return HttpResponse(json.dumps(response_data,sort_keys=True), content_type="application/json")
I hate my job.
End of update
I need to store 'http:\/\/shop.muchticket.com\/' in a variable. And then save it in a dictionary. I have tried several different methods, but none of them seems to work, here are some examples of what I've tried:
url = 'http:\/\/shop.muchticket.com\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
With raw
url = r'http:\/\/shop.muchticket.com\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
With the escape character
url = 'http:\\/\\/shop.muchticket.com\\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
Raw and escape character
url = r'http:\\/\\/shop.muchticket.com\\/'
print url
>> http:\\\\/\\\\/shop.muchticket.com\\\\/
Escape character and decode
url = 'http:\\/\\/shop.muchticket.com\\/'
print url.decode('string_escape')
>> http:\\/\\/shop.muchticket.com\\/
Decode only
url = 'http:\/\/shop.muchticket.com\/'
print url.decode('string_escape')
>> http:\\/\\/shop.muchticket.com\\/
The best way is not to use any escape sequences
>>> s = 'http://shop.muchticket.com/'
>>> s
'http://shop.muchticket.com/'
>>> print(s)
http://shop.muchticket.com/
Unlike "other" languages, you do not need to escape the forward slash (/) in Python!
If you need the forward slash then
>>> s = 'http:\/\/shop.muchticket.com\/'
>>> print(s)
http:\/\/shop.muchticket.com\/
Note: When you just type s in interpreter it gives you the repr output and thus you get the escaped forward slash
>>> s
'http:\\/\\/shop.muchticket.com\\/' # Internally stored!!!
>>> print(repr(s))
'http:\\/\\/shop.muchticket.com\\/'
Therefore Having a single \ is enough to store it in a variable.
As J F S says,
To avoid ambiguity, either use raw string literals or escape the
backslashes if you want a literal backslash in the string.
Thus your string would be
s = 'http:\\/\\/shop.muchticket.com\\/' # Escape the \ literal
s = r'http:\/\/shop.muchticket.com\/' # Make it a raw string
If you need two characters in the string: the backslash (REVERSE SOLIDUS) and the forward slash (SOLIDUS) then all three Python string literals produce the same string object:
>>> '\/' == r'\/' == '\\/' == '\x5c\x2f'
True
>>> len(r'\/') == 2
True
The preferable way to write it is: r'\/' or '\\/'.
The reason is that the backslash is a special character in a string literal (something that you write in Python source code (usually by hand)) if it is followed by certain characters e.g., '\n' is a single character (newline) and '\\' is also a single character (the backslash). But '\/' is not an escape sequence and therefore it is two characters. To avoid ambiguity, use raw string literals r'\/' where the backslash has no special meaning.
The REPL calls repr on a string to print it:
>>> r'\/'
'\\/'
>>> print r'\/'
\/
>>> print repr(r'\/')
'\\/'
repr() shows your the Python string literal (how you would write it in a Python source code). '\\/' is a two character string, not three. Don't confuse a string literal that is used to create a string and the string object itself.
And to test the understanding:
>>> repr(r'\/')
"'\\\\/'"
It shows the representation of the representation of the string.
For Python 2.7.9, ran:
url = "http:\/\/shop.muchticket.com\/"
print url
With the result of:
>> http:\/\/shop.muchticket.com\/
What is the version of Python you are using? From Bhargav Rao's answer, it seems that it should work in Python 3.X as well, so maybe it's a case of some weird imports?
Related
Is there a way to declare a string variable in Python such that everything inside of it is automatically escaped, or has its literal character value?
I'm not asking how to escape the quotes with slashes, that's obvious. What I'm asking for is a general purpose way for making everything in a string literal so that I don't have to manually go through and escape everything for very large strings.
Raw string literals:
>>> r'abc\dev\t'
'abc\\dev\\t'
If you're dealing with very large strings, specifically multiline strings, be aware of the triple-quote syntax:
a = r"""This is a multiline string
with more than one line
in the source code."""
There is no such thing. It looks like you want something like "here documents" in Perl and the shells, but Python doesn't have that.
Using raw strings or multiline strings only means that there are fewer things to worry about. If you use a raw string then you still have to work around a terminal "\" and with any string solution you'll have to worry about the closing ", ', ''' or """ if it is included in your data.
That is, there's no way to have the string
' ''' """ " \
properly stored in any Python string literal without internal escaping of some sort.
You will find Python's string literal documentation here:
http://docs.python.org/tutorial/introduction.html#strings
and here:
http://docs.python.org/reference/lexical_analysis.html#literals
The simplest example would be using the 'r' prefix:
ss = r'Hello\nWorld'
print(ss)
Hello\nWorld
(Assuming you are not required to input the string from directly within Python code)
to get around the Issue Andrew Dalke pointed out, simply type the literal string into a text file and then use this;
input_ = '/directory_of_text_file/your_text_file.txt'
input_open = open(input_,'r+')
input_string = input_open.read()
print input_string
This will print the literal text of whatever is in the text file, even if it is;
' ''' """ “ \
Not fun or optimal, but can be useful, especially if you have 3 pages of code that would’ve needed character escaping.
Use print and repr:
>>> s = '\tgherkin\n'
>>> s
'\tgherkin\n'
>>> print(s)
gherkin
>>> repr(s)
"'\\tgherkin\\n'"
# print(repr(..)) gets literal
>>> print(repr(s))
'\tgherkin\n'
>>> repr('\tgherkin\n')
"'\\tgherkin\\n'"
>>> print('\tgherkin\n')
gherkin
>>> print(repr('\tgherkin\n'))
'\tgherkin\n'
Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.
For example, let's say myString is defined as:
>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs
I want a function (I'll call it process) that does this:
>>> print(process(myString))
spam
eggs
It's important that the function can process all of the escape sequences in Python (listed in a table in the link above).
Does Python have a function to do this?
The correct thing to do is use the 'string-escape' code to decode the string.
>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs
Don't use the AST or eval. Using the string codecs is much safer.
unicode_escape doesn't work in general
It turns out that the string_escape or unicode_escape solution does not work in general -- particularly, it doesn't work in the presence of actual Unicode.
If you can be sure that every non-ASCII character will be escaped (and remember, anything beyond the first 128 characters is non-ASCII), unicode_escape will do the right thing for you. But if there are any literal non-ASCII characters already in your string, things will go wrong.
unicode_escape is fundamentally designed to convert bytes into Unicode text. But in many places -- for example, Python source code -- the source data is already Unicode text.
The only way this can work correctly is if you encode the text into bytes first. UTF-8 is the sensible encoding for all text, so that should work, right?
The following examples are in Python 3, so that the string literals are cleaner, but the same problem exists with slightly different manifestations on both Python 2 and 3.
>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naïve test
Well, that's wrong.
The new recommended way to use codecs that decode text into text is to call codecs.decode directly. Does that help?
>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naïve test
Not at all. (Also, the above is a UnicodeError on Python 2.)
The unicode_escape codec, despite its name, turns out to assume that all non-ASCII bytes are in the Latin-1 (ISO-8859-1) encoding. So you would have to do it like this:
>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve test
But that's terrible. This limits you to the 256 Latin-1 characters, as if Unicode had never been invented at all!
>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)
Adding a regular expression to solve the problem
(Surprisingly, we do not now have two problems.)
What we need to do is only apply the unicode_escape decoder to things that we are certain to be ASCII text. In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text.
The plan is, we'll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.
import re
import codecs
ESCAPE_SEQUENCE_RE = re.compile(r'''
( \\U........ # 8-digit hex escapes
| \\u.... # 4-digit hex escapes
| \\x.. # 2-digit hex escapes
| \\[0-7]{1,3} # Octal escapes
| \\N\{[^}]+\} # Unicode characters by name
| \\[\\'"abfnrtv] # Single-character escapes
)''', re.UNICODE | re.VERBOSE)
def decode_escapes(s):
def decode_match(match):
return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)
And with that:
>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő Rubik
The actually correct and convenient answer for python 3:
>>> import codecs
>>> myString = "spam\\neggs"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
spam
eggs
>>> myString = "naïve \\t test"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
naïve test
Details regarding codecs.escape_decode:
codecs.escape_decode is a bytes-to-bytes decoder
codecs.escape_decode decodes ascii escape sequences, such as: b"\\n" -> b"\n", b"\\xce" -> b"\xce".
codecs.escape_decode does not care or need to know about the byte object's encoding, but the encoding of the escaped bytes should match the encoding of the rest of the object.
Background:
#rspeer is correct: unicode_escape is the incorrect solution for python3. This is because unicode_escape decodes escaped bytes, then decodes bytes to unicode string, but receives no information regarding which codec to use for the second operation.
#Jerub is correct: avoid the AST or eval.
I first discovered codecs.escape_decode from this answer to "how do I .decode('string-escape') in Python3?". As that answer states, that function is currently not documented for python 3.
The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.
Of course Python's interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.
Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.
The (currently) accepted answer by Jerub is correct for python2, but incorrect and may produce garbled results (as Apalala points out in a comment to that solution), for python3. That's because the unicode_escape codec requires its source to be coded in latin-1, not utf-8, as per the official python docs. Hence, in python3 use:
>>> myString="špåm\\nëðþ\\x73"
>>> print(myString)
špåm\nëðþ\x73
>>> decoded_string = myString.encode('latin-1','backslashreplace').decode('unicode_escape')
>>> print(decoded_string)
špåm
ëðþs
This method also avoids the extra unnecessary roundtrip between strings and bytes in metatoaster's comments to Jerub's solution (but hats off to metatoaster for recognizing the bug in that solution).
This is a bad way of doing it, but it worked for me when trying to interpret escaped octals passed in a string argument.
input_string = eval('b"' + sys.argv[1] + '"')
It's worth mentioning that there is a difference between eval and ast.literal_eval (eval being way more unsafe). See Using python's eval() vs. ast.literal_eval()?
Quote the string properly so that it looks like the equivalent Python string literal, and then use ast.literal_eval. This is safe, but much trickier to get right than you might expect.
It's easy enough to add a " to the beginning and end of the string, but we also need to make sure that any " inside the string are properly escaped. If we want fully Python-compliant translation, we need to account for the deprecated behaviour of invalid escape sequences.
It works out that we need to add one backslash to:
any sequence of an even number of backslashes followed by a double-quote (so that we escape a quote if needed, but don't escape a backslash and un-escape the quote if it was already escaped); as well as
a sequence of an odd number of backslashes at the end of the input (because otherwise a backslash would escape our enclosing double-quote).
Here is an acid-test input showing a bunch of difficult cases:
>>> text = r'''\\ \ \" \\" \\\" \'你好'\n\u062a\xff\N{LATIN SMALL LETTER A}"''' + '\\'
>>> text
'\\\\ \\ \\" \\\\" \\\\\\" \\\'你好\'\\n\\u062a\\xff\\N{LATIN SMALL LETTER A}"\\'
>>> print(text)
\\ \ \" \\" \\\" \'你好'\n\u062a\xff\N{LATIN SMALL LETTER A}"\
I was eventually able to work out a regex that handles all these cases properly, allowing literal_eval to be used:
>>> def parse_escapes(text):
... fixed_escapes = re.sub(r'(?<!\\)(\\\\)*("|\\$)', r'\\\1\2', text)
... return ast.literal_eval(f'"{fixed_escapes}"')
...
Testing the results:
>>> parse_escapes(text)
'\\ \\ " \\" \\" \'你好\'\nتÿa"\\'
>>> print(parse_escapes(text))
\ \ " \" \" '你好'
تÿa"\
This should correctly handle everything - strings containing both single and double quotes, every weird situation with backslashes, and non-ASCII characters in the input. (I admit it's a bit difficult to verify the results by eye!)
Below code should work for \n is required to be displayed on the string.
import string
our_str = 'The String is \\n, \\n and \\n!'
new_str = string.replace(our_str, '/\\n', '/\n', 1)
print(new_str)
Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.
For example, let's say myString is defined as:
>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs
I want a function (I'll call it process) that does this:
>>> print(process(myString))
spam
eggs
It's important that the function can process all of the escape sequences in Python (listed in a table in the link above).
Does Python have a function to do this?
The correct thing to do is use the 'string-escape' code to decode the string.
>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs
Don't use the AST or eval. Using the string codecs is much safer.
unicode_escape doesn't work in general
It turns out that the string_escape or unicode_escape solution does not work in general -- particularly, it doesn't work in the presence of actual Unicode.
If you can be sure that every non-ASCII character will be escaped (and remember, anything beyond the first 128 characters is non-ASCII), unicode_escape will do the right thing for you. But if there are any literal non-ASCII characters already in your string, things will go wrong.
unicode_escape is fundamentally designed to convert bytes into Unicode text. But in many places -- for example, Python source code -- the source data is already Unicode text.
The only way this can work correctly is if you encode the text into bytes first. UTF-8 is the sensible encoding for all text, so that should work, right?
The following examples are in Python 3, so that the string literals are cleaner, but the same problem exists with slightly different manifestations on both Python 2 and 3.
>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naïve test
Well, that's wrong.
The new recommended way to use codecs that decode text into text is to call codecs.decode directly. Does that help?
>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naïve test
Not at all. (Also, the above is a UnicodeError on Python 2.)
The unicode_escape codec, despite its name, turns out to assume that all non-ASCII bytes are in the Latin-1 (ISO-8859-1) encoding. So you would have to do it like this:
>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve test
But that's terrible. This limits you to the 256 Latin-1 characters, as if Unicode had never been invented at all!
>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)
Adding a regular expression to solve the problem
(Surprisingly, we do not now have two problems.)
What we need to do is only apply the unicode_escape decoder to things that we are certain to be ASCII text. In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text.
The plan is, we'll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.
import re
import codecs
ESCAPE_SEQUENCE_RE = re.compile(r'''
( \\U........ # 8-digit hex escapes
| \\u.... # 4-digit hex escapes
| \\x.. # 2-digit hex escapes
| \\[0-7]{1,3} # Octal escapes
| \\N\{[^}]+\} # Unicode characters by name
| \\[\\'"abfnrtv] # Single-character escapes
)''', re.UNICODE | re.VERBOSE)
def decode_escapes(s):
def decode_match(match):
return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)
And with that:
>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő Rubik
The actually correct and convenient answer for python 3:
>>> import codecs
>>> myString = "spam\\neggs"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
spam
eggs
>>> myString = "naïve \\t test"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
naïve test
Details regarding codecs.escape_decode:
codecs.escape_decode is a bytes-to-bytes decoder
codecs.escape_decode decodes ascii escape sequences, such as: b"\\n" -> b"\n", b"\\xce" -> b"\xce".
codecs.escape_decode does not care or need to know about the byte object's encoding, but the encoding of the escaped bytes should match the encoding of the rest of the object.
Background:
#rspeer is correct: unicode_escape is the incorrect solution for python3. This is because unicode_escape decodes escaped bytes, then decodes bytes to unicode string, but receives no information regarding which codec to use for the second operation.
#Jerub is correct: avoid the AST or eval.
I first discovered codecs.escape_decode from this answer to "how do I .decode('string-escape') in Python3?". As that answer states, that function is currently not documented for python 3.
The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.
Of course Python's interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.
Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.
The (currently) accepted answer by Jerub is correct for python2, but incorrect and may produce garbled results (as Apalala points out in a comment to that solution), for python3. That's because the unicode_escape codec requires its source to be coded in latin-1, not utf-8, as per the official python docs. Hence, in python3 use:
>>> myString="špåm\\nëðþ\\x73"
>>> print(myString)
špåm\nëðþ\x73
>>> decoded_string = myString.encode('latin-1','backslashreplace').decode('unicode_escape')
>>> print(decoded_string)
špåm
ëðþs
This method also avoids the extra unnecessary roundtrip between strings and bytes in metatoaster's comments to Jerub's solution (but hats off to metatoaster for recognizing the bug in that solution).
This is a bad way of doing it, but it worked for me when trying to interpret escaped octals passed in a string argument.
input_string = eval('b"' + sys.argv[1] + '"')
It's worth mentioning that there is a difference between eval and ast.literal_eval (eval being way more unsafe). See Using python's eval() vs. ast.literal_eval()?
Quote the string properly so that it looks like the equivalent Python string literal, and then use ast.literal_eval. This is safe, but much trickier to get right than you might expect.
It's easy enough to add a " to the beginning and end of the string, but we also need to make sure that any " inside the string are properly escaped. If we want fully Python-compliant translation, we need to account for the deprecated behaviour of invalid escape sequences.
It works out that we need to add one backslash to:
any sequence of an even number of backslashes followed by a double-quote (so that we escape a quote if needed, but don't escape a backslash and un-escape the quote if it was already escaped); as well as
a sequence of an odd number of backslashes at the end of the input (because otherwise a backslash would escape our enclosing double-quote).
Here is an acid-test input showing a bunch of difficult cases:
>>> text = r'''\\ \ \" \\" \\\" \'你好'\n\u062a\xff\N{LATIN SMALL LETTER A}"''' + '\\'
>>> text
'\\\\ \\ \\" \\\\" \\\\\\" \\\'你好\'\\n\\u062a\\xff\\N{LATIN SMALL LETTER A}"\\'
>>> print(text)
\\ \ \" \\" \\\" \'你好'\n\u062a\xff\N{LATIN SMALL LETTER A}"\
I was eventually able to work out a regex that handles all these cases properly, allowing literal_eval to be used:
>>> def parse_escapes(text):
... fixed_escapes = re.sub(r'(?<!\\)(\\\\)*("|\\$)', r'\\\1\2', text)
... return ast.literal_eval(f'"{fixed_escapes}"')
...
Testing the results:
>>> parse_escapes(text)
'\\ \\ " \\" \\" \'你好\'\nتÿa"\\'
>>> print(parse_escapes(text))
\ \ " \" \" '你好'
تÿa"\
This should correctly handle everything - strings containing both single and double quotes, every weird situation with backslashes, and non-ASCII characters in the input. (I admit it's a bit difficult to verify the results by eye!)
Below code should work for \n is required to be displayed on the string.
import string
our_str = 'The String is \\n, \\n and \\n!'
new_str = string.replace(our_str, '/\\n', '/\n', 1)
print(new_str)
I knew that I could get unicode characters using the escape sequence, like this:
>>> print "\3"
♥
and I just wanted to look through available ASCII characters and written this:
for i in xrange(1, 99):
print "\%o" % i
and it prints "\1", "\2", "\3", etc., so not unicode characters. I then tried it using %s, %r, and %d and none of those seem to work either.
It was much more interesting than seeing available ASCII characters so I started reading about string formating and ended up with this piece working:
for i in xrange(1, 99):
print "{:c}".format(i)
The question is - why the initial code wasn't working?
Escape sequences in string literals are processed at "parse time", not at "run time".
If you write
"\%o"
Python parser sees a backslash followed by a percent sign and because this is not a valid escape sequence it will just keep both characters and then will also add o as a normal character (note that in this Python is different from e.g. the C++ programming language that it would have interpreted that string just as "%o" because in that language a backslash before a percent sign is interpreted as a percent sign only).
At run time the formatting operator will see as left side a string composed by three characters, a backslash and a %o sequence and that is the part that will be replaced by the right-hand side giving for example the string "\\1" for the input value 1 and that string is displayed as \1.
Python is interpreting \%o as 'literal backslash followed by a string formatting code'; \% doesn't mean anything in a python literal so the backslash is included literally.
You are looking for the chr() function:
for i in xrange(1, 99):
print chr(i)
The \ character escapes only work in python literals. You can instruct python to interpret an arbitrary string containing a literal \ backslash pus code to be interpreted as a python string literal using the string_escape codec:
>>> print repr('\\n'.decode('string_escape')
'\n'
Note that the proper way to specify a unicode literal is to use the \uxxxx format, and to use a unicode string literal:
>>> print u'\u2665'
♥
Raw bytes can also be generated using the \x00 escape sequence:
>>> print repr('\x12')
'\n'
String literals in Python source code are interpreted during lexical analysis – the first step of source code processing the Python compiler performs. The escape sequences are parsed, and only the resulting string is stored in memory. This is why e.g.
>>> "A"
'A'
>>> "\x41"
'A'
result in exactly the same string. Escape sequences are not processed while actually printing the string, or while performing string formatting. Printing basically means to copy the contents of the string to the terminal. Formatting means to interpolate the % or {} placeholders with the desired contents. The rest of the string is left unchanged.
The result of the formatting opartion
>>> "\%03o" % 65
'\\101'
is a string of four characters \101. (In the interactive interpreter, a representation of this string is shown; that's why you see the quotes and the double back slash.) The string literal "\101" on the other hand is a string with only a single character, namely a capital A.
As pointed out by Martijn Pieters, you can explicitly request interpretation of escape sequences with the string_escape codec:
>>> ("\%03o" % 65).decode("string_escape")
'A'
When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\xf6ld". I want the escape sequence to be returned as string. How to do it in python?
Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:
>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'
Another way of achieving this:
>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'
Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:
>>> print name.decode('latin-1')
Christensen Sköld
BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:
>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'
I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:
>>> u'\xcfa'
u'\xcfa'
>>> print u'\xcfa'
Ïa
Given a byte string with Unicode escapes b"\N{SNOWMAN}", b"\N{SNOWMAN}".decode('unicode-escape) will produce the expected Unicode string u'\u2603'.