See this code:
my_src_str = '"""hello"""'
my_real_str = get_real_string_from_python_src_string(my_src_str)
In this case, my_src_str is a string representation in python source code format. I want to interpret it as a real python string. Here I want to get hello to my_real_str. How can I do this?
>>> import ast
>>> my_src_str = '"""hello"""'
>>> ast.literal_eval(my_src_str)
'hello'
Related
When receiving a JSON from some OCR server the encoding seems to be broken. The image includes some characters that are not encoded(?) properly. Displayed in console they are represented by \uXXXX.
For example processing an image like this:
ends up with output:
"some text \u0141\u00f3\u017a"
It's confusing because if I add some code like this:
mystr = mystr.replace(r'\u0141', '\u0141')
mystr = mystr.replace(r'\u00d3', '\u00d3')
mystr = mystr.replace(r'\u0142', '\u0142')
mystr = mystr.replace(r'\u017c', '\u017c')
mystr = mystr.replace(r'\u017a', '\u017a')
The output is ok:
"some text Ółźż"
What is more if I try to replace them by regex:
mystr = re.sub(r'(\\u[0-9|abcdef|ABCDEF]{4})', r'\g<1>', mystr)
The output remain "broken":
"some text \u0141\u00f3\u017a"
This OCR is processing image to MathML / Latex prepared for use in Python. Full documentation can be found here. So for example:
Will produce the following RAW output:
"\\(\\Delta=b^{2}-4 a c\\)"
Take a note that quotes are included in string - maybe this implies something to the case.
Why the characters are not being displayed properly in the first place while after this silly mystr.replace(x, x) it goes just fine?
Why the first method is working and re.sub fails? The code seems to be okay and it works fine in other script. What am I missing?
Python strings are unicode-encoded by default, so the string you have is different from the string you output.
>>> txt = r"some text \u0141\u00f3\u017a"
>>> txt
'some text \\u0141\\u00f3\\u017a'
>>> print(txt)
some text \u0141\u00f3\u017a
The regex doesn't work since there only is one backslash and it doesn't do anything to replace it. The python code converts your \uXXXX into the actual symbol and inserts it, which obviously works. To reproduce:
>>> txt[-5:]
'u017a'
>>> txt[-6:]
'\\u017a'
>>> txt[-6:-5]
'\\'
What you should do to resolve it:
Make sure your response is received in the correct encoding and not as a raw string. (e.g. use response.text instead of reponse.body)
Otherwise
>>> txt.encode("raw-unicode-escape").decode('unicode-escape')
'some text Łóź'
I stored uuid4 without dashes using native UNHEX() as binary in MySQL and use native HEX() function to retrieve the uuid4.
Example:
UUID4: UNHEX("7D96F13AC8394EF5A60E8252B70FC179")
BINARY IN MySQL: }éŽ:·9NŠÙ ýRÈ ¾y
UUID4: HEX(UUID4Column)
This works fine to store and retrieve using its functions HEX() and UNHEX().
However, by using ORM sqlalchemy and declare UUID4Column as LargeBinary(16), the return value become b'}\x96\xf1:\xc89N\xf5\xa6\x0e\x82R\xb7\x0f\xc1y'
How can convert this bytes to 7D96F13AC8394EF5A60E8252B70FC179in python code?
Without any imports:
You can use int.from_bytes and string formatting with 'X' to get the hexadecimal representation of the integer:
s = b'}\x96\xf1:\xc89N\xf5\xa6\x0e\x82R\xb7\x0f\xc1y'
nicer = f"{(int.from_bytes(s,byteorder='big')):X}"
print(nicer)
prints:
7D96F13AC8394EF5A60E8252B70FC179
You can use binascii.hexlify:
>>> import binascii
>>> bs = b'}\x96\xf1:\xc89N\xf5\xa6\x0e\x82R\xb7\x0f\xc1y'
>>> binascii.hexlify(bs)
b'7d96f13ac8394ef5a60e8252b70fc179'
>>> # for str result
>>> binascii.hexlify(bs).decode('ascii')
'7d96f13ac8394ef5a60e8252b70fc179'
This worked for me:
>>> print(b'}\x96\xf1:\xc89N\xf5\xa6\x0e\x82R\xb7\x0f\xc1y'.hex())
7d96f13ac8394ef5a60e8252b70fc179
Just pass your byte data in a variable or so and .hex() it for your answer.
Like this:
byte_data = b'}\x96\xf1:\xc89N\xf5\xa6\x0e\x82R\xb7\x0f\xc1y'
print(byte_data.hex())
eval() seems to be dangerous to use when processing unknown strings, which is what a part of my project is doing.
For my project I have a string, called:
stringAsByte = "b'a'"
I've tried to do the following to convert that string directly (without using eval):
byteRepresentation = str.encode(stringAsByte)
print(byteRepresentation) # prints b"b'a'"
Clearly, that didn't work, so instead of doing:
byteRepresentation = eval(stringAsByte) # Uses eval!
print(byteRepresentation) # prints b'a'
Is there another way where I can get the output b'a'?
yes, with ast.literal_eval which is safe since it only evaluates literals.
>>> import ast
>>> stringAsByte = "b'a'"
>>> ast.literal_eval(stringAsByte)
b'a'
This code:
a = "Hello,\\nWorld!"
print(a)
Prints:
Hello,\nWorld!
Sure enough.
But how do I reintroduce the specialness of newline symbol and print :
Hello,
World!
without making any changes to the original string. like :
a = "Hello,\nWorld!"
I am asking this because a certain function (over which I have no control) is returning a string like:
Hello,\\nWorld!
Which i would like to print to the screen (with formatting) as
Hello,
World!
You can decode your string using string_escape:
a = "Hello,\\nWorld!"
print(a)
# Hello,\nWorld!
print(a.decode("string_escape"))
# Hello,
# World!
UPDATE - Since you didn't specify the version immediately, the above works with Python 2.x because its str object is already what bytes is in Python 3.x. For Python 3.x strings you'd have to encode them first to bytes and then decode them using unicode_escape, or let the codecs module do that for you:
import codecs
a = "Hello,\\nWorld!"
print(a)
# Hello,\nWorld!
print(codecs.getdecoder('unicode_escape')(a)[0])
# Hello,
# World!
This seems much simpler.
s="Hello,\\nWorld!"
def spl(s):
return s.replace('\\n','\n')
print(spl(s))
print(s)
Just create a function which returns the replaced string. And then print it.
output:
Hello,
World!
Hello,\nWorld!
Also you don't have to change your original string.
I want to convert Python string to URL syntax.
For example
>>> u'한글'.encode('utf-8')
'\xed\x95\x9c\xea\xb8\x80' to '%ed%95%9c%ea%b8%80'
>>> import urllib2
>>> urllib2.quote('한글')
'%ED%95%9C%EA%B8%80'