Print string in a form of Unicode codes - python

How can I print a string as a sequence of unicode codes in Python?
Input: "если" (in Russian).
Output: "\u0435\u0441\u043b\u0438"

This should work:
>>> s = u'если'
>>> print repr(s)
u'\u0435\u0441\u043b\u0438'

Code:
txt = u"если"
print repr(txt)
Output:
u'\u0435\u0441\u043b\u0438'

a = u"\u0435\u0441\u043b\u0438"
print "".join("\u{0:04x}".format(ord(c)) for c in a)

If you need a specific encoding, you can use :
txt = u'если'
print txt.encode('utf8')
print txt.encode('utf16')

Related

Why doesn't this decode function working in python2?

I am debugging a python2 code:
tag_list = [convert(tag) for tag in tag_list]
print('tag_list: ', str(tag_list).decode("utf-8"))
However, the print out is like below:
u"['\\xe4\\xba\\xa4\\xe9\\x80\\x9a\\xe6\\x9c\\x8d\\xe5\\x8a\\xa1', '\\xe7\\xa4\\xbe\\xe4\\xbc\\x9a', '\\xe7\\x94\\xb5\\xe8\\xa7\\x86\\xe5\\x89\\xa7', '\\xe9\\x9f\\xb3\\xe4\\xb9\\x90']"
How to correctly print out the actual strings, instead of those x codes?
print("[" + ", ".join(tag_list) + "]")
I guess would give you the output you want ... maybe

python from hex to shellcode format

I try to convert a hex string to shellcode format
For example: I have a file in hex string like aabbccddeeff11223344
and I want to convert that through python to show this exact format:
"\xaa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44" including the quotes "".
My code is:
with open("file","r") as f:
a = f.read()
b = "\\x".join(a[i:i+2] for i in range(0, len(a), 2))
print b
so my output is aa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44\x.
I understand I can do it via sed command but I wonder how I may accomplish this through python.
The binascii standard module will help here:
import binascii
print repr(binascii.unhexlify("aabbccddeeff11223344"))
Output:
>>> print repr(binascii.unhexlify("aabbccddeeff11223344"))
'\xaa\xbb\xcc\xdd\xee\xff\x11"3D'

Python3 print in hex representation

I can find lot's of threads that tell me how to convert values to and from hex. I do not want to convert anything. Rather I want to print the bytes I already have in hex representation, e.g.
byteval = '\x60'.encode('ASCII')
print(byteval) # b'\x60'
Instead when I do this I get:
byteval = '\x60'.encode('ASCII')
print(byteval) # b'`'
Because ` is the ASCII character that my byte corresponds to.
To clarify: type(byteval) is bytes, not string.
>>> print("b'" + ''.join('\\x{:02x}'.format(x) for x in byteval) + "'")
b'\x60'
See this:
hexify = lambda s: [hex(ord(i)) for i in list(str(s))]
And
print(hexify("abcde"))
# ['0x61', '0x62', '0x63', '0x64', '0x65']
Another example:
byteval='\x60'.encode('ASCII')
hexify = lambda s: [hex(ord(i)) for i in list(str(s))]
print(hexify(byteval))
# ['0x62', '0x27', '0x60', '0x27']
Taken from https://helloacm.com/one-line-python-lambda-function-to-hexify-a-string-data-converting-ascii-code-to-hexadecimal/

How to print string inside a parentheses in a line in Python?

I have lots of lines in a text file. They looks like, for example:
562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4
711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0
I want to exact the string inside the parantheses, for example, output in this case should
"Auto_Gain_ROI_Size" and "Auto_Contrast".
Notice that, string is always enclosed by "Parameter()". Thanks.
You can use regex:
>>> import re
>>> s = "562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4"
>>> t = "711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0 "
>>> myreg = re.compile(r'Parameter\((.*?)\)')
>>> print myreg.search(s).group(1)
Auto_Gain_ROI_Size
>>> print myreg.search(t).group(1)
Auto_Contrast
Or, without regex (albeit a bit more messier):
>>> print s.split('Parameter(')[1].split(')')[0]
Auto_Gain_ROI_Size
>>> print t.split('Parameter(')[1].split(')')[0]
Auto_Contrast

Python Convert Unicode-Hex utf-8 strings to Unicode strings

Have s = u'Gaga\xe2\x80\x99s' but need to convert to t = u'Gaga\u2019s'
How can this be best achieved?
s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
x = s.encode('raw-unicode-escape').decode('utf-8')
assert x==t
print(x)
yields
Gaga’s
Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:
>>> s = u'Gaga\xe2\x80\x99s'
>>> s.encode('latin-1').decode('utf8')
u'Gaga\u2019s'
import codecs
s = u"Gaga\xe2\x80\x99s"
s_as_str = codecs.charmap_encode(s)[0]
t = unicode(s_as_str, "utf-8")
print t
prints
u'Gaga\u2019s'

Categories

Resources