I have a hex code like this:
\xf0\x9f\x94\xb4
And I want to encode this like this:
1F534
How can I transform it with a method in python 2.7?
Thanks
Here you are just asking: how can I find the unicode code of the character represented in utf8 with the (byte) string '\xf0\x9f\x94\xb4'?
In Python3 it would be as simple as:
>>> hex(ord(b'\xf0\x9f\x94\xb4'.decode()))
'0x1f534'
In a Python2 version compiled with --enable-unicode=ucs4, it would be more or less the same:
>>> hex(ord('\xf0\x9f\x94\xb4'.decode('utf-8')))
'0x1f534'
But after your comments, you have a Python 2.7 version compiled with --enable-unicode=ucs2. In that case, Unicode strings actually contain a UTF16 representation of the string:
>>> print [hex(ord(i)) for i in '\xf0\x9f\x94\xb4'.decode('utf-8')]
['0xd83d', '0xdd34']
with no direct way to find the true unicode code point of the U+1F534 LARGE RED CIRCLE character.
The last option is then to decode the utf8 sequence by hand. You can find the description of the UTF8 encoding on wikipedia. The following function take an utf-8 representation of an unicode character and return its code point:
def from_utf8(bstr):
b = [ord(i) for i in bstr]
if b[0] & 0x80 == 0: return b
if b[0] & 0xe0 == 0xc0:
return ((b[0] & 0x1F) << 6) | (b[1] & 0x3F)
if b[0] & 0xf0 == 0xe0:
return ((b[0] & 0xF) << 12) | ((b[1] & 0x3F) << 6) | (b[2] & 0x3F)
else:
return ((b[0] & 7) << 18) | ((b[1] & 0x3F) << 12) | \
((b[2] & 0x3F) << 6) | (b[3] & 0x3F)
Beware, no control is done here to make sure that the string is a correct UTF-8 representation of a single character... But at least it gives the expected result:
>>> print hex(from_utf8("\xf0\x9f\x94\xb4"))
0x1f534
Related
I'm quite new to python (have C,C++, Java script experience) and I run into strange behavior on pyCharm Python 3.7
I'm having a piece of code which calcs the xmodem CRC over TxBuffer and adds it to the buffer but somehow there's an extra character added.
TxBuffer = command + str(inverter)
CRC = CRCCCITT().calculate(TxBuffer)
print(hex(CRC)) # >>> prints 0x29b6
CRCstr = chr((CRC >> 8) & 0xff)
CRCstr += chr((CRC >> 0) & 0xff)
print(CRCstr) # >>> prints )ΒΆ
TxBuffer += CRCstr
# TxBuffer += chr((CRC >> 8) & 0xff)
# TxBuffer += chr((CRC >> 0) & 0xff) #line inserts \xc2 character
TxBuffer += "\r"
print(binascii.hexlify(TxBuffer.encode())) # >>>prints b'5e503030375047533029c2b60d'
So, what I can't explain is why the 'c2' character is added to my data?
Best regards,
In Python 3, chr creates a Unicode character. The call to encode converts to a byte string, but it must use an encoding to do so. There's only one encoding that has a 1-to-1 correspondence between Unicode code points and byte values, and that's 'latin1'. The default is probably 'utf-8' which will convert some code points into multi-byte sequences.
As suggested in one of the comments, this is one of those cases where you're better off working with a byte string from the start and avoiding Unicode altogether.
TxBuffer = TxBuffer.encode()
CRC = CRCCCITT().calculate(TxBuffer)
print(hex(CRC)) # prints 0x29b6
CRCstr = bytes([(CRC >> 8) & 0xff, (CRC >> 0) & 0xff])
print(CRCstr) # prints b')\xb6'
TxBuffer += CRCstr
TxBuffer += b"\r"
print(binascii.hexlify(TxBuffer)) # prints b'5e503030375047533029b60d'
I am working with a function which generates cyclical redundancy check values. on data packets prior to sending them out over serial and I seem to be having some problems with the Python not being able to determine the difference between a hex representation and an ascii representation of a value. I send the following data:
('+', ' ', 'N', '\x00', '\x08')
To the following function
# Computes CRC checksum using CRC-32 polynomial
def crc_stm32(self,data):
crc = 0xFFFFFFFF
for d in data:
crc ^= d
for i in range(32):
if crc & 0x80000000:
crc = (crc << 1) ^ 0x04C11DB7 #Polynomial used in STM32
else:
crc = (crc << 1)
crc = (crc & 0xFFFFFFFF)
return crc
Now the actual value of the '+' char that is going through this function is (as one might expect) 0x2B, however when Python gets to the line
crc ^= d
I am faced with the following error
unsupported operand type(s) for ^=: 'long' and 'str'
I have tried casting the value to chr(), hex(), int(), long() etc. all to no avail. It seems as though Python is interpreting the '+' value as a char or string.
As per juanpa's comment, the following modification to the code allowed for the proper handling of the data.
# Computes CRC checksum using CRC-32 polynomial
def crc_stm32(self,data):
crc = 0xFFFFFFFF
for d in map(ord,data):
crc ^= d
for i in range(32):
if crc & 0x80000000:
crc = (crc << 1) ^ 0x04C11DB7 #Polynomial used in STM32
else:
crc = (crc << 1)
crc = (crc & 0xFFFFFFFF)
print crc
return crc
I have an algorithm that I want to write in python and analyze it. I think I wrote it well, but my output doesn't match what the given output should be.
given algorithm;
Input{inStr: a binary string of bytes}
Output{outHash: 32-bit hashcode for the inStr in a series of hex values}
Mask: 0x3FFFFFFF
outHash: 0
for byte in input
intermediate_value = ((byte XOR 0xCC) Left Shift 24) OR
((byte XOR 0x33) Left Shift 16) OR
((byte XOR 0xAA) Left Shift 8) OR
(byte XOR 0x55)
outHash =(outHash AND Mask) + (intermediate_value AND Mask)
return outHash
My algorithm version in python is;
Input = "Hello world!"
Mask = 0x3FFFFFFF
outHash = 0
for byte in Input:
intermediate_value = ((ord(byte) ^ 0xCC) << 24) or ((ord(byte) ^ 0x33) << 16) or ((ord(byte) ^ 0xAA) << 8) or (ord(byte) ^ 0x55)
outHash =(outHash & Mask) + (intermediate_value & Mask)
print outHash
# use %x to print result in hex
print '%x'%outHash
For input "Hello world!", I should see output as 0x50b027cf, but my output is too different, it looks like;
1291845632
4d000000
OR must be bitwise OR operator (|).
i am trying to combine several variables into 1 element of a bytearray.
I have the variables:
version, padding, extension, cc
of sizes: 2b, 1b, 1b, 4b
how do i combine them in that order as one byte?
If the variables are integers, you can just use bit-shifts and bitwise-or operations to form a value comprised of 8-bits and then store that where you want in the bytearray.
ba[i] = version << 6 | padding << 5 | extension << 4 | cc
You can pack them into a byte using shift and bit-masking.
version, padding, extension, cc = 2, 0, 1, 3
byte = ((version & 3) << 6) | ((padding & 1) << 5) | ((extension & 1) << 4) | (cc & 7)
byte
# OUT: 147
Note that you have to mask them first or else if the value exceeds the range it will clobber the other fields.
In C I could, for example, zero out bit #10 in a 32 bit unsigned value like so:
unsigned long value = 0xdeadbeef;
value &= ~(1<<10);
How do I do that in Python ?
Bitwise operations on Python ints work much like in C. The &, | and ^ operators in Python work just like in C. The ~ operator works as for a signed integer in C; that is, ~x computes -x-1.
You have to be somewhat careful with left shifts, since Python integers aren't fixed-width. Use bit masks to obtain the low order bits. For example, to do the equivalent of shift of a 32-bit integer do (x << 5) & 0xffffffff.
value = 0xdeadbeef
value &= ~(1<<10)
Some common bit operations that might serve as example:
def get_bit(value, n):
return ((value >> n & 1) != 0)
def set_bit(value, n):
return value | (1 << n)
def clear_bit(value, n):
return value & ~(1 << n)
Usage e.g.
>>> get_bit(5, 2)
True
>>> get_bit(5, 1)
False
>>> set_bit(5, 1)
7
>>> clear_bit(5, 2)
1
>>> clear_bit(7, 2)
3
Python has C style bit manipulation operators, so your example is literally the same in Python except without type keywords.
value = 0xdeadbeef
value &= ~(1 << 10)
You should also check out BitArray, which is a nice interface for dealing with sequences of bits.
Omit the 'unsigned long', and the semi-colons are not needed either:
value = 0xDEADBEEF
value &= ~(1<<10)
print value
"0x%08X" % value
Have you tried copying and pasting your code into the Python REPL to see what will happen?
>>> value = 0xdeadbeef
>>> value &= ~(1<<10)
>>> hex (value)
'0xdeadbaef'
If you're going to do a lot of bit manipulation ( and you care much more about readability rather than performance for your application ) then you may want to create an integer wrapper to enable slicing like in Verilog or VHDL:
import math
class BitVector:
def __init__(self,val):
self._val = val
def __setslice__(self,highIndx,lowIndx,newVal):
assert math.ceil(math.log(newVal)/math.log(2)) <= (highIndx-lowIndx+1)
# clear out bit slice
clean_mask = (2**(highIndx+1)-1)^(2**(lowIndx)-1)
self._val = self._val ^ (self._val & clean_mask)
# set new value
self._val = self._val | (newVal<<lowIndx)
def __getslice__(self,highIndx,lowIndx):
return (self._val>>lowIndx)&(2L**(highIndx-lowIndx+1)-1)
b = BitVector(0)
b[3:0] = 0xD
b[7:4] = 0xE
b[11:8] = 0xA
b[15:12] = 0xD
for i in xrange(0,16,4):
print '%X'%b[i+3:i]
Outputs:
D
E
A
D
a = int('00001111', 2)
b = int('11110000', 2)
bin(a & b)[2:].zfill(8)
bin(a | b)[2:].zfill(8)
bin(a << 2)[2:].zfill(8)
bin(a >> 2)[2:].zfill(8)
bin(a ^ b)[2:].zfill(8)
int(bin(a | b)[2:].zfill(8), 2)