Python 3.7 character refailure

Python 3.7 character refailure - python

I'm quite new to python (have C,C++, Java script experience) and I run into strange behavior on pyCharm Python 3.7
I'm having a piece of code which calcs the xmodem CRC over TxBuffer and adds it to the buffer but somehow there's an extra character added.
TxBuffer = command + str(inverter)
CRC = CRCCCITT().calculate(TxBuffer)
print(hex(CRC)) # >>> prints 0x29b6
CRCstr = chr((CRC >> 8) & 0xff)
CRCstr += chr((CRC >> 0) & 0xff)
print(CRCstr) # >>> prints )¶
TxBuffer += CRCstr
# TxBuffer += chr((CRC >> 8) & 0xff)
# TxBuffer += chr((CRC >> 0) & 0xff) #line inserts \xc2 character
TxBuffer += "\r"
print(binascii.hexlify(TxBuffer.encode())) # >>>prints b'5e503030375047533029c2b60d'
So, what I can't explain is why the 'c2' character is added to my data?
Best regards,

In Python 3, chr creates a Unicode character. The call to encode converts to a byte string, but it must use an encoding to do so. There's only one encoding that has a 1-to-1 correspondence between Unicode code points and byte values, and that's 'latin1'. The default is probably 'utf-8' which will convert some code points into multi-byte sequences.
As suggested in one of the comments, this is one of those cases where you're better off working with a byte string from the start and avoiding Unicode altogether.
TxBuffer = TxBuffer.encode()
CRC = CRCCCITT().calculate(TxBuffer)
print(hex(CRC)) # prints 0x29b6
CRCstr = bytes([(CRC >> 8) & 0xff, (CRC >> 0) & 0xff])
print(CRCstr) # prints b')\xb6'
TxBuffer += CRCstr
TxBuffer += b"\r"
print(binascii.hexlify(TxBuffer)) # prints b'5e503030375047533029b60d'

Related

trasform hex to hexcodepoint

I have a hex code like this:
\xf0\x9f\x94\xb4
And I want to encode this like this:
1F534
How can I transform it with a method in python 2.7?
Thanks

Here you are just asking: how can I find the unicode code of the character represented in utf8 with the (byte) string '\xf0\x9f\x94\xb4'?
In Python3 it would be as simple as:
>>> hex(ord(b'\xf0\x9f\x94\xb4'.decode()))
'0x1f534'
In a Python2 version compiled with --enable-unicode=ucs4, it would be more or less the same:
>>> hex(ord('\xf0\x9f\x94\xb4'.decode('utf-8')))
'0x1f534'
But after your comments, you have a Python 2.7 version compiled with --enable-unicode=ucs2. In that case, Unicode strings actually contain a UTF16 representation of the string:
>>> print [hex(ord(i)) for i in '\xf0\x9f\x94\xb4'.decode('utf-8')]
['0xd83d', '0xdd34']
with no direct way to find the true unicode code point of the U+1F534 LARGE RED CIRCLE character.
The last option is then to decode the utf8 sequence by hand. You can find the description of the UTF8 encoding on wikipedia. The following function take an utf-8 representation of an unicode character and return its code point:
def from_utf8(bstr):
b = [ord(i) for i in bstr]
if b[0] & 0x80 == 0: return b
if b[0] & 0xe0 == 0xc0:
return ((b[0] & 0x1F) << 6) | (b[1] & 0x3F)
if b[0] & 0xf0 == 0xe0:
return ((b[0] & 0xF) << 12) | ((b[1] & 0x3F) << 6) | (b[2] & 0x3F)
else:
return ((b[0] & 7) << 18) | ((b[1] & 0x3F) << 12) | \
((b[2] & 0x3F) << 6) | (b[3] & 0x3F)
Beware, no control is done here to make sure that the string is a correct UTF-8 representation of a single character... But at least it gives the expected result:
>>> print hex(from_utf8("\xf0\x9f\x94\xb4"))
0x1f534

Unsupported operand types for bitwise exclusive OR in Python

I am working with a function which generates cyclical redundancy check values. on data packets prior to sending them out over serial and I seem to be having some problems with the Python not being able to determine the difference between a hex representation and an ascii representation of a value. I send the following data:
('+', ' ', 'N', '\x00', '\x08')
To the following function
# Computes CRC checksum using CRC-32 polynomial
def crc_stm32(self,data):
crc = 0xFFFFFFFF
for d in data:
crc ^= d
for i in range(32):
if crc & 0x80000000:
crc = (crc << 1) ^ 0x04C11DB7 #Polynomial used in STM32
else:
crc = (crc << 1)
crc = (crc & 0xFFFFFFFF)
return crc
Now the actual value of the '+' char that is going through this function is (as one might expect) 0x2B, however when Python gets to the line
crc ^= d
I am faced with the following error
unsupported operand type(s) for ^=: 'long' and 'str'
I have tried casting the value to chr(), hex(), int(), long() etc. all to no avail. It seems as though Python is interpreting the '+' value as a char or string.

As per juanpa's comment, the following modification to the code allowed for the proper handling of the data.
# Computes CRC checksum using CRC-32 polynomial
def crc_stm32(self,data):
crc = 0xFFFFFFFF
for d in map(ord,data):
crc ^= d
for i in range(32):
if crc & 0x80000000:
crc = (crc << 1) ^ 0x04C11DB7 #Polynomial used in STM32
else:
crc = (crc << 1)
crc = (crc & 0xFFFFFFFF)
print crc
return crc

Byte representation of unicode string

This is python3 code:
>>> bytes(json.dumps({'Ä':0}), "utf-8")
b'{"\\u00c4": 0}'
json.dumps() returns unicode string and bytes() returns its' bytes representation - string encoded into utf-8.
How do I achieve the same result in Lua? I need a bytes representation of a json object which contains non-ascii chars.

You have to do it manually.
local function utf8_to_unicode(utf8str, pos)
local code, size = utf8str:byte(pos), 1
if code >= 0xC0 and code < 0xFE then
local mask = 64
code = code - 128
repeat
local next_byte = utf8str:byte(pos + size) or 0
if next_byte >= 0x80 and next_byte < 0xC0 then
code, size = (code - mask - 2) * 64 + next_byte, size + 1
else
code, size = utf8str:byte(pos), 1
end
mask = mask * 32
until code < mask
end
-- returns code, number of bytes in this utf8 char
return code, size
end
function utf8_to_python(utf8str)
local pos = 1
local z = ''
while pos <= #utf8str do
local unicode, size = utf8_to_unicode(utf8str, pos)
pos = pos + size
if unicode < 0x80 then
z = z..string.char(unicode)
elseif unicode < 0x10000 then
z = z..string.format('\\\\u%04x', unicode)
else
z = z..string.format('\\\\U%08x', unicode)
end
end
return z
end
Usage:
local json = require('json')
local x = {['Ä'] = 0}
local y = json.encode(x)
print(y) --> {"Ä":0}
local z = utf8_to_python(y)
print(z) --> {"\\u00c4":0}

A simpler version using string.gsub:
local function python_escape(str)
return (string.gsub(
str,
-- leading byte followed by one or more continuation bytes;
-- decimal version for Lua 5.1: "[\194-\244][\128-\191]+",
"[\xC2-\xF4][\x80-\xBF]+"
function (non_ASCII)
local codepoint = utf8.codepoint(non_ASCII)
if codepoint <= 0xFFFF then
return ("\\u%04x"):format(codepoint)
else
return ("\\U%08x"):format(codepoint)
end
end))
end
I put parentheses around the return value (string.gsub(--[[...]])) to strip away the second return value of string.gsub (the number of replacements).

Python working with bits

I want to do a bit operation, and need some help:
I have a word of 16 bit and want to split it into two, reverse each and then join them again.
Example if i have 0b11000011
First I divide it into 0b1100 and 0b0011
Then i reverse both getting 0b0011 and 0b1100
And finally rejoin them getting 0b00111100
Thanks!

Here's one way to do it:
def rev(n):
res = 0
mask = 0x01
while mask <= 0x80:
res <<= 1
res |= bool(n & mask)
mask <<= 1
return res
x = 0b1100000110000011
x = (rev(x >> 8) << 8) | rev(x & 0xFF)
print bin(x) # 0b1000001111000001
Note that the method above operates on words, not bytes as example in the question.

here are some basic operations you can try, and you can concatenate results after splitting your string in two and reversing it
a = "0b11000011" #make a string
b = a[:6] #get first 5 chars
c = a[::-1] # invert the string

Why should I be failing this simple python algorithm?

I have an algorithm that I want to write in python and analyze it. I think I wrote it well, but my output doesn't match what the given output should be.
given algorithm;
Input{inStr: a binary string of bytes}
Output{outHash: 32-bit hashcode for the inStr in a series of hex values}
Mask: 0x3FFFFFFF
outHash: 0
for byte in input
intermediate_value = ((byte XOR 0xCC) Left Shift 24) OR
((byte XOR 0x33) Left Shift 16) OR
((byte XOR 0xAA) Left Shift 8) OR
(byte XOR 0x55)
outHash =(outHash AND Mask) + (intermediate_value AND Mask)
return outHash
My algorithm version in python is;
Input = "Hello world!"
Mask = 0x3FFFFFFF
outHash = 0
for byte in Input:
intermediate_value = ((ord(byte) ^ 0xCC) << 24) or ((ord(byte) ^ 0x33) << 16) or ((ord(byte) ^ 0xAA) << 8) or (ord(byte) ^ 0x55)
outHash =(outHash & Mask) + (intermediate_value & Mask)
print outHash
# use %x to print result in hex
print '%x'%outHash
For input "Hello world!", I should see output as 0x50b027cf, but my output is too different, it looks like;
1291845632
4d000000

OR must be bitwise OR operator (|).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3.7 character refailure - python

Related

trasform hex to hexcodepoint

Unsupported operand types for bitwise exclusive OR in Python

Byte representation of unicode string

Python working with bits

Why should I be failing this simple python algorithm?

Categories

Resources