Native array.frombytes() (not numpy!) mysterious behavior

Native array.frombytes() (not numpy!) mysterious behavior - python

[I cannot use numpy so please refrain from talking about it]
I (apparently naively) thought Python array.frombytes() would read from a series of bytes representing various machine format integers, depending on how you create the array object. On creation you are required to provide a letter type code telling it (or so I thought) the machine type of integer making up the byte stream.
import array
b = b"\x01\x00\x02\x00\x03\x00\x04\x00"
a = array.array('i') #signed int (2 bytes)
a.frombytes(b)
print(a)
array('i', [131073, 262147])
and in the debugger:
array('i', [131073, 262147])
itemsize: 4
typecode: 'i'
The bytes in b are a series of little endian int16s (type code = 'i'). Despite being told this, it interpreted the bytes as 4-byte integers. This is Python 3.7.8.
I really need to convert the varying ints into an array (or list) of Python ints to deal with image data coming in byte-streams but which is actually either 16-bit or 32-bit integer, or 64 bit double floating format. What did I miss or do wrong? Or what is the right way to accomplish this?

Note that the documentation doesn't specify the exact size of each type, it specifies the minimum size. Which means it may use a larger size if it wants, probably based on the types in the C compiler that was used to build Python.
Here are all the sizes on my system:
for c in 'bBuhHiIlLqQfd':
print(c, array(c).itemsize)
b 1
B 1
u 2
h 2
H 2
i 4
I 4
l 4
L 4
q 8
Q 8
f 4
d 8
I would suggest using the 'h' or 'H' type.

Related

R: What's the R equivalent to numpy's .dtype.itemsize and .dtype.alignment array properties?

I'm trying to convert something like this from python into R:
dt = my_array.dtype
fw = int(dt.itemsize/dt.alignment)
b = numpy.array([list(w.ljust(fw)) for w in my_array.T])
I've looked around but haven't found anything on this particular topic.

The first line extracts the data type. R might use class(my_array). Using typeof or mode might also be possible but unless you have studied R for a while you may not get the information you desire. It appears that Python encodes several types of information in the datatype string. There isn't really an exact parallel in R but you might want to look at the value returned by str(). Unlike Python's dt, the value from str is not going to be accessible for further breaks=down by additional functions. From its help page:
Value
str does not return anything, for efficiency reasons. The obvious side effect is output to the terminal.
The attributes function will sometimes yield additional information about an object, but in the case of an array there's nothing additional to the information from dim.
> my_array <- array(1:24, c(2,3,4)) # a 2 x 3 x 4 array of integers
> class(my_array)
[1] "array"
> str(my_array)
int [1:2, 1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
dim(my_array) # Not sure, but this might be the equivalent of "alignment"
[1] 2 3 4
attributes(my_array)
$dim
[1] 2 3 4
> length(my_array)
[1] 24
> mode(my_array)
[1] "numeric"

Reading 12bit little endian integers from can frame

I am reading in a series of CAN BUS frames from python-can represented as hex strings, e.g. '9819961F9FFF7FC1' and I know the values in each frame are laid out as follows:
Signal Startbit Length
A 0 8
B 8 4
C 12 4
D 16 12
E 28 12
F 40 16
G 56 4
With each value being an unsigned integer, with little endian byte order. Where I am struggling is how to deal with the 12 bit signals, and how to do it fast as this will be running in real time. As far as I understand struct.unpack only supports 1,2,4, and 8 byte integers. The Bitstring package also only supports whole-byte bitstrings when you specify the endianness.
I clearly don't understand binary well enough to do it by manipulating the bits directly because I have been tearing my hair out trying to get sensible values...

I was able to decode the frame successfully and reasonably quickly with the bitstruct library, which can handle values with any number of bits, as in the code below.
However I found I also had to swap the location of the hex characters if two signals are present on the same byte, as in the CAN frame layout. I'm still not sure why, but it does work.
swapped_frame = frame[0:2] + frame[3] + frame[2] + frame[4:6] + frame[7] + \
frame[6] + frame[8:]
ba = bytearray(swapped_frame.decode('hex'))
A,B,C,D,E,F,G = bitstruct.unpack('<u8u4u4u12u12u16u4', ba)

Is it possible to convert a digit integer in python into a 1 byte character

I have a variable x in my code that takes only three values x = {1, 2, 3}. When use the sys.getsizeof() I get 24 which is the size of an object in bytes.
Question
I was wondering if it's possible in python to convert x to char with 1 byte size. I used the str(x) but sys.getsizeof(str(x)) printed 38 bytes.

It is not possible for a single byte, since python objects always include the overhead of the Python implementation.
Your use case is only relevant in practice, if you have larger amounts of such values (thousands or millions, e.g. an image). In that case you would use for example the array or bytearray objects as containers. Another approach would be using numpy arrays.

How to create a fixed size (unsigned) integer in python?

I want to create a fixed size integer in python, for example 4 bytes. Coming from a C background, I expected that all the primitive types will occupy a constant space in memory, however when I try the following in python:
import sys
print sys.getsizeof(1000)
print sys.getsizeof(100000000000000000000000000000000000000000000000000000000)
I get
>>>24
>>>52
respectively.
How can I create a fixed size (unsigned) integer of 4 bytes in python? I need it to be 4 bytes regardless if the binary representation uses 3 or 23 bits, since later on I will have to do byte level memory manipulation with Assembly.

You can use struct.pack with the I modifier (unsigned int). This function will warn when the integer does not fit in four bytes:
>>> from struct import *
>>> pack('I', 1000)
'\xe8\x03\x00\x00'
>>> pack('I', 10000000)
'\x80\x96\x98\x00'
>>> pack('I', 1000000000000000)
sys:1: DeprecationWarning: 'I' format requires 0 <= number <= 4294967295
'\x00\x80\xc6\xa4'
You can also specify endianness.

the way I do this (and its usually to ensure a fixed width integer before sending to some hardware) is via ctypes
from ctypes import c_ushort
def hex16(self, data):
'''16bit int->hex converter'''
return '0x%004x' % (c_ushort(data).value)
#------------------------------------------------------------------------------
def int16(self, data):
'''16bit hex->int converter'''
return c_ushort(int(data,16)).value
otherwise struct can do it
from struct import pack, unpack
pack_type = {'signed':'>h','unsigned':'>H',}
pack(self.pack_type[sign_type], data)

you are missing something here I think
when you send a character you will be sending 1 byte so even though
sys.getsizeof('\x05')
reports larger than 8 you are still only sending a single byte when you send it. the extra overhead is python methods that are attached to EVERYTHING in python, those do not get transmitted
you complained about getsizeof for the struct pack answer but accepted the c_ushort answer so I figured I would show you this
>>> sys.getsizeof(struct.pack("I",15))
28
>>> sys.getsizeof(c_ushort(15))
80
however that said both of the answers should do exactly what you want

I have no idea if there's a better way to do this, but here's my naive approach:
def intn(n, num_bits=4):
return min(2 ** num_bits - 1, n)

Python Struct, size changed by alignment.

Here's the hex code I am trying to unpack.
b'ABCDFGHa\x00a\x00a\x00a\x00a\x00\x00\x00\x00\x00\x00\x01' (it's not supposed to make any sense)
labels = unpack('BBBBBBBHHHHH5sB', msg)
struct.error: unpack requires a bytes argument of length 24
From what I counted, both of those are length = 23, both the format in my unpack function and the length of the hex values. I don't understand.
Thanks in advance

Most processors access data faster when the data is on natural boundaries, meaning data of size 2 should be on even addresses, data of size 4 should be accessed on addresses divisible by four, etc.
struct by default maintains this alignment. Since your structure starts out with 7 'B', a padding byte is added to align the next 'H' on an even address. To prevent this in Python, precede your string with '='.
Example:
>>> import struct
>>> struct.calcsize('BBB')
3
>>> struct.calcsize('BBBH')
6
>>> struct.calcsize('=BBBH')
5

I think H is enforcing 2-byte alignment after your 7 B
Aha, the alignment info is at the top of http://docs.python.org/library/struct.html, not down by the definition of the format characters.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Native array.frombytes() (not numpy!) mysterious behavior - python

Related

R: What's the R equivalent to numpy's .dtype.itemsize and .dtype.alignment array properties?

Reading 12bit little endian integers from can frame

Is it possible to convert a digit integer in python into a 1 byte character

How to create a fixed size (unsigned) integer in python?

Python Struct, size changed by alignment.

Categories

Resources