Decoding NumPy int64 binary representation

Decoding NumPy int64 binary representation - python

So I did a stupid thing, and forgot to explicitly type-convert some values I was putting into an SQLite database (using Python's SQLalchemy). The column was set up to store an INT, whereas the input was actually a numpy.int64 dtype.
The values I am getting back out of the database look like:
b'\x15\x00\x00\x00\x00\x00\x00\x00'
It seems that SQLite has gone and stored the binary representation for these values, rather than the integer itself.
Is there a way to decode these values in Python, or am I stuck with loading all my data again (not a trivial exercise at this point)?

You can use struct.unpack():
>>> import struct
>>> value = struct.unpack('<q', b'\x15\x00\x00\x00\x00\x00\x00\x00')
>>> value
(21,)
>>> value[0]
21
That assumes that the data was stored little endian as specified by the < in the unpack() format string, and that it is a signed "long long" (8 bytes) as specified by the q. If the data is big endian:
>>> struct.unpack('>q', b'\x15\x00\x00\x00\x00\x00\x00\x00')
(1513209474796486656,)
I imagine that little endian is more likely to be correct in this case.
P.S. I have just confirmed that when a numpy.int64 is inserted into a SQLite int field it can be retrieved using struct.unpack() as shown above.

Related

Convert two raw values to 32-bit IEEE floating point number

I am attempting to decode some data from a Shark 100 Power Meter via TCP modbus. I have successfully pulled down the registers that I need, and am left with two raw values from the registers like so:
[17138, 59381]
From the manual, I know that I need to convert these two numbers into a 32bit IEEE floating-point number. I also know from the manual that "The lower-addressed register is the
high order half (i.e., contains the exponent)." The first number in the list shown above is the lower-addressed register.
Using Python (any library will do if needed), how would I take these two values and make them into a 32 bit IEEE floating point value.
I have tried to use various online converters and calculators to figure out a non-programmatic way to do this, however, anything I have tried gets me a result that is way out of bounds (I am reading volts in this case so the end result should be around 120-122 from the supplied values above).

Update for Python 3.6+ (f-strings).
I am not sure why the fill in #B.Go's answer was only 2. Also, since the byte order was big-endian, I hardcoded it as such.
import struct
a = 17138
b = 59381
struct.unpack('>f', bytes.fromhex(f"{a:0>4x}" + f"{b:0>4x}"))[0]
Output: 121.45304107666016

The following code works:
import struct
a=17138
b=59381
struct.unpack('!f', bytes.fromhex('{0:02x}'.format(a) + '{0:02x}'.format(b)))
It gives
(121.45304107666016,)
Adapted from Convert hex to float and Integer to Hexadecimal Conversion in Python

I read in the comments, and #Sanju had posted this link: https://github.com/riptideio/pymodbus/blob/master/examples/common/modbus_payload.py
For anyone using pymodbus, the BinaryPayloadDecoder is useful as it's built in. It's very easy to pass a result.registers, as shown in the example. Also, it has a logging integrated, so you can help debug why a conversion isn't working (ex: wrong endianness).
As such, I made a working example for this question (using pymodbus==2.3.0):
from pymodbus.constants import Endian
from pymodbus.payload import BinaryPayloadDecoder
a = 17138
b = 59381
registers = [a, b]
decoder = BinaryPayloadDecoder.fromRegisters(registers, byteorder=Endian.Big)
decoder.decode_32bit_float() # type: float
Output: 121.45304107666016

Why do we need endianness here?

I am reading a source-code which downloads the zip-file and reads the data into numpy array. The code suppose to work on macos and linux and here is the snippet that I see:
def _read32(bytestream):
dt = numpy.dtype(numpy.uint32).newbyteorder('>')
return numpy.frombuffer(bytestream.read(4), dtype=dt)
This function is used in the following context:
with gzip.open(filename) as bytestream:
magic = _read32(bytestream)
It is not hard to see what happens here, but I am puzzled with the purpose of newbyteorder('>'). I read the documentation, and know what endianness mean, but can not understand why exactly developer added newbyteorder (in my opinion it is not really needed).

That's because data downloaded is in big endian format as described in source page: http://yann.lecun.com/exdb/mnist/
All the integers in the files are stored in the MSB first (high
endian) format used by most non-Intel processors. Users of Intel
processors and other low-endian machines must flip the bytes of the
header.

It is just a way of ensuring that the bytes are interpreted from the resulting array in the correct order, regardless of a system's native byteorder.
By default, the built in NumPy integer dtypes will use the byteorder that is native to your system. For example, my system is little-endian, so simply using the dtype numpy.dtype(numpy.uint32) will mean that values read into an array from a buffer with the bytes in big-endian order will not be interpreted correctly.
If np.frombuffer is to meant to recieve bytes that are known to be in a particular byteorder, the best practice is to modify the dtype using newbyteorder. This is mentioned in the documents for np.frombuffer:
Notes
If the buffer has data that is not in machine byte-order, this should be specified as part of the data-type, e.g.:
>>> dt = np.dtype(int)
>>> dt = dt.newbyteorder('>')
>>> np.frombuffer(buf, dtype=dt)
The data of the resulting array will not be byteswapped, but will be
interpreted correctly.

Python Converts integer into a bit number of specific length, fast

I am trying to delta compress a list of pixels and store them in a binary file. I have managed to do this however the method I found takes ~4 minutes a frame.
def getByte_List(self):
values = BitArray("")
for I in range(len(self.delta_values)):
temp = Bits(int= self.delta_values[I], length=self.num_bits_pixel)
values.append(temp)
##start_time = time.time()
bit_stream = pack("uint:16, uint:5, bits", self.intial_value, self.num_bits_pixel, values)
##end_time = time.time()
##print(end_time - start_time)
# Make sure that the list of bits contains a multiple of 8 values
if (len(bit_stream) % 8):
bit_stream.append(Bits(uint=0, length = (8-(len(bit_stream) % 8)))) #####Append? On a pack? (Only work on bitarray? bit_stream = BitArray("")
# Create a list of unsigned integer values to represent each byte in the stream
fmt = (len(bit_stream)/8) * ["uint:8"]
return bit_stream.unpack(fmt)
This is my code. I take the initial value, the number of bits per pixel and the delta values and convert them into bits. I then byte align and take the integer representation of the bytes and use it elsewhere. The problem areas are where I convert each delta value to bits(3min) and where I pack(1min). Is it possible to do what I am doing faster or another way to pack them straight into integers representing bytes.

From a quick Google of the classes you're instantiating, it looks like you're using the bitstring module. This is written in pure Python, so it's not a big surprise that it's pretty slow. You might look at one or more of the following:
struct - a module that comes with Python that will let you pack and unpack C structures into constituent values
bytearray - a built-in type that lets you accumulate, well, an array of bytes, and has both list-like and string-like operations
bin(x), int(x, 2) - conversion of numbers to a binary representation as a string, and back - string manipulations can sometimes be a reasonably efficient way to do this
bitarray - native (C) module for bit manipulation, looks like it has similar functionality to bitstring but should be much faster. Available here in form suitable for compiling on Linux or here pre-compiled for Windows.
numpy - fast manipulation of arrays of various types including single bytes. Kind of the go-to module for this sort of thing, frankly. http://www.numpy.org/

Use integer keys in Berkeley DB with python (using bsddb3)

I want to use BDB as a time-series data store, and planning to use the microseconds since epoch as the key values. I am using BTREE as the data store type.
However, when I try to store integer keys, bsddb3 gives an error saying TypeError: Integer keys only allowed for Recno and Queue DB's.
What is the best workaround? I can store them as strings, but that probably will make it unnecessarily slower.
Given BDB itself can handle any kind of data, why is there a restriction? can I sorta hack the bsddb3 implementation? has anyone used anyother methods?

You can't store integers since bsddb doesn't know how to represent integers and which kind of integer it is.
If you convert your integer to a string you will break the lexicographic ordering of keys of bsddb: 10 > 2 but as strings "10" < "2".
You have to use python struct to convert your integers into a string (or in python 3 into bytes) to store then store them in bsddb. You have to use bigendian packing or ordering will not be correct.
Then you can use bsddb's Cursor.set_range(key) to query for information in a given slice of time.
For instance, Cursor.set_range(struct.unpack('>Q', 123456789)) will set the cursor at the key of the even happening at 123456789 or the first that happens after.

Well, there's no workaround. But you can use two approaches
Store the integers as string using str or repr. If the ints are big, you can even use string formatting
use cPickle/pickle module to store and retrieve data. This is a good way if you have data types other than basic types. For basics ints and floats this actually is slower and takes more space than just storing strings

Alternatives to using pack_into() when manipulating a list of bytes?

I'm reading in a binary file into a list and parsing the binary data. I'm using unpack() to extract certain parts of the data as primitive data types, and I want to edit that data and insert it back into the original list of bytes. Using pack_into() would make it easy, except that I'm using Python 2.4, and pack_into() wasn't introduced until 2.5
Does anyone know of a good way to go about serializing the data this way so that I can accomplish essentially the same functionality as pack_into()?

Have you looked at the bitstring module? It's designed to make the construction, parsing and modification of binary data easier than using the struct and array modules directly.
It's especially made for working at the bit level, but will work with bytes just as well. It will also work with Python 2.4.
from bitstring import BitString
s = BitString(filename='somefile')
# replace byte range with new values
# The step of '8' signifies byte rather than bit indicies.
s[10:15:8] = '0x001122'
# Search and replace byte value with two bytes
s.replace('0xcc', '0xddee', bytealigned=True)
# Different interpretations of the data are available through properties
if s[5:7:8].int > 1000:
s[5:7:8] = 1000
# Use the bytes property to get back to a Python string
open('newfile', 'wb').write(s.bytes)
The underlying data stored in the BitString is just an array object, but with a comprehensive set of functions and special methods to make it simple to modify and interpret.

Do you mean editing data in a buffer object? Documentation on manipulating those at all from Python directly is fairly scarce.
If you just want to edit bytes in a string, it's simple enough, though; struct.pack_into is new to 2.5, but struct.pack isn't:
import struct
s = open("file").read()
ofs = 1024
fmt = "Ih"
size = struct.calcsize(fmt)
before, data, after = s[0:ofs], s[ofs:ofs+size], s[ofs+size:]
values = list(struct.unpack(fmt, data))
values[0] += 5
values[1] /= 2
data = struct.pack(fmt, *values)
s = "".join([before, data, after])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Decoding NumPy int64 binary representation - python

Related

Convert two raw values to 32-bit IEEE floating point number

Why do we need endianness here?

Python Converts integer into a bit number of specific length, fast

Use integer keys in Berkeley DB with python (using bsddb3)

Alternatives to using pack_into() when manipulating a list of bytes?

Categories

Resources