How to pack & unpack 64 bits of data? - python

I have a 64-bit data structure as follows:
HHHHHHHHHHHHHHHHGGGGGGGGGGGGFFFEEEEDDDDCCCCCCCCCCCCBAAAAAAAAAAAA
A: 12 bits (unsigned)
B: 1 bit
C: 12 bits (unsigned)
D: 4 bits (unsigned)
E: 4 bits (unsigned)
F: 3 bits (unsigned)
G: 12 bits (unsigned)
H: 16 bits (unsigned)
Using Python, I am trying to determine which module (preferably native Python 3.x) I should be using. I am looking at BitVector but having trouble figuring some things out.
For ease of use, I want to be able to do something like the following:
# To set the `A` bits, use a mapped mask 'objectId'
bv = BitVector(size=64)
bv['objectId'] = 1
I'm not sure that BitVector actually works the way I want it to. Whatever module I end up implementing, the data structure will be encapsulated in a class that reads and writes to the structure via property getters/setters.
I will also be using constants (or enums) for some of the bit values and it would be convenient to be able to set a mapped mask using something like:
# To set the 'B' bit, use the constant flag to set the mapped mask 'visibility'
bv['visibility'] = PUBLIC
print(bv['visibility']) # output: 1
# To set the 'G' bits, us a mapped mask 'imageId'
bv['imageId'] = 238
Is there a Python module in 3.x that will help me achieve this goal? If BitVector will (or should) work, some helpful hints (e.g. examples) would be appreciated. It seems that BitVector wants to force everything to an 8-bit format which is not ideal for my application (IMHO).

Based on the recommendation to use bitarray I have come up with the following implementation with two utility methods:
def test_bitvector_set_block_id_slice(self):
bv = bitvector(VECTOR_SIZE)
bv.setall(False)
print("BitVector[{len}]: {bv}".format(len=bv.length(),
bv=bv.to01()))
print("set block id: current {bid} --> {nbid}".format(bid=bv[52:VECTOR_SIZE].to01(),
nbid=inttobitvector(12, 1).to01()))
# set blockVector.blockId (last 12 bits)
bv[52:VECTOR_SIZE] = inttobitvector(12, 1)
block_id = bv[52:VECTOR_SIZE]
self.assertTrue(bitvectortoint(block_id) == 1)
print("BitVector[{len}] set block id: {bin} [{val}]".format(len=bv.length(),
bin=block_id.to01(),
val=bitvectortoint(block_id)))
print("BitVector[{len}]: {bv}".format(len=bv.length(),
bv=bv.to01()))
print()
# utility methods
def bitvectortoint(bitvec):
out = 0
for bit in bitvec:
out = (out << 1) | bit
return out
def inttobitvector(size, n):
bits = "{bits}".format(bits="{0:0{size}b}".format(n,
size=size))
print("int[{n}] --> binary: {bits}".format(n=n,
bits=bits))
return bitvector(bits)
The output is as follows:
BitVector[64]: 0000000000000000000000000000000000000000000000000000000000000000
int[1] --> binary: 000000000001
set block id: current 000000000000 --> 000000000001
int[1] --> binary: 000000000001
BitVector[64] set block id: 000000000001 [1]
BitVector[64]: 0000000000000000000000000000000000000000000000000000000000000001
If there are improvements to the utility methods, I am more than willing to take some advice.

Related

Unpack IEEE 754 Floating Point Number

I am reading two 16 bit registers from a tcp client using the pymodbus module. The two registers make up a 32 bit IEEE 754 encoded floating point number. Currently I have the 32 bit binary value of the registers shown in the code below.
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
reg_1 = response.getRegister(0)<<(16 - (response.getRegister(0).bit_length())) #Get in 16 bit format
reg_2 = response.getRegister(1)<<(16 - (response.getRegister(1).bit_length())) #Get in 16 bit format
volts = (reg_1 << 16) | reg_2 #Get the 32 bit format
The above works fine to get the encoded value the problem is decoding it. I was going to code something like in this video but I came across the 'f' format in the struct module for IEEE 754 encoding. I tried decode the 32 bit float stored in volts in the code above using the unpack method in the struct module but ran into the following errors.
val = struct.unpack('f',volts)
>>> TypeError: a bytes-like object is required, not 'int'
Ok tried convert it to a 32 bit binary string.
temp = bin(volts)
val = struct.unpack('f',temp)
>>> TypeError: a bytes-like object is required, not 'str'
Tried to covert it to a bytes like object as in this post and format in different ways.
val = struct.unpack('f',bytes(volts))
>>> TypeError: string argument without an encoding
temp = "{0:b}".format(volts)
val = struct.unpack('f',temp)
>>> ValueError: Unknown format code 'b' for object of type 'str'
val = struct.unpack('f',volts.encode())
>>> struct.error: unpack requires a buffer of 4 bytes
Where do I add this buffer and where in the documentation does it say I need this buffer with the unpack method? It does say in the documentation
The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The calcsize(fmt) function returns a value in bytes but the len(string) returns a value of the length of the string, no?
Any suggestions are welcome.
EDIT
There is a solution to decoding below however a better solution to obtaining the 32 bit register value from the two 16 bit register values is shown below compared to the original in the question.
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 1 by 16 bits
reg_1s = reg_1 << 16
# OR with the reg_2
total = reg_1s | reg_2
I found a solution to the problem using the BinaryPayloadDecoder.fromRegisters() from the pymodbus moudule instead of the struct module. Note that this solution is specific to the modbus smart meter device I am using as the byte order and word order of the registers could change in other devices. It may still work in other devices to decode registers but I would advise to read the documentation of the device first to be sure. I left in the comments in the code below but when I refer to page 24 this is just for my device.
from pymodbus.client.sync import ModbusTcpClient
from pymodbus.constants import Endian
from pymodbus.payload import BinaryPayloadDecoder
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
# The response will contain two registers making a 32 bit floating point number
# Use the BinaryPayloadDecoder.fromRegisters() function to decode
# The coding scheme for a 32 bit float is IEEE 754 https://en.wikipedia.org/wiki/IEEE_754
# The MS Bytes are stored in the first address and the LS bytes are stored in the second address,
# this corresponds to a big endian byte order (Second parameter in function)
# The documentation for the Modbus registers for the smart meter on page 24 says that
# the low word is the first priority, this correspond to a little endian word order (Third parameter in function)
decoder = BinaryPayloadDecoder.fromRegisters(response.registers, Endian.Big, wordorder=Endian.Little)
final_val = (decoder.decode_32bit_float())
client.close()
EDIT
Credit to juanpa-arrivillaga and chepner the problem can be solved using the struct module also with the byteorder='little'. The two functions in the code below can be used if the byteorder is little or if the byte order is big depending upon the implementation.
import struct
from pymodbus.client.sync import ModbusTcpClient
def big_endian(response):
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 1 by 16 bits
reg_1s = reg_1 << 16
# OR with the reg_2
total = reg_1s | reg_2
return total
def little_endian(response):
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 2 by 16 bits
reg_2s = reg_2 << 16
# OR with the reg_1
total = reg_2s | reg_1
return(total)
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
# Little
little = little_endian(response)
lit_byte = little.to_bytes(4,byteorder='little')
print(struct.unpack('f',lit_byte))
# Big
big = big_endian(response)
big_byte = big.to_bytes(4,byteorder='big')
print(struct.unpack('f',big_byte))

Read a binary file using unpack in Python compared with a IDL method

I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.

Python u-Law (MULAW) wave decompression to raw wave signal

I googled this issue for last 2 weeks and wasn't able to find an algorithm or solution. I have some short .wav file but it has MULAW compression and python doesn't seem to have function inside wave.py that can successfully decompresses it. So I've taken upon myself to build a decoder in python.
I've found some info about MULAW in basic elements:
Wikipedia
A-law u-Law comparison
Some c-esc codec library
So I need some guidance, since I don't know how to approach getting from signed short integer to a full wave signal. This is my initial thought from what I've gathered so far:
So from wiki I've got a equation for u-law compression and decompression :
compression :
decompression :
So judging by compression equation, it looks like the output is limited to a float range of -1 to +1 , and with signed short integer from –32,768 to 32,767 so it looks like I would need to convert it from short int to float in specific range.
Now, to be honest, I've heard of quantisation before, but I am not sure if I should first try and dequantize and then decompress or in the other way, or even if in this case it is the same thing... the tutorials/documentation can be a bit of tricky with terminology.
The wave file I am working with is supposed to contain 'A' sound like for speech synthesis, I could probably verify success by comparing 2 waveforms in some audio software and custom wave analyzer but I would really like to diminish trial and error section of this process.
So what I've had in mind:
u = 0xff
data_chunk = b'\xe7\xe7' # -6169
data_to_r1 = unpack('h',data_chunk)[0]/0xffff # I suspect this is wrong,
# # but I don't know what else
u_law = ( -1 if data_chunk<0 else 1 )*( pow( 1+u, abs(data_to_r1)) -1 )/u
So is there some sort of algorithm or crucial steps I would need to take in form of first: decompression, second: quantisation : third ?
Since everything I find on google is how to read a .wav PCM-modulated file type, not how to manage it if wild compression arises.
So, after scouring the google the solution was found in github ( go figure ). I've searched for many many algorithms and found 1 that is within bounds of error for lossy compression. Which is for u law for positive values from 30 -> 1 and for negative values from -32 -> -1
To be honest i think this solution is adequate but not quite per equation per say, but it is best solution for now. This code is transcribed to python directly from gcc9108 audio codec
def uLaw_d(i8bit):
bias = 33
sign = pos = 0
decoded = 0
i8bit = ~i8bit
if i8bit&0x80:
i8bit &= ~(1<<7)
sign = -1
pos = ( (i8bit&0xf0) >> 4 ) + 5
decoded = ((1 << pos) | ((i8bit & 0x0F) << (pos - 4)) | (1 << (pos - 5))) - bias
return decoded if sign else ~decoded
def uLaw_e(i16bit):
MAX = 0x1fff
BIAS = 33
mask = 0x1000
sign = lsb = 0
pos = 12
if i16bit < 0:
i16bit = -i16bit
sign = 0x80
i16bit += BIAS
if ( i16bit>MAX ): i16bit = MAX
for x in reversed(range(pos)):
if i16bit&mask != mask and pos>=5:
pos = x
break
lsb = ( i16bit>>(pos-4) )&0xf
return ( ~( sign | ( pos<<4 ) | lsb ) )
With test:
print( 'normal :\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(0xff) )
print( 'encoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_e(0xff)) )
print( 'decoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_d(uLaw_e(0xff))) )
and output:
normal : 255 | FF : 0000000011111111
encoded: -179 | -B3 : -000000010110011
decoded: 263 | 107 : 0000000100000111
And as you can see 263-255 = 8 which is within bounds. When i tried to implement seeemmmm method described in G.711 ,that kind user Oliver Charlesworth suggested that i look in to , the decoded value for maximum in data was -8036 which is close to the maximum of uLaw spec, but i couldn't reverse engineer decoding function to get binary equivalent of function from wikipedia.
Lastly, i must say that i am currently disappointed that python library doesn't support all kind of compression algorithms since it is not just a tool that people use, it is also a resource python consumers learn from since most of data for further dive into code isn't readily available or understandable.
EDIT
After decoding the data and writing wav file via wave.py i've successfully succeeded to write a new raw linear PCM file. This works... even though i was sceptical at first.
EDIT 2: ::> you can find real solution oncompressions.py
I find this helpful for converting to/from ulaw with numpy arrays.
import audioop
def numpy_audioop_helper(x, xdtype, func, width, ydtype):
'''helper function for using audioop buffer conversion in numpy'''
xi = np.asanyarray(x).astype(xdtype)
if np.any(x != xi):
xinfo = np.iinfo(xdtype)
raise ValueError("input must be %s [%d..%d]" % (xdtype, xinfo.min, xinfo.max))
y = np.frombuffer(func(xi.tobytes(), width), dtype=ydtype)
return y.reshape(xi.shape)
def audioop_ulaw_compress(x):
return numpy_audioop_helper(x, np.int16, audioop.lin2ulaw, 2, np.uint8)
def audioop_ulaw_expand(x):
return numpy_audioop_helper(x, np.uint8, audioop.ulaw2lin, 2, np.int16)
Python actually supports decoding u-Law out of the box:
audioop.ulaw2lin(fragment, width)
Convert sound fragments in u-LAW encoding to linearly encoded sound fragments. u-LAW encoding always uses 8 bits samples, so width
refers only to the sample width of the output fragment here.
https://docs.python.org/3/library/audioop.html#audioop.ulaw2lin

sharing gmpy2 multi-precision integer between processes without copying

Is it possible to share gmpy2 multiprecision integers (https://pypi.python.org/pypi/gmpy2) between processes (created by multiprocessing) without creating copies in memory?
Each integer has about 750,000 bits. The integers are not modified by the processes.
Thank you.
Update: Tested code is below.
I would try the following untested approach:
Create a memory mapped file using Python's mmap library.
Use gmpy2.to_binary() to convert a gmpy2.mpz instance into binary string.
Write both the length of the binary string and binary string itself into the memory mapped file. To allow for random access, you should begin every write at a multiple of a fixed value, say 94000 in your case.
Populate the memory mapped file with all your values.
Then in each process, use gmpy2.from_binary() to read the data from the memory mapped file.
You need to read both the length of the binary string and binary string itself. You should be able to pass a slice from the memory mapped file directly to gmpy2.from_binary().
I may be simpler to create a list of (start, end) values for the position of each byte string in the memory mapped file and then pass that list to each process.
Update: Here is some sample code that has been tested on Linux with Python 3.4.
import mmap
import struct
import multiprocessing as mp
import gmpy2
# Number of mpz integers to place in the memory buffer.
z_count = 40000
# Maximum number of bits in each integer.
z_bits = 750000
# Total number of bytes used to store each integer.
# Size is rounded up to a multiple of 4.
z_size = 4 + (((z_bits + 31) // 32) * 4)
def f(instance):
global mm
s = 0
for i in range(z_count):
mm.seek(i * z_size)
t = struct.unpack('i', mm.read(4))[0]
z = gmpy2.from_binary(mm.read(t))
s += z
print(instance, z % 123456789)
def main():
global mm
mm = mmap.mmap(-1, z_count * z_size)
rs = gmpy2.random_state(42)
for i in range(z_count):
z = gmpy2.mpz_urandomb(rs, z_bits)
b = gmpy2.to_binary(z)
mm.seek(i * z_size)
mm.write(struct.pack('i', len(b)))
mm.write(b)
ctx = mp.get_context('fork')
pool = ctx.Pool(4)
pool.map_async(f, range(4))
pool.close()
pool.join()
if __name__ == '__main__':
main()

Construct parsing for unaligned int field?

I am using this nice little package "construct" for binary data parsing. However, I ran into a case where the format is defined as:
31 24 23 0
+-------------------------+
| status | an int number |
+-------------------------+
Basically, the higher 8 bits are used for status, and 3 bytes left for integer: an int type with higher bits masked off. I am a bit lost on what is the proper way of defining the format:
The brute force way is to define it as ULInt32 and do bit masking myself
Is there anyway I can use BitStruct to save the trouble?
Edit
Assuming Little Endian and based on jterrace's example and swapped=True suggestion, I think this is what will work in my case:
sample = "\xff\x01\x01\x01"
c = BitStruct("foo", BitField("i", 24, swapped=True), BitField("status", 8))
c.parse(sample)
Container({'i': 66047, 'status': 1})
Thanks
Oliver
This would be easy if construct contained Int24 types, but it doesn't. Instead, you can specify the bit lengths yourself like this:
>>> from construct import BitStruct, BitField
>>> sample = "\xff\x01\x01\x01"
>>> c = BitStruct("foo", BitField("status", 8), BitField("i", 24))
>>> c.parse(sample)
Container({'status': 255, 'i': 65793})
Note: The value \x01\x01\x01 is 65536 + 256 + 1 = 65793
BitStruct("foo",
BitField("status", 8),
BitField("number", 24))

Categories

Resources