I am trying to extract an integer which occupies up to 12 bits in a 2 byte (16 bit) message, which is in big-endian format. I have done some research already and expect that I will have to use bit_manipulation (bit shifting) to achieve this, but I am unsure how this can be applied to big-endian format.
A couple of answers on here used the python 'Numpy' package, but I don't have access to that on Micropython. I do have access to the 'ustruct' module, which I use to unpack certain other parts of the message, but it only seems to apply to 8 bit, 16bit and 32bit messages.
So far the only thing I have come up with is:
int12 = (byte1 << 4) + (byte2)
expected_value = int.from_bytes(int12)
but this isn't giving me the number's I am expecting. For example 0x02,0x15 should present decimal 533 .
Where am I going wrong?
I'm new to bit manipulation and extracting data from bytes so any help is greatly appreciated, Thanks!
This should work:
import struct
val, _ = struct.unpack( '!h', b'23' )
val = (val >> 4) & 0xFFF
gives:
>>> hex(val)
'0x333'
However, you should check what 12 bits out of 16 are occupied. My previous code assumes that those are the upper 3 nibbles. If the number occupies lower 3 nibbles, you don't need any shifts, just the mask with 0xFFF.
Related
I am trying to write a python driver for a lidar sensor that only has a package for robot OS.
I was able to get the communication working on a Raspberry Pi and I am getting the data that I need.
I never really worked with bytearrays before and even python is pretty new to me.
The received data looks like this (png), but you can take a look at the documentation (pdf) as well.
So if I'm not mistaken, I have to combine three bits into two like this:
[0x5D, 0xC7, 0xD0] => [0x5DC, 0x7D0]
I think the aforementioned robot OS library does this here, but my c++ is even worse than my python :)
After I have the correct data I want to sort it into a 2D array but that's not a problem.
Can you point me in the right direction, or just suggest how to search for a solution?
Thank you for your help
So here's one solution (maybe not the cleanest but it's bit-manipulation so...):
arr = [0x5D, 0xC7, 0xD0]
byte_0 = arr[0] << 4 | (arr[1] >> 4)
byte_1 = (arr[1] & 0xF) << 8 | arr[2]
I'll try to go over this step by step. The three bytes are, in binary representation:
0b0101_1101
0b1100_0111
0b1101_0000
The << operator is the shift-operator. It moves the bits to the left the specified amount. Applying this to the first byte yields:
0b0101_1101 << 4 = 0b0101_1101_0000, effectively appending four zero's at the end.
The >> operator is basically equivalent to the << operator, just shifting it the other way round. It discards bits when they would go below position 0:
0b1100_0111 >> 4 = 0b1100
Finally, the | operator is the logical 'or' operator. It performs a bit-wise or operation where each result bit is '1' if one or both of the initial bits is '1'. It is only '0' when both bits are '0'. We can make use of this to 'override' the contents of the lower four bits of our result so far. Note that I have omitted leading zero's for simplicity, but here are the numbers padded with zeroes
0b0101_1101_0000 | 0b0000_0000_1100 = 0b0101_1101_1100. And there you have your first number. Now note that this is not a byte, rather you now need 12 bits to represent the number.
The same is done with the second byte. The only thing new here is the logical and operator (&). This operator yields '1' only if both bits are '1'. We can use this to mask out a part of interest of the byte:
0b1100_0111 & 0x1111 = 0b0111
[Edit: In summary, this question was the result of me making (clearly incorrect) assumptions about what endian means (I assumed it was 00000001 vs 10000000, i.e. reversing the bits, rather than the bytes). Many thanks #tripleee for clearing up my confusion.]
As far as I can tell, the byte order of frames returned by the Python 3 wave module [1] (which I'll now refer to as pywave) isn't documented. I've had a look at the source code [2] [3], but haven't quite figured it out.
Firstly, it looks like pywave only supports 'RIFF' wave files [2]. 'RIFF' files use little endian; unsigned for 8 bit or lower bitrate, otherwise signed (two's complement).
However, it looks like pywave converts the bytes it reads from the file to sys.byteorder [2]:
data = self._data_chunk.read(nframes * self._framesize)
if self._sampwidth != 1 and sys.byteorder == 'big':
data = audioop.byteswap(data, self._sampwidth)
Except in the case of sampwidth==1, which corresponds to an 8 bit file. So 8 bit files aren't converted to sys.byteorder? Why would this be? (Maybe because they are unsigned?)
Currently my logic looks like:
if sampwidth == 1:
signed = False
byteorder = 'little'
else:
signed = True
byteorder = sys.byteorder
Is this correct?
8 bit wav files are incredibly rare nowadays, so this isn't really a problem. But I would still like to find answers...
[1] https://docs.python.org/3/library/wave.html
[2] https://github.com/python/cpython/blob/3.9/Lib/wave.py
[3] https://github.com/python/cpython/blob/3.9/Lib/chunk.py
A byte is a byte, little or big endian only makes sense for data which is more than one byte.
0xf0 is a single, 8-bit byte. The bits are 0x11110000 on any modern architecture. Without a sign bit, the range is 0 through 255 (8 bits of storage gets 28 possible values).
0xf0eb is a 16-bit number which takes two 8-bit bytes to represent. This can be represented as
0xf0 0xeb big-endian (0x11110000 0x11101011), or
0xeb 0xf0 little-endian (0x11101011 0x11110000)
The range of possible values without a sign bit is 0 through 65,535 (216 values).
You can also have different byte orders for 32-bit numbers etc, but I'll defer to Wikipedia etc for the full exposition.
I've been studying compression algorithms recently, and I'm trying to understand how I can store integers as bits in Python to save space.
So first I save '1' and '0' as strings in Python.
import os
import numpy as np
array= np.random.randint(0, 2, size = 200)
string = [str(i) for i in array]
with open('testing_int.txt', 'w') as f:
for i in string:
f.write(i)
print(os.path.getsize('testing_int.txt'))
I get back 200 bytes which makes sense, since each each char is represented by one byte in ascii (and utf-8 as well if characters are latin?).
Now if trying to save these ones and zeroes as bits, I should only take up around 25 bytes right?
200 bits/8 = 25 bytes.
However, when I try the following code below, I get 105 bytes.
Am I doing something wrong?
Using the same 'array variable' as above I tried this:
bytes_string = [bytes(i) for i in array]
with open('testing_bytes.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
Then I tried this:
bin_string = [bin(i) for i in array]
with open('testing_bin.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
This also takes up around 105 bytes.
So I tried looking at the text files, and I noticed that
both the 'bytes.txt' and 'bin.txt' are blank.
So I tried to read the 'bytes.txt' file via this code:
with open(r"C:\Users\Moondra\Desktop\testing_bytes\testing_bytes.txt", 'rb') as f:
x =f.read()
Now I get get back as this :
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
So I tried these commands:
>>> int.from_bytes(x, byteorder='big')
0
>>> int.from_bytes(x, byteorder='little')
0
>>>
So apparently I'm doing multiple things incorrectly.
I can't figure out:
1) Why I am not getting a text file that is 25 bytes
2) Why can I read back the bytes file correctly.
Thank you.
bytes_string = [bytes(i) for i in array]
It looks like you expect bytes(x) to give you a one-byte bytes object with the value of x. Follow the documentation, and you'll see that bytes() is initialized like bytearray(), and bytearray() says this about its argument:
If it is an integer, the array will have that size and will be initialized with null bytes.
So bytes(0) gives you an empty bytes object, and bytes(1) gives you a single byte with the ordinal zero. That's why bytes_string is about half the size of array and is made up completely of zero bytes.
As for why the bin() example didn't work, it looks like a simple case of copy-pasting and forgetting to change bytes_string to bin_string in the for loop.
This all still doesn't accomplish your goal of treating 0 or 1 value integers as bits. Python doesn't really have that sort of functionality built in. There are third-party modules that allow you to work at the bit level, but I can't speak to any of them specifically. Personally I would probably just roll my own specific to the application.
It looks like you're trying to bit shift all the values into a single byte. For example, you expect the integer values [0,1,0,1,0,1,0,1] to be packed into a byte that looks like the following binary number: 0b01010101. To do this, you need to use the bitwise shift operator and bitwise or operator along with the struct module to pack the values into an unsigned Char which represents the sequence of int values you have.
The code below takes the array of random integers in range [0,1] and shifts them together to make a binary number that can be packed into a single byte. I used 256 ints for convenience. The expected number of bytes for the file to be is then 32 (256/8). You will see that when it is run this is indeed what you get.
import struct
import numpy as np
import os
a = np.random.randint(0, 2, size = 256)
bool_data = []
bin_vals = []
for i in range(0, len(a), 8):
bin_val = (a[i] << 0) | (a[i+1] << 1) | \
(a[i+2] << 2) | (a[i+3] << 3) | \
(a[i+4] << 4) | (a[i+5] << 5) | \
(a[i+6] << 6) | (a[i+7] << 7)
bin_vals.append(struct.pack('B', bin_val))
with open("output.txt", 'wb') as f:
for val in bin_vals:
f.write(val)
print(os.path.getsize('output.txt'))
Please note, however, that this will only work for values of integers in the range [0,1] since if they are bigger it will shift more non-zeros and wreck the structure of the generated byte. The binary number may also exceed 1 byte in size in this case.
It seems like you're just using python in attempt to generate an array of bits for demonstration purposes, and to that token I would say that python probably isn't best suited for this. I would recommend using a lower level language such as C/C++ which has more direct access to data type than python does.
I have this long hex string 20D788028A4B59FB3C07050E2F30 In python 2.7 I want to extract the first 4 bytes, change their order, convert it to a signed number, divide it by 2^20 and then print it out. In C this would be very easy for me :) but here I'm a little stuck.
For example the correct answer would extract the 4 byte number from the string above as 0x288D720. Then divided by 2^20 would be 40.5525. Mainly I'm having trouble figuring out the right way to do byte manipulation in python. In C I would just grab pointers to each byte and shift them where I wanted them to go and cast into an int or a long.
Python is great in strings, so let's use what we have:
s = "20D788028A4B59FB3C07050E2F30"
t = "".join([s[i-2:i] for i in range(8,0,-2)])
print int(t, 16) * 1.0 / pow(2,20)
But dividing by 2**20 comes a bit strange with bits, so maybe shifting is at least worth a mention too...
print int(t, 16) >> 20
After all, I would
print int(t, 16) * 1.0 / (1 << 20)
For an extraction you can just do
foo[:8]
Hex to bytes: hexadecimal string to byte array in python
Rearrange bytes: byte reverse AB CD to CD AB with python
You can use struct for conversion to long
And just do a normal division by (2**20)
I read a binary file and get an array with characters. When converting two bytes to an integer I do 256*ord(p1) + ord(p0). It works fine for positive integers but when I get a negative number it doesn't work. I know there is something with the first bit in the most significant byte but with no success.
I also understand there is something called struct and after reading I ended up with the following code
import struct
p1 = chr(231)
p0 = chr(174)
a = struct.unpack('h',p0+p1)
print str(a)
a becomes -6226 and if I swap p0 and p1 I get -20761.
a should have been -2
-2 is not correct for the values you have specified, and byte order matters. struct uses > for big-endian (most-significant byte first) and < for little-endian (least-significant byte first):
>>> import struct
>>> struct.pack('>h',-2)
'\xff\xfe'
>>> struct.pack('<h',-2)
'\xfe\xff'
>>> p1=chr(254) # 0xFE
>>> p0=chr(255) # 0xFF
>>> struct.unpack('<h',p1+p0)[0]
-2
>>> struct.unpack('>h',p0+p1)[0]
-2
Generally, when using struct, your format string should start with one of the alignment specifiers. The default, native one differs from machine to machine.
Therefore, the correct result is
>>> struct.unpack('!h',p0+p1)[0]
-20761
The representation of -2 in big endian is:
1111 1111 1111 1110 # binary
255 254 # decimal bytes
f f f e # hexadecimal bytes
You can easily verify that by adding two, which results in 0.
With the first method (256*ord(p1) + ord(p0)), you could check to see if the first bit is 1 with if p1 & 0x80 > 0. If so then you'd use p1 & 0x7f instead of p1 and then negate the end result.
For the record, you can do it without struct. Your original equation can be used, but if the result is greater than 32767, subtract 65536. (Or if the high-order byte is greater than 127, which is the same thing.) Look up two's complement, which is how all modern computers represent negative integers.
p1 = chr(231)
p0 = chr(174)
a = 256 * ord(p1) + ord(p0) - (65536 if ord(p1) > 127 else 0)
This gets you the correct answer of -6226. (The correct answer is not -2.)
If you are converting values from a file that is large, use the array module.
For a file, know that it is the endianess of the file format that matters. Not the endianess of the machine that either wrote it or is reading it.
Alex Martelli, of course, has the definitive answer.
Your original equation will work fine if you use masking to take off the extra 1 bits in a negative number:
256*(ord(p0) & 0xff) + (ord(p1) & 0xff)
Edit: I think I might have misunderstood your question. You're trying to convert two positive byte values into a negative integer? This should work:
a = 256*ord(p0) + ord(p1)
if a > 32767: # 0x7fff
a -= 65536 # 0x10000