Convert binary data to signed integer - python

I read a binary file and get an array with characters. When converting two bytes to an integer I do 256*ord(p1) + ord(p0). It works fine for positive integers but when I get a negative number it doesn't work. I know there is something with the first bit in the most significant byte but with no success.
I also understand there is something called struct and after reading I ended up with the following code
import struct
p1 = chr(231)
p0 = chr(174)
a = struct.unpack('h',p0+p1)
print str(a)
a becomes -6226 and if I swap p0 and p1 I get -20761.
a should have been -2

-2 is not correct for the values you have specified, and byte order matters. struct uses > for big-endian (most-significant byte first) and < for little-endian (least-significant byte first):
>>> import struct
>>> struct.pack('>h',-2)
'\xff\xfe'
>>> struct.pack('<h',-2)
'\xfe\xff'
>>> p1=chr(254) # 0xFE
>>> p0=chr(255) # 0xFF
>>> struct.unpack('<h',p1+p0)[0]
-2
>>> struct.unpack('>h',p0+p1)[0]
-2

Generally, when using struct, your format string should start with one of the alignment specifiers. The default, native one differs from machine to machine.
Therefore, the correct result is
>>> struct.unpack('!h',p0+p1)[0]
-20761
The representation of -2 in big endian is:
1111 1111 1111 1110 # binary
255 254 # decimal bytes
f f f e # hexadecimal bytes
You can easily verify that by adding two, which results in 0.

With the first method (256*ord(p1) + ord(p0)), you could check to see if the first bit is 1 with if p1 & 0x80 > 0. If so then you'd use p1 & 0x7f instead of p1 and then negate the end result.

For the record, you can do it without struct. Your original equation can be used, but if the result is greater than 32767, subtract 65536. (Or if the high-order byte is greater than 127, which is the same thing.) Look up two's complement, which is how all modern computers represent negative integers.
p1 = chr(231)
p0 = chr(174)
a = 256 * ord(p1) + ord(p0) - (65536 if ord(p1) > 127 else 0)
This gets you the correct answer of -6226. (The correct answer is not -2.)

If you are converting values from a file that is large, use the array module.
For a file, know that it is the endianess of the file format that matters. Not the endianess of the machine that either wrote it or is reading it.
Alex Martelli, of course, has the definitive answer.

Your original equation will work fine if you use masking to take off the extra 1 bits in a negative number:
256*(ord(p0) & 0xff) + (ord(p1) & 0xff)
Edit: I think I might have misunderstood your question. You're trying to convert two positive byte values into a negative integer? This should work:
a = 256*ord(p0) + ord(p1)
if a > 32767: # 0x7fff
a -= 65536 # 0x10000

Related

In Python - how to retrieve the binary code of negative integer from the memory? [duplicate]

Integers in Python are stored in two's complement, correct?
Although:
>>> x = 5
>>> bin(x)
0b101
And:
>>> x = -5
>>> bin(x)
-0b101
That's pretty lame. How do I get python to give me the numbers in REAL binary bits, and without the 0b infront of it? So:
>>> x = 5
>>> bin(x)
0101
>>> y = -5
>>> bin(y)
1011
It works best if you provide a mask. That way you specify how far to sign extend.
>>> bin(-27 & 0b1111111111111111)
'0b1111111111100101'
Or perhaps more generally:
def bindigits(n, bits):
s = bin(n & int("1"*bits, 2))[2:]
return ("{0:0>%s}" % (bits)).format(s)
>>> print bindigits(-31337, 24)
111111111000010110010111
In basic theory, the actual width of the number is a function of the size of the storage. If it's a 32-bit number, then a negative number has a 1 in the MSB of a set of 32. If it's a 64-bit value, then there are 64 bits to display.
But in Python, integer precision is limited only to the constraints of your hardware. On my computer, this actually works, but it consumes 9GB of RAM just to store the value of x. Anything higher and I get a MemoryError. If I had more RAM, I could store larger numbers.
>>> x = 1 << (1 << 36)
So with that in mind, what binary number represents -1? Python is well-capable of interpreting literally millions (and even billions) of bits of precision, as the previous example shows. In 2's complement, the sign bit extends all the way to the left, but in Python there is no pre-defined number of bits; there are as many as you need.
But then you run into ambiguity: does binary 1 represent 1, or -1? Well, it could be either. Does 111 represent 7 or -1? Again, it could be either. So does 111111111 represent 511, or -1... well, both, depending on your precision.
Python needs a way to represent these numbers in binary so that there's no ambiguity of their meaning. The 0b prefix just says "this number is in binary". Just like 0x means "this number is in hex". So if I say 0b1111, how do I know if the user wants -1 or 15? There are two options:
Option A: The sign bit
You could declare that all numbers are signed, and the left-most bit is the sign bit. That means 0b1 is -1, while 0b01 is 1. That also means that 0b111 is also -1, while 0b0111 is 7. In the end, this is probably more confusing than helpful particularly because most binary arithmetic is going to be unsigned anyway, and people are more likely to run into mistakes by accidentally marking a number as negative because they didn't include an explicit sign bit.
Option B: The sign indication
With this option, binary numbers are represented unsigned, and negative numbers have a "-" prefix, just like they do in decimal. This is (a) more consistent with decimal, (b) more compatible with the way binary values are most likely going to be used. You lose the ability to specify a negative number using its two's complement representation, but remember that two's complement is a storage implementation detail, not a proper indication of the underlying value itself. It shouldn't have to be something that the user has to understand.
In the end, Option B makes the most sense. There's less confusion and the user isn't required to understand the storage details.
To properly interpret a binary sequence as two's complement, there needs to a length associated with the sequence. When you are working low-level types that correspond directly to CPU registers, there is an implicit length. Since Python integers can have an arbitrary length, there really isn't an internal two's complement format. Since there isn't a length associated with a number, there is no way to distinguish between positive and negative numbers. To remove the ambiguity, bin() includes a minus sign when formatting a negative number.
Python's arbitrary length integer type actually uses a sign-magnitude internal format. The logical operations (bit shifting, and, or, etc.) are designed to mimic two's complement format. This is typical of multiple precision libraries.
Here is a little bit more readable version of Tylerl answer, for example let's say you want -2 in its 8-bits negative representation of "two's complement" :
bin(-2 & (2**8-1))
2**8 stands for the ninth bit (256), substract 1 to it and you have all the preceding bits set to one (255)
for 8 and 16 bits masks, you can replace (2**8-1) by 0xff, or 0xffff. The hexadecimal version becomes less readalbe after that point.
If this is unclear, here is a regular function of it:
def twosComplement (value, bitLength) :
return bin(value & (2**bitLength - 1))
The compliment of one minus number's meaning is mod value minus the positive value.
So I think,the brief way for the compliment of -27 is
bin((1<<32) - 27) // 32 bit length '0b11111111111111111111111111100101'
bin((1<<16) - 27)
bin((1<<8) - 27) // 8 bit length '0b11100101'
Not sure how to get what you want using the standard lib. There are a handful of scripts and packages out there that will do the conversion for you.
I just wanted to note the "why" , and why it's not lame.
bin() doesn't return binary bits. it converts the number to a binary string. the leading '0b' tells the interpreter that you're dealing with a binary number , as per the python language definition. this way you can directly work with binary numbers, like this
>>> 0b01
1
>>> 0b10
2
>>> 0b11
3
>>> 0b01 + 0b10
3
that's not lame. that's great.
http://docs.python.org/library/functions.html#bin
bin(x)
Convert an integer number to a binary string.
http://docs.python.org/reference/lexical_analysis.html#integers
Integer and long integer literals are described by the following lexical definitions:
bininteger ::= "0" ("b" | "B") bindigit+
bindigit ::= "0" | "1"
Use slices to get rid of unwanted '0b'.
bin(5)[2:]
'101'
or if you want digits,
tuple ( bin(5)[2:] )
('1', '0', '1')
or even
map( int, tuple( bin(5)[2:] ) )
[1, 0, 1]
tobin = lambda x, count=8: "".join(map(lambda y:str((x>>y)&1), range(count-1, -1, -1)))
e.g.
tobin(5) # => '00000101'
tobin(5, 4) # => '0101'
tobin(-5, 4) # => '1011'
Or as clear functions:
# Returns bit y of x (10 base). i.e.
# bit 2 of 5 is 1
# bit 1 of 5 is 0
# bit 0 of 5 is 1
def getBit(y, x):
return str((x>>y)&1)
# Returns the first `count` bits of base 10 integer `x`
def tobin(x, count=8):
shift = range(count-1, -1, -1)
bits = map(lambda y: getBit(y, x), shift)
return "".join(bits)
(Adapted from W.J. Van de Laan's comment)
I'm not entirely certain what you ultimately want to do, but you might want to look at the bitarray package.
def tobin(data, width):
data_str = bin(data & (2**width-1))[2:].zfill(width)
return data_str
You can use the Binary fractions package. This package implements TwosComplement with binary integers and binary fractions. You can convert binary-fraction strings into their twos complement and vice-versa
Example:
>>> from binary_fractions import TwosComplement
>>> TwosComplement.to_float("11111111111") # TwosComplement --> float
-1.0
>>> TwosComplement.to_float("11111111100") # TwosComplement --> float
-4.0
>>> TwosComplement(-1.5) # float --> TwosComplement
'10.1'
>>> TwosComplement(1.5) # float --> TwosComplement
'01.1'
>>> TwosComplement(5) # int --> TwosComplement
'0101'
To use this with Binary's instead of float's you can use the Binary class inside the same package.
PS: Shameless plug, I'm the author of this package.
For positive numbers, just use:
bin(x)[2:].zfill(4)
For negative numbers, it's a little different:
bin((eval("0b"+str(int(bin(x)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
As a whole script, this is how it should look:
def binary(number):
if number < 0:
return bin((eval("0b"+str(int(bin(number)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
return bin(number)[2:].zfill(4)
x=input()
print binary(x)
A modification on tylerl's very helpful answer that provides sign extension for positive numbers as well as negative (no error checking).
def to2sCompStr(num, bitWidth):
num &= (2 << bitWidth-1)-1 # mask
formatStr = '{:0'+str(bitWidth)+'b}'
ret = formatStr.format(int(num))
return ret
Example:
In [11]: to2sCompStr(-24, 18)
Out[11]: '111111111111101000'
In [12]: to2sCompStr(24, 18)
Out[12]: '000000000000011000'
No need, it already is. It is just python choosing to represent it differently. If you start printing each nibble separately, it will show its true colours.
checkNIB = '{0:04b}'.format
checkBYT = lambda x: '-'.join( map( checkNIB, [ (x>>4)&0xf, x&0xf] ) )
checkBTS = lambda x: '-'.join( [ checkBYT( ( x>>(shift*8) )&0xff ) for shift in reversed( range(4) ) if ( x>>(shift*8) )&0xff ] )
print( checkBTS(-0x0002) )
Output is simple:
>>>1111-1111-1111-1111-1111-1111-1111-1110
Now it reverts to original representation when you want to display a twos complement of an nibble but it is still possible if you divide it into halves of nibble and so. Just have in mind that the best result is with negative hex and binary integer interpretations simple numbers not so much, also with hex you can set up the byte size.
We can leverage the property of bit-wise XOR. Use bit-wise XOR to flip the bits and then add 1. Then you can use the python inbuilt bin() function to get the binary representation of the 2's complement. Here's an example function:
def twos_complement(input_number):
print(bin(input_number)) # prints binary value of input
mask = 2**(1 + len(bin(input_number)[2:])) - 1 # Calculate mask to do bitwise XOR operation
twos_comp = (input_number ^ mask) + 1 # calculate 2's complement, for negative of input_number (-1 * input_number)
print(bin(twos_comp)) # print 2's complement representation of negative of input_number.
I hope this solves your problem`
num = input("Enter number : ")
bin_num=bin(num)
binary = '0' + binary_num[2:]
print binary

Extract 12-bit integer from 2 byte big endian (motorola) bytearray

I am trying to extract an integer which occupies up to 12 bits in a 2 byte (16 bit) message, which is in big-endian format. I have done some research already and expect that I will have to use bit_manipulation (bit shifting) to achieve this, but I am unsure how this can be applied to big-endian format.
A couple of answers on here used the python 'Numpy' package, but I don't have access to that on Micropython. I do have access to the 'ustruct' module, which I use to unpack certain other parts of the message, but it only seems to apply to 8 bit, 16bit and 32bit messages.
So far the only thing I have come up with is:
int12 = (byte1 << 4) + (byte2)
expected_value = int.from_bytes(int12)
but this isn't giving me the number's I am expecting. For example 0x02,0x15 should present decimal 533 .
Where am I going wrong?
I'm new to bit manipulation and extracting data from bytes so any help is greatly appreciated, Thanks!
This should work:
import struct
val, _ = struct.unpack( '!h', b'23' )
val = (val >> 4) & 0xFFF
gives:
>>> hex(val)
'0x333'
However, you should check what 12 bits out of 16 are occupied. My previous code assumes that those are the upper 3 nibbles. If the number occupies lower 3 nibbles, you don't need any shifts, just the mask with 0xFFF.

STL binary file reader with Python

I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000

how to re-order bytes in a python hex string and convert to long

I have this long hex string 20D788028A4B59FB3C07050E2F30 In python 2.7 I want to extract the first 4 bytes, change their order, convert it to a signed number, divide it by 2^20 and then print it out. In C this would be very easy for me :) but here I'm a little stuck.
For example the correct answer would extract the 4 byte number from the string above as 0x288D720. Then divided by 2^20 would be 40.5525. Mainly I'm having trouble figuring out the right way to do byte manipulation in python. In C I would just grab pointers to each byte and shift them where I wanted them to go and cast into an int or a long.
Python is great in strings, so let's use what we have:
s = "20D788028A4B59FB3C07050E2F30"
t = "".join([s[i-2:i] for i in range(8,0,-2)])
print int(t, 16) * 1.0 / pow(2,20)
But dividing by 2**20 comes a bit strange with bits, so maybe shifting is at least worth a mention too...
print int(t, 16) >> 20
After all, I would
print int(t, 16) * 1.0 / (1 << 20)
For an extraction you can just do
foo[:8]
Hex to bytes: hexadecimal string to byte array in python
Rearrange bytes: byte reverse AB CD to CD AB with python
You can use struct for conversion to long
And just do a normal division by (2**20)

The best way to base64 encode the last 6k bits of a python integer

I illustrate a case for where k = 2 (so, the bottom 12 digits)
import base64
# Hi
# 7 - 34
# 000111 - 100010
# 0001 - 1110 - 0010 = 0x1E2 = 482
# 1
integer = int(bin(482)[-12:] + '0' * 20, 2)
encoded = base64.b64encode(base64.b16decode('{0:08X}'.format(integer)))
print encoded
# 2
encoded = base64.b64encode(base64.b16decode('{0:08X}'.format(482 << 20)))
print encoded
Both output HiAAAA== as desired
An ideone link for your convenience: http://ideone.com/O73kQs
Intuitively these are very clear, and I'm favoring #2 by quite a bit.
One thing that "irks" me about #1, is that if the integers in python are not 32 bits, then I'm in trouble.
How can I get the proper size of an int? (total python newbie question?) (edit: yes, apparently a newbie-ish question How can i determine the exact size of a type used by python)
It would be nice, however, if there was a way to simply do something like
encoded = base64.b64encode('{0:08X}'.format(482 << 20))
Moreover, how can I go from
bin(1)
which equals
'0b1'
to the actual binary literal
0b1
you can go from bin back with int , which takes an optional 2nd parameter that is base
int(bin(18)[2:],2)
since you use this earlier you must know about it ... so I only assume you mean something else by binary literal than its integer representation ... although for the life of me im not sure what that is...
you can do
print 0b1
and see that the actual repr is the decimal value ...
to get the last 12 bits of an int
my_int = 482
k=2
mask = int("1"*(6*k),2)
last_bits = my_int & mask
then you can just shift it 20 or whatever ...
first get the last 12 bits as demonstrated above
import struct
print struct.pack('H',last_bits)
print struct.pack('H',0b100001)
alternatively you could
def get_chars(int_val):
while int_val > 0:
yield chr(int_val & 0xFF)
int_val <<= 8
print repr("".join(get_chars(last_bits)))

Categories

Resources