How to convert signed string to its Binary equivalent in Python? - python

I am using the itertool function to enter value to a list. The itertool function is taking the value as a str, not as an int. After that, I need to convert the values from the list to its Binary equivalent. The problem arises when I need to convert a negative value e.g. -5. My code is taking the "-" as a str, but I need it to consider it as a negative sign before the following numerical value.Does the concept of unsigned integer come into play?
My code is-
L3= list(itertools.repeat("-1",5))
file= open(filename, 'w')
L3_1=[ ]
for item in L3:
x3=bytes(item,"ascii")
L3_1.append(' '.join(["{0:b}".format(x).zfill(8) for x in x3]))
for item in L3_1:
file.write("%s\n" % item)
file.close()

It's not entirely clear what your problem is and what you want to achieve, so please correct me if I make wrong assumptions.
Anyways, converting integers to binary representation is easily done using bin. For example, bin(5) gives you '0b101'. Now, bin(-5) gives you '-0b101' - which is not what you expect when being used to binary from other languages, e.g. C.
The "problem" is that integers in python are not fixed size. There's no int16, int32, uint8 and such. Python will just add bits as it needs to. That means a negative number cannot be represented by its complement - 0b11111011 is not -5 as for int8, but 251. Since binaries are potentially infinite, there's no fixed position to place a sign bit. Thus, python has to add the explicit unary -. This is different from interpreting -5 as the strings "-" and "5".
If you want to get the binary representation for negative, fixed size integers, I think you have to do it by yourself. A function that does this could look like this:
def bin_int(number, size=8):
max_val = int('0b' + ('1'* (size - 1)), 2) # e.g. 0b01111111
assert -max_val <= number <= max_val, 'Number out of range'
if number >=0:
return bin(number)
sign = int('0b1' + ('0' * size), 2) # e.g. 0b10000000
return bin(number + sign)
Now, to do what you initially wanted: write the binary representation of numbers to a file.
output_list = [1, -1, -5, -64, 0] # iterable *containing* integers
with open(filename, 'w') as output_file: # with statement is safer for writing
for number in output_list:
output_file.write(bin_int(number) + '\n')
Or if you just want to check the result:
print([bin_int(number) for number in [1, -1, -5, -64, -127]])
# ['0b1', '0b11111111', '0b11111011', '0b11000000', '0b10000001']
Note that if you want to strip the 0b, you can do that via bin_int(number)[2:], e.g. output_file.write(bin_int(number)[2:] + '\n'). This removes the first two characters from the string holding the binary representation.

Related

In Python - how to retrieve the binary code of negative integer from the memory? [duplicate]

Integers in Python are stored in two's complement, correct?
Although:
>>> x = 5
>>> bin(x)
0b101
And:
>>> x = -5
>>> bin(x)
-0b101
That's pretty lame. How do I get python to give me the numbers in REAL binary bits, and without the 0b infront of it? So:
>>> x = 5
>>> bin(x)
0101
>>> y = -5
>>> bin(y)
1011
It works best if you provide a mask. That way you specify how far to sign extend.
>>> bin(-27 & 0b1111111111111111)
'0b1111111111100101'
Or perhaps more generally:
def bindigits(n, bits):
s = bin(n & int("1"*bits, 2))[2:]
return ("{0:0>%s}" % (bits)).format(s)
>>> print bindigits(-31337, 24)
111111111000010110010111
In basic theory, the actual width of the number is a function of the size of the storage. If it's a 32-bit number, then a negative number has a 1 in the MSB of a set of 32. If it's a 64-bit value, then there are 64 bits to display.
But in Python, integer precision is limited only to the constraints of your hardware. On my computer, this actually works, but it consumes 9GB of RAM just to store the value of x. Anything higher and I get a MemoryError. If I had more RAM, I could store larger numbers.
>>> x = 1 << (1 << 36)
So with that in mind, what binary number represents -1? Python is well-capable of interpreting literally millions (and even billions) of bits of precision, as the previous example shows. In 2's complement, the sign bit extends all the way to the left, but in Python there is no pre-defined number of bits; there are as many as you need.
But then you run into ambiguity: does binary 1 represent 1, or -1? Well, it could be either. Does 111 represent 7 or -1? Again, it could be either. So does 111111111 represent 511, or -1... well, both, depending on your precision.
Python needs a way to represent these numbers in binary so that there's no ambiguity of their meaning. The 0b prefix just says "this number is in binary". Just like 0x means "this number is in hex". So if I say 0b1111, how do I know if the user wants -1 or 15? There are two options:
Option A: The sign bit
You could declare that all numbers are signed, and the left-most bit is the sign bit. That means 0b1 is -1, while 0b01 is 1. That also means that 0b111 is also -1, while 0b0111 is 7. In the end, this is probably more confusing than helpful particularly because most binary arithmetic is going to be unsigned anyway, and people are more likely to run into mistakes by accidentally marking a number as negative because they didn't include an explicit sign bit.
Option B: The sign indication
With this option, binary numbers are represented unsigned, and negative numbers have a "-" prefix, just like they do in decimal. This is (a) more consistent with decimal, (b) more compatible with the way binary values are most likely going to be used. You lose the ability to specify a negative number using its two's complement representation, but remember that two's complement is a storage implementation detail, not a proper indication of the underlying value itself. It shouldn't have to be something that the user has to understand.
In the end, Option B makes the most sense. There's less confusion and the user isn't required to understand the storage details.
To properly interpret a binary sequence as two's complement, there needs to a length associated with the sequence. When you are working low-level types that correspond directly to CPU registers, there is an implicit length. Since Python integers can have an arbitrary length, there really isn't an internal two's complement format. Since there isn't a length associated with a number, there is no way to distinguish between positive and negative numbers. To remove the ambiguity, bin() includes a minus sign when formatting a negative number.
Python's arbitrary length integer type actually uses a sign-magnitude internal format. The logical operations (bit shifting, and, or, etc.) are designed to mimic two's complement format. This is typical of multiple precision libraries.
Here is a little bit more readable version of Tylerl answer, for example let's say you want -2 in its 8-bits negative representation of "two's complement" :
bin(-2 & (2**8-1))
2**8 stands for the ninth bit (256), substract 1 to it and you have all the preceding bits set to one (255)
for 8 and 16 bits masks, you can replace (2**8-1) by 0xff, or 0xffff. The hexadecimal version becomes less readalbe after that point.
If this is unclear, here is a regular function of it:
def twosComplement (value, bitLength) :
return bin(value & (2**bitLength - 1))
The compliment of one minus number's meaning is mod value minus the positive value.
So I think,the brief way for the compliment of -27 is
bin((1<<32) - 27) // 32 bit length '0b11111111111111111111111111100101'
bin((1<<16) - 27)
bin((1<<8) - 27) // 8 bit length '0b11100101'
Not sure how to get what you want using the standard lib. There are a handful of scripts and packages out there that will do the conversion for you.
I just wanted to note the "why" , and why it's not lame.
bin() doesn't return binary bits. it converts the number to a binary string. the leading '0b' tells the interpreter that you're dealing with a binary number , as per the python language definition. this way you can directly work with binary numbers, like this
>>> 0b01
1
>>> 0b10
2
>>> 0b11
3
>>> 0b01 + 0b10
3
that's not lame. that's great.
http://docs.python.org/library/functions.html#bin
bin(x)
Convert an integer number to a binary string.
http://docs.python.org/reference/lexical_analysis.html#integers
Integer and long integer literals are described by the following lexical definitions:
bininteger ::= "0" ("b" | "B") bindigit+
bindigit ::= "0" | "1"
Use slices to get rid of unwanted '0b'.
bin(5)[2:]
'101'
or if you want digits,
tuple ( bin(5)[2:] )
('1', '0', '1')
or even
map( int, tuple( bin(5)[2:] ) )
[1, 0, 1]
tobin = lambda x, count=8: "".join(map(lambda y:str((x>>y)&1), range(count-1, -1, -1)))
e.g.
tobin(5) # => '00000101'
tobin(5, 4) # => '0101'
tobin(-5, 4) # => '1011'
Or as clear functions:
# Returns bit y of x (10 base). i.e.
# bit 2 of 5 is 1
# bit 1 of 5 is 0
# bit 0 of 5 is 1
def getBit(y, x):
return str((x>>y)&1)
# Returns the first `count` bits of base 10 integer `x`
def tobin(x, count=8):
shift = range(count-1, -1, -1)
bits = map(lambda y: getBit(y, x), shift)
return "".join(bits)
(Adapted from W.J. Van de Laan's comment)
I'm not entirely certain what you ultimately want to do, but you might want to look at the bitarray package.
def tobin(data, width):
data_str = bin(data & (2**width-1))[2:].zfill(width)
return data_str
You can use the Binary fractions package. This package implements TwosComplement with binary integers and binary fractions. You can convert binary-fraction strings into their twos complement and vice-versa
Example:
>>> from binary_fractions import TwosComplement
>>> TwosComplement.to_float("11111111111") # TwosComplement --> float
-1.0
>>> TwosComplement.to_float("11111111100") # TwosComplement --> float
-4.0
>>> TwosComplement(-1.5) # float --> TwosComplement
'10.1'
>>> TwosComplement(1.5) # float --> TwosComplement
'01.1'
>>> TwosComplement(5) # int --> TwosComplement
'0101'
To use this with Binary's instead of float's you can use the Binary class inside the same package.
PS: Shameless plug, I'm the author of this package.
For positive numbers, just use:
bin(x)[2:].zfill(4)
For negative numbers, it's a little different:
bin((eval("0b"+str(int(bin(x)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
As a whole script, this is how it should look:
def binary(number):
if number < 0:
return bin((eval("0b"+str(int(bin(number)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
return bin(number)[2:].zfill(4)
x=input()
print binary(x)
A modification on tylerl's very helpful answer that provides sign extension for positive numbers as well as negative (no error checking).
def to2sCompStr(num, bitWidth):
num &= (2 << bitWidth-1)-1 # mask
formatStr = '{:0'+str(bitWidth)+'b}'
ret = formatStr.format(int(num))
return ret
Example:
In [11]: to2sCompStr(-24, 18)
Out[11]: '111111111111101000'
In [12]: to2sCompStr(24, 18)
Out[12]: '000000000000011000'
No need, it already is. It is just python choosing to represent it differently. If you start printing each nibble separately, it will show its true colours.
checkNIB = '{0:04b}'.format
checkBYT = lambda x: '-'.join( map( checkNIB, [ (x>>4)&0xf, x&0xf] ) )
checkBTS = lambda x: '-'.join( [ checkBYT( ( x>>(shift*8) )&0xff ) for shift in reversed( range(4) ) if ( x>>(shift*8) )&0xff ] )
print( checkBTS(-0x0002) )
Output is simple:
>>>1111-1111-1111-1111-1111-1111-1111-1110
Now it reverts to original representation when you want to display a twos complement of an nibble but it is still possible if you divide it into halves of nibble and so. Just have in mind that the best result is with negative hex and binary integer interpretations simple numbers not so much, also with hex you can set up the byte size.
We can leverage the property of bit-wise XOR. Use bit-wise XOR to flip the bits and then add 1. Then you can use the python inbuilt bin() function to get the binary representation of the 2's complement. Here's an example function:
def twos_complement(input_number):
print(bin(input_number)) # prints binary value of input
mask = 2**(1 + len(bin(input_number)[2:])) - 1 # Calculate mask to do bitwise XOR operation
twos_comp = (input_number ^ mask) + 1 # calculate 2's complement, for negative of input_number (-1 * input_number)
print(bin(twos_comp)) # print 2's complement representation of negative of input_number.
I hope this solves your problem`
num = input("Enter number : ")
bin_num=bin(num)
binary = '0' + binary_num[2:]
print binary

Casting to a string for numeric sorting

I have a set of numbers and I need to generate a string-hash that is sortable for these numbers. The numbers can be integers or floats, for example:
-5.75E+100
-4
-1.74E-101
1.74E-101
5
9
11
52.3
5.75E+100
I think to do non-exponents for integers and floats it would be simple:
# whatever the padding needs to be
>>> sorted(map(lambda x: str(x).zfill(10), [-4, 5, 52.3]))
['-000000004', '0000000005', '00000052.3']
However, what would be a more comprehensive way to generate a string-hash here that would sort properly for the above list of numbers? I am fine prepending exponents, if necessary (or converting everything to an exponent, if required), and encoding negative numbers in complement code, if that's required too.
Every float object has a built-in function hex() that will convert it to a hex string. That's almost enough to make a sortable string, but there are a few problems.
First, negative numbers have a leading - but positive numbers don't have anything. You need to add a leading character to positive numbers.
Second, - comes after + in the sorting order. You need to replace one or the other to make the order correct.
Third, the exponent comes at the end of the string. It needs to get moved to the front of the string to make it more significant, but the sign needs to stay at the absolute front.
Fourth, the exponent is a variable number of digits. It needs to be zero filled so that it has a consistent size.
Putting it all together produces something like this:
def sortable_string(number):
hex_num = float(number).hex()
if not hex_num.startswith('-'):
hex_num = '+' + hex_num
hex_num = hex_num.replace('-', '!')
hex_parts = hex_num.split('p')
exponent = hex_parts[1][0] + hex_parts[1][1:].ljust(4, '0')
return hex_parts[0][0] + exponent + hex_parts[0][1:]
You can try this,
nums = sorted(map(eval, data))
nums = list(map(str, nums))
nums
Output -
['-5.75E+100', '-4', '-1.74E-101', '1.74E-101', '5', '52.3', '5.75E+100']

STL binary file reader with Python

I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000

Python - Incrementing a binary sequence while maintaining the bit length

I am trying to increment a binary sequence in python while maintaining the bit length.
So far I am using this piece of code...
'{0:b}'.format(long('0100', 2) + 1)
This will take the binary number, convert it to a long, adds one, then converts it back to a binary number. Eg, 01 -> 10.
However, if I input a number such as '0100', instead of incrementing it to '0101', my code
increments it to '101', so it is disregarding the first '0', and just incrementing '100'
to '101'.
Any help on how to make my code maintain the bit length will be greatly appreciated.
Thanks
str.format lets you specify the length as a parameter like this
>>> n = '0100'
>>> '{:0{}b}'.format(long(n, 2) + 1, len(n))
'0101'
That's because 5 is represented as '101' after conversion from int(or long) to binary, so to prefix some 0's before it you've use 0 as filler and pass the width of the initial binary number while formatting.
In [35]: b='0100'
In [36]: '{0:0{1:}b}'.format(long(b, 2) + 1,len(b))
Out[36]: '0101'
In [37]: b='0010000'
In [38]: '{0:0{1:}b}'.format(long(b, 2) + 1,len(b))
Out[38]: '0010001'
This is probably best solved using format strings. Get the length of your input, construct a format string from it, and then use it to print the incremented number.
from __future__ import print_function
# Input here, as a string
s = "0101"
# Convert to a number
n = long(s, 2)
# Construct a format string
f = "0{}b".format(len(s))
# Format the incremented number; this is your output
t = format(n + 1, f)
print(t)
To hardcode to four binary places (left-padded by 0) you would use 04b, for five you would use 05b, etc. In the code above we just get the length of the input string.
Oh, and if you input a number like 1111 and add 1 you'll get 10000 since you need an extra bit to represent that. If you want to wrap around to 0000 do t = format(n + 1, f)[-len(s):].

How to convert a string representing a binary fraction to a number in Python

Let us suppose that we have a string representing a binary fraction such as:
".1"
As a decimal number this is 0.5. Is there a standard way in Python to go from such strings to a number type (whether it is binary or decimal is not strictly important).
For an integer, the solution is straightforward:
int("101", 2)
>>>5
int() takes an optional second argument to provide the base, but float() does not.
I am looking for something functionally equivalent (I think) to this:
def frac_bin_str_to_float(num):
"""Assuming num to be a string representing
the fractional part of a binary number with
no integer part, return num as a float."""
result = 0
ex = 2.0
for c in num:
if c == '1':
result += 1/ex
ex *= 2
return result
I think that does what I want, although I may well have missed some edge cases.
Is there a built-in or standard method of doing this in Python?
The following is a shorter way to express the same algorithm:
def parse_bin(s):
return int(s[1:], 2) / 2.**(len(s) - 1)
It assumes that the string starts with the dot. If you want something more general, the following will handle both the integer and the fractional parts:
def parse_bin(s):
t = s.split('.')
return int(t[0], 2) + int(t[1], 2) / 2.**len(t[1])
For example:
In [56]: parse_bin('10.11')
Out[56]: 2.75
It is reasonable to suppress the point instead of splitting on it, as follows. This bin2float function (unlike parse_bin in previous answer) correctly deals with inputs without points (except for returning an integer instead of a float in that case).
For example, the invocations bin2float('101101'), bin2float('.11101'), andbin2float('101101.11101')` return 45, 0.90625, 45.90625 respectively.
def bin2float (b):
s, f = b.find('.')+1, int(b.replace('.',''), 2)
return f/2.**(len(b)-s) if s else f
You could actually generalize James's code to convert it from any number system if you replace the hard coded '2' to that base.
def str2float(s, base=10):
dot, f = s.find('.') + 1, int(s.replace('.', ''), base)
return f / float(base)**(len(s) - dot) if dot else f
You can use the Binary fractions package. With this package you can convert binary-fraction strings into floats and vice-versa.
Example:
>>> from binary_fractions import Binary
>>> float(Binary("0.1"))
0.5
>>> str(Binary(0.5))
'0b0.1'
It has many more helper functions to manipulate binary strings such as: shift, add, fill, to_exponential, invert...
PS: Shameless plug, I'm the author of this package.

Categories

Resources