I'm trying to convert some binary output from a file to different types, and I keep seeing odd things.
For instance, I have:
value = '\x11'
If you do
bin(ord(value))
you get the output
'0b10001'
whereas I was hoping to get
'0b00010001'
I'm basically trying to read in a 32 byte header, turn it into 1's and 0's, so I can grab various bits that have different meanings.
Why not just use bitwise operators?
def is_bit_set(i, x):
"""Check if the i-th bit in x is set"""
return x & (1 << i) > 0
To get the desired output, try:
"0b{:08b}".format(ord(value))
If efficiency is your concern, it's recommended to use native binary representation instead of literal(string) binary representation, for bitwise operation is much more compact and efficient.
format(ord('\x11'), '08b') will get you 00010001, which should be close enough to what you want.
Related
Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.
import numpy as np
for _ in range(2):
A=np.random.rand()+np.random.randint(0,100)
B=np.random.rand()+np.random.randint(0,100)
C=np.random.rand()+np.random.randint(0,100)
D=np.random.rand()+np.random.randint(0,100)
with open('file.txt','a') as f:
f.write(f'{A},{B},{C},{D}')
And thus the output example file looks very similar to what follows:
40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486
With the issue being that there are two numbers 'printed together', in the example they were as follows:
13.95828176311762557.217562728494684
So you cannot know if they should be
13.958281763117625, 57.217562728494684
or
13.9582817631176255, 7.217562728494684
Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)
Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?
Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):
0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345
To expand my comment into an actual answer:
We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.
We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)
If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.
Here's a snippet, for example, that you can play around with
def parse_number(s: str) -> List[float]:
if s.count('.') == 2:
first_decimal = s.index('.')
second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
split_idx = second_decimal - 1
for i in range(second_decimal - 1, first_decimal + 1, -1):
a, b = s[:split_idx], s[split_idx:]
if str(float(a)) == a and str(float(b)) == b:
return [float(a), float(b)]
# default to returning as large an a as possible
return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
else:
return [float(s)]
parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input
Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).
Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].
Yes, you have one available weapon: you're using the default precision to display the numbers. In the example you cite, there are 15 digits after the decimal point, making it easy to reconstruct the original numbers.
Let's take a simple case, where you have only 3 digits after the decimal point. It's trivial to separate
13.95857.217
The formatting requires a maximum of 2 digits before the decimal point, and three after.
Any case that has five digits between the points, is trivial to split.
13.958 57.217
However, you run into the "trailing zero" problem in some cases. If you see, instead
13.9557.217
This could be either
13.950 57.217
or
13.955 07.217
Your data do not contain enough information to differentiate the two cases.
if i have a number that is too big to be represented with 64 bits so i receive a string that contains it.
what happens if i use:
num = int(num_str)
i am asking because it looks like it works accurately and i dont understand how, does is allocate more memory for that?
i was required to check if a huge number is a power of 2. someone suggested:
def power(self, A):
A = int(A)
if A == 1:
return 0
x =bin(A)
if x.count('1')>1:
return 0
else:
return 1
while i understand why under regular circumstances it would work, the fact that the numbers are much larger than 2^64 and it still works baffles me.
According to the Python manual's description on the representation of integers:
These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left.
I have a hex string f6befc34e3de2d30. I want to convert it to signed long long, but
x['id'], = struct.unpack('>q', 'f6befc34e3de2d30'.decode('hex'))
gives:
-0b100101000001000000111100101100011100001000011101001011010000
0b1111011010111110111111000011010011100011110111100010110100110000
expected.
Thanks!
You could do long('f6befc34e3de2d30', 16)
bin(long('f6befc34e3de2d30', 16))
>>> '0b1111011010111110111111000011010011100011110111100010110100110000'
Edit: Follow up on #Paul Panzer's comment. That would be true with C type long implementation based on ALU hardware. You could not have signed integer larger that 2^63. However, Python's implementation is different, and relies on array representation of big numbers, and Karatsuba algorithm for arithmetic operations. That is why this method works.
Edit 2: Following OPs questions. There is no question of "first bit as sign". In your question you explicitly want to use the long construct of Python, for which the implementation is not the one you expect in the sense that, the representation it uses isn't the same as what you may be familiar with in C. Instead it represents large integers as an array. So if you want to implement some kind of first bit logic, you have to do it yourself. I have no culture or experience in that whatsoever so the following may come completely wrong as someone knowking his stuff, but still let me give you my take on this.
I see two ways of proceeding. In the first one you agree on a convention for the max long you want to work with, and then implement the same kind of logic the ALU does. Let us say for the sake of argument we want to work with sign long in the range [-2^127, 2^127-1]. We can do the following
MAX_LONG = long('1' + "".join([str(0)]*127), 2)
def parse_nb(s):
# returns the first bit and the significand in the case of a usual
# integer representation
b = bin(long(s, 16))
if len(b) < 130: # deal with the case where the leading zeros are absent
return "0", b[2:]
else:
return b[2], b[3:]
def read_long(s):
# takes an hexadecimal representation of a string, and return
# the corresponding long with the convention stated above
sign, mant = parse_nb(s)
b = "0b" + mant
if sign == "0":
return long(b, 2)
else:
return -MAX_LONG + long(b, 2)
read_long('5')
>>> 5L
# fffffffffffffffffffffffffffffffb is the representation of -5 using the
# usual integer representation, extended to 128 bits integers
read_long("fffffffffffffffffffffffffffffffb")
>>> -5L
For the second approach, you don't consider that there a MAX_LONG, but that the first bit is always the sign bit. Then you would have to modify the parse_nb method above. I leave that as an exercise :).
I have 32 bit numbers A=0x0000000A and B=0X00000005.
I get A xor B by A^B and it gives 0b1111.
I rotated this and got D=0b111100000 but I want this to be 32 bit number not just for printing but I need MSB bits even though there are 0 in this case for further manipulation.
Most high-level languages don't have ROR/ROL operators. There are two ways to deal with this: one is to add an external library like ctypes or https://github.com/scott-griffiths/bitstring, that have native support for rotate or bitslice support for integers (which is pretty easy to add).
One thing to keep in mind is that Python is 'infinite' precision - those MSBs are always 0 for positive numbers, 1 for negative numbers; python stores as many digits as it needs to hold up to the highest magnitude difference from the default. This is one reason you see weird notation in python like ~(0x3) is shown as -0x4, which is equivalent in two's complement notation, rather than the equivalent positive value, but -0x4 is always true, even if you AND it against a 5000 bit number, it will just mask off the bottom two bits.
Or, you can just do yourself, the way we all used to, and how the hardware actually does it:
def rotate_left(number, rotatebits, numbits=32):
newnumber = (number << rotatebits) & ~((1<<numbits)-1)
newnumber |= (number & ~((1<<rotatebits)-1)) << rotatebits
return newnumber
To get the binary of an integer you could use bin().
Just an short example:
>>> i = 333333
>>> print (i)
333333
>>> print (bin(i))
0b1010001011000010101
>>>
bin(i)[2:].zfill(32)
I guess does what you want.
I think your bigger problem here is that you are misunderstanding the difference between a number and its representation
12 ^ 18 #would xor the values
56 & 11 # and the values
if you need actual 32bit signed integers you can use numpy
a =numpy.array(range(100),dtype=np.int32)
Is there some way to get the actual bit representation, instead of the garbage '-0bx'? I need to actually be able to see the bits. Whether or not it comes out big/little endian doesn't matter. This is for an assignment.
Does anyone know how to view the actual 2's complement bit representation of an integer in python?
Because the number isn't constrained to a bit range, there is no canonical "the bits" representation. The output would be 0b1, 0b11111111, 0b1111111111111111, etc. depending on which bit range you happened to intend.
Would the following give what you want?
> x = -1
> print(bin(x & 0xffffffff)) # 32-bit output
0b11111111111111111111111111111111
Note: This doesn't pad with 0es to give a fixed length, as Ned's suggestion does.
>>> x = -1
>>> "{:032b}".format(x & 0xffffffff)
'11111111111111111111111111111111'