I have a hex string f6befc34e3de2d30. I want to convert it to signed long long, but
x['id'], = struct.unpack('>q', 'f6befc34e3de2d30'.decode('hex'))
gives:
-0b100101000001000000111100101100011100001000011101001011010000
0b1111011010111110111111000011010011100011110111100010110100110000
expected.
Thanks!
You could do long('f6befc34e3de2d30', 16)
bin(long('f6befc34e3de2d30', 16))
>>> '0b1111011010111110111111000011010011100011110111100010110100110000'
Edit: Follow up on #Paul Panzer's comment. That would be true with C type long implementation based on ALU hardware. You could not have signed integer larger that 2^63. However, Python's implementation is different, and relies on array representation of big numbers, and Karatsuba algorithm for arithmetic operations. That is why this method works.
Edit 2: Following OPs questions. There is no question of "first bit as sign". In your question you explicitly want to use the long construct of Python, for which the implementation is not the one you expect in the sense that, the representation it uses isn't the same as what you may be familiar with in C. Instead it represents large integers as an array. So if you want to implement some kind of first bit logic, you have to do it yourself. I have no culture or experience in that whatsoever so the following may come completely wrong as someone knowking his stuff, but still let me give you my take on this.
I see two ways of proceeding. In the first one you agree on a convention for the max long you want to work with, and then implement the same kind of logic the ALU does. Let us say for the sake of argument we want to work with sign long in the range [-2^127, 2^127-1]. We can do the following
MAX_LONG = long('1' + "".join([str(0)]*127), 2)
def parse_nb(s):
# returns the first bit and the significand in the case of a usual
# integer representation
b = bin(long(s, 16))
if len(b) < 130: # deal with the case where the leading zeros are absent
return "0", b[2:]
else:
return b[2], b[3:]
def read_long(s):
# takes an hexadecimal representation of a string, and return
# the corresponding long with the convention stated above
sign, mant = parse_nb(s)
b = "0b" + mant
if sign == "0":
return long(b, 2)
else:
return -MAX_LONG + long(b, 2)
read_long('5')
>>> 5L
# fffffffffffffffffffffffffffffffb is the representation of -5 using the
# usual integer representation, extended to 128 bits integers
read_long("fffffffffffffffffffffffffffffffb")
>>> -5L
For the second approach, you don't consider that there a MAX_LONG, but that the first bit is always the sign bit. Then you would have to modify the parse_nb method above. I leave that as an exercise :).
Related
Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.
import numpy as np
for _ in range(2):
A=np.random.rand()+np.random.randint(0,100)
B=np.random.rand()+np.random.randint(0,100)
C=np.random.rand()+np.random.randint(0,100)
D=np.random.rand()+np.random.randint(0,100)
with open('file.txt','a') as f:
f.write(f'{A},{B},{C},{D}')
And thus the output example file looks very similar to what follows:
40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486
With the issue being that there are two numbers 'printed together', in the example they were as follows:
13.95828176311762557.217562728494684
So you cannot know if they should be
13.958281763117625, 57.217562728494684
or
13.9582817631176255, 7.217562728494684
Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)
Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?
Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):
0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345
To expand my comment into an actual answer:
We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.
We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)
If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.
Here's a snippet, for example, that you can play around with
def parse_number(s: str) -> List[float]:
if s.count('.') == 2:
first_decimal = s.index('.')
second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
split_idx = second_decimal - 1
for i in range(second_decimal - 1, first_decimal + 1, -1):
a, b = s[:split_idx], s[split_idx:]
if str(float(a)) == a and str(float(b)) == b:
return [float(a), float(b)]
# default to returning as large an a as possible
return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
else:
return [float(s)]
parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input
Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).
Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].
Yes, you have one available weapon: you're using the default precision to display the numbers. In the example you cite, there are 15 digits after the decimal point, making it easy to reconstruct the original numbers.
Let's take a simple case, where you have only 3 digits after the decimal point. It's trivial to separate
13.95857.217
The formatting requires a maximum of 2 digits before the decimal point, and three after.
Any case that has five digits between the points, is trivial to split.
13.958 57.217
However, you run into the "trailing zero" problem in some cases. If you see, instead
13.9557.217
This could be either
13.950 57.217
or
13.955 07.217
Your data do not contain enough information to differentiate the two cases.
I'm working with some in-place code dealing with formatting user-stored floating point numbers for human display.
The current implementation does this:
"{0:.24f}".format(some_floating_point).rstrip('0')
which makes sense and works just fine for the most part. But when faced with a value of such as 0.0003 things don't go as well.
>>> "{0:.24f}".format(0.0003).rstrip('0')
'0.000299999999999999973719'
Some further investigation indicates that Python seems to change the underlying representation based on the number of digits requested?
>>> "{0:.15f}".format(0.0003)
'0.000300000000000'
>>> "{0:.20f}".format(0.0003)
'0.00029999999999999997'
My assumption is single precision vs double.
The user enters these values where they are stored in the database as a double, and when the form is rendered again later the same value is prepopulated in the field. Therefore I need a 1:1 mapping of these representations.
My question is therefore: What is an elegant, and more importantly safe way to deal with this behavior? My best efforts so far have involved log10 and are less than ideal to put it nicely.
EDIT: As Prune points out the value is not actually changing, but rather the rounding done by format will carry over causing a set of 9s to become 0s (d'oh). The behavior makes sense then, but the solution is still escaping me.
You are receiving the number as stored. 0.0003 cannot be stored exactly as a binary fraction. To illustrate:
>>> 0.00029999999999999997 == 0.0003
True
Print formatting rounds the number at the least significant digit. Double precision merely pushes the problem farther to the right. To fully "solve" the problem to base-10 eyes, you need to switch to decimal arithmetic, or perhaps build your own string handler for numbers that are sufficiently close to a simpler value (a suspicious string of 9's or 0's in the fractional part).
Here's the start of a function for you. I tested it with 0.0004, which stores as a hair more than 0.0004; the 9's case is left as an exercise :-) .
def str_round(x):
size = 6
nines = '9'*size
zeros = '0'*size
str = "{0:.24f}".format(x).rstrip('0')
str_len = len(str)
print str, str_len
if nines in str:
# replace leading digit with one more
pos = str.index(nines)
# ADD CODE HERE
# Turn the leading portion into an integer;
# increment and convert back to zero-leading string.
# Fill out the rest with zeros.
elif zeros in str:
# Change all trailing digits to 0
pos = str.index(zeros)
str = str[:pos] + '0'*(str_len - pos)
return str
print str_round(0.0004)
I am looking for a slick function that reverses the digits of the binary representation of a number.
If f were such a function I would have
int(reversed(s),2) == f(int(s,2)) whenever s is a string of zeros and ones starting with 1.
Right now I am using lambda x: int(''.join(reversed(bin(x)[2:])),2)
which is ok as far as conciseness is concerned, but it seems like a pretty roundabout way of doing this.
I was wondering if there was a nicer (perhaps faster) way with bitwise operators and what not.
How about
int('{0:b}'.format(n)[::-1], 2)
or
int(bin(n)[:1:-1], 2)
The second method seems to be the faster of the two, however both are much faster than your current method:
import timeit
print timeit.timeit("int('{0:b}'.format(n)[::-1], 2)", 'n = 123456')
print timeit.timeit("int(bin(n)[:1:-1], 2)", 'n = 123456')
print timeit.timeit("int(''.join(reversed(bin(n)[2:])),2)", 'n = 123456')
1.13251614571
0.710681915283
2.23476600647
You could do it with shift operators like this:
def revbits(x):
rev = 0
while x:
rev <<= 1
rev += x & 1
x >>= 1
return rev
It doesn't seem any faster than your method, though (in fact, slightly slower for me).
Here is my suggestion:
In [83]: int(''.join(bin(x)[:1:-1]), 2)
Out[83]: 9987
Same method, slightly simplified.
I would argue your current method is perfectly fine, but you can lose the list() call, as str.join() will accept any iterable:
def binary_reverse(num):
return int(''.join(reversed(bin(num)[2:])), 2)
It would also advise against using lambda for anything but the simplest of functions, where it will only be used once, and makes surrounding code clearer by being inlined.
The reason I feel this is fine as it describes what you want to do - take the binary representation of a number, reverse it, then get a number again. That makes this code very readable, and that should be a priority.
There is an entire half chapter of Hacker's Delight devoted to this issue (Section 7-1: Reversing Bits and Bytes) using binary operations, bit shifts, and other goodies. Seems like these are all possible in Python and it should be much quicker than the binary-to-string-and-reverse methods.
The book isn't available publicly but I found this blog post that discusses some of it. The method shown in the blog post follows the following quote from the book:
Bit reversal can be done quite efficiently by interchanging adjacent
single bits, then interchanging adjacent 2-bit fields, and so on, as
shown below. These five assignment statements can be executed in any
order.
http://blog.sacaluta.com/2011/02/hackers-delight-reversing-bits.html
>>> def bit_rev(n):
... return int(bin(n)[:1:-1], 2)
...
>>> bit_rev(2)
1
>>>bit_rev(10)
5
What if you wanted to reverse the binary value based on a specific amount of bits, i.e. 1 = 2b'00000001? In this case the reverse value would be 2b'10000000 or 128 (dec) respectively 0x80 (hex).
def binary_reverse(num, bit_length):
# Convert to binary and pad with 0s on the left
bin_val = bin(num)[2:].zfill(bit_length)
return int(''.join(reversed(bin_val)), 2)
# Or, alternatively:
# return int(bin_val[::-1], 2)
I am wondering about the way Python (3.3.0) prints complex numbers. I am looking for an explanation, not a way to change the print.
Example:
>>> complex(1,1)-complex(1,1)
0j
Why doesn't it just print "0"? My guess is: to keep the output of type complex.
Next example:
>>> complex(0,1)*-1
(-0-1j)
Well, a simple "-1j" or "(-1j)" would have done. And why "-0"?? Isn't that the same as +0? It doesn't seem to be a rounding problem:
>>> (complex(0,1)*-1).real == 0.0
True
And when the imaginary part gets positive, the -0 vanishes:
>>> complex(0,1)
1j
>>> complex(0,1)*-1
(-0-1j)
>>> complex(0,1)*-1*-1
1j
Yet another example:
>>> complex(0,1)*complex(0,1)*-1
(1-0j)
>>> complex(0,1)*complex(0,1)*-1*-1
(-1+0j)
>>> (complex(0,1)*complex(0,1)*-1).imag
-0.0
Am I missing something here?
It prints 0j to indicate that it's still a complex value. You can also type it back in that way:
>>> 0j
0j
The rest is probably the result of the magic of IEEE 754 floating point representation, which makes a distinction between 0 and -0, the so-called signed zero. Basically, there's a single bit that says whether the number is positive or negative, regardless of whether the number happens to be zero. This explains why 1j * -1 gives something with a negative zero real part: the positive zero got multiplied by -1.
-0 is required by the standard to compare equal to +0, which explains why (1j * -1).real == 0.0 still holds.
The reason that Python still decides to print the -0, is that in the complex world these make a difference for branch cuts, for instance in the phase function:
>>> phase(complex(-1.0, 0.0))
3.141592653589793
>>> phase(complex(-1.0, -0.0))
-3.141592653589793
This is about the imaginary part, not the real part, but it's easy to imagine situations where the sign of the real part would make a similar difference.
The answer lies in the Python source code itself.
I'll work with one of your examples. Let
a = complex(0,1)
b = complex(-1, 0)
When you doa*b you're calling this function:
real_part = a.real*b.real - a.imag*b.imag
imag_part = a.real*b.imag + a.imag*b.real
And if you do that in the python interpreter, you'll get
>>> real_part
-0.0
>>> imag_part
-1.0
From IEEE754, you're getting a negative zero, and since that's not +0, you get the parens and the real part when printing it.
if (v->cval.real == 0. && copysign(1.0, v->cval.real)==1.0) {
/* Real part is +0: just output the imaginary part and do not
include parens. */
...
else {
/* Format imaginary part with sign, real part without. Include
parens in the result. */
...
I guess (but I don't know for sure) that the rationale comes from the importance of that sign when calculating with elementary complex functions (there's a reference for this in the wikipedia article on signed zero).
0j is an imaginary literal which indeed indicates a complex number rather than an integer or floating-point one.
The +-0 ("signed zero") is a result of Python's conformance to IEEE 754 floating point representation since in Python, complex is by definition a pair of floating point numbers. Due to the latter, there's no need to print or specify zero fraction parts for a complex too.
The -0 part is printed in order to accurately represent the contents as repr()'s documentation demands (repr() is implicitly called whenever an operation's result is output to the console).
Regarding the question why (-0+1j) = 1j but (1j*-1) = (-0+1j).
Note that (-0+0j) or (-0.0+0j) aren't single complex numbers but expressions - an int/float added to a complex. To compute the result, first the first number is converted to a complex (-0-> (0.0,0.0) since integers don't have signed zeros, -0.0-> (-0.0,0.0)). Then its .real and .imag are added to the corresponding ones of 1j which are (+0.0,1.0). The result is (+0.0,1.0) :^) . To construct a complex directly, use complex(-0.0,1).
As far as the first question is concerned: if it just printed 0 it would be mathematically correct, but you wouldn't know you were dealing with a complex object vs an int. As long as you don't specify .real you will always get a J component.
I'm not sure why you would ever get -0; it's not technically incorrect (-1 * 0 = 0) but it's syntactically odd.
As far as the rest goes, it's strange that it isn't consistent, however none are technically correct, just an artifact of the implementation.
Is there some way to get the actual bit representation, instead of the garbage '-0bx'? I need to actually be able to see the bits. Whether or not it comes out big/little endian doesn't matter. This is for an assignment.
Does anyone know how to view the actual 2's complement bit representation of an integer in python?
Because the number isn't constrained to a bit range, there is no canonical "the bits" representation. The output would be 0b1, 0b11111111, 0b1111111111111111, etc. depending on which bit range you happened to intend.
Would the following give what you want?
> x = -1
> print(bin(x & 0xffffffff)) # 32-bit output
0b11111111111111111111111111111111
Note: This doesn't pad with 0es to give a fixed length, as Ned's suggestion does.
>>> x = -1
>>> "{:032b}".format(x & 0xffffffff)
'11111111111111111111111111111111'