Positive integer from Python hash() function - python

I want to use the Python hash() function to get integer hashes from objects. But built-in hash() can give negative values, and I want only positive. And I want it to work sensibly on both 32-bit and 64-bit platforms.
I.e. on 32-bit Python, hash() can return an integer in the range -2**31 to 2**31 - 1.
On 64-bit systems, hash() can return an integer in the range -2**63 to 2**63 - 1.
But I want a hash in the range 0 to 2**32-1 on 32-bit systems, and 0 to 2**64-1 on 64-bit systems.
What is the best way to convert the hash value to its equivalent positive value within the range of the 32- or 64-bit target platform?
(Context: I'm trying to make a new random.Random style class. According to the random.Random.seed() docs, the seed "optional argument x can be any hashable object." So I'd like to duplicate that functionality, except that my seed algorithm can't handle negative integer values, only positive.)

Using sys.maxsize:
>>> import sys
>>> sys.maxsize
9223372036854775807L
>>> hash('asdf')
-618826466
>>> hash('asdf') % ((sys.maxsize + 1) * 2)
18446744073090725150L
Alternative using ctypes.c_size_t:
>>> import ctypes
>>> ctypes.c_size_t(hash('asdf')).value
18446744073090725150L

Just using sys.maxsize is wrong for obvious reasons (it being `2*n-1 and not 2*n), but the fix is easy enough:
h = hash(obj)
h += sys.maxsize + 1
for performance reasons you may want to split the sys.maxsize + 1 into two separate assignments to avoid creating a long integer temporarily for most negative numbers. Although I doubt this is going to matter much

(Edit: at first I thought you always wanted a 32-bit value)
Simply AND it with a mask of the desired size. Generally sys.maxsize will already be such a mask, since it's a power of 2 minus 1.
import sys
assert (sys.maxsize & (sys.maxsize+1)) == 0 # checks that maxsize+1 is a power of 2
new_hash = hash & sys.maxsize

How about:
h = hash(o)
if h < 0:
h += sys.maxsize
This uses sys.maxsize to be portable between 32- and 64-bit systems.

Related

Using 32-bit ints and operands

Is it possible to somehow override or overload the standard implementation of ints/numbers in python so that it acts like a 32-bit int.
a: int
a = 4076863488
>>> -218103808
Or is it possible to somehow define a variable that can't change type? Doing something like: x: int?
I want to do this because it's annoying to write ctypes.c_int32(n) on every bit operation and assignment. Especially since Python does not use 32 bits bitwise operands.
I know I'm basically trying to change the nature of the language. So maybe I'm asking what you would do if you had to do 32-bit stuff in python.
Some options:
Use Cython. You can declare a native 32-bit int type there, and you even get the advantage that pure numerical code gets compiled to (very) fast C code.
Use a numpy array of a single element: np.zeros((1,), dtype=np.int32). Provided you only ever use in-place operations (+=, *=, etc.), this will work like a 32-bit int type. Do be aware that if you ever use a regular binary operator (e.g. myint + 3), you might be subjected to type promotion or conversion, and the result will no longer be int32.
Use ctypes.c_int32. This comes built-in to Python, but supports no mathematical operations so you have to wrap and unwrap yourself (e.g. newval = c_int32(v1.value + v2.value)).
Use a library like fixedint (shameless plug), which provides fixed-integer classes that remain fixed size through operations rather than decaying to int. fixedint was specifically designed with fixed-width bitwise math in mind. In this case you would use fixedint.Int32.
Some less desirable options:
struct: Throws errors if your input is out of range. You can work around this with unpack('i', pack('I', val & 0xffffffff))[0], but that's really unwieldy.
array: Throws errors if you try to store a value out of range. Harder to work around than struct.
Manual bitmashing. With an unsigned 32-bit int, this is just a matter of adding & 0xffffffff a lot, which is not too bad. But, Python doesn't have any built-in way to wrap a value to a signed 32-bit int, so you'll have to write your own int32 conversion function and wrap all your operations with it:
def to_int32(val):
val &= ((1<<32)-1)
if val & (1<<31): val -= (1<<32)
return val
Demonstrations of your options:
Cython
cpdef int munge(int val):
cdef int x
x = val * 32
x += 0x7fffffff
return x
Save as int_test.pyx and compile with cythonize -a -i int_test.pyx.
>>> import int_test
>>> int_test.munge(3)
-2147483553
NumPy
import numpy as np
def munge(val):
x = val.copy()
x *= 32
x += 0x7fffffff
return x
def to_int32(val):
return np.array((val,), dtype=np.int32)
print(munge(to_int32(3)))
# prints [-2147483553]
ctypes
from ctypes import c_int32
def munge(val):
x = c_int32(val.value * 32)
x = c_int32(x.value + 0x7fffffff)
return x
print(munge(c_int32(3)))
# prints c_int(-2147483553)
fixedint
import fixedint
def munge(val):
x = val * 32
x += 0x7fffffff
return x
print(munge(fixedint.Int32(3)))
# prints -2147483553

Why Lua and Python factorial output was different

--t.lua
function fact(n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
for i=1,100,1 do
print(i,fact(i))
end
# t.py
fact = lambda n:1 if n == 0 else n * fact(n-1)
for i in range(1, 100):
print(i, fact(i))
When I write a factorial code in Lua and in Python, I found that output was different.
Lua as usually configured uses your platform's usual double-precision floating point format to store all numbers (this means all number types). For most desktop platforms today, that will be the 64-bit IEEE-754 format. The conventional wisdom is that integers in the range -1E15 to +1E15 can be safely assumed to be represented exactly. To deal with huge numbers in Lua, key words are "bignum" and "arbitrary precision numbers". You can use pure-Lua modules. for example (bignum and lua-nums) and C-based module lmapm. Also read this thread.
Python supports a such-known "bignum" integer type which can work with arbitrarily large numbers. In Python 2.5+, this type is called long and is separate from the int type, but the interpreter will automatically use whichever is more appropriate. In Python 3.0+, the int type has been dropped completely. In Python usually you don't need to use special tools to deal with huge numbers.
This is basic example with lbn library library
local bn = require "bn"
function bn_fact(n)
if n:tonumber() == 0 then return 1 end
return n * bn_fact(n-1)
end
function fact(n)
return bn_fact(bn.number(n))
end
for i=1,100,1 do
print(i,fact(i))
end
Output for some values
30 265252859812191058636308480000000
31 8222838654177922817725562880000000
32 263130836933693530167218012160000000
33 8683317618811886495518194401280000000
You have an overflow on your first image because values are to big to be stored on that var.

Converting number into 32 bits in python

I have 32 bit numbers A=0x0000000A and B=0X00000005.
I get A xor B by A^B and it gives 0b1111.
I rotated this and got D=0b111100000 but I want this to be 32 bit number not just for printing but I need MSB bits even though there are 0 in this case for further manipulation.
Most high-level languages don't have ROR/ROL operators. There are two ways to deal with this: one is to add an external library like ctypes or https://github.com/scott-griffiths/bitstring, that have native support for rotate or bitslice support for integers (which is pretty easy to add).
One thing to keep in mind is that Python is 'infinite' precision - those MSBs are always 0 for positive numbers, 1 for negative numbers; python stores as many digits as it needs to hold up to the highest magnitude difference from the default. This is one reason you see weird notation in python like ~(0x3) is shown as -0x4, which is equivalent in two's complement notation, rather than the equivalent positive value, but -0x4 is always true, even if you AND it against a 5000 bit number, it will just mask off the bottom two bits.
Or, you can just do yourself, the way we all used to, and how the hardware actually does it:
def rotate_left(number, rotatebits, numbits=32):
newnumber = (number << rotatebits) & ~((1<<numbits)-1)
newnumber |= (number & ~((1<<rotatebits)-1)) << rotatebits
return newnumber
To get the binary of an integer you could use bin().
Just an short example:
>>> i = 333333
>>> print (i)
333333
>>> print (bin(i))
0b1010001011000010101
>>>
bin(i)[2:].zfill(32)
I guess does what you want.
I think your bigger problem here is that you are misunderstanding the difference between a number and its representation
12 ^ 18 #would xor the values
56 & 11 # and the values
if you need actual 32bit signed integers you can use numpy
a =numpy.array(range(100),dtype=np.int32)

How to add to/subtract from python float value the smallest possible value [explanation that is not duplicate]? [duplicate]

How can I increment a floating point value in python by the smallest possible amount?
Background: I'm using floating point values as dictionary keys.
Occasionally, very occasionally (and perhaps never, but not certainly never), there will be collisions. I would like to resolve these by incrementing the floating point value by as small an amount as possible. How can I do this?
In C, I would twiddle the bits of the mantissa to achieve this, but I assume that isn't possible in Python.
Since Python 3.9 there is math.nextafter in the stdlib. Read on for alternatives in older Python versions.
Increment a python floating point value by the smallest possible amount
The nextafter(x,y) functions return the next discretely different representable floating-point value following x in the direction of y. The nextafter() functions are guaranteed to work on the platform or to return a sensible value to indicate that the next value is not possible.
The nextafter() functions are part of POSIX and ISO C99 standards and is _nextafter() in Visual C. C99 compliant standard math libraries, Visual C, C++, Boost and Java all implement the IEEE recommended nextafter() functions or methods. (I do not honestly know if .NET has nextafter(). Microsoft does not care much about C99 or POSIX.)
None of the bit twiddling functions here fully or correctly deal with the edge cases, such as values going though 0.0, negative 0.0, subnormals, infinities, negative values, over or underflows, etc. Here is a reference implementation of nextafter() in C to give an idea of how to do the correct bit twiddling if that is your direction.
There are two solid work arounds to get nextafter() or other excluded POSIX math functions in Python < 3.9:
Use Numpy:
>>> import numpy
>>> numpy.nextafter(0,1)
4.9406564584124654e-324
>>> numpy.nextafter(.1, 1)
0.10000000000000002
>>> numpy.nextafter(1e6, -1)
999999.99999999988
>>> numpy.nextafter(-.1, 1)
-0.099999999999999992
Link directly to the system math DLL:
import ctypes
import sys
from sys import platform as _platform
if _platform == "linux" or _platform == "linux2":
_libm = ctypes.cdll.LoadLibrary('libm.so.6')
_funcname = 'nextafter'
elif _platform == "darwin":
_libm = ctypes.cdll.LoadLibrary('libSystem.dylib')
_funcname = 'nextafter'
elif _platform == "win32":
_libm = ctypes.cdll.LoadLibrary('msvcrt.dll')
_funcname = '_nextafter'
else:
# these are the ones I have access to...
# fill in library and function name for your system math dll
print("Platform", repr(_platform), "is not supported")
sys.exit(0)
_nextafter = getattr(_libm, _funcname)
_nextafter.restype = ctypes.c_double
_nextafter.argtypes = [ctypes.c_double, ctypes.c_double]
def nextafter(x, y):
"Returns the next floating-point number after x in the direction of y."
return _nextafter(x, y)
assert nextafter(0, 1) - nextafter(0, 1) == 0
assert 0.0 + nextafter(0, 1) > 0.0
And if you really really want a pure Python solution:
# handles edge cases correctly on MY computer
# not extensively QA'd...
import math
# 'double' means IEEE 754 double precision -- c 'double'
epsilon = math.ldexp(1.0, -53) # smallest double that 0.5+epsilon != 0.5
maxDouble = float(2**1024 - 2**971) # From the IEEE 754 standard
minDouble = math.ldexp(1.0, -1022) # min positive normalized double
smallEpsilon = math.ldexp(1.0, -1074) # smallest increment for doubles < minFloat
infinity = math.ldexp(1.0, 1023) * 2
def nextafter(x,y):
"""returns the next IEEE double after x in the direction of y if possible"""
if y==x:
return y #if x==y, no increment
# handle NaN
if x!=x or y!=y:
return x + y
if x >= infinity:
return infinity
if x <= -infinity:
return -infinity
if -minDouble < x < minDouble:
if y > x:
return x + smallEpsilon
else:
return x - smallEpsilon
m, e = math.frexp(x)
if y > x:
m += epsilon
else:
m -= epsilon
return math.ldexp(m,e)
Or, use Mark Dickinson's excellent solution
Obviously the Numpy solution is the easiest.
Python 3.9 and above
Starting with Python 3.9, released 2020-10-05, you can use the math.nextafter function:
math.nextafter(x, y)
Return the next floating-point value after x towards y.
If x is equal to y, return y.
Examples:
math.nextafter(x, math.inf) goes up: towards positive infinity.
math.nextafter(x, -math.inf) goes down: towards minus infinity.
math.nextafter(x, 0.0) goes towards zero.
math.nextafter(x, math.copysign(math.inf, x)) goes away from zero.
See also math.ulp().
First, this "respond to a collision" is a pretty bad idea.
If they collide, the values in the dictionary should have been lists of items with a common key, not individual items.
Your "hash probing" algorithm will have to loop through more than one "tiny increments" to resolve collisions.
And sequential hash probes are known to be inefficient.
Read this: http://en.wikipedia.org/wiki/Quadratic_probing
Second, use math.frexp and sys.float_info.epsilon to fiddle with mantissa and exponent separately.
>>> m, e = math.frexp(4.0)
>>> (m+sys.float_info.epsilon)*2**e
4.0000000000000018
Forgetting about why we would want to increment a floating point value for a moment, I would have to say I think Autopulated's own answer is probably correct.
But for the problem domain, I share the misgivings of most of the responders to the idea of using floats as dictionary keys. If the objection to using Decimal (as proposed in the main comments) is that it is a "heavyweight" solution, I suggest a do-it-yourself compromise: Figure out what the practical resolution is on the timestamps, pick a number of digits to adequately cover it, then multiply all the timestamps by the necessary amount so that you can use integers as the keys. If you can afford an extra digit or two beyond the timer precision, then you can be even more confident that there will be no or fewer collisions, and that if there are collisions, you can just add 1 (instead of some rigamarole to find the next floating point value).
I recommend against assuming that floats (or timestamps) will be unique if at all possible. Use a counting iterator, database sequence or other service to issue unique identifiers.
Instead of incrementing the value, just use a tuple for the colliding key. If you need to keep them in order, every key should be a tuple, not just the duplicates.
A better answer (now I'm just doing this for fun...), motivated by twiddling the bits. Handling the carry and overflows between parts of the number of negative values is somewhat tricky.
import struct
def floatToieee754Bits(f):
return struct.unpack('<Q', struct.pack('<d', f))[0]
def ieee754BitsToFloat(i):
return struct.unpack('<d', struct.pack('<Q', i))[0]
def incrementFloat(f):
i = floatToieee754Bits(f)
if f >= 0:
return ieee754BitsToFloat(i+1)
else:
raise Exception('f not >= 0: unsolved problem!')
Instead of resolving the collisions by changing the key, how about collecting the collisions? IE:
bag = {}
bag[1234.] = 'something'
becomes
bag = collections.defaultdict(list)
bag[1234.].append('something')
would that work?
For colliding key k, add: k / 250
Interesting problem. The amount you need to add obviously depends on the magnitude of the colliding value, so that a normalized add will affect only the least significant bits.
It's not necessary to determine the smallest value that can be added. All you need to do is approximate it. The FPU format provides 52 mantissa bits plus a hidden bit for 53 bits of precision. No physical constant is known to anywhere near this level of precision. No sensor is able measure anything near it. So you don't have a hard problem.
In most cases, for key k, you would be able to add k/253, because of that 52-bit fraction plus the hidden bit.
But it's not necessary to risk triggering library bugs or exploring rounding issues by shooting for the very last bit or anything near it.
So I would say, for colliding key k, just add k / 250 and call it a day.1
1. Possibly more than once until it doesn't collide any more, at least to foil any diabolical unit test authors.
import sys
>>> sys.float_info.epsilon
2.220446049250313e-16
Instead of modifying your float timestamp, use a tuple for every key as Mark Ransom suggests where the tuple (x,y) is composed of x=your_unmodified_time_stamp and y=(extremely unlikely to be a same value twice).
So:
x just is the unmodified timestamp and can be the same value many times;
y you can use:
a random integer number from a large range,
serial integer (0,1,2,etc),
UUID.
While 2.1 (random int from a large range) there works great for ethernet, I would use 2.2 (serializer) or 2.3 (UUID). Easy, fast, bulletproof. For 2.2 and 2.3 you don't even need collision detection (you might want to still have it for 2.1 as ethernet does.)
The advantage of 2.2 is that you can also tell, and sort, data elements that have the same float time stamp.
Then just extract x from the tuple for any sorting type operations and the tuple itself is a collision free key for the hash / dictionary.
Edit
I guess example code will help:
#!/usr/bin/env python
import time
import sys
import random
#generator for ints from 0 to maxinteger on system:
serializer=(sn for sn in xrange(0,sys.maxint))
#a list with guranteed collisions:
times=[]
for c in range(0,35):
t=time.clock()
for i in range(0,random.choice(range(0,4))):
times.append(t)
print len(set(times)), "unique items in a list of",len(times)
#dictionary of tuples; no possibilities of collisions:
di={}
for time in times:
sn=serializer.next()
di[(time,sn)]='Element {}'.format(sn)
#for tuples of multiple numbers, Python sorts
# as you expect: first by t[0] then t[1], until t[n]
for key in sorted(di.keys()):
print "{:>15}:{}".format(key, di[key])
Output:
26 unique items in a list of 55
(0.042289, 0):Element 0
(0.042289, 1):Element 1
(0.042289, 2):Element 2
(0.042305, 3):Element 3
(0.042305, 4):Element 4
(0.042317, 5):Element 5
# and so on until Element n...
Here it part of it. This is dirty and slow, but maybe that is how you like it. It is missing several corner cases, but maybe this gets someone else close.
The idea is to get the hex string of a floating point number. That gives you a string with the mantissa and exponent bits to twiddle. The twiddling is a pain since you have to do all it manually and keep converting to/from strings. Anyway, you add(subtract) 1 to(from) the last digit for positive(negative) numbers. Make sure you carry through to the exponent if you overflow. Negative numbers are a little more tricky to make you don't waste any bits.
def increment(f):
h = f.hex()
# decide if we need to increment up or down
if f > 0:
sign = '+'
inc = 1
else:
sign = '-'
inc = -1
# pull the string apart
h = h.split('0x')[-1]
h,e = h.split('p')
h = ''.join(h.split('.'))
h2 = shift(h, inc)
# increase the exponent if we added a digit
h2 = '%s0x%s.%sp%s' % (sign, h2[0], h2[1:], e)
return float.fromhex(h2)
def shift(s, num):
if not s:
return ''
right = s[-1]
right = int(right, 16) + num
if right > 15:
num = right // 16
right = right%16
elif right < 0:
right = 0
num = -1
else:
num = 0
# drop the leading 0x
right = hex(right)[2:]
return shift(s[:-1], num) + right
a = 1.4e4
print increment(a) - a
a = -1.4e4
print increment(a) - a
a = 1.4
print increment(a) - a
I think you mean "by as small an amount possible to avoid a hash collision", since for example the next-highest-float may already be a key! =)
while toInsert.key in myDict: # assumed to be positive
toInsert.key *= 1.000000000001
myDict[toInsert.key] = toInsert
That said you probably don't want to be using timestamps as keys.
After Looking at Autopopulated's answer I came up with a slightly different answer:
import math, sys
def incrementFloatValue(value):
if value == 0:
return sys.float_info.min
mant, exponent = math.frexp(value)
epsilonAtValue = math.ldexp(1, exponent - sys.float_info.mant_dig)
return math.fsum([value, epsilonAtValue])
Disclaimer: I'm really not as great at maths as I think I am ;) Please verify this is correct before using it. Also I'm not sure about performance
some notes:
epsilonAtValue calculates how many bits are used for the mantissa (the maximum minus what is used for the exponent).
I'm not sure if the math.fsum() is needed but hey it doesn't seem to hurt.
It turns out that this is actually quite complicated (maybe why seven people have answered without actually providing an answer yet...).
I think this is the right solution, it certainly seems to handle 0 and positive values correctly:
import math
import sys
def incrementFloat(f):
if f == 0.0:
return sys.float_info.min
m, e = math.frexp(f)
return math.ldexp(m + sys.float_info.epsilon / 2, e)

Maximum value for long integer

How can I assign the maximum value for a long integer to a variable, similar, for example, to C++'s LONG_MAX.
Long integers:
There is no explicitly defined limit. The amount of available address space forms a practical limit.
(Taken from this site). See the docs on Numeric Types where you'll see that Long integers have unlimited precision. In Python 2, Integers will automatically switch to longs when they grow beyond their limit:
>>> import sys
>>> type(sys.maxsize)
<type 'int'>
>>> type(sys.maxsize+1)
<type 'long'>
for integers we have
maxint and maxsize:
The maximum value of an int can be found in Python 2.x with sys.maxint. It was removed in Python 3, but sys.maxsize can often be used instead. From the changelog:
The sys.maxint constant was removed, since there is no longer a limit
to the value of integers. However, sys.maxsize can be used as an
integer larger than any practical list or string index. It conforms to
the implementation’s “natural” integer size and is typically the same
as sys.maxint in previous releases on the same platform (assuming the
same build options).
and, for anyone interested in the difference (Python 2.x):
sys.maxint The largest positive integer supported by Python’s regular
integer type. This is at least 2**31-1. The largest negative integer
is -maxint-1 — the asymmetry results from the use of 2’s complement
binary arithmetic.
sys.maxsize The largest positive integer supported by the platform’s
Py_ssize_t type, and thus the maximum size lists, strings, dicts, and
many other containers can have.
and for completeness, here's the Python 3 version:
sys.maxsize
An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2^31 - 1 on a 32-bit platform and
2^63 - 1 on a 64-bit platform.
floats:
There's float("inf") and float("-inf"). These can be compared to other numeric types:
>>> import sys
>>> float("inf") > sys.maxsize
True
Python long can be arbitrarily large. If you need a value that's greater than any other value, you can use float('inf'), since Python has no trouble comparing numeric values of different types. Similarly, for a value lesser than any other value, you can use float('-inf').
Direct answer to title question:
Integers are unlimited in size and have no maximum value in Python.
Answer which addresses stated underlying use case:
According to your comment of what you're trying to do, you are currently thinking something along the lines of
minval = MAXINT;
for (i = 1; i < num_elems; i++)
if a[i] < a[i-1]
minval = a[i];
That's not how to think in Python. A better translation to Python (but still not the best) would be
minval = a[0] # Just use the first value
for i in range(1, len(a)):
minval = min(a[i], a[i - 1])
Note that the above doesn't use MAXINT at all. That part of the solution applies to any programming language: You don't need to know the highest possible value just to find the smallest value in a collection.
But anyway, what you really do in Python is just
minval = min(a)
That is, you don't write a loop at all. The built-in min() function gets the minimum of the whole collection.
long type in Python 2.x uses arbitrary precision arithmetic and has no such thing as maximum possible value. It is limited by the available memory. Python 3.x has no special type for values that cannot be represented by the native machine integer — everything is int and conversion is handled behind the scenes.
Unlike C/C++ Long in Python have unlimited precision. Refer the section Numeric Types in python for more information.To determine the max value of integer you can just refer sys.maxint. You can get more details from the documentation of sys.
You can use: max value of float is
float('inf')
for negative
float('-inf')
A) For a cheap comparison / arithmetics dummy use math.inf. Or math.nan, which compares FALSE in any direction (including nan == nan) except identity check (is) and renders any arithmetics (like nan - nan) nan. Or a reasonably high real integer number according to your use case (e.g. sys.maxsize). For a bitmask dummy (e.g. in mybits & bitmask) use -1.
B) To get the platform primitive maximum signed long int (or long long):
>>> 256 ** sys.int_info.sizeof_digit // 2 - 1 # Python’s internal primitive
2147483647
>>> 256 ** ctypes.sizeof(ctypes.c_long) // 2 - 1 # CPython
2147483647
>>> 256 ** ctypes.sizeof(ctypes.c_longlong) // 2 - 1 # CPython
9223372036854775807
>>> 2**63 - 1 # Java / JPython primitive long
9223372036854775807
C) The maximum Python integer could be estimated by a long running loop teasing for a memory overflow (try 256**int(8e9) - can be stopped by KeyboardInterrupt). But it cannot not be used reasonably, because its representation already consumes all the memory and its much greater than sys.float_info.max.

Categories

Resources