How to work around the limitation of `len` function in Python? - python

The len builtin function in Python is limited to the system's integer length. So, in my case, it is limited to sys.maxsize which is 2147483647. However, the in light with the Python3's unlimited integer, I think this limitation is causing frustration. Is there any workarounds to overcome this limitation? For example, I would like to get the length of this:
range(3, 100000000000000000000, 3)
But this:
len(range(3, 100000000000000000000, 3))
returns this error:
OverflowError: Python int too large to convert to C ssize_t

Unless you plan to have a plethora of lazily-iterable types with massive capacities, you could special-case range and do the math yourself:
def robustish_len(c):
try:
return len(c)
except OverflowError:
return (c.stop - c.start + c.step - 1) // c.step
Or, alternatively:
def robust_len(c):
try:
return len(c)
except OverflowError:
return float('inf') # close enough :)

This seems like a bug in Python. At least for classes, you could replace
len(c)
with
c.__len__()

Related

How to use infinity in python

I am working with graphs in python. I am trying to get the distance between each vertex. I need to use the value INFINITY (represented using the string "-") for a situation in which there is no direct path between one vertex and another. I tried a couple of solutions. One was using a crazy large int to represent infinity. However, I researched and found that this is poor practice. I also saw several solutions on stack overflow that made use of the math module's infinity function. However, this is not appropriate for my problem, as my INFINITY value is being used in a UI and must look graphically pleasing. This is why my INFINITY must remain as the string "-". This is the error I am getting with my current line of code:
TypeError: '<' not supported between instances of 'str' and 'int'
I am not completely sure, but I think the < is coming from my use of the min() function.
This error is coming from the following code:
for i in range(length):
for j in range(length):
for k in range(length):
#print('40:', int(temp[j][i]), int(temp[i][k]))
temp[j][k] = min(temp[j][k], addWithInfinity(temp[j][i],temp[i][k]))
Temp just refers to a matrix which I received as an argument in the method I am currently working with. Here is my addWithInfinity method:
def addWithInfinity(a, b):
"""If a == INFINITY or b == INFINITY, returns INFINITY.
Otherwise, returns a + b."""
if a == LinkedDirectedGraph.INFINITY or b == LinkedDirectedGraph.INFINITY:
return LinkedDirectedGraph.INFINITY
else: return a + b
My issue is that I am trying to compare infinity with an int. I tried to convert INFINITY to an int like this: int(INFINITY) ( or int('-') ) but I got the following error:
ValueError: invalid literal for int() with base 10: '-'
Any ideas how I can get away with the comparison between an int and INFINITY (which is a string)?
Use float("inf") or math.inf
See also How can I represent an infinite number in Python?
>>> float("inf") > 5
True
>>> float("inf") < 10**100
False
>>> import math
>>> float("inf") == math.inf
True
If you need to use some other value than "inf" for infinity, such as '-' in your example, you could try/except it, either
checking if the initial value is your target string (if a == '-':)
parsing the error calling float on it (if "'-'" in str(err_string):)
try:
a = float(a)
except ValueError as ex:
# special-case for ValueError: could not convert string to float: '-'
if "'-'" in str(ex).split(":")[-1]:
a = float("inf")
else: # re-raise other ValueErrors
raise ex
String "-" is not infinity even you mean it. In Python, you can obtain the type in two ways:
float("inf")
or
import math
math.inf
Your problem is not related to the usage or use cases of infinity type, but it is the representation issue. I believe you should use the normal infinity type in your math and create a property in your class that returns "-" and then use it only when you need to represent it on UI but not in your math.
The issue that you have is more about the decoupling of representation and logic.

Using 32-bit ints and operands

Is it possible to somehow override or overload the standard implementation of ints/numbers in python so that it acts like a 32-bit int.
a: int
a = 4076863488
>>> -218103808
Or is it possible to somehow define a variable that can't change type? Doing something like: x: int?
I want to do this because it's annoying to write ctypes.c_int32(n) on every bit operation and assignment. Especially since Python does not use 32 bits bitwise operands.
I know I'm basically trying to change the nature of the language. So maybe I'm asking what you would do if you had to do 32-bit stuff in python.
Some options:
Use Cython. You can declare a native 32-bit int type there, and you even get the advantage that pure numerical code gets compiled to (very) fast C code.
Use a numpy array of a single element: np.zeros((1,), dtype=np.int32). Provided you only ever use in-place operations (+=, *=, etc.), this will work like a 32-bit int type. Do be aware that if you ever use a regular binary operator (e.g. myint + 3), you might be subjected to type promotion or conversion, and the result will no longer be int32.
Use ctypes.c_int32. This comes built-in to Python, but supports no mathematical operations so you have to wrap and unwrap yourself (e.g. newval = c_int32(v1.value + v2.value)).
Use a library like fixedint (shameless plug), which provides fixed-integer classes that remain fixed size through operations rather than decaying to int. fixedint was specifically designed with fixed-width bitwise math in mind. In this case you would use fixedint.Int32.
Some less desirable options:
struct: Throws errors if your input is out of range. You can work around this with unpack('i', pack('I', val & 0xffffffff))[0], but that's really unwieldy.
array: Throws errors if you try to store a value out of range. Harder to work around than struct.
Manual bitmashing. With an unsigned 32-bit int, this is just a matter of adding & 0xffffffff a lot, which is not too bad. But, Python doesn't have any built-in way to wrap a value to a signed 32-bit int, so you'll have to write your own int32 conversion function and wrap all your operations with it:
def to_int32(val):
val &= ((1<<32)-1)
if val & (1<<31): val -= (1<<32)
return val
Demonstrations of your options:
Cython
cpdef int munge(int val):
cdef int x
x = val * 32
x += 0x7fffffff
return x
Save as int_test.pyx and compile with cythonize -a -i int_test.pyx.
>>> import int_test
>>> int_test.munge(3)
-2147483553
NumPy
import numpy as np
def munge(val):
x = val.copy()
x *= 32
x += 0x7fffffff
return x
def to_int32(val):
return np.array((val,), dtype=np.int32)
print(munge(to_int32(3)))
# prints [-2147483553]
ctypes
from ctypes import c_int32
def munge(val):
x = c_int32(val.value * 32)
x = c_int32(x.value + 0x7fffffff)
return x
print(munge(c_int32(3)))
# prints c_int(-2147483553)
fixedint
import fixedint
def munge(val):
x = val * 32
x += 0x7fffffff
return x
print(munge(fixedint.Int32(3)))
# prints -2147483553

Using large index in python (numpy or lists)

I frequently need to enter large integers for indexing and creating numpy arrays, such as 3500000 or 250000. Normally I'd enter these using scientific notation, 3.5e6 or .25e6 or such. This is quicker, and much less likely to have errors.
Unfortunately, python expects integer datatypes for indexing. The obvious solution is to convert datatypes. So [5e5:1e6] becomes [int(5e5):int(1e6)], but this decreases readability and is somewhat longer to type. Not to mention, it's easy to forget what datatype an index is until an indexing operation fails on a list or numpy.ndarray.
Is there a way to have numpy or python interpret large floats as integers, or is there an easy way to create large integers in python?
In a comment you considered having e5 = 10**5 for use as in 35*e5, lamenting it doesn't support 3.5*e6. Here's a hack that does:
class E:
def __init__(self, e):
self.val = 10**e
def __rmul__(self, x):
return int(x * self.val)
Demo:
>>> e6 = E(6)
>>> 3.5*e6
3500000
Though due to floats being lossy, this can lead to slight inaccurracies, for example:
>>> 0.1251*e6
125099
Here's a better hack, building the literal '0.1251e6' and evaluating that:
class E:
def __init__(self, e):
self.e = e
def __rmul__(self, x):
return int(float('%se%d' % (x, self.e)))
Demo:
>>> e6 = E(6)
>>> 0.1251*e6
125100
If you're worried about mistakes in the number of zeros, try underscores.
>>> 3_500_000
3500000
My cheap solution is to create a helper function in proper scope.
def e(coeff, exponent):
return int (coeff * 10 ** exponent)
np_array[e(3.5,6)] # use like this
But this cheaper answer may cause round off error,
Create alias for int in proper scope is simple and clean solution.
e=int # in proper scope
I can propose to use such notation [5*10**5:1*10**6] but it's not so clear as in case of 5e5 and 1e6. And even worse in case of 3.5e6 = 35*10**5
You can add a shorter name to int() such as I
I = int
x = I(3.5e6)
print (x)
#3500000
This still allows use of int() normally
This should fix the problems with indexing lists and arrays with floats,
slice_orig = slice
def slice(*args):
return slice_orig(*[int(i) for i in args])
slice.__doc__ = slice_orig.__doc__+ """
WARNING: overridden to convert (stop, start, step) to integers"""
It doesn't allow using large numbers with other numpy functions requiring an int type.
EDIT: This has to be used explicitly, such as list[slice(1e5)], so it's not as useful as I expected.

Maximum value for long integer

How can I assign the maximum value for a long integer to a variable, similar, for example, to C++'s LONG_MAX.
Long integers:
There is no explicitly defined limit. The amount of available address space forms a practical limit.
(Taken from this site). See the docs on Numeric Types where you'll see that Long integers have unlimited precision. In Python 2, Integers will automatically switch to longs when they grow beyond their limit:
>>> import sys
>>> type(sys.maxsize)
<type 'int'>
>>> type(sys.maxsize+1)
<type 'long'>
for integers we have
maxint and maxsize:
The maximum value of an int can be found in Python 2.x with sys.maxint. It was removed in Python 3, but sys.maxsize can often be used instead. From the changelog:
The sys.maxint constant was removed, since there is no longer a limit
to the value of integers. However, sys.maxsize can be used as an
integer larger than any practical list or string index. It conforms to
the implementation’s “natural” integer size and is typically the same
as sys.maxint in previous releases on the same platform (assuming the
same build options).
and, for anyone interested in the difference (Python 2.x):
sys.maxint The largest positive integer supported by Python’s regular
integer type. This is at least 2**31-1. The largest negative integer
is -maxint-1 — the asymmetry results from the use of 2’s complement
binary arithmetic.
sys.maxsize The largest positive integer supported by the platform’s
Py_ssize_t type, and thus the maximum size lists, strings, dicts, and
many other containers can have.
and for completeness, here's the Python 3 version:
sys.maxsize
An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2^31 - 1 on a 32-bit platform and
2^63 - 1 on a 64-bit platform.
floats:
There's float("inf") and float("-inf"). These can be compared to other numeric types:
>>> import sys
>>> float("inf") > sys.maxsize
True
Python long can be arbitrarily large. If you need a value that's greater than any other value, you can use float('inf'), since Python has no trouble comparing numeric values of different types. Similarly, for a value lesser than any other value, you can use float('-inf').
Direct answer to title question:
Integers are unlimited in size and have no maximum value in Python.
Answer which addresses stated underlying use case:
According to your comment of what you're trying to do, you are currently thinking something along the lines of
minval = MAXINT;
for (i = 1; i < num_elems; i++)
if a[i] < a[i-1]
minval = a[i];
That's not how to think in Python. A better translation to Python (but still not the best) would be
minval = a[0] # Just use the first value
for i in range(1, len(a)):
minval = min(a[i], a[i - 1])
Note that the above doesn't use MAXINT at all. That part of the solution applies to any programming language: You don't need to know the highest possible value just to find the smallest value in a collection.
But anyway, what you really do in Python is just
minval = min(a)
That is, you don't write a loop at all. The built-in min() function gets the minimum of the whole collection.
long type in Python 2.x uses arbitrary precision arithmetic and has no such thing as maximum possible value. It is limited by the available memory. Python 3.x has no special type for values that cannot be represented by the native machine integer — everything is int and conversion is handled behind the scenes.
Unlike C/C++ Long in Python have unlimited precision. Refer the section Numeric Types in python for more information.To determine the max value of integer you can just refer sys.maxint. You can get more details from the documentation of sys.
You can use: max value of float is
float('inf')
for negative
float('-inf')
A) For a cheap comparison / arithmetics dummy use math.inf. Or math.nan, which compares FALSE in any direction (including nan == nan) except identity check (is) and renders any arithmetics (like nan - nan) nan. Or a reasonably high real integer number according to your use case (e.g. sys.maxsize). For a bitmask dummy (e.g. in mybits & bitmask) use -1.
B) To get the platform primitive maximum signed long int (or long long):
>>> 256 ** sys.int_info.sizeof_digit // 2 - 1 # Python’s internal primitive
2147483647
>>> 256 ** ctypes.sizeof(ctypes.c_long) // 2 - 1 # CPython
2147483647
>>> 256 ** ctypes.sizeof(ctypes.c_longlong) // 2 - 1 # CPython
9223372036854775807
>>> 2**63 - 1 # Java / JPython primitive long
9223372036854775807
C) The maximum Python integer could be estimated by a long running loop teasing for a memory overflow (try 256**int(8e9) - can be stopped by KeyboardInterrupt). But it cannot not be used reasonably, because its representation already consumes all the memory and its much greater than sys.float_info.max.

Slow Big Int Output in python

Is there anyway to improve performance of "str(bigint)" and "print bigint" in python ? Printing big integer values takes a lot of time. I tried to use the following recursive technique :
def p(x,n):
if n < 10:
sys.stdout.write(str(x))
return
n >>= 1
l = 10**n
k = x/l
p(k,n)
p(x-k*l,n)
n = number of digits,
x = bigint
But the method fails for certain cases where x in a sub call has leading zeros. Is there any alternative to it or any faster method. ( Please do not suggest using any external module or library ).
Conversion from a Python integer to a string has a running of O(n^2) where n is the length of the number. For sufficiently large numbers, it will be slow. For a 1,000,001 digit number, str() takes approximately 24 seconds on my computer.
If you are really needing to convert very large numbers to a string, your recursive algorithm is a good approach.
The following version of your recursive code should work:
def p(x,n=0):
if n == 0:
n = int(x.bit_length() * 0.3)
if n < 100:
return str(x)
n >>= 1
l = 10**n
a,b = divmod(x, l)
upper = p(a,n)
lower = p(b,n).rjust(n, "0")
return upper + lower
It automatically estimates the number of digits in the output. It is about 4x faster for a 1,000,001 digit number.
If you need to go faster, you'll probably need to use an external library.
For interactive applications, the built-in print and str functions run in the blink of an eye.
>>> print(2435**356)
392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625
>>> str(2435**356)
'392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625'
If however you are printing big integers to (standard output, say) so that they can be read (from standard input) by another process, and you are finding the binary-to-decimal operations impacting the overall performance, you can look at Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes? (although the accepted answer suggests numpy, which is an external library, though there are other suggestions).

Categories

Resources