How to raise an integer to fractional power efficiently? - python

I have an binary search implemented in python.
Now I want to check if element math.floor(n ^ (1/p)) is in my binary search.
But p is a very, very large number. I wrote using fractions module:
binary_search.search(list,int (n**fractions.Fraction('1'+'/'+str(p))))
But I have an error OverflowError: integer division result too large for a float
How can I take to n to the power, which is a fraction and do it fast?

Unless your values of n are also incredibly large, floor(n^(1/p)) is going to tend toward 1 for "very, very large" values of p. Since you're only interested in the integer portion, you could get away with a simple loop to test if 1^P, 2^p, 3^p and so on are greater than n.
Don't waste time finding exact values if you don't need them.

n^(1/p)=exp(ln(n)/p) ~~ 1+ln(n)/p for big p values
So you can compare p with natural logarithm of n. If the ratio p/ln(n) >> 1 (much larger), then you can use approximation above (which tends to 1)

Related

Numpy float mean calculation precision

I happen to have a numpy array of floats:
a.dtype, a.shape
#(dtype('float64'), (32769,))
The values are:
a[0]
#3.699822718929953
all(a == a[0])
True
However:
a.mean()
3.6998227189299517
The mean is off by 15th and 16th figure.
Can anybody show how this difference is accumulated over 30K mean and if there is a way to avoid it?
In case it matters my OS is 64 bit.
Here is a rough approximation of a bound on the maximum error. This will not be representative of average error, and it could be improved with more analysis.
Consider calculating a sum using floating-point arithmetic with round-to-nearest ties-to-even:
sum = 0;
for (i = 0; i < n; ++n)
sum += a[i];
where each a[i] is in [0, m).
Let ULP(x) denote the unit of least precision in the floating-point number x. (For example, in the IEEE-754 binary64 format with 53-bit significands, if the largest power of 2 not greater than |x| is 2p, then ULP(x) = 2p−52. With round-to-nearest, the maximum error in any operation with result x is ½ULP(x).
If we neglect rounding errors, the maximum value of sum after i iterations is i•m. Therefore, a bound on the error in the addition in iteration i is ½ULP(i•m). (Actually zero for i=1, since that case adds to zero, which has no error, but we neglect that for this approximation.) Then the total of the bounds on all the additions is the sum of ½ULP(i•m) for i from 1 to n. This is approximately ½•n•(n+1)/2•ULP(m) = ¼•n•(n+1)•ULP(m). (This is an approximation because it moves i outside the ULP function, but ULP is a discontinuous function. It is “approximately linear,“ but there are jumps. Since the jumps are by factors of two, the approximation can be off by at most a factor of two.)
So, with 32,769 elements, we can say the total rounding error will be at most about ¼•32,769•32,770•ULP(m), about 2.7•108 times the ULP of the maximum element value. The ULP is 2−52 times the greatest power of two not less than m, so that is about 2.7•108•2−52 = 6•10−8 times m.
Of course, the likelihood that 32,768 sums (not 32,769 because the first necessarily has no error) all round in the same direction by chance is vanishingly small but I conjecture one might engineer a sequence of values that gets close to that.
An Experiment
Here is a chart of (in blue) the mean error over 10,000 samples of summing arrays with sizes 100 to 32,800 by 100s and elements drawn randomly from a uniform distribution over [0, 1). The error was calculated by comparing the sum calculated with float (IEEE-754 binary32) to that calculated with double (IEEE-754 binary64). (The samples were all multiples of 2−24, and double has enough precision so that the sum for up to 229 such values is exact.)
The green line is c n √n with c set to match the last point of the blue line. We see it tracks the blue line over the long term. At points where the average sum crosses a power of two, the mean error increases faster for a time. At these points, the sum has entered a new binade, and further additions have higher average errors due to the increased ULP. Over the course of the binade, this fixed ULP decreases relative to n, bringing the blue line back to the green line.
This is due to incapability of float64 type to store the sum of your float numbers with correct precision. In order to get around this problem you need to use a larger data type of course*. Numpy has a longdouble dtype that you can use in such cases:
In [23]: np.mean(a, dtype=np.longdouble)
Out[23]: 3.6998227189299530693
Also, note:
In [25]: print(np.longdouble.__doc__)
Extended-precision floating-point number type, compatible with C
``long double`` but not necessarily with IEEE 754 quadruple-precision.
Character code: ``'g'``.
Canonical name: ``np.longdouble``.
Alias: ``np.longfloat``.
Alias *on this platform*: ``np.float128``: 128-bit extended-precision floating-point number type.
* read the comments for more details.
The mean is (by definition):
a.sum()/a.size
Unfortunately, adding all those values up and dividing accumulates floating point errors. They are usually around the magnitude of:
np.finfo(np.float).eps
Out[]: 2.220446049250313e-16
Yeah, e-16, about where you get them. You can make the error smaller by using higher-accuracy floats like float128 (if your system supports it) but they'll always accumulate whenever you're summing a large number of float together. If you truly want the identity, you'll have to hardcode it:
def mean_(arr):
if np.all(arr == arr[0]):
return arr[0]
else:
return arr.mean()
In practice, you never really want to use == between floats. Generally in numpy we use np.isclose or np.allclose to compare floats for exactly this reason. There are ways around it using other packages and leveraging arcane machine-level methods of calculating numbers to get (closer to) exact equality, but it's rarely worth the performance and clarity hit.

technical problem on python with infinite float

I am using Python, and I have a problem, I want to do a program tha can count from 1 to infinite, to know how much is the infinite.
Here is my code :
a=0
for i in range(1, 10e+99):
a += 1
print (a)
but it says " 'float' object cannot be interpreted as an integer "
whereas 10e+99 is not a float
help me please
Per the Python 2 documentation and Python 3 documentation, range requires integer arguments.
In IEEE-754 32-bit binary floating-point, the largest representable finite number is about 3.4028e38. When converting numerals, such as 1e99 in source code, to this format, any number greater than or equal to 2128−2104 (340,282,377,062,143,265,289,209,819,405,393,854,464) will be converted to infinity, assuming the common round-to-nearest-ties-to-even method is used. Because of this, 10e+99 (which stands for 10•1099 and hence 10100) would act like infinity. However, Python implementations more typically use IEEE-754 64-bit binary floating-point, in which the largest representable finite number is 21024−2971, and 10e99 acts as a finite number.1 Thus, to get infinity, you would need around 1e309.
It is not humanly possible to test whether a loop incrementing by 1 from 1 to 10e99 will produce infinity because the total computing power available to humans is only around 1030 additions per year (for a loose sense of “around”, some orders of magnitude). This is insufficient to count to the limit of 32-bit floating-point finite numbers, let alone that of the 64-bit floating-point numbers.
If the arithmetic were done in a floating-point format, it would never reach infinity even with unlimited computing power because, once the sum reached 253 in IEEE-754 64-bit binary, adding 1 would not change the number; 253 would be produced in each iteration. This is because IEEE-754 64-bit binary has only 53 bits available for the significand, so 253+1 is not representable. The nearest representable values are 253 and 253+2. When arithmetic is performed, the exact real-number result is by default rounded to the nearest representable value, with ties rounded to the number with the even low bit in its significand. When 1 is added to 253 the real-number result 253+1 is rounded to 253, and the sum thus stays at 253 for all future iterations.
Footnote
1 The representable value nearest 10100 is 10,000,000,000,000,000,159,028,911,097,599,180,468,360,808,563,945,281,389,781,327,557,747,838,772,170,381,060,813,469,985,856,815,104.
The problem arises because the range() function takes an int, whereas 10e+99 is indeed a float. While 10e+99 is of course not infinity, and therefore you shouldn't expect infinity to pop up anywhere during the execution of your program, if you really wanted to get your for loop to work as it is you could simply do
a=0
for i in range(1, int(10e+99)):
a += 1
print (a)
As other users have pointed out, I would however rethink your strategy entirely: using a range-based for loop to "find out" the value of infinity just doesn't work. Infinity is not a number.
Perhaps you meant your program to go on forever:
a = 0
while True:
a += 1
print(a)
In my head when I see while True: I replace it with 'forever'.
With is code you can check you variable is infinity or not.
import math
infinity = float('inf')
a = 99999999999999999999999999999999
if a > infinity:
print('Your number is an infinity number')
else:
print('Your number is not an infinity number')
#or you can check with math.isinf
print('Your number is Infinity: ',math.isinf(infinity ))
# Also infinity can be both positive and negative
Note: infinity is infinity that has no end, whatever your value or number you enter it will always return false.
Here is what is going to happen if you correct and execute your program:
a=0
for i in range(1, 10**100):
a += 1
print (a)
Suppose you have a super efficient python virtual machine (everyone knows how efficient they are...).
Suppose you have a very efficient implementation of (unbounded) large integers.
Suppose each loop takes a few machine cycles to print those numbers in decimal form (say only 1000 which is well under reality).
Suppose each cycle takes approximately 1.0e-10 s (10GHz) which means having an implementation of print taking advantage of parallelism.
With those irrealistic hypothesis, that's already 10^93 s necessary for the program to complete.
The age of universe is estimated to be less than 10^18 s. Whaouh! It gonna be long.
Now let's compute the energy it's gonna take on a base of 400W computer.
Assuming that all Sun matter (2e30 kg) can be converted into electrical power for your computer (thru E=m c^2), you are going to consume about 2 10^48 equivalent of Sun to perform this computation.
Before you hit return, I kindly ask you: think twice! Save the universe!

What are the odds of a repeat in numpy.random.rand(n) (assuming perfect randomness)?

For the moment, put aside any issues relating to pseudorandom number generators and assume that numpy.random.rand perfectly samples from the discrete distribution of floating point numbers over [0, 1). What are the odds getting at least two exactly identical floating point numbers in the result of:
numpy.random.rand(n)
for any given value of n?
Mathematically, I think this is equivalent to first asking how many IEEE 754 singles or doubles there are in the interval [0, 1). Then I guess the next step would be to solve the equivalent birthday problem? I'm not really sure. Anyone have some insight?
The computation performed by numpy.random.rand for each element generates a number 0.<53 random bits>, for a total of 2^53 equally likely outputs. (Of course, the memory representation isn't a fixed-point 0.stuff; it's still floating point.) This computation is incapable of producing most binary64 floating-point numbers between 0 and 1; for example, it cannot produce 1/2^60. You can see the code in numpy/random/mtrand/randomkit.c:
double
rk_double(rk_state *state)
{
/* shifts : 67108864 = 0x4000000, 9007199254740992 = 0x20000000000000 */
long a = rk_random(state) >> 5, b = rk_random(state) >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
(Note that rk_random produces 32-bit outputs, regardless of the size of long.)
Assuming a perfect source of randomness, the probability of repeats in numpy.random.rand(n) is 1-(1-0/k)(1-1/k)(1-2/k)...(1-(n-1)/k), where k=2^53. It's probably best to use an approximation instead of calculating this directly for large values of n. (The approximation may even be more accurate, depending on how the approximation error compares to the rounding error accumulated in a direct computation.)
I think you are correct, this is like the birthday problem.
But you need to decide on the number of possible options. You do this by deciding the precision of your floating point numbers.
For example, if you decide to have a precision of 2 numbers after the dot, then there are 100 options(including zero and excluding 1).
And if you have n numbers then the probability of not having a collision is:
or when given R possible numbers and N data points, the probability of no collision is:
And of collision is 1 - P.
This is because the probability of getting any given number is 1/R. And at any point, the probability of a data point not colliding with prior data points is (R-i)/R for i being the index of the data point. But to get the probability of no data points colliding with each other, we need to multiply all the probabilities of data points not colliding with those prior to them. Applying some algebraic operations, we get the equation above.

Does this truely generate a random foating point number? (Python)

For an introduction to Python course, I'm looking at generating a random floating point number in Python, and I have seen a standard recommended code of
import random
lower = 5
upper = 10
range_width = upper - lower
x = random.random() * range_width + lower
for a random floating point from 5 up to but not including 10.
It seems to me that the same effect could be achieved by:
import random
x = random.randrange(5, 10) + random.random()
Since that would give an integer of 5, 6, 7, 8, or 9, and then tack on a decimal to it.
The question I have is would this second code still give a fully even probability distribution, or would it not keep the full randomness of the first version?
According to the documentation then yes random() is indeed a uniform distribution.
random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator.
So both code examples should be fine. To shorten your code, you can equally do:
random.uniform(5, 10)
Note that uniform(a, b) is simply a + (b - a) * random() so the same as your first example.
The second example depends on the version of Python you're using.
Prior to 3.2 randrange() could produce a slightly uneven distributions.
There is a difference. Your second method is theoretically superior, although in practice it only matters for large ranges. Indeed, both methods will give you a uniform distribution. But only the second method can return all values in the range that are representable as a floating point number.
Since your range is so small, there is no appreciable difference. But still there is a difference, which you can see by considering a larger range. If you take a random real number between 0 and 1, you get a floating-point representation with a given number of bits. Now suppose your range is, say, in the order of 2**32. By multiplying the original random number by this range, you lose 32 bits of precision in the result. Put differently, there will be gaps between the values that this method can return. The gaps are still there when you multiply by 4: You have lost the two least significant bits of the original random number.
The two methods can give different results, but you'll only notice the difference in fairly extreme situations (with very wide ranges). For instance, If you generate random numbers between 0 and 2/sys.float_info.epsilon (9007199254740992.0, or a little more than 9 quintillion), you'll notice that the version using multiplication will never give you any floats with fractional values. If you increase the maximum bound to 4/sys.float_info.epsilon, you won't get any odd integers, only even ones. That's because the 64-bit floating point type Python uses doesn't have enough precision to represent all integers at the upper end of that range, and it's trying to maintain a uniform distribution (so it omits small odd integers and fractional values even though those can be represented in parts of the range).
The second version of the calculation will give extra precision to the smaller random numbers generated. For instance, if you're generating numbers between 0 and 2/sys.float_info.epsilon and the randrange call returned 0, you can use the full precision of the random call to add a fractional part to the number. On the other hand if the randrange returned the largest number in the range (2/sys.float_info.epsilon - 1), very little of the precision of the fraction would be used (the number will round to the nearest integer without any fractional part remaining).
Adding a fractional value also can't help you deal with ranges that are too large for every integer to be represented. If randrange returns only even numbers, adding a fraction usually won't make odd numbers appear (it can in some parts of the range, but not for others, and the distribution may be very uneven). Even for ranges where all integers can be represented, the odds of a specific floating point number appearing will not be entirely uniform, since the smaller numbers can be more precisely represented. Large but imprecise numbers will be more common than smaller but more precisely represented ones.

1st digit before taking modulo(10**9 + 7)

I am multiplying many large numbers and finally taking modulo of it. To optimise this I am using MOD at each step. But I also want the 1st digit of the final answer. Is there any way to know that even after using MOD?
Or is there any other efficient way to do huge multiplication many times, get the final answer and extract the 1st digit from it?
Order of elements is 10^9 and number of multiplications is about 10^5
take 10 based logarithms, sum them up and take the fractional part of the sum.
think about scientific notation of large numbers.

Categories

Resources