Why i**2 is slower than i*i? [duplicate]

Why i**2 is slower than i*i? [duplicate] - python

I'm curious as to why it's so much faster to multiply than to take powers in python (though from what I've read this may well be true in many other languages too). For example it's much faster to do
x*x
than
x**2
I suppose the ** operator is more general and can also deal with fractional powers. But if that's why it's so much slower, why doesn't it perform a check for an int exponent and then just do the multiplication?
Edit: Here's some example code I tried...
def pow1(r, n):
for i in range(r):
p = i**n
def pow2(r, n):
for i in range(r):
p = 1
for j in range(n):
p *= i
Now, pow2 is just a quick example and is clearly not optimised!
But even so I find that using n = 2 and r = 1,000,000, then pow1 takes ~ 2500ms and pow2 takes ~ 1700ms.
I admit that for large values of n, then pow1 does get much quicker than pow2. But that's not too surprising.

Basically naive multiplication is O(n) with a very low constant factor. Taking the power is O(log n) with a higher constant factor (There are special cases that need to be tested... fractional exponents, negative exponents, etc) . Edit: just to be clear, that's O(n) where n is the exponent.
Of course the naive approach will be faster for small n, you're only really implementing a small subset of exponential math so your constant factor is negligible.

Adding a check is an expense, too. Do you always want that check there? A compiled language could make the check for a constant exponent to see if it's a relatively small integer because there's no run-time cost, just a compile-time cost. An interpreted language might not make that check.
It's up to the particular implementation unless that kind of detail is specified by the language.
Python doesn't know what distribution of exponents you're going to feed it. If it's going to be 99% non-integer values, do you want the code to check for an integer every time, making runtime even slower?

Doing this in the exponent check will slow down the cases where it isn't a simple power of two very slightly, so isn't necessarily a win. However, in cases where the exponent is known in advance( eg. literal 2 is used), the bytecode generated could be optimised with a simple peephole optimisation. Presumably this simply hasn't been considered worth doing (it's a fairly specific case).
Here's a quick proof of concept that does such an optimisation (usable as a decorator). Note: you'll need the byteplay module to run it.
import byteplay, timeit
def optimise(func):
c = byteplay.Code.from_code(func.func_code)
prev=None
for i, (op, arg) in enumerate(c.code):
if op == byteplay.BINARY_POWER:
if c.code[i-1] == (byteplay.LOAD_CONST, 2):
c.code[i-1] = (byteplay.DUP_TOP, None)
c.code[i] = (byteplay.BINARY_MULTIPLY, None)
func.func_code = c.to_code()
return func
def square(x):
return x**2
print "Unoptimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
square = optimise(square)
print "Optimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
Which gives the timings:
Unoptimised : 6.42024898529
Optimised : 4.52667593956
[Edit]
Actually, thinking about it a bit more, there's a very good reason why this optimisaton isn't done. There's no guarantee that someone won't create a user defined class that overrides the __mul__ and __pow__ methods and do something different for each. The only way to do it safely is if you can guarantee that the object on the top of the stack is one that has the same result "x**2" and "x*x", but working that out is much harder. Eg. in my example it's impossible, as any object could be passed to the square function.

An implementation of b^p with binary exponentiation
def power(b, p):
"""
Calculates b^p
Complexity O(log p)
b -> double
p -> integer
res -> double
"""
res = 1
while p:
if p & 0x1: res *= b
b *= b
p >>= 1
return res

I'd suspect that nobody was expecting this to be all that important. Typically, if you want to do serious calculations, you do them in Fortran or C or C++ or something like that (and perhaps call them from Python).
Treating everything as exp(n * log(x)) works well in cases where n isn't integral or is pretty large, but is relatively inefficient for small integers. Checking to see if n is a small enough integer does take time, and adds complication.
Whether the check is worth it depends on the expected exponents, how important it is to get best performance here, and the cost of the extra complexity. Apparently, Guido and the rest of the Python gang decided the check wasn't worth doing.
If you like, you could write your own repeated-multiplication function.

how about xxxxx?
is it still faster than x**5?
as int exponents gets larger, taking powers might be faster than multiplication.
but the number where actual crossover occurs depends on various conditions, so in my opinion, that's why the optimization was not done(or couldn't be done) in language/library level. But users can still optimize for some special cases :)

Related

Time complexity of recursion of multiplication

What is the worst case time complexity (Big O notation) of the following function for positive integers?
def rec_mul(a:int, b:int) -> int:
if b == 1:
return a
if a == 1:
return b
else:
return a + rec_mul(a, b-1)
I think it's O(n) but my friend claims it's O(2^n)
My argument:
The function recurs at any case b times, therefor the complexity is O(b) = O(n)
His argument:
since there are n bits, a\b value can be no more than (2^n)-1,
therefor the max number of calls will be O(2^n)

You are both right.
If we disregard the time complexity of addition (and you might discuss whether you have reason to do so or not) and count only the number of iterations, then you are both right because you define:
n = b
and your friend defines
n = log_2(b)
so the complexity is O(b) = O(2^log_2(b)).
Both definitions are valid and both can be practical. You look at the input values, your friend at the lengths of the input, in bits.
This is a good demonstration why big-O expressions mean nothing if you don't define the variables used in those expressions.

Your friend and you can both be right, depending on what is n. Another way to say this is that your friend and you are both wrong, since you both forgot to specify what was n.
Your function takes an input that consists in two variables, a and b. These variables are numbers. If we express the complexity as a function of these numbers, it is really O(b log(ab)), because it consists in b iterations, and each iteration requires an addition of numbers of size up to ab, which takes log(ab) operations.
Now, you both chose to express the complexity in function of n rather than a or b. This is okay; we often do this; but an important question is: what is n?
Sometimes we think it's "obvious" what is n, so we forget to say it.
If you choose n = max(a, b) or n = a + b, then you are right, the complexity is O(n).
If you choose n to be the length of the input, then n is the number of bits needed to represent the two numbers a and b. In other words, n = log(a) + log(b). In that case, your friend is right, the complexity is O(2^n).
Since there is an ambiguity in the meaning of n, I would argue that it's meaningless to express the complexity as a function of n without specifying what n is. So, your friend and you are both wrong.

Background
A unary encoding of the input uses an alphabet of size 1: think tally marks. If the input is the number a, you need O(a) bits.
A binary encoding uses an alphabet of size 2: you get 0s and 1s. If the number is a, you need O(log_2 a) bits.
A trinary encoding uses an alphabet of size 3: you get 0s, 1s, and 2s. If the number is a, you need O(log_3 a) bits.
In general, a k-ary encoding uses an alphabet of size k: you get 0s, 1s, 2s, ..., and k-1s. If the number is a, you need O(log_k a) bits.
What does this have to do with complexity?
As you are aware, we ignore multiplicative constants inside big-oh notation. n, 2n, 3n, etc, are all O(n).
The same holds for logarithms. log_2 n, 2 log_2 n, 3 log_2 n, etc, are all O(log_2 n).
The key observation here is that the ratio log_k1 n / log_k2 n is a constant, no matter what k1 and k2 are... as long as they are greater than 1. That means f(log_k1 n) = O(log_k2 n) for all k1, k2 > 1.
This is important when comparing algorithms. As long as you use an "efficient" encoding (i.e., not a unary encoding), it doesn't matter what base you use: you can simply say f(n) = O(lg n) without specifying the base. This allows us to compare runtime of algorithms without worrying about the exact encoding you use.
So n = b (which implies a unary encoding) is typically never used. Binary encoding is simplest, and doesn't provide a non-constant speed-up over any other encoding, so we usually just assume binary encoding.
That means we almost always assume that n = lg a + lg b as the input size, not n = a + b. A unary encoding is the only one that suggests linear growth, rather than exponential growth, as the values of a and b increase.
One area, though, where unary encodings are used is in distinguishing between strong NP-completeness and weak NP-completeness. Without getting into the theory, if a problem is NP-complete, we don't expect any algorithm to have a polynomial running time, that is, one bounded by O(n**k) for some constant k when using an efficient encoring.
But some algorithms do become polynomial if we allow a unary encoding. If a problem that is otherwise NP-complete becomes polynomial when using an unary encoding, we call that a weakly NP-complete problem. It's still slow, but it is in some sense "faster" than an algorithm where the size of the numbers doesn't matter.

Pseudorandom Algorithm for VERY Large (10^1.2mil) Numbers?

I'm looking for a pseudo-random number generator (an algorithm where you input a seed number and it outputs a different 'random-looking' number, and the same seed will always generate the same output) for numbers between 1 and 951,312,000.
I would use the Linear Feedback Shift Register (LFSR) PRNG, but if I did, I would have to convert the seed number (which could be up to 1.2 million digits long in base-10) into a binary number, which would be so massive that I think it would take too long to compute.
In response to a similar question, the Feistel cipher was recommended, but I didn't understand the vocabulary of the wiki page for that method (I'm going into 10th grade so I don't have a degree in encryption), so if you could use layman's terms, I would strongly appreciate it.
Is there an efficient way of doing this which won't take until the end of time, or is this problem impossible?
Edit: I forgot to mention that the prng sequence needs to have a full period. My mistake.

A simple way to do this is to use a linear congruential generator with modulus m = 95^1312000.
The formula for the generator is x_(n+1) = a*x_n + c (mod m). By the Hull-Dobell Theorem, it will have full period if and only if gcd(m,c) = 1 and 95 divides a-1. Furthermore, if you want good second values (right after the seed) even for very small seeds, a and c should be fairly large. Also, your code can't store these values as literals (they would be much too big). Instead, you need to be able to reliably produce them on the fly. After a bit of trial and error to make sure gcd(m,c) = 1, I hit upon:
import random
def get_book(n):
random.seed(1941) #Borges' Library of Babel was published in 1941
m = 95**1312000
a = 1 + 95 * random.randint(1, m//100)
c = random.randint(1, m - 1) #math.gcd(c,m) = 1
return (a*n + c) % m
For example:
>>> book = get_book(42)
>>> book % 10**100
4779746919502753142323572698478137996323206967194197332998517828771427155582287891935067701239737874
shows the last 100 digits of "book" number 42. Given Python's built-in support for large integers, the code runs surprisingly fast (it takes less than 1 second to grab a book on my machine)

If you have a method that can produce a pseudo-random digit, then you can concatenate as many together as you want. It will be just as repeatable as the underlying prng.
However, you'll probably run out of memory scaling that up to millions of digits and attempting to do arithmetic. Normally stuff on that scale isn't done on "numbers". It's done on byte vectors, or something similar.

Why are the "normal" methods for generating random integers so slow?

Not a maths major or a cs major, I just fool around with python (usually making scripts for simulations/theorycrafting on video games) and I discovered just how bad random.randint is performance wise. It's got me wondering why random.randint or random.randrange are used/made the way they are. I made a function that produces (for all intents and actual purposes) identical results to random.randint:
big_bleeping_float= (2**64 - 2)/(2**64 - 2)
def fastrandint(start, stop):
return start + int(random.random() * (stop - start + big_bleeping_float))
There is a massive 180% speed boost using that to generate an integer in the range (inclusive) 0-65 compared to random.randrange(0, 66), the next fastest method.
>>> timeit.timeit('random.randint(0, 66)', setup='from numpy import random', number=10000)
0.03165552873121058
>>> timeit.timeit('random.randint(0, 65)', setup='import random', number=10000)
0.022374771118336412
>>> timeit.timeit('random.randrange(0, 66)', setup='import random', number=10000)
0.01937231027605435
>>> timeit.timeit('fastrandint(0, 65)', setup='import random; from fasterthanrandomrandom import fastrandint', number=10000)
0.0067909916844523755
Furthermore, the adaptation of this function as an alternative to random.choice is 75% faster, and I'm sure adding larger-than-one stepped ranges would be faster (although I didn't test that). For almost double the speed boost as using the fastrandint function you can simply write it inline:
>>> timeit.timeit('int(random.random() * (65 + big_bleeping_float))', setup='import random; big_bleeping_float= (2**64 - 2)/(2**64 - 2)', number=10000)
0.0037642723021917845
So in summary, why am I wrong that my function is a better, why is it faster if it is better, and is there a yet even faster way to do what I'm doing?

randint calls randrange which does a bunch of range/type checks and conversions and then uses _randbelow to generate a random int. _randbelow again does some range checks and finally uses random.
So if you remove all the checks for edge cases and some function call overhead, it's no surprise your fastrandint is quicker.

random.randint() and others are calling into random.getrandbits() which may be less efficient that direct calls to random(), but for good reason.
It is actually more correct to use a randint that calls into random.getrandbits(), as it can be done in an unbiased manner.
You can see that using random.random to generate values in a range ends up being biased since there are only M floating point values between 0 and 1 (for M pretty large). Take an N that doesn't divide into M, then if we write M = k N + r for 0<r<N. At best, using random.random() * (N+1)
we'll get r numbers coming out with probability (k+1)/M and N-r numbers coming out with probability k/M. (This is at best, using the pigeon hole principle - in practice I'd expect the bias to be even worse).
Note that this bias is only noticeable for
A large number of sampling
where N is a large fraction of M the number of floats in (0,1]
So it probably won't matter to you, unless you know you need unbiased values - such as for scientific computing etc.
In contrast, a value from randint(0,N) can be unbiased by using rejection sampling from repeated calls to random.getrandbits(). Of course managing this can introduce additional overhead.
Aside
If you end up using a custom random implementation then
From the python 3 docs
Almost all module functions depend on the basic function random(), which
generates a random float uniformly in the semi-open range [0.0, 1.0).
This suggests that randint and others may be implemented using random.random. If this is the case I would expect them to be slower,
incurring at least one addition function call overhead per call.
Looking at the code referenced in https://stackoverflow.com/a/37540577/221955 you can see that this will happen if the random implementation doesn't provide a getrandbits() function.

This is probably rarely a problem but randint(0,10**1000) works while fastrandint(0,10**1000) crashes. The slower time is probably the price you need to pay to have a function that works for all possible cases...

Reversing pow function - finding the power [duplicate]

Given positive integers b, c, m where (b < m) is True it is to find a positive integer e such that
(b**e % m == c) is True
where ** is exponentiation (e.g. in Ruby, Python or ^ in some other languages) and % is modulo operation. What is the most effective algorithm (with the lowest big-O complexity) to solve it?
Example:
Given b=5; c=8; m=13 this algorithm must find e=7 because 5**7%13 = 8

From the % operator I'm assuming that you are working with integers.
You are trying to solve the Discrete Logarithm problem. A reasonable algorithm is Baby step, giant step, although there are many others, none of which are particularly fast.
The difficulty of finding a fast solution to the discrete logarithm problem is a fundamental part of some popular cryptographic algorithms, so if you find a better solution than any of those on Wikipedia please let me know!

This isn't a simple problem at all. It is called calculating the discrete logarithm and it is the inverse operation to a modular exponentation.
There is no efficient algorithm known. That is, if N denotes the number of bits in m, all known algorithms run in O(2^(N^C)) where C>0.

Python 3 Solution:
Thankfully, SymPy has implemented this for you!
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
This is the documentation on the discrete_log function. Use this to import it:
from sympy.ntheory import discrete_log
Their example computes \log_7(15) (mod 41):
>>> discrete_log(41, 15, 7)
3
Because of the (state-of-the-art, mind you) algorithms it employs to solve it, you'll get O(\sqrt{n}) on most inputs you try. It's considerably faster when your prime modulus has the property where p - 1 factors into a lot of small primes.
Consider a prime on the order of 100 bits: (~ 2^{100}). With \sqrt{n} complexity, that's still 2^{50} iterations. That being said, don't reinvent the wheel. This does a pretty good job. I might also add that it was almost 4x times more memory efficient than Mathematica's MultiplicativeOrder function when I ran with large-ish inputs (44 MiB vs. 173 MiB).

Since a duplicate of this question was asked under the Python tag, here is a Python implementation of baby step, giant step, which, as #MarkBeyers points out, is a reasonable approach (as long as the modulus isn't too large):
def baby_steps_giant_steps(a,b,p,N = None):
if not N: N = 1 + int(math.sqrt(p))
#initialize baby_steps table
baby_steps = {}
baby_step = 1
for r in range(N+1):
baby_steps[baby_step] = r
baby_step = baby_step * a % p
#now take the giant steps
giant_stride = pow(a,(p-2)*N,p)
giant_step = b
for q in range(N+1):
if giant_step in baby_steps:
return q*N + baby_steps[giant_step]
else:
giant_step = giant_step * giant_stride % p
return "No Match"
In the above implementation, an explicit N can be passed to fish for a small exponent even if p is cryptographically large. It will find the exponent as long as the exponent is smaller than N**2. When N is omitted, the exponent will always be found, but not necessarily in your lifetime or with your machine's memory if p is too large.
For example, if
p = 70606432933607
a = 100001
b = 54696545758787
then 'pow(a,b,p)' evaluates to 67385023448517
and
>>> baby_steps_giant_steps(a,67385023448517,p)
54696545758787
This took about 5 seconds on my machine. For the exponent and the modulus of those sizes, I estimate (based on timing experiments) that brute force would have taken several months.

Discrete logarithm is a hard problem
Computing discrete logarithms is believed to be difficult. No
efficient general method for computing discrete logarithms on
conventional computers is known.
I will add here a simple bruteforce algorithm which tries every possible value from 1 to m and outputs a solution if it was found. Note that there may be more than one solution to the problem or zero solutions at all. This algorithm will return you the smallest possible value or -1 if it does not exist.
def bruteLog(b, c, m):
s = 1
for i in xrange(m):
s = (s * b) % m
if s == c:
return i + 1
return -1
print bruteLog(5, 8, 13)
and here you can see that 3 is in fact the solution:
print 5**3 % 13
There is a better algorithm, but because it is often asked to be implemented in programming competitions, I will just give you a link to explanation.

as said the general problem is hard. however a prcatical way to find e if and only if you know e is going to be small (like in your example) would be just to try each e from 1.
btw e==3 is the first solution to your example, and you can obviously find that in 3 steps, compare to solving the non discrete version, and naively looking for integer solutions i.e.
e = log(c + n*m)/log(b) where n is a non-negative integer
which finds e==3 in 9 steps

Explain a code to check primality based on Fermat's little theorem

I found some Python code that claims checking primality based on Fermat's little theorem:
def CheckIfProbablyPrime(x):
return (2 << x - 2) % x == 1
My questions:
How does it work?
What's its relation to Fermat's little theorem?
How accurate is this method?
If it's not accurate, what's the advantage of using it?
I found it here.

1. How does it work?
Fermat's little theorem says that if a number x is prime, then for any integer a:
If we divide both sides by a, then we can re-write the equation as follows:
I'm going to punt on proving how this works (your first question) because there are many good proofs (better than I can provide) on this wiki page and under some Google searches.
2. Relation between code and theorem
So, the function you posted checks if (2 << x - 2) % x == 1.
First off, (2 << x-2) is the same thing as writing 2**(x-1), or in math-form:
That's because << is the logical left-shift operator, which is explained better here. The relation between bit-shifting and multiplying by powers of 2 is specific to the way that numbers are represented on computers (in binary), but it all boils down to
I can subtract 1 from the exponent on both sides, which gives
Now, we know from above that for any number a,
Let's say then that a = 2. That gives us
Well heck, that's the same as 2 << (x-2)! So then we can write:
Which leads to the final relation:
Now, the math version of mod looks kind of odd, but we can write the equivalent code as follows:
(2 << x - 2) % x == 1
And that's the relation.
3. Accuracy of method
So, I think "accuracy" is a bad term here, because Fermat's little theorem is definitely true for all prime numbers. However, that does not mean that it's true or false for all numbers -- which is to say, if I have some number i, and I'm not sure if i is prime, using Fermat's Little Relation will only tell me if it is definitely NOT prime. If Fermat's Little Relation is true, then i could not be prime. These kinds of numbers are called pseudoprime numbers, or more specifically in this case Fermat Pseudoprime numbers.
If this sort of thing sounds interesting, take a look at the Carmichael numbers AKA the Absolute Fermat Pseudoprimes, which pass the Fermat test in any base but are not prime. In our case we run into numbers which pass in base 2, but Fermat's little theorem might not hold for these numbers in other bases -- the Carmichael numbers pass the test for all bases coprime to x.
On the wiki page of the Carmichael there is a discussion of their distribution over the range of natural numbers -- they appear exponentially with the size of the range over which you're looking, though the exponent is less than 1 (about 1/3). So, if you're searching for primes over a big range, you're going to run into exponentially more Carmichael numbers, which are effectively false positives for this method CheckIfProbablyPrime. That might be okay, depending on your input and how much you care about running into false positives.
4. Why is this useful?
In short, it's an optimization.
The main reason to use something like this is to speed up a search for prime numbers. That's because actually checking if a number is prime is expensive -- i.e. more than O(1) running time. Doable, but still more expensive than O(1) time. So, if we can avoid doing that actual check for some numbers, we'll be able to devote more time to checking actual candidates. Since Fermat's little relation will only say yes if a number is possibly prime (it will never say no if the number is prime), and it can be checked in O(1) time, we can toss it into an is_prime loop to ignore a fair amount of numbers. So, we can speed things up.
There are many primality checks like this one, you can find some coded prime checkers here
Final Note
One of the confusing things about this optimization is that it uses the bit shift operator << instead of the exponentiation operator **. This is because bit shifting is one of the fastest operations that your computer can do, while exponentiation is slower by some amount. It is not always the best optimization in many cases, because most modern languages know how to replace things we write with more optimized operations. But, that's my venture as to why the authors of this code used the bit shift instead of 2**(x-1).
Edit: As MarkDickinson notes, taking the exponent of a number and then modding it explicitly is not the best way to do it. This is a thing called modular exponentiation, and there exist algorithms which can do it faster than the way we've written it. Python's builtin pow actually implements one of these algorithms, and takes an optional third argument to mod by. So we can write a final version of this function:
def CheckIfProbablyPrime(x):
return pow(2, x-1, x) == 1
Which is not only more readable but also faster than the confusing bit-shift crap. You know what they say.

I believe, the code in your example is incorrect because binary left shift operator is not equivalent to power of a number, which is used in Fermat's little theorem. With base of two, binary left shift would be equal to power of x + 1, which is NOT used in a version of Fermat's little format.
Instead, use ** for power of integer in Python.
def CheckIfProbablyPrime(x):
return (2 ** x - 2) % x == 0
" p − a is an integer multiple of p " therefore for primes, following theorem, result of 2 in power of x - 2 divided by x will leave a leftover of 0 (modulo '%' checks for number left over after division.
For x - 1 version,
def CheckIfProbablyPrime(a, x):
return (a ** (x-1) - 1) % x == 0
both variations should result as true for prime numbers, because they're representing the Fermat's little theorem in Python

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.