Python -- Fast Factorial function - python

I am trying to write code for a super-fast factorial function. I have experimented a little and have come up with the following three candidates (apart from math.factorial):
def f1():
return reduce(lambda x,y : x * y, xrange(1,31))
def f2():
result = 1
result *= 2
result *= 3
result *= 4
result *= 5
result *= 6
result *= 7
result *= 8
#and so-on...
result *= 28
result *= 29
result *= 30
return result
def f3():
return 1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19*20*21*22*23*24*25*26*27*28*29*30
I have timed these functions. These are the results:
In [109]: timeit f1()
100000 loops, best of 3: 11.9 µs per loop
In [110]: timeit f2()
100000 loops, best of 3: 5.05 µs per loop
In [111]: timeit f3()
10000000 loops, best of 3: 143 ns per loop
In [112]: timeit math.factorial(30)
1000000 loops, best of 3: 2.11 µs per loop
Clearly, f3() takes the cake. I have tried implementing this. To be verbose, I have tried writing code that generates a string like this:
"1*2*3*4*5*6*7*8*9*10*11*12*13*14........" and then using eval to evaluate this string. (Acknowledging that 'eval' is evil). However, this method gave me no gains in time, AT ALL. In fact, it took me nearly 150 microseconds to finish.
Please advise on how to generalize f3().

f3 is only fast because it isn't actually computing anything when you call it. The whole computation gets optimized out at compile time and replaced with the final value, so all you're timing is function call overhead.
This is particularly obvious if we disassemble the function with the dis module:
>>> import dis
>>> dis.dis(f3)
2 0 LOAD_CONST 59 (265252859812191058636308480000000L)
3 RETURN_VALUE
It is impossible to generalize this speedup to a function that takes an argument and returns its factorial.

f3() takes the cake because when the function is def'ed Python just optimizes the string of multiplications down to the final result and effective definition of f3() becomes:
def f3():
return 8222838654177922817725562880000000
which, because no computation need occur when the function is called, runs really fast!
One way to produce all the effect of placing a * operator between the list of numbers is to use reduce from the functools module. Is this sometime like what you're looking for?
from functools import reduce
def fact(x):
return reduce((lambda x, y: x * y), range(1, x+1))

I would argue that none of these are good factorial functions, since none take a parameter to the function. The reason why the last one works well is because it minimizes the number of interpreter steps, but that's still not a good answer: all of them have the same complexity (linear with the size of the value). We can do better: O(1).
import math
def factorial(x):
return math.gamma(x+1)
This scales constantly with the input value, at the sacrifice of some accuracy. Still, way better when performance matters.
We can do a quick benchmark:
import math
def factorial_gamma(x):
return math.gamma(x+1)
def factorial_linear(x):
if x == 0 or x == 1:
return 1
return x * factorial_linear(x-1)
In [10]: factorial_linear(50)
Out[10]: 30414093201713378043612608166064768844377641568960512000000000000
In [11]: factorial_gamma(50)
Out[11]: 3.0414093201713376e+64
In [12]: %timeit factorial_gamma(50)
537 ns ± 6.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [13]: %timeit factorial_linear(50)
17.2 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
A 30 fold increase for a factorial of 50. Not bad.
https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.misc.factorial.html

As others have stated f3() isn't actually computing anything that's why you get such fast results. You can't achieve the same by giving it to a function.
Also you maybe wondering why math.factorial() is so fast it's because
the math module's functions are implemented in C:
This module is always available. It provides access to the mathematical functions defined by the C standard
By using an efficient algorithm in C, you get such fast results.
Here your best bet would be would using the below function, but using math.factorial is what I prefer if you're purely in need of performance
def f3(x):
ans=1
for i in range(1,x+1):
ans=ans*i
return ans
print(f3(30))

Related

Building a numpy array as a function of previous element

I would like to create a numpy array where the first element is a defined constant, and every next element is defined as the function of the previous element in the following way:
import numpy as np
def build_array_recursively(length, V_0, function):
returnList = np.empty(length)
returnList[0] = V_0
for i in range(1,length):
returnList[i] = function(returnList[i-1])
return returnList
d_t = 0.05
print(build_array_recursively(20, 0.3, lambda x: x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t))
The print method above outputs
[0.3 0.28511194 0.27095747 0.25750095 0.24470843 0.23254756 0.22098752
0.20999896 0.19955394 0.18962586 0.18018937 0.17122037 0.16269589
0.15459409 0.14689418 0.13957638 0.13262186 0.1260127 0.11973187 0.11376316]
Is there a fast way of doing this in numpy without a for loop?
If so is there a way to handle two elements before the current one, e.g. can a Fibonacci array be constructed similarly?
I found a similar question here
Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?
but was not answered in general. In my example, the difference equation is difficult to solve manually.
This is faster for what you want to do. You don't have to use recursion for the function.
Calculate the element based on previous element. Append calculated element to a list, and then change the list to numpy.
def method2(length, V_0, d_t):
k = [V_0]
x = V_0
for i in range(1, length):
x = x - x * d_t + x * x / 2 * d_t * d_t - x * x * x / 6 * d_t * d_t * d_t
k.append(x)
return np.asarray(k)
print(method2(20,0.3, 0.05))
Running you existing method 10000 times takes 0.438 seconds, while method2 takes 0.097 seconds.
Using a function to make the code clearer (instead of the inline lambda):
def fn(x):
return x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t
And a function that combines elements of build_array_recursively and method2:
def foo1(length, V_0, function):
returnList = np.empty(length)
returnList[0] = x = V_0
for i in range(1,length):
returnList[i] = x = function(x)
return returnList
In [887]: timeit build_array_recursively(20,0.3, fn);
61.4 µs ± 63 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [888]: timeit method2(20,0.3, fn);
16.9 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [889]: timeit foo1(20,0.3, fn);
13 µs ± 29.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The main time saver in method2 and foo2 is carrying over x, the last value, from one iteration to the next, rather than indexing with returnList[i-1].
The accumulation method, assigning to a preallocated array, or list append, is less important. Performance is usually similar.
Here the calculation is simple enough that details of what you do in the loop makes a big difference in the overall time.
All of these are loops. Some ufunc have a reduce (and accumulate) method, that can apply the function repeatedly to a elements of the input array. np.sum, np.cumsum, etc make use of this. But you can't do that with a general Python function.
You have to use some sort of compilation tool like numba to perform this sort of loop much faster.

Why numpy.where is much faster than alternatives

im trying to speedup the following code:
import time
import numpy as np
np.random.seed(10)
b=np.random.rand(10000,1000)
def f(a=1):
tott=0
for _ in range(a):
q=np.array(b)
t1 = time.time()
for i in range(len(q)):
for j in range(len(q[0])):
if q[i][j]>0.5:
q[i][j]=1
else:
q[i][j]=-1
t2=time.time()
tott+=t2-t1
print(tott/a)
As you can see, mainly func is about iterating in double cycle. So, i've tried to use np.nditer,np.vectorize and map instead of it. If gives some speedup (like 4-5 times except np.nditer), but! with np.where(q>0.5,1,-1) speedup is almost 100x.
How can i iterate over numpy arrays as fast as np.where does it? And why is it so much faster?
It's because the core of numpy is implemented in C. You're basically comparing the speed of C with Python.
If you want to use the speed advantage of numpy, you should make as few calls as possible in your Python code. If you use a Python-loop, you have already lost, even if you use numpy functions in that loop only. Use higher-level functions provided by numpy (that's why they ship so many special functions). Internally, it will use a much more efficient (C-)loop
You can implement a function in C (with loops) yourself and call that from Python. That should give comparable speeds.
To answer this question, you can gain the same speed (100x acceleration) by using the numba library:
from numba import njit
def f(b):
q = np.zeros_like(b)
for i in range(b.shape[0]):
for j in range(b.shape[1]):
if q[i][j] > 0.5:
q[i][j] = 1
else:
q[i][j] = -1
return q
#njit
def f_jit(b):
q = np.zeros_like(b)
for i in range(b.shape[0]):
for j in range(b.shape[1]):
if q[i][j] > 0.5:
q[i][j] = 1
else:
q[i][j] = -1
return q
Compare the speed:
Plain Python
%timeit f(b)
592 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Numba (just-in-time compiled using LLVM ~ C speed)
%timeit f_jit(b)
5.97 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Slow recursion in python

I know this subject is well discussed but I've come around a case I don't really understand how the recursive method is "slower" than a method using "reduce,lambda,xrange".
def factorial2(x, rest=1):
if x <= 1:
return rest
else:
return factorial2(x-1, rest*x)
def factorial3(x):
if x <= 1:
return 1
return reduce(lambda a, b: a*b, xrange(1, x+1))
I know python doesn't optimize tail recursion so the question isn't about it. To my understanding, a generator should still generate n amount of numbers using the +1 operator. So technically, fact(n) should add a number n times just like the recursive one. The lambda in the reduce will be called n times just as the recursive method... So since we don't have tail call optimization in both case, stacks will be created/destroyed and returned n times. And a if in the generator should check when to raise a StopIteration exception.
This makes me wonder why does the recursive method still slowlier than the other one since the recursive one use simple arithmetic and doesn't use generators.
In a test, I replaced rest*x by x in the recursive method and the time spent got reduced on par with the method using reduce.
Here are my timings for fact(400), 1000 times
factorial3 : 1.22370505333
factorial2 : 1.79896998405
Edit:
Making the method start from 1 to n doesn't help either instead of n to 1. So not an overhead with the -1.
Also, can we make the recursive method faster. I tried multiple things like global variables that I can change... Using a mutable context by placing variables in cells that I can modify like an array and keep the recursive method without parameters. Sending the method used for recursion as a parameter so we don't have to "dereference" it in our scope...?! But nothings makes it faster.
I'll point out that I have a version of the fact that use a forloop that is much faster than both of those 2 methods so there is clearly space for improvement but I wouldn't expect anything faster than the forloop.
The slowness of the recursive version comes from the need to resolve on each call the named (argument) variables. I have provided a different recursive implementation that has only one argument and it works slightly faster.
$ cat fact.py
def factorial_recursive1(x):
if x <= 1:
return 1
else:
return factorial_recursive1(x-1)*x
def factorial_recursive2(x, rest=1):
if x <= 1:
return rest
else:
return factorial_recursive2(x-1, rest*x)
def factorial_reduce(x):
if x <= 1:
return 1
return reduce(lambda a, b: a*b, xrange(1, x+1))
# Ignore the rest of the code for now, we'll get back to it later in the answer
def range_prod(a, b):
if a + 1 < b:
c = (a+b)//2
return range_prod(a, c) * range_prod(c, b)
else:
return a
def factorial_divide_and_conquer(n):
return 1 if n <= 1 else range_prod(1, n+1)
$ ipython -i fact.py
In [1]: %timeit factorial_recursive1(400)
10000 loops, best of 3: 79.3 µs per loop
In [2]: %timeit factorial_recursive2(400)
10000 loops, best of 3: 90.9 µs per loop
In [3]: %timeit factorial_reduce(400)
10000 loops, best of 3: 61 µs per loop
Since in your example very large numbers are involved, initially I suspected that the performance difference might be due to the order of multiplication. Multiplying on every iteration a large partial product by the next number is proportional to the number of digits/bits in the product, so the time complexity of such a method is O(n2), where n is the number of bits in the final product. Instead it is better to use a divide and conquer technique, where the final result is obtained as a product of two approximately equally long values each of which is computed recursively in the same manner. So I implemented that version too (see factorial_divide_and_conquer(n) in the above code) . As you can see below it still loses to the reduce()-based version for small arguments (due to the same problem with named parameters) but outperforms it for large arguments.
In [4]: %timeit factorial_divide_and_conquer(400)
10000 loops, best of 3: 90.5 µs per loop
In [5]: %timeit factorial_divide_and_conquer(4000)
1000 loops, best of 3: 1.46 ms per loop
In [6]: %timeit factorial_reduce(4000)
100 loops, best of 3: 3.09 ms per loop
UPDATE
Trying to run the factorial_recursive?() versions with x=4000 hits the default recursion limit, so the limit must be increased:
In [7]: sys.setrecursionlimit(4100)
In [8]: %timeit factorial_recursive1(4000)
100 loops, best of 3: 3.36 ms per loop
In [9]: %timeit factorial_recursive2(4000)
100 loops, best of 3: 7.02 ms per loop

Python function call speed

I'm really confused with functions call speed in Python. First and second cases, nothing unexpected:
%timeit reduce(lambda res, x: res+x, range(1000))
10000 loops, best of 3: 150 µs per loop
def my_add(res, x):
return res + x
%timeit reduce(my_add, range(1000))
10000 loops, best of 3: 148 µs per loop
But third case looks strange for me:
from operator import add
%timeit reduce(add, range(1000))
10000 loops, best of 3: 80.1 µs per loop
At the same time:
%timeit add(10, 100)
%timeit 10 + 100
10000000 loops, best of 3: 94.3 ns per loop
100000000 loops, best of 3: 14.7 ns per loop
So, why the third case gives speed up about 50%?
add is implemented in C.
>>> from operator import add
>>> add
<built-in function add>
>>> def my_add(res, x):
... return res + x
...
>>> my_add
<function my_add at 0x18358c0>
The reason that a straight + is faster is that add still has to call the Python VM's BINARY_ADD instruction as well as perform some other work due to it being a function, while + is only a BINARY_ADD instruction.
The operator module exports a set of efficient functions corresponding to the intrinsic operators of Python. For example, operator.add(x, y) is equivalent to the expression x+y. The function names are those used for special class methods; variants without leading and trailing __ are also provided for convenience.
From Python docs (emphasis mine)
The operator module is an efficient (native I'd assume) implementation. IMHO, calling a native implementation should be quicker than calling a python function.
You could try calling the interpreter with -O or -OO to compile the python core and check the timing again.

Returning the product of a list

Is there a more concise, efficient or simply pythonic way to do the following?
def product(lst):
p = 1
for i in lst:
p *= i
return p
EDIT:
I actually find that this is marginally faster than using operator.mul:
from operator import mul
# from functools import reduce # python3 compatibility
def with_lambda(lst):
reduce(lambda x, y: x * y, lst)
def without_lambda(lst):
reduce(mul, lst)
def forloop(lst):
r = 1
for x in lst:
r *= x
return r
import timeit
a = range(50)
b = range(1,50)#no zero
t = timeit.Timer("with_lambda(a)", "from __main__ import with_lambda,a")
print("with lambda:", t.timeit())
t = timeit.Timer("without_lambda(a)", "from __main__ import without_lambda,a")
print("without lambda:", t.timeit())
t = timeit.Timer("forloop(a)", "from __main__ import forloop,a")
print("for loop:", t.timeit())
t = timeit.Timer("with_lambda(b)", "from __main__ import with_lambda,b")
print("with lambda (no 0):", t.timeit())
t = timeit.Timer("without_lambda(b)", "from __main__ import without_lambda,b")
print("without lambda (no 0):", t.timeit())
t = timeit.Timer("forloop(b)", "from __main__ import forloop,b")
print("for loop (no 0):", t.timeit())
gives me
('with lambda:', 17.755449056625366)
('without lambda:', 8.2084708213806152)
('for loop:', 7.4836349487304688)
('with lambda (no 0):', 22.570688009262085)
('without lambda (no 0):', 12.472226858139038)
('for loop (no 0):', 11.04065990447998)
Without using lambda:
from operator import mul
# from functools import reduce # python3 compatibility
reduce(mul, list, 1)
it is better and faster. With python 2.7.5
from operator import mul
import numpy as np
import numexpr as ne
# from functools import reduce # python3 compatibility
a = range(1, 101)
%timeit reduce(lambda x, y: x * y, a) # (1)
%timeit reduce(mul, a) # (2)
%timeit np.prod(a) # (3)
%timeit ne.evaluate("prod(a)") # (4)
In the following configuration:
a = range(1, 101) # A
a = np.array(a) # B
a = np.arange(1, 1e4, dtype=int) #C
a = np.arange(1, 1e5, dtype=float) #D
Results with python 2.7.5
| 1 | 2 | 3 | 4 |
-------+-----------+-----------+-----------+-----------+
A 20.8 µs 13.3 µs 22.6 µs 39.6 µs
B 106 µs 95.3 µs 5.92 µs 26.1 µs
C 4.34 ms 3.51 ms 16.7 µs 38.9 µs
D 46.6 ms 38.5 ms 180 µs 216 µs
Result: np.prod is the fastest one, if you use np.array as data structure (18x for small array, 250x for large array)
with python 3.3.2:
| 1 | 2 | 3 | 4 |
-------+-----------+-----------+-----------+-----------+
A 23.6 µs 12.3 µs 68.6 µs 84.9 µs
B 133 µs 107 µs 7.42 µs 27.5 µs
C 4.79 ms 3.74 ms 18.6 µs 40.9 µs
D 48.4 ms 36.8 ms 187 µs 214 µs
Is python 3 slower?
Starting Python 3.8, a prod function has been included to the math module in the standard library:
math.prod(iterable, *, start=1)
which returns the product of a start value (default: 1) times an iterable of numbers:
import math
math.prod([2, 3, 4]) # 24
Note that if the iterable is empty, this will produce 1 (or the start value if provided).
from functools import reduce
a = [1, 2, 3]
reduce(lambda x, y: x * y, a, 1)
if you just have numbers in your list:
from numpy import prod
prod(list)
EDIT: as pointed out by #off99555 this does not work for large integer results in which case it returns a result of type numpy.int64 while Ian Clelland's solution based on operator.mul and reduce works for large integer results because it returns long.
Well if you really wanted to make it one line without importing anything you could do:
eval('*'.join(str(item) for item in list))
But don't.
I remember some long discussions on comp.lang.python (sorry, too lazy to produce pointers now) which concluded that your original product() definition is the most Pythonic.
Note that the proposal is not to write a for loop every time you want to do it, but to write a function once (per type of reduction) and call it as needed! Calling reduction functions is very Pythonic - it works sweetly with generator expressions, and since the sucessful introduction of sum(), Python keeps growing more and more builtin reduction functions - any() and all() are the latest additions...
This conclusion is kinda official - reduce() was removed from builtins in Python 3.0, saying:
"Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
See also The fate of reduce() in Python 3000 for a supporting quote from Guido (and some less supporting comments by Lispers that read that blog).
P.S. if by chance you need product() for combinatorics, see math.factorial() (new 2.6).
import operator
reduce(operator.mul, list, 1)
I've tested various solutions with perfplot (a small project of mine) and found that
numpy.prod(lst)
is by far the fastest solution (if the list isn't very short).
Code to reproduce the plot:
import perfplot
import numpy
import math
from operator import mul
from functools import reduce
from itertools import accumulate
def reduce_lambda(lst):
return reduce(lambda x, y: x * y, lst)
def reduce_mul(lst):
return reduce(mul, lst)
def forloop(lst):
r = 1
for x in lst:
r *= x
return r
def numpy_prod(lst):
return numpy.prod(lst)
def math_prod(lst):
return math.prod(lst)
def itertools_accumulate(lst):
for value in accumulate(lst, mul):
pass
return value
b = perfplot.bench(
setup=numpy.random.rand,
kernels=[
reduce_lambda,
reduce_mul,
forloop,
numpy_prod,
itertools_accumulate,
math_prod,
],
n_range=[2 ** k for k in range(20)],
xlabel="len(a)",
)
b.save("out.png")
b.show()
The intent of this answer is to provide a calculation that is useful in certain circumstances -- namely when a) there are a large number of values being multiplied such that the final product may be extremely large or extremely small, and b) you don't really care about the exact answer, but instead have a number of sequences, and want to be able to order them based on each one's product.
If you want to multiply the elements of a list, where l is the list, you can do:
import math
math.exp(sum(map(math.log, l)))
Now, that approach is not as readable as
from operator import mul
reduce(mul, list)
If you're a mathematician who isn't familiar with reduce() the opposite might be true, but I wouldn't advise using it under normal circumstances. It's also less readable than the product() function mentioned in the question (at least to non-mathematicians).
However, if you're ever in a situation where you risk underflow or overflow, such as in
>>> reduce(mul, [10.]*309)
inf
and your purpose is to compare the products of different sequences rather than to know what the products are, then
>>> sum(map(math.log, [10.]*309))
711.49879373515785
is the way to go because it's virtually impossible to have a real-world problem in which you would overflow or underflow with this approach. (The larger the result of that calculation is, the larger the product would be if you could calculate it.)
I am surprised no-one has suggested using itertools.accumulate with operator.mul. This avoids using reduce, which is different for Python 2 and 3 (due to the functools import required for Python 3), and moreover is considered un-pythonic by Guido van Rossum himself:
from itertools import accumulate
from operator import mul
def prod(lst):
for value in accumulate(lst, mul):
pass
return value
Example:
prod([1,5,4,3,5,6])
# 1800
One option is to use numba and the #jit or #njit decorator. I also made one or two little tweaks to your code (at least in Python 3, "list" is a keyword that shouldn't be used for a variable name):
#njit
def njit_product(lst):
p = lst[0] # first element
for i in lst[1:]: # loop over remaining elements
p *= i
return p
For timing purposes, you need to run once to compile the function first using numba. In general, the function will be compiled the first time it is called, and then called from memory after that (faster).
njit_product([1, 2]) # execute once to compile
Now when you execute your code, it will run with the compiled version of the function. I timed them using a Jupyter notebook and the %timeit magic function:
product(b) # yours
# 32.7 µs ± 510 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
njit_product(b)
# 92.9 µs ± 392 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note that on my machine, running Python 3.5, the native Python for loop was actually the fastest. There may be a trick here when it comes to measuring numba-decorated performance with Jupyter notebooks and the %timeit magic function. I am not sure that the timings above are correct, so I recommend trying it out on your system and seeing if numba gives you a performance boost.
The fastest way I found was, using while:
mysetup = '''
import numpy as np
from find_intervals import return_intersections
'''
# code snippet whose execution time is to be measured
mycode = '''
x = [4,5,6,7,8,9,10]
prod = 1
i = 0
while True:
prod = prod * x[i]
i = i + 1
if i == len(x):
break
'''
# timeit statement for while:
print("using while : ",
timeit.timeit(setup=mysetup,
stmt=mycode))
# timeit statement for mul:
print("using mul : ",
timeit.timeit('from functools import reduce;
from operator import mul;
c = reduce(mul, [4,5,6,7,8,9,10])'))
# timeit statement for mul:
print("using lambda : ",
timeit.timeit('from functools import reduce;
from operator import mul;
c = reduce(lambda x, y: x * y, [4,5,6,7,8,9,10])'))
and the timings are:
>>> using while : 0.8887967770060641
>>> using mul : 2.0838719510065857
>>> using lambda : 2.4227715369997895
Python 3 result for the OP's tests: (best of 3 for each)
with lambda: 18.978000981995137
without lambda: 8.110567473006085
for loop: 10.795806062000338
with lambda (no 0): 26.612515013999655
without lambda (no 0): 14.704098362999503
for loop (no 0): 14.93075215499266
I'm not sure about the fastest way, but here is the short code to get product of any collection without importing any library or module.
eval('*'.join(map(str,l)))
Here is the code:
product = 1 # Set product to 1 because when you multiply it you don't want you answer to always be 0
my_list = list(input("Type in a list: ").split(", ")) # When input, the data is a string, so you need to convert it into a list and split it to make it a list.
for i in range(0, len(my_list)):
product *= int(my_list[i])
print("The product of all elements in your list is: ", product)
This also works though its cheating
def factorial(n):
x=[]
if n <= 1:
return 1
else:
for i in range(1,n+1):
p*=i
x.append(p)
print x[n-1]

Categories

Resources