I am researching on speed of factorial. But I am using two ways only,
import timeit
def fact(N):
B = N
while N > 1:
B = B * (N-1)
N = N-1
return B
def fact1(N):
B = 1
for i in range(1, N+1):
B = B * i
return B
print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
Here is the output,
0.540276050568 120
0.654400110245 120
From above code I have observed,
While take less time than for
My question is,
Is the best way to find the factorial in python ?
If you're looking for the best, why not use the one provided in the math module?
>>> import math
>>> math.factorial
<built-in function factorial>
>>> math.factorial(10)
3628800
And a comparison of timings on my machine:
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
0.840167045593 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.04350399971 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial")
0.149857997894
We see that the builtin is significantly better than either of the pure python variants you proposed.
TLDR; microbenchmarks aren't very useful
For Cpython, try this:
>>> from math import factorial
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
1.38128209114 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.46199703217 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), factorial(5)
0.397044181824 120
But under pypy, the while is faster than the one from math
>>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)\
0.170556783676 120
>>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1\
(5)
0.319650173187 120
>>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), f\
actorial(5)
0.210616111755 120
So it depends on the implementation. Now try bigger numbers
>>>> print timeit.timeit('fact(50)', setup="from __main__ import fact"), fact(50)
7.71517109871 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('fact1(50)', setup="from __main__ import fact1"), fact1(50)
6.58060312271 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('factorial(50)', setup="from math import factorial"), factorial(50)
6.53072690964 30414093201713378043612608166064768844377641568960512000000000000
while is in last place, and the version using for is about the same as the one from the math module
Otherwise, if you're looking for a Python implementation (this is my favourite):
from operator import mul
def factorial(n):
return reduce(mul, range(1, (n + 1)), 1)
Usage:
>>> factorial(0)
1
>>> factorial(1)
1
>>> factorial(2)
2
>>> factorial(3)
6
>>> factorial(4)
24
>>> factorial(5)
120
>>> factorial(10)
3628800
Performance: (On my desktop:)
$ python -m timeit -c -s "fact = lambda n: reduce(lambda a, x: a * x, range(1, (n + 1)), 1)" "fact(10)"
1000000 loops, best of 3: 1.98 usec per loop
I have tried with reduce(lambda x, y: x*y, range(1, 5))
>>>timeit("import math; math.factorial(4)")
1.0205099133840179
>>>timeit("reduce(lambda x, y: x*y, range(1, 5))")
1.4047879075160665
>>>timeit("from operator import mul;reduce(mul, range(1, 5))")
2.530837320051319
Related
Below is the well-known example of fibonacci sequence
# test.py
import sys
sys.setrecursionlimit(20000)
def fib_loop(n):
if n <= 1:
return n
fn, fnm1 = 1, 0
for _ in range(2, n+1):
fn, fnm1 = fn + fnm1, fn
return fn
def fib_recursion(n, memo={}):
if n <= 1:
return n
if n not in memo:
memo[n] = fib_recursion(n-1, memo) + fib_recursion(n-2, memo)
return memo[n]
As everybody does, I used to think that the loop variant will be much faster than the recursive one. However, the actual result is quite surprising.
$ python3 -m timeit "import test; test.fib_loop(10000)"
100 loops, best of 5: 1.93 msec per loop
$ python3 -m timeit "import test; test.fib_recursion(10000)"
500000 loops, best of 5: 471 nsec per loop
I have no idea why. Could anybody help me?
Because you are memoizing your result. And you are re-using that memo dict on every iteration. So the first time it runs it is slow. On every other invoctation, it is a simple dict-lookup.
If you use number=1 so it only runs just once, you'll see the first call is actually slower
>>> import sys
>>> sys.setrecursionlimit(20000)
>>>
>>> def fib_loop(n):
... if n <= 1:
... return n
... fn, fnm1 = 1, 0
... for _ in range(2, n+1):
... fn, fnm1 = fn + fnm1, fn
... return fn
...
>>> def fib_recursion(n, memo={}):
... if n <= 1:
... return n
... if n not in memo:
... memo[n] = fib_recursion(n-1, memo) + fib_recursion(n-2, memo)
... return memo[n]
...
>>> import timeit
>>> timeit.timeit("fib_loop(1000)", setup="from __main__ import fib_loop", number=1)
9.027599999456015e-05
>>> timeit.timeit("fib_recursion(1000)", setup="from __main__ import fib_recursion", number=1)
0.0016194200000114733
Alternatively, if you pass a new memo dict for each outer call, you get the same behavior:
>>> timeit.timeit("fib_recursion(1000, {})", setup="from __main__ import fib_recursion", number=1000)
0.38679519899999093
>>> timeit.timeit("fib_loop(1000)", setup="from __main__ import fib_loop", number=1000)
0.07079556799999409
Assume that you have a list with an arbitrary amounts of items, and you wish to get the number of items that match a specific conditions. I though of two ways to do this in a sensible manner but I am not sure which one is best (more pythonic) - or if there is perhaps a better option (without sacrificing too much readability).
import numpy.random as nprnd
import timeit
my = nprnd.randint(1000, size=1000000)
def with_len(my_list):
much = len([t for t in my_list if t >= 500])
def with_sum(my_list):
many = sum(1 for t in my_list if t >= 500)
t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')
print("with len:",t1.timeit(1000)/1000)
print("with sum:",t2.timeit(1000)/1000)
Performance is almost identical between these two cases. However, which of these is more pythonic? Or is there a better alternative?
For those who are curious, I tested the proposed solutions (from comments and answers) and these are the results:
import numpy as np
import timeit
import functools
my = np.random.randint(1000, size=100000)
def with_len(my_list):
return len([t for t in my_list if t >= 500])
def with_sum(my_list):
return sum(1 for t in my_list if t >= 500)
def with_sum_alt(my_list):
return sum(t >= 500 for t in my_list)
def with_lambda(my_list):
return functools.reduce(lambda a, b: a + (1 if b >= 500 else 0), my_list, 0)
def with_np(my_list):
return len(np.where(my_list>=500)[0])
t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')
t3 = timeit.Timer('with_sum_alt(my)', 'from __main__ import with_sum_alt, my')
t4 = timeit.Timer('with_lambda(my)', 'from __main__ import with_lambda, my')
t5 = timeit.Timer('with_np(my)', 'from __main__ import with_np, my')
print("with len:", t1.timeit(1000)/1000)
print("with sum:", t2.timeit(1000)/1000)
print("with sum_alt:", t3.timeit(1000)/1000)
print("with lambda:", t4.timeit(1000)/1000)
print("with np:", t5.timeit(1000)/1000)
Python 2.7
('with len:', 0.02201753337348283)
('with sum:', 0.022727363518455238)
('with sum_alt:', 0.2370256687439941) # <-- very slow!
('with lambda:', 0.026367264818657078)
('with np:', 0.0005811764306089913) # <-- very fast!
Python 3.6
with len: 0.017649643657480736
with sum: 0.0182978007766851
with sum_alt: 0.19659815740239048
with lambda: 0.02691670741400111
with np: 0.000534095418615152
The 2nd one, with_sum is more pythonic in the sense that it uses much less memory as it doesn't build the whole list because the generator expression is fed to sum().
I'm with #Chris_Rands. But as far as performance is concerned, there is a faster way using numpy:
import numpy as np
def with_np(my_list):
return len(np.where(my_list>=500)[0])
Working on below problem, using Python 2.7. Post my code and wondering if any further smart ideas to make it run faster? I thought there might be some ideas which sort the list first, and leveraging sorting behavior, but cannot figure out so far. My code is O(n^2) time complexity.
Problem,
Given an array A of integers, find the index of values that satisfy A + B =C + D, where A,B,C & D are integers values in the array. Find all combinations of quadruples.
Code,
from collections import defaultdict
sumIndex = defaultdict(list)
def buildIndex(numbers):
for i in range(len(numbers)):
for j in range(i+1,len(numbers)):
sumIndex[numbers[i]+numbers[j]].append((i,j))
def checkResult():
for k,v in sumIndex.items():
if len(v) > 1:
for i in v:
print k, i
if __name__ == "__main__":
buildIndex([1,2,3,4])
checkResult()
Output, which is sum value, and indexes which sum could result in such value,
5 (0,3)
5 (1,2)
Consider the case where all the elements of the array are equal. Then we know the answer beforehand but merely printing the result will take O(n^2) time since there are n*(n-1)/2 number of such pairs. So I think it is safe to say that there is no approach with a better complexity than O(n^2) for this problem.
Yes it can be done in a way with complexity less than O(n^2). The algo is:
Create a duplicate array suppose indexArr[] storing the index of the element of the original array say origArr[].
Sort the origArr[] in ascending order using some algo having complexity O(nLogn). Likewise also shuffle the indexArr[] while sorting the origArr[].
Now you have to find the pairs in the sorted array, you will run 2 loops finding all the possible combinations. Suppose you select origArr[i] + origArr[i + 1] = sum.
Now you will search iff sum <= origArr[n] where n is the last element of the array which is the maximum element. Also if sum > origArr[n] then you will break the inner loop as well as the outer loop as no other combinations are possible.
Also you will break the inner loop if sum > origArr[j] as no other combinations are possible for that sum.
PS - The worst case scenario will be O(n^2).
faster, more Pythonic approach using itertools.combinations:
from collections import defaultdict
from itertools import combinations
def get_combos(l):
d = defaultdict(list)
for indices in combinations(range(len(l)),2):
d[(l[indices[0]] + l[indices[1]])].append(indices)
return {k:v for k,v in d.items() if len(v) > 1}
timing results
OP this
len(l)=4, min(repeat=100, number=10000) | 0.09334 | 0.08050
len(l)=50, min(repeat=10, number=100) | 0.08689 | 0.08996
len(l)=500, min(repeat=10, number=10) | 0.64974 | 0.59553
len(l)=1000, min(repeat=3, number=3) | 1.01559 | 0.83494
len(l)=5000, min(repeat=3, number=1) | 10.26168 | 8.92959
timing code
from collections import defaultdict
from itertools import combinations
from random import randint
from timeit import repeat
def lin_get_combos(l):
sumIndex = defaultdict(list)
for i in range(len(l)):
for j in range(i+1,len(l)):
sumIndex[l[i]+l[j]].append((i,j))
return {k:v for k,v in sumIndex.items() if len(v) > 1}
def craig_get_combos(l):
d = defaultdict(list)
for indices in combinations(range(len(l)),2):
d[(l[indices[0]] + l[indices[1]])].append(indices)
return {k:v for k,v in d.items() if len(v) > 1}
l = []
for _ in range(4):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=100, number=10000))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 100, number=10000))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(50):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=10, number=100))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 10, number=100))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(500):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=10, number=10))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 10, number=10))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(1000):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=3, number=3))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 3, number=3))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(5000):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=3, number=1))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 3, number=1))
print '%0.5f, %0.5f' % (t1, t2)
from python timeit module i want to check how much time does it take to print the following , how to do so,
import timeit
x = [x for x in range(10000)]
timeit.timeit("print x[9999]")
d=[{i:i} for i in x]
timeit.timeit("print d[9999]")
NameError: global name 'x' is not defined
NameError: global name 'd' is not defined
Per the docs:
To give the timeit module access to functions you define, you can pass a setup parameter which contains an import statement
In your case, that would be e.g.:
timeit.timeit('print d[9999]',
setup='from __main__ import d')
Here is an example of how you can do it:
import timeit
x = [x for x in range(10000)]
d = [{i: i} for i in x]
for i in [x, d]:
t = timeit.timeit(stmt="print(i[9999])", number=100, globals=globals())
print(f"took: {t:.4f}")
Output:
took: 0.0776
took: 0.0788
Please notice I added number=100, so it runs 100 times each test. By default it 1,000,000 times.
I wanted to get number of indexes in two string which are not same.
Things that are fixed:
String data will only have 0 or 1 on any index. i.e strings are binary representation of a number.
Both the string will be of same length.
For the above problem I wrote the below function in python
def foo(a,b):
result = 0
for x,y in zip(a,b):
if x != y:
result += 1
return result
But the thing is these strings are huge. Very large. So the above functions is taking too much time. any thing i should do to make it super fast.
This is how i did same in c++, Its quite fast now, but still can't understand how to do packing in short integers and all that said by #Yves Daoust :
size_t diff(long long int n1, long long int n2)
{
long long int c = n1 ^ n2;
bitset<sizeof(int) * CHAR_BIT> bits(c);
string s = bits.to_string();
return std::count(s.begin(), s.end(), '1');
}
I'll walk through the options here, but basically you are calculating the hamming distance between two numbers. There are dedicated libraries that can make this really, really fast, but lets focus on the pure Python options first.
Your approach, zipping
zip() produces one big list first, then lets you loop. You could use itertools.izip() instead, and make it a generator expression:
from itertools import izip
def foo(a, b):
return sum(x != y for x, y in izip(a, b))
This produces only one pair at a time, avoiding having to create a large list of tuples first.
The Python boolean type is a subclass of int, where True == 1 and False == 0, letting you sum them:
>>> True + True
2
Using integers instead
However, you probably want to rethink your input data. It's much more efficient to use integers to represent your binary data; integers can be operated on directly. Doing the conversion inline, then counting the number of 1s on the XOR result is:
def foo(a, b):
return format(int(a, 2) ^ int(b, 2), 'b').count('1')
but not having to convert a and b to integers in the first place would be much more efficient.
Time comparisons:
>>> from itertools import izip
>>> import timeit
>>> s1 = "0100010010"
>>> s2 = "0011100010"
>>> def foo_zipped(a, b): return sum(x != y for x, y in izip(a, b))
...
>>> def foo_xor(a, b): return format(int(a, 2) ^ int(b, 2), 'b').count('1')
...
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_zipped as f')
1.7872788906097412
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor as f')
1.3399651050567627
>>> s1 = s1 * 1000
>>> s2 = s2 * 1000
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_zipped as f', number=1000)
1.0649528503417969
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor as f', number=1000)
0.0779869556427002
The XOR approach is faster by orders of magnitude if the inputs get larger, and this is with converting the inputs to int first.
Dedicated libraries for bitcounting
The bit counting (format(integer, 'b').count(1)) is pretty fast, but can be made faster still if you installed the gmpy extension library (a Python wrapper around the GMP library) and used the gmpy.popcount() function:
def foo(a, b):
return gmpy.popcount(int(a, 2) ^ int(b, 2))
gmpy.popcount() is about 20 times faster on my machine than the str.count() method. Again, not having to convert a and b to integers to begin with would remove another bottleneck, but even then there per-call performance is almost doubled:
>>> import gmpy
>>> def foo_xor_gmpy(a, b): return gmpy.popcount(int(a, 2) ^ int(b, 2))
...
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor as f', number=10000)
0.7225301265716553
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor_gmpy as f', number=10000)
0.47731995582580566
To illustrate the difference when a and b are integers to begin with:
>>> si1, si2 = int(s1, 2), int(s2, 2)
>>> def foo_xor_int(a, b): return format(a ^ b, 'b').count('1')
...
>>> def foo_xor_gmpy_int(a, b): return gmpy.popcount(a ^ b)
...
>>> timeit.timeit('f(si1, si2)', 'from __main__ import si1, si2, foo_xor_int as f', number=100000)
3.0529568195343018
>>> timeit.timeit('f(si1, si2)', 'from __main__ import si1, si2, foo_xor_gmpy_int as f', number=100000)
0.15820622444152832
Dedicated libraries for hamming distances
The gmpy library actually includes a gmpy.hamdist() function, which calculates this exact number (the number of 1 bits in the XOR result of the integers) directly:
def foo_gmpy_hamdist(a, b):
return gmpy.hamdist(int(a, 2), int(b, 2))
which'll blow your socks off entirely if you used integers to begin with:
def foo_gmpy_hamdist_int(a, b):
return gmpy.hamdist(a, b)
Comparisons:
>>> def foo_gmpy_hamdist(a, b):
... return gmpy.hamdist(int(a, 2), int(b, 2))
...
>>> def foo_gmpy_hamdist_int(a, b):
... return gmpy.hamdist(a, b)
...
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor as f', number=100000)
7.479684114456177
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_gmpy_hamdist as f', number=100000)
4.340585947036743
>>> timeit.timeit('f(si1, si2)', 'from __main__ import si1, si2, foo_gmpy_hamdist_int as f', number=100000)
0.22896099090576172
That's 100.000 times the hamming distance between two 3k+ digit numbers.
Another package that can calculate the distance is Distance, which supports calculating the hamming distance between strings directly.
Make sure you use the --with-c switch to have it compile the C optimisations; when installing with pip use bin/pip install Distance --install-option --with-c for example.
Benchmarking this against the XOR-with-bitcount approach again:
>>> import distance
>>> def foo_distance_hamming(a, b):
... return distance.hamming(a, b)
...
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_xor as f', number=100000)
7.229060173034668
>>> timeit.timeit('f(s1, s2)', 'from __main__ import s1, s2, foo_distance_hamming as f', number=100000)
0.7701470851898193
It uses the naive approach; zip over both input strings and count the number of differences, but since it does this in C it is still plenty faster, about 10 times as fast. The gmpy.hamdist() function still beats it when you use integers, however.
Not tested, but how would this perform:
sum(x!=y for x,y in zip(a,b))
If the strings represent binary numbers, you can convert to integers and use bitwise operators:
def foo(s1, s2):
# return sum(map(int, format(int(a, 2) ^ int(b, 2), 'b'))) # one-liner
a = int(s1, 2) # convert string to integer
b = int(s2, 2)
c = a ^ b # use xor to get differences
s = format(c, 'b') # convert back to string of zeroes and ones
return sum(map(int, s)) # sum all ones (count of differences)
s1 = "0100010010"
s2 = "0011100010"
# 12345
assert foo(s1, s2) == 5
Pack your strings as short integers (16 bits). After xoring, pass to a precomputed lookup table of 65536 entries that gives the number of 1s per short.
If pre-packing is not an option, switch to C++ with inline AVX2 intrinsics. They will allow you to load 32 characters in a single instruction, perform the comparisons, then pack the 32 results to 32 bits (if I am right).