Python make math calculations faster

Python make math calculations faster - python

I have this code to find the Pythagorean triplets, which works fine.
I just want to make it faster. On my Intel i5 1135G7, it takes about 0.1272754669189453 time. Maybe it could be done using the multiprocessing module, as my CPU is not 100% utilized.
import math
import time
results = []
start_time = time.time()
def triplets4(n):
for a in range(n):
for b in range(a, n):
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
end_time = time.time()
for x in results:
print(x)
print(end_time-start_time) #print time elapsed

I'm not a python programmer but after observing a few opportunities, I thought I'd make a few modifications for performance optimisation. Python is not a great language to use for speed except when using optimised library functions. You need a script you can compile for performance. Multi-threading can make your program faster but often requires a major refactoring of code. I excluded a=0 from the outputs as I don't think they count as triples. Anywho, this is what I did which runs all of 20%(!) faster on my computer, even after removing a=0 from your script. While the performance gain isn't great, I think some lessons can be learnt from the changes I made.
import math
import time
results = []
start_time = time.time()
def triplets4(n):
n2 = n*n
for a in range(1, n):
a2 = a*a
blim = 1+int(math.sqrt(n2 - a2))
for b in range(a+1, blim):
arg = a2 + b * b
c = math.sqrt(arg)
if c.is_integer():
results.append([a , b, int(c)])
triplets4(1000)
end_time = time.time()
for x in results:
print(x)
print(end_time-start_time) #print time elapsed

Short Answer
One simple way is to add a check for (a * b) % 12 != 0 before running sqrt or is_integer.
According to the Wolfram page on Pythagorean Triplets, the product of the legs (a and b) of a Pythagorean triplet/triangle is always divisible by 12.
Long Answer
We should profile your code first. Just to keep things simple, I've removed the print statements and timers, so we're working with this version of your code:
import math
results = []
def triplets4(n):
for a in range(n):
for b in range(a, n):
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
Using cProfile:
python -m cProfile .\pythagorean.py
1002950 function calls in 0.270 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.270 0.270 {built-in method builtins.exec}
1 0.000 0.000 0.270 0.270 pythagorean.py:1(<module>)
1 0.175 0.175 0.269 0.269 pythagorean.py:5(triplets4)
500500 0.051 0.000 0.051 0.000 {built-in method math.sqrt}
500500 0.043 0.000 0.043 0.000 {method 'is_integer' of 'float' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1167(_find_and_load)
1881 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
Most of the time is spent on math.sqrt and is_integer.
So, to speed up your code we should just make fewer calls to those functions.
The simplest way I can think of is a divisibility test.
import math
results = []
def triplets4(n):
for a in range(n):
for b in range(a, n):
# product of legs must be divisible by 12
if (a * b) % 12 != 0: continue
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
python -m cProfile .\pythagorean_v2.py
280672 function calls in 0.174 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.174 0.174 {built-in method builtins.exec}
1 0.000 0.000 0.174 0.174 pythagorean_v2.py:1(<module>)
1 0.135 0.135 0.174 0.174 pythagorean_v2.py:5(triplets4)
139361 0.021 0.000 0.021 0.000 {built-in method math.sqrt}
139361 0.018 0.000 0.018 0.000 {method 'is_integer' of 'float' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1167(_find_and_load)
1881 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
In both versions we're still appending the same amount of triplets (1881), so that's a good sign.

Related

why python process a sorted list cost more time than a unsorted list

Example:
import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random() for i in range(1000000)]
lIn1 = copy.copy(lIn)
lIn2 = sorted(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn2)')
Result:
3 function calls in 0.075 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 0.075 0.075 :1()
1 0.070 0.070 0.070 0.070 test.py:716(foo)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3 function calls in 0.143 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.006 0.006 0.143 0.143 :1()
1 0.137 0.137 0.137 0.137 test.py:716(foo)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Not really an answer yet, but the comment margin is a bit too small for this.
As random.shuffle() would yield the same result, I decided to implement my own shuffle function and vary the amount of times I'd shuffle. (In the below example, it's the parameter to xrange, 300000.
def my_shuffle(array):
for _ in xrange(300000):
rand1 = random.randint(0, 999999)
rand2 = random.randint(0, 999999)
array[rand1], array[rand2] = array[rand2], array[rand1]
The other code is pretty much unmodified:
import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random()*100000 for i in range(1000000)]
lIn1 = copy.copy(lIn)
my_shuffle(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn1)')
The results I got for the second cProfile depended on the number of times I shuffled:
10000 0.062
100000 0.082
200000 0.099
400000 0.122
800000 0.137
8000000 0.141
10000000 0.141
100000000 0.248
It looks like the more you mess an array up, the longer operations take, up to a certain point. (I don't know about the last result. It took so long that I did some light other stuff in the background and don't really want to retry.)

strange result from timeit

I tried to repeat the functionality of IPython %time, but for some strange reason, results of testing of some function are horrific.
IPython:
In [11]: from random import shuffle
....: import numpy as np
....: def numpy_seq_el_rank(seq, el):
....: return sum(seq < el)
....:
....: seq = np.array(xrange(10000))
....: shuffle(seq)
....:
In [12]: %timeit numpy_seq_el_rank(seq, 10000//2)
10000 loops, best of 3: 46.1 µs per loop
Python:
from timeit import timeit, repeat
def my_timeit(code, setup, rep, loops):
result = repeat(code, setup=setup, repeat=rep, number=loops)
return '%d loops, best of %d: %0.9f sec per loop'%(loops, rep, min(result))
np_setup = '''
from random import shuffle
import numpy as np
def numpy_seq_el_rank(seq, el):
return sum(seq < el)
seq = np.array(xrange(10000))
shuffle(seq)
'''
np_code = 'numpy_seq_el_rank(seq, 10000//2)'
print 'Numpy seq_el_rank:\n\t%s'%my_timeit(code=np_code, setup=np_setup, rep=3, loops=100)
And its output:
Numpy seq_el_rank:
100 loops, best of 3: 1.655324947 sec per loop
As you can see, in python i made 100 loops instead 10000 (and get 35000 times slower result) as in ipython, because it takes really long time. Can anybody explain why result in python is so slow?
UPD:
Here is cProfile.run('my_timeit(code=np_code, setup=np_setup, rep=3, loops=10000)') output:
30650 function calls in 4.987 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.987 4.987 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 <timeit-src>:2(<module>)
3 0.001 0.000 4.985 1.662 <timeit-src>:2(inner)
300 0.006 0.000 4.961 0.017 <timeit-src>:7(numpy_seq_el_rank)
1 0.000 0.000 4.987 4.987 Lab10.py:47(my_timeit)
3 0.019 0.006 0.021 0.007 random.py:277(shuffle)
1 0.000 0.000 0.002 0.002 timeit.py:121(__init__)
3 0.000 0.000 4.985 1.662 timeit.py:185(timeit)
1 0.000 0.000 4.985 4.985 timeit.py:208(repeat)
1 0.000 0.000 4.987 4.987 timeit.py:239(repeat)
2 0.000 0.000 0.000 0.000 timeit.py:90(reindent)
3 0.002 0.001 0.002 0.001 {compile}
3 0.000 0.000 0.000 0.000 {gc.disable}
3 0.000 0.000 0.000 0.000 {gc.enable}
3 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
3 0.000 0.000 0.000 0.000 {isinstance}
3 0.000 0.000 0.000 0.000 {len}
3 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
29997 0.001 0.000 0.001 0.000 {method 'random' of '_random.Random' objects}
2 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
1 0.000 0.000 0.000 0.000 {min}
3 0.003 0.001 0.003 0.001 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {range}
300 4.955 0.017 4.955 0.017 {sum}
6 0.000 0.000 0.000 0.000 {time.clock}

Well, one issue is that you're misreading the results. ipython is telling you how long it took each of the 10,000 iterations for the set of 10,000 iterations with the lowest total time. The timeit.repeat module is reporting how long the whole round of 100 iterations took (again, for the shortest of three). So the real discrepancy is 46.1 µs per loop (ipython) vs. 16.5 ms per loop (python), still a factor of ~350x difference, but not 35,000x.
You didn't show profiling results for ipython. Is it possible that in your ipython session, you did either from numpy import sum or from numpy import *? If so, you'd have been timing the numpy.sum (which is optimized for numpy arrays and would run several orders of magnitude faster), while your python code (which isolated the globals in a way that ipython does not) ran the normal sum (that has to convert all the values to Python ints and sum them).
If you check your profiling output, virtually all of your work is being done in sum; if that part of your code was sped up by several orders of magnitude, the total time would reduce similarly. That would explain the "real" discrepancy; in the test case linked above, it was a 40x difference, and that was for a smaller array (the smaller the array, the less numpy can "show off") with more complex values (vs. summing 0s and 1s here I believe).
The remainder (if any) is probably an issue of how the code is being evaled slightly differently, or possibly weirdness with the random shuffle (for consistent tests, you'd want to seed random with a consistent seed to make the "randomness" repeatable) but I doubt that's a difference of more than a few percent.

There could be any number of reasons this code is running slower in one implementation of python than another. One may be optimized differently than another, one may pre-compile certain parts while the other is fully interpreted. The only way to figure out why is to profile your code.
https://docs.python.org/2/library/profile.html
import cProfile
cProfile.run('repeat(code, setup=setup, repeat=rep, number=loops)')
Will give a result similar to
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <stdin>:1(testing)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
Which shows you when function calls were made, how many times they were made and how long they took.

Python numpy: sum every 3 rows (converting monthly to quarterly)

I have a set of one-dimensional numpy arrays with monthly data. I need to aggregate them by quarter, creating a new array where the first item is the sum of the first 3 items of the old array, etc.
I am using this function, with x =3 :
def sumeveryxrows(myarray,x):
return([sum(myarray[x*n:x*n+x]) for n in range( int(len(myarray)/x))])
It works, but can you think of a faster way? I profiled it, and 97% of the time is spent doing __getitem__

You could use reshape (assuming your array has a size multiple of x):
sumeveryxrows = lambda myarray, x: myarray.reshape((myarray.shape[0] / x, x)).sum(1)
The above takes less than .3s on an array with 30000000 values:
>>> a = numpy.random.rand(30000000)
>>> cProfile.run('sumeveryxrows(a, 3)')
8 function calls in 0.263 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.258 0.258 <stdin>:1(<lambda>)
1 0.005 0.005 0.263 0.263 <string>:1(<module>)
1 0.000 0.000 0.258 0.258 _methods.py:31(_sum)
1 0.000 0.000 0.263 0.263 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.258 0.258 0.258 0.258 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.258 0.258 {method 'sum' of 'numpy.ndarray' objects}

another solution may be
def sumeveryxrows(myarray, x):
return [sum(myarray[n: n+x]) for n in xrange(0, len(myarray), x)]
This is for python 2.x. If your using python 3 replace xrange with range.
xrange uses an iterator rather than generating an entire list.
You can also specify a step. This removes the need to use multiplication.
Then of course there is always the non-python way to do it (specifically for 3).
def sumevery3rows(a):
i = 0
ret = []
stop = len(a) - 2
while i < stop:
ret.append(a[i] + a[i+1] + a[i+2])
i += 3
if i != len(a):
ret.append(sum(a[i:len(a)]))
return ret
I don't know how well this performs, and an implementation for variable x would probably make any benefits of this solution non-existent.

filter takes one calls but it didn't takes time

I am checking the performance of filter with map using cProfile,
cProfile.run("""
s = [range(10000) for i in range(10000)]
filter(None, map(lambda x:x[0], s))""")
20005 function calls in 42.272 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.467 2.467 42.272 42.272 <string>:2(<module>)
10000 0.004 0.000 0.004 0.000 <string>:3(<lambda>)
1 0.000 0.000 0.000 0.000 {filter}
1 0.201 0.201 0.205 0.205 {map}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10001 39.599 0.004 39.599 0.004 {range}
From the above analysis I have observed filter and it calls one time but it didn't takes time, Why it didn't takes time ?

If there is a function call, there will be some time consumption to call that function.
That time migth be very minute, which is lesser than the millisecond.
In your Cprofile Test , the maximum unit of time can be represented is 1 millisecond.
If a function takes less than 1 milli second, Cprofile shows it as 0 milliseconds.
That means here filter takes less millisecond, I found this by testing your code in following way.
import datetime
l= [range(4) for i in range(4)]
a = datetime.datetime.now()
filter(None, l)
b = datetime.datetime.now()
c = b - a
print c.seconds
print c.microseconds
Output on my system
0
35

Most efficient way to create an array of cos and sin in Numpy

I need to store an array of size n with values of cos(x) and sin(x), lets say
array[[cos(0.9), sin(0.9)],
[cos(0.35),sin(0.35)],
...]
The arguments of each pair of cos and sin is given by random choice. My code as far as I have been improving it is like this:
def randvector():
""" Generates random direction for n junctions in the unitary circle """
x = np.empty([n,2])
theta = 2 * np.pi * np.random.random_sample((n))
x[:,0] = np.cos(theta)
x[:,1] = np.sin(theta)
return x
Is there a shorter way or more effective way to achieve this?

Your code is effective enough. And justhalf's answer is not bad I think.
For effective and short, How about this code?
def randvector(n):
theta = 2 * np.pi * np.random.random_sample((n))
return np.vstack((np.cos(theta), np.sin(theta))).T
UPDATE
Append cProfile result.
justhalf's
5 function calls in 4.707 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 4.707 4.707 <string>:1(<module>)
1 2.452 2.452 4.706 4.706 test.py:6(randvector1)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.010 0.010 0.010 0.010 {method 'random_sample' of 'mtrand.RandomState' objects}
1 2.244 2.244 2.244 2.244 {numpy.core.multiarray.array}
OP's
5 function calls in 0.088 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.088 0.088 <string>:1(<module>)
1 0.079 0.079 0.088 0.088 test.py:9(randvector2)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
mine
21 function calls in 0.087 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.087 0.087 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 numeric.py:322(asanyarray)
1 0.000 0.000 0.002 0.002 shape_base.py:177(vstack)
2 0.000 0.000 0.000 0.000 shape_base.py:58(atleast_2d)
1 0.076 0.076 0.087 0.087 test.py:17(randvector3)
6 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects}
2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.002 0.002 0.002 0.002 {numpy.core.multiarray.concatenate}

Your code already looks fine to me, but here are a few more thoughts.
Here's a one-liner.
It is marginally slower than your version.
def randvector2(n):
return np.exp((2.0j * np.pi) * np.random.rand(n, 1)).view(dtype=np.float64)
I get these timings for n=10000
Yours:
1000 loops, best of 3: 716 µs per loop
my shortened version:
1000 loops, best of 3: 834 µs per loop
Now if speed is a concern, your approach is really very good.
Another answer shows how to use hstack.
That works well.
Here is another version that is just a little different from yours and is marginally faster.
def randvector3(n):
x = np.empty([n,2])
theta = (2 * np.pi) * np.random.rand(n)
np.cos(theta, out=x[:,0])
np.sin(theta, out=x[:,1])
return x
This gives me the timing:
1000 loops, best of 3: 698 µs per loop
If you have access to numexpr, the following is faster (at least on my machine).
import numexpr as ne
def randvector3(n):
sample = np.random.rand(n, 1)
c = 2.0j * np.pi
return ne.evaluate('exp(c * sample)').view(dtype=np.float64)
This gives me the timing:
1000 loops, best of 3: 366 µs per loop
Honestly though, if I were writing this for anything that wasn't extremely performance intensive, I'd do pretty much the same thing you did.
It makes your intent pretty clear to the reader.
The version with hstack works well too.
Another quick note:
When I run timings for n=10, my one-line version is fastest.
When I do n=10000000, the fast pure-numpy version is fastest.

You can use list comprehension to make the code a little bit shorter:
def randvector(n):
return np.array([(np.cos(theta), np.sin(theta)) for theta in 2*np.pi*np.random.random_sample(n)])
But, as IanH mentioned in comments, this is slower. In fact, through my experiment, this is 5x slower, because this doesn't take advantage of NumPy vectorization.
So to answer your question:
Is there a shorter way?
Yes, which is what I give in this answer, although it's only shorter by a few characters (but it saves many lines!)
Is there a more effective (I believe you meant "efficient") way?
I believe the answer to this question, without overly complicating the code, is no, since numpy already optimizes the vectorization (assigning of the cos and sin values to the array)
Timing
Comparing various methods:
OP's randvector: 0.002131 s
My randvector: 0.013218 s
mskimm's randvector: 0.003175 s
So it seems that mskimm's randvector looks good in terms of code length end efficiency =D

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python make math calculations faster - python

Related

why python process a sorted list cost more time than a unsorted list

strange result from timeit

Python numpy: sum every 3 rows (converting monthly to quarterly)

filter takes one calls but it didn't takes time

Most efficient way to create an array of cos and sin in Numpy

Categories

Resources