I have a set of one-dimensional numpy arrays with monthly data. I need to aggregate them by quarter, creating a new array where the first item is the sum of the first 3 items of the old array, etc.
I am using this function, with x =3 :
def sumeveryxrows(myarray,x):
return([sum(myarray[x*n:x*n+x]) for n in range( int(len(myarray)/x))])
It works, but can you think of a faster way? I profiled it, and 97% of the time is spent doing __getitem__
You could use reshape (assuming your array has a size multiple of x):
sumeveryxrows = lambda myarray, x: myarray.reshape((myarray.shape[0] / x, x)).sum(1)
The above takes less than .3s on an array with 30000000 values:
>>> a = numpy.random.rand(30000000)
>>> cProfile.run('sumeveryxrows(a, 3)')
8 function calls in 0.263 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.258 0.258 <stdin>:1(<lambda>)
1 0.005 0.005 0.263 0.263 <string>:1(<module>)
1 0.000 0.000 0.258 0.258 _methods.py:31(_sum)
1 0.000 0.000 0.263 0.263 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.258 0.258 0.258 0.258 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1 0.000 0.000 0.258 0.258 {method 'sum' of 'numpy.ndarray' objects}
another solution may be
def sumeveryxrows(myarray, x):
return [sum(myarray[n: n+x]) for n in xrange(0, len(myarray), x)]
This is for python 2.x. If your using python 3 replace xrange with range.
xrange uses an iterator rather than generating an entire list.
You can also specify a step. This removes the need to use multiplication.
Then of course there is always the non-python way to do it (specifically for 3).
def sumevery3rows(a):
i = 0
ret = []
stop = len(a) - 2
while i < stop:
ret.append(a[i] + a[i+1] + a[i+2])
i += 3
if i != len(a):
ret.append(sum(a[i:len(a)]))
return ret
I don't know how well this performs, and an implementation for variable x would probably make any benefits of this solution non-existent.
Related
I have this code to find the Pythagorean triplets, which works fine.
I just want to make it faster. On my Intel i5 1135G7, it takes about 0.1272754669189453 time. Maybe it could be done using the multiprocessing module, as my CPU is not 100% utilized.
import math
import time
results = []
start_time = time.time()
def triplets4(n):
for a in range(n):
for b in range(a, n):
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
end_time = time.time()
for x in results:
print(x)
print(end_time-start_time) #print time elapsed
I'm not a python programmer but after observing a few opportunities, I thought I'd make a few modifications for performance optimisation. Python is not a great language to use for speed except when using optimised library functions. You need a script you can compile for performance. Multi-threading can make your program faster but often requires a major refactoring of code. I excluded a=0 from the outputs as I don't think they count as triples. Anywho, this is what I did which runs all of 20%(!) faster on my computer, even after removing a=0 from your script. While the performance gain isn't great, I think some lessons can be learnt from the changes I made.
import math
import time
results = []
start_time = time.time()
def triplets4(n):
n2 = n*n
for a in range(1, n):
a2 = a*a
blim = 1+int(math.sqrt(n2 - a2))
for b in range(a+1, blim):
arg = a2 + b * b
c = math.sqrt(arg)
if c.is_integer():
results.append([a , b, int(c)])
triplets4(1000)
end_time = time.time()
for x in results:
print(x)
print(end_time-start_time) #print time elapsed
Short Answer
One simple way is to add a check for (a * b) % 12 != 0 before running sqrt or is_integer.
According to the Wolfram page on Pythagorean Triplets, the product of the legs (a and b) of a Pythagorean triplet/triangle is always divisible by 12.
Long Answer
We should profile your code first. Just to keep things simple, I've removed the print statements and timers, so we're working with this version of your code:
import math
results = []
def triplets4(n):
for a in range(n):
for b in range(a, n):
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
Using cProfile:
python -m cProfile .\pythagorean.py
1002950 function calls in 0.270 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.270 0.270 {built-in method builtins.exec}
1 0.000 0.000 0.270 0.270 pythagorean.py:1(<module>)
1 0.175 0.175 0.269 0.269 pythagorean.py:5(triplets4)
500500 0.051 0.000 0.051 0.000 {built-in method math.sqrt}
500500 0.043 0.000 0.043 0.000 {method 'is_integer' of 'float' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1167(_find_and_load)
1881 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
Most of the time is spent on math.sqrt and is_integer.
So, to speed up your code we should just make fewer calls to those functions.
The simplest way I can think of is a divisibility test.
import math
results = []
def triplets4(n):
for a in range(n):
for b in range(a, n):
# product of legs must be divisible by 12
if (a * b) % 12 != 0: continue
c = math.sqrt(a * a + b * b)
if c.is_integer() and c<=n:
results.append([a , b, int(c)])
triplets4(1000)
python -m cProfile .\pythagorean_v2.py
280672 function calls in 0.174 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.174 0.174 {built-in method builtins.exec}
1 0.000 0.000 0.174 0.174 pythagorean_v2.py:1(<module>)
1 0.135 0.135 0.174 0.174 pythagorean_v2.py:5(triplets4)
139361 0.021 0.000 0.021 0.000 {built-in method math.sqrt}
139361 0.018 0.000 0.018 0.000 {method 'is_integer' of 'float' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1167(_find_and_load)
1881 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
In both versions we're still appending the same amount of triplets (1881), so that's a good sign.
I tried to repeat the functionality of IPython %time, but for some strange reason, results of testing of some function are horrific.
IPython:
In [11]: from random import shuffle
....: import numpy as np
....: def numpy_seq_el_rank(seq, el):
....: return sum(seq < el)
....:
....: seq = np.array(xrange(10000))
....: shuffle(seq)
....:
In [12]: %timeit numpy_seq_el_rank(seq, 10000//2)
10000 loops, best of 3: 46.1 µs per loop
Python:
from timeit import timeit, repeat
def my_timeit(code, setup, rep, loops):
result = repeat(code, setup=setup, repeat=rep, number=loops)
return '%d loops, best of %d: %0.9f sec per loop'%(loops, rep, min(result))
np_setup = '''
from random import shuffle
import numpy as np
def numpy_seq_el_rank(seq, el):
return sum(seq < el)
seq = np.array(xrange(10000))
shuffle(seq)
'''
np_code = 'numpy_seq_el_rank(seq, 10000//2)'
print 'Numpy seq_el_rank:\n\t%s'%my_timeit(code=np_code, setup=np_setup, rep=3, loops=100)
And its output:
Numpy seq_el_rank:
100 loops, best of 3: 1.655324947 sec per loop
As you can see, in python i made 100 loops instead 10000 (and get 35000 times slower result) as in ipython, because it takes really long time. Can anybody explain why result in python is so slow?
UPD:
Here is cProfile.run('my_timeit(code=np_code, setup=np_setup, rep=3, loops=10000)') output:
30650 function calls in 4.987 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.987 4.987 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 <timeit-src>:2(<module>)
3 0.001 0.000 4.985 1.662 <timeit-src>:2(inner)
300 0.006 0.000 4.961 0.017 <timeit-src>:7(numpy_seq_el_rank)
1 0.000 0.000 4.987 4.987 Lab10.py:47(my_timeit)
3 0.019 0.006 0.021 0.007 random.py:277(shuffle)
1 0.000 0.000 0.002 0.002 timeit.py:121(__init__)
3 0.000 0.000 4.985 1.662 timeit.py:185(timeit)
1 0.000 0.000 4.985 4.985 timeit.py:208(repeat)
1 0.000 0.000 4.987 4.987 timeit.py:239(repeat)
2 0.000 0.000 0.000 0.000 timeit.py:90(reindent)
3 0.002 0.001 0.002 0.001 {compile}
3 0.000 0.000 0.000 0.000 {gc.disable}
3 0.000 0.000 0.000 0.000 {gc.enable}
3 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
3 0.000 0.000 0.000 0.000 {isinstance}
3 0.000 0.000 0.000 0.000 {len}
3 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
29997 0.001 0.000 0.001 0.000 {method 'random' of '_random.Random' objects}
2 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
1 0.000 0.000 0.000 0.000 {min}
3 0.003 0.001 0.003 0.001 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {range}
300 4.955 0.017 4.955 0.017 {sum}
6 0.000 0.000 0.000 0.000 {time.clock}
Well, one issue is that you're misreading the results. ipython is telling you how long it took each of the 10,000 iterations for the set of 10,000 iterations with the lowest total time. The timeit.repeat module is reporting how long the whole round of 100 iterations took (again, for the shortest of three). So the real discrepancy is 46.1 µs per loop (ipython) vs. 16.5 ms per loop (python), still a factor of ~350x difference, but not 35,000x.
You didn't show profiling results for ipython. Is it possible that in your ipython session, you did either from numpy import sum or from numpy import *? If so, you'd have been timing the numpy.sum (which is optimized for numpy arrays and would run several orders of magnitude faster), while your python code (which isolated the globals in a way that ipython does not) ran the normal sum (that has to convert all the values to Python ints and sum them).
If you check your profiling output, virtually all of your work is being done in sum; if that part of your code was sped up by several orders of magnitude, the total time would reduce similarly. That would explain the "real" discrepancy; in the test case linked above, it was a 40x difference, and that was for a smaller array (the smaller the array, the less numpy can "show off") with more complex values (vs. summing 0s and 1s here I believe).
The remainder (if any) is probably an issue of how the code is being evaled slightly differently, or possibly weirdness with the random shuffle (for consistent tests, you'd want to seed random with a consistent seed to make the "randomness" repeatable) but I doubt that's a difference of more than a few percent.
There could be any number of reasons this code is running slower in one implementation of python than another. One may be optimized differently than another, one may pre-compile certain parts while the other is fully interpreted. The only way to figure out why is to profile your code.
https://docs.python.org/2/library/profile.html
import cProfile
cProfile.run('repeat(code, setup=setup, repeat=rep, number=loops)')
Will give a result similar to
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <stdin>:1(testing)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
Which shows you when function calls were made, how many times they were made and how long they took.
Using cProfile:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 17.834 17.834 <string>:1(<module>)
1 0.007 0.007 17.834 17.834 basher.py:5551(_refresh)
1 0.000 0.000 10.522 10.522 basher.py:1826(RefreshUI)
4 0.024 0.006 10.517 2.629 basher.py:961(PopulateItems)
211 1.494 0.007 7.488 0.035 basher.py:1849(PopulateItem)
231 0.074 0.000 6.734 0.029 {method 'sort' of 'list' objects}
215 0.002 0.000 6.688 0.031 bosh.py:4764(getOrdered)
1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>)
253 0.178 0.001 5.600 0.022 bosh.py:3325(getStatus)
1 0.000 0.000 5.508 5.508 bosh.py:4327(refresh)
1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects}
The 1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>) line puzzles me. At bosh.py:4770 I have modNames.sort(key=lambda a: (a in data) and data.index(a)), data and modNames being lists. Notice 1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects} which seems to come from this line.
So why is this so slow ? Any way I can rewrite this sort() so it performs faster ?
EDIT: a final ingredient I was missing to grok this lambda:
>>> True and 3
3
As YardGlassOfCode stated, it's not the lambda per se which is slow, it is the O(n) operation inside the lambda which is slow. Both a in data and data.index(a) are O(n) operations, where n is the length of data. And as an additional affront to efficiency, the call to index repeats much of the work done in a in data too. If the items in data are hashable, then you can speed this up considerably by first preparing a dict:
weight = dict(zip(data, range(len(data))))
modNames.sort(key=weight.get) # Python2, or
modNames.sort(key=lambda a: weight.get(a, -1)) # works in Python3
This is much quicker because each dict lookup is a O(1) operation.
Note that modNames.sort(key=weight.get) relies on None comparing as less than integers:
In [39]: None < 0
Out[39]: True
In Python3, None < 0 raises an TypeError. So lambda a: weight.get(a, -1) is used to return -1 when a is not in weight.
I need to store an array of size n with values of cos(x) and sin(x), lets say
array[[cos(0.9), sin(0.9)],
[cos(0.35),sin(0.35)],
...]
The arguments of each pair of cos and sin is given by random choice. My code as far as I have been improving it is like this:
def randvector():
""" Generates random direction for n junctions in the unitary circle """
x = np.empty([n,2])
theta = 2 * np.pi * np.random.random_sample((n))
x[:,0] = np.cos(theta)
x[:,1] = np.sin(theta)
return x
Is there a shorter way or more effective way to achieve this?
Your code is effective enough. And justhalf's answer is not bad I think.
For effective and short, How about this code?
def randvector(n):
theta = 2 * np.pi * np.random.random_sample((n))
return np.vstack((np.cos(theta), np.sin(theta))).T
UPDATE
Append cProfile result.
justhalf's
5 function calls in 4.707 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 4.707 4.707 <string>:1(<module>)
1 2.452 2.452 4.706 4.706 test.py:6(randvector1)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.010 0.010 0.010 0.010 {method 'random_sample' of 'mtrand.RandomState' objects}
1 2.244 2.244 2.244 2.244 {numpy.core.multiarray.array}
OP's
5 function calls in 0.088 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.088 0.088 <string>:1(<module>)
1 0.079 0.079 0.088 0.088 test.py:9(randvector2)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
mine
21 function calls in 0.087 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.087 0.087 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 numeric.py:322(asanyarray)
1 0.000 0.000 0.002 0.002 shape_base.py:177(vstack)
2 0.000 0.000 0.000 0.000 shape_base.py:58(atleast_2d)
1 0.076 0.076 0.087 0.087 test.py:17(randvector3)
6 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {map}
2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects}
2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.002 0.002 0.002 0.002 {numpy.core.multiarray.concatenate}
Your code already looks fine to me, but here are a few more thoughts.
Here's a one-liner.
It is marginally slower than your version.
def randvector2(n):
return np.exp((2.0j * np.pi) * np.random.rand(n, 1)).view(dtype=np.float64)
I get these timings for n=10000
Yours:
1000 loops, best of 3: 716 µs per loop
my shortened version:
1000 loops, best of 3: 834 µs per loop
Now if speed is a concern, your approach is really very good.
Another answer shows how to use hstack.
That works well.
Here is another version that is just a little different from yours and is marginally faster.
def randvector3(n):
x = np.empty([n,2])
theta = (2 * np.pi) * np.random.rand(n)
np.cos(theta, out=x[:,0])
np.sin(theta, out=x[:,1])
return x
This gives me the timing:
1000 loops, best of 3: 698 µs per loop
If you have access to numexpr, the following is faster (at least on my machine).
import numexpr as ne
def randvector3(n):
sample = np.random.rand(n, 1)
c = 2.0j * np.pi
return ne.evaluate('exp(c * sample)').view(dtype=np.float64)
This gives me the timing:
1000 loops, best of 3: 366 µs per loop
Honestly though, if I were writing this for anything that wasn't extremely performance intensive, I'd do pretty much the same thing you did.
It makes your intent pretty clear to the reader.
The version with hstack works well too.
Another quick note:
When I run timings for n=10, my one-line version is fastest.
When I do n=10000000, the fast pure-numpy version is fastest.
You can use list comprehension to make the code a little bit shorter:
def randvector(n):
return np.array([(np.cos(theta), np.sin(theta)) for theta in 2*np.pi*np.random.random_sample(n)])
But, as IanH mentioned in comments, this is slower. In fact, through my experiment, this is 5x slower, because this doesn't take advantage of NumPy vectorization.
So to answer your question:
Is there a shorter way?
Yes, which is what I give in this answer, although it's only shorter by a few characters (but it saves many lines!)
Is there a more effective (I believe you meant "efficient") way?
I believe the answer to this question, without overly complicating the code, is no, since numpy already optimizes the vectorization (assigning of the cos and sin values to the array)
Timing
Comparing various methods:
OP's randvector: 0.002131 s
My randvector: 0.013218 s
mskimm's randvector: 0.003175 s
So it seems that mskimm's randvector looks good in terms of code length end efficiency =D
Here is my problem: I have a dict in python such as:
a = {1:[2, 3], 2:[1]}
I would like to output:
1, 2
1, 3
2, 1
what I am doing is
for i in a:
for j in a[i]:
print i, j
so is there any easier way to do that avoiding two loops here or it is the easiest way already?
The code you have is about as good as it gets. One minor improvement might be iterating over the dictionary's items in the outer loop, rather than doing indexing:
for i, lst in a.items() # use a.iteritems() in Python 2
for j in lst:
print("{}, {}".format(i, j))
Couple of alternatives using list comprehensions, if you want to avoid explicit for loops.
# 1 method
# Python2.7
for key, value in a.iteritems(): # Use a.items() for python 3
print "\n".join(["%d, %d" % (key, val) for val in value])
# 2 method - A more fancy way with list comprehensions
print "\n".join(["\n".join(["%d, %d" % (key, val) for val in value]) for key, value in a.iteritems()])
Both will output
1, 2
1, 3
2, 1
Remember in Python, Readability counts., so ideally #Blckknght's solution is what you should look forward, but just looking at your problem, technically as a POC, that you can rewrite your expression as a single loop, here is a solution.
But caveat, if you wan;t your code to be Readable, remember Explicit is better than implicit.
>>> def foo():
return '\n'.join('{},{}'.format(*e) for e in chain(*(izip(cycle([k]),v) for k,v in a.items())))
>>> def bar():
return '\n'.join("{},{}".format(i,j) for i in a for j in a[i])
>>> cProfile.run("foo()")
20 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <pyshell#240>:1(foo)
5 0.000 0.000 0.000 0.000 <pyshell#240>:2(<genexpr>)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
>>> cProfile.run("bar()")
25 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <pyshell#242>:1(bar)
11 0.000 0.000 0.000 0.000 <pyshell#242>:2(<genexpr>)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}