numpy array partial sums with weights - python

I have a numpy array, say, [a,b,c,d,e,...], and would like to compute an array that would look like [x*a+y*b, x*b+y*c, x*c+y*d,...]. The idea that I have is to first split the original array into something like [[a,b],[b,c],[c,d],[d,e],...] and then attack this creature with np.average specifying the axis and weights (x+y=1 in my case), or even use np.dot. Unfortunately, I don't know how to create such array of [a,b],[b,c],... pairs. Any help, or completely different idea even to accomplish the major task, are much appreciated :-)

The quickest, simplest would be to manually extract two slices of your array and add them together:
>>> arr = np.arange(5)
>>> x, y = 10, 1
>>> x*arr[:-1] + y*arr[1:]
array([ 1, 12, 23, 34])
This will turn into a pain if you want to generalize it to triples, quadruples... But you can create your array of pairs from the original array with as_strided in a much more general form:
>>> from numpy.lib.stride_tricks import as_strided
>>> arr_pairs = as_strided(arr, shape=(len(arr)-2+1,2), strides=arr.strides*2)
>>> arr_pairs
array([[0, 1],
[1, 2],
[2, 3],
[3, 4]])
Of course the nice thing about using as_strided is that, just like with the array slices, there is no data copying involved, just messing with the way memory is viewed, so creating this array is virtually costless.
And now probably the fastest is to use np.dot:
>>> xy = [x, y]
>>> np.dot(arr_pairs, xy)
array([ 1, 12, 23, 34])

This looks like a correlate problem.
a
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7])
b
Out[62]: array([1, 2])
np.correlate(a,b,mode='valid')
Out[63]: array([ 2, 5, 8, 11, 14, 17, 20])
Depending on array size and BLAS dot can be faster, your milage will vary greatly:
arr = np.random.rand(1E6)
b = np.random.rand(2)
np.allclose(jamie_dot(arr,b),np.convolve(arr,b[::-1],mode='valid'))
True
%timeit jamie_dot(arr,b)
100 loops, best of 3: 16.1 ms per loop
%timeit np.correlate(arr,b,mode='valid')
10 loops, best of 3: 28.8 ms per loop
This is with an intel mkl BLAS and 8 cores, np.correlate will likely be faster for most implementations.
Also an interesting observation from #Jamie's post:
%timeit b[0]*arr[:-1] + b[1]*arr[1:]
100 loops, best of 3: 8.43 ms per loop
His comment also eliminated the use of np.convolve(a,b[::-1],mode=valid) to the simpler correlate syntax.

If you have a small array, I would create a shifted copy:
shifted_array=numpy.append(original_array[1:],0)
result_array=x*original_array+y*shifted_array
Here you have to store your array twice in memory, so this solution is very memory inefficient, but you can get rid of the for loops.
If you have large arrays, you really need a loop (but much rather a list comprehension):
result_array=[x*original_array[i]+y*original_array[i+1] for i in xrange(len(original_array)-1)]
It gives you the same result as a python list, except for the last item, which should be treated differently anyway.
Based on some random trials, for arrays smaller than 2000 items. the first solution seems to be faster than the second one, but runs into MemoryError even for relatively small arrays (a few 10s of thousands on my PC).
So generally, use a list comprehension, but if you surely know that you will run this only on small (max. 1-2 thousand) arrays, you have a better shot.
Creating a new list like [[a,b],[b,c],[c,d],[d,e],...] would be both memory and time inefficient, as you also need a for loop (or similar) to create it, and you have to store every old value in a new array twice, so you would end up with storing your original array three times.

Another way is to create the right pairs in the array a = np.array([a,b,c,d,e,...]), reshape according to the size of array b = np.array([x, y, ...]) and then take advantage of numpy broadcasting rules:
a = np.arange(8)
b = np.array([1, 2])
a = a.repeat(2)[1:-1]
ans = a.reshape(-1, b.shape[0]).dot(b)
Timings (on my computer):
#Ophion's solution:
# 100000 loops, best of 3: 4.67 µs per loop
This solution:
# 100000 loops, best of 3: 9.78 µs per loop
So, it is slower. #Jaime's solution is better since it does not copy the data like this one.

Related

Vectorize a simple python loop with different numpy.roll for each row

I've searching a lot of similar questions, but no one seems to answer my particular problem, so finally I decided to make my own question.
I have an array with several independent time series, and I need to perform a roll which is different for each time series. The array has dimensions a[N,L] where N is the number of time series and L the length for each time series. I want to store a rolled version of each time series, but with a different roll for each one, and store them in an array b with the same dimensions; the different rolls to be performed to each time series are stored in the array shf which is an integer array with dimension N.
The loop is the following:
for i in np.arange(N):
b[i]=np.roll(a[i],shf[i])
but given the dimensions of the arrays and the huge number of time series in my program, this loop takes a lot of time. Given that every time series is independent of the others, I would like to parallelize this loop to speed my program up. Sure it is very straightforward, but I feel clumsy. Any idea will be well received.
Your way is likely the right way. You may be able to cut out some of the overhead by using numba or similar, but it probably won't be much. If you are willing to waste a whole bunch of memory, you can pre-compute the index array that gives you the desired roll, which may speed things up if you do this computation multiple times with a static shf:
index = (np.arange(L) - shf[:, None]) % L
b = a[index]
In [21]: a = np.arange(12).reshape(3,4)
In [22]: a
Out[22]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [23]: np.roll(a,2,1)
Out[23]:
array([[ 2, 3, 0, 1],
[ 6, 7, 4, 5],
[10, 11, 8, 9]])
The code for roll is more complex because you can specify several axes. But essentially it is doing 2 sliced copies (per dimension). It's a whole new array with every element moved. It's not a simple as the word might sound.
In [24]: res = np.empty_like(a)
In [25]: res[:,:2] = a[:,-2:]
In [26]: res[:,-2:] = a[:,:2]
In [27]: res
Out[27]:
array([[ 2, 3, 0, 1],
[ 6, 7, 4, 5],
[10, 11, 8, 9]])
Doing a different roll for each row requires a different set of slices for each row. The only alternative to row by row iteration is to create advanced indexing arrays for the whole array. That will require at least some sort of iteration, and may be slower indexing.
IIUC, this code can be written in a np.roll-equivalent parallel no-python numba code as:
#nb.njit(parallel=True)
def numba_(a, shf):
b = np.empty_like(a)
rows_num = a.shape[0]
cols_num = a.shape[1]
for i in nb.prange(rows_num):
n = shf[i]
b[i, n:] = a[i, :cols_num - n]
b[i, :n] = a[i, cols_num - n:]
return b
which will get similar results as the OP code but in a very faster scheme, that took 730 ms per loop using google Collaboratory CPU for the expected data volumes.

Cannot understand numpy argpartition output

I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:
These are first 5 elements of the sorted array norms
np.sort(norms)[:5]
array([ 53.64759445, 54.91434479, 60.11617279, 64.09630585, 64.75318909], dtype=float32)
But when I use indices_sorted = np.argpartition(norms, 5)[:5]
norms[indices_sorted]
array([ 60.11617279, 64.09630585, 53.64759445, 54.91434479, 64.75318909], dtype=float32)
When I think I should get the same result as the sorted array?
It works just fine when I use 3 as the parameter indices_sorted = np.argpartition(norms, 3)[:3]
norms[indices_sorted]
array([ 53.64759445, 54.91434479, 60.11617279], dtype=float32)
This isn't making much sense to me, hoping someone can offer some insight?
EDIT: Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.
We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5 elements, instead of np.argpartition(a,5)[:5], simply do -
np.argpartition(a,range(5))[:5]
Here's a sample run to make things clear -
In [84]: a = np.random.rand(10)
In [85]: a
Out[85]:
array([ 0.85017222, 0.19406266, 0.7879974 , 0.40444978, 0.46057793,
0.51428578, 0.03419694, 0.47708 , 0.73924536, 0.14437159])
In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266, 0.14437159, 0.03419694, 0.40444978, 0.46057793])
In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694, 0.14437159, 0.19406266, 0.40444978, 0.46057793])
Please note that argpartition makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k number of elems which is a small fraction of the total number of elems.
Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -
In [51]: a = np.random.rand(10000)*100
In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop
In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop
Thus, to sort all elems, np.argpartition isn't the way to go.
Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -
In [68]: a = np.random.rand(10000)*100
In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647, 942, 2167, 1371, 2571])
In [70]: a.argsort()[:5]
Out[70]: array([1647, 942, 2167, 1371, 2571])
In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop
In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop
Very useful here!
Given the task of indirectly sorting a subset (the top k, top meaning first in sort order) there are two builtin solutions: argsort and argpartition cf. #Divakar's answer.
If, however, performance is a consideration then it may (depending on the sizes of the data and the subset of interest) be well worth resisting the "lure of the one-liner", investing one more line and applying argsort on the output of argpartition:
>>> def top_k_sort(a, k):
... return np.argsort(a)[:k]
...
>>> def top_k_argp(a, k):
... return np.argpartition(a, range(k))[:k]
...
>>> def top_k_hybrid(a, k):
... b = np.argpartition(a, k)[:k]
... return b[np.argsort(a[b])]
>>> k = 100
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_sort, 'rng': np.random.random, 'k': k})
8.348663672804832
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_argp, 'rng': np.random.random, 'k': k})
9.869098862167448
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_hybrid, 'rng': np.random.random, 'k': k})
1.2305558240041137
argsort is O(n log n), argpartition with range argument appears to be O(nk) (?), and argpartition + argsort is O(n + k log k)
Therefore in an interesting regime n >> k >> 1 the hybrid method is expected to be fastest
UPDATE: ND version:
import numpy as np
from timeit import timeit
def top_k_sort(A,k,axis=-1):
return A.argsort(axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_partition(A,k,axis=-1):
return A.argpartition(range(k),axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_hybrid(A,k,axis=-1):
B = A.argpartition(k,axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
return np.take_along_axis(B,np.take_along_axis(A,B,axis).argsort(axis),axis)
A = np.random.random((100,10000))
k = 100
from timeit import timeit
for f in globals().copy():
if f.startswith("top_"):
print(f, timeit(f"{f}(A,k)",globals=globals(),number=10)*100)
Sample run:
top_k_sort 63.72379460372031
top_k_partition 99.30561298970133
top_k_hybrid 10.714635509066284
Let's describe the partition method in a simplified way which helps a lot understand argpartition
Following the example in the picture if we execute C=numpy.argpartition(A, 3) C will be the resulting array of getting the position of every element in B with respect to the A array. ie:
Idx(z) = index of element z in array A
then C would be
C = [ Idx(B[0]), Idx(B[1]), Idx(B[2]), Idx(X), Idx(B[4]), ..... Idx(B[N]) ]
As previously mentioned this method is very helpful and comes very handy when you have a huge array and you are only interested in a selected group of ordered elements, not the whole array.

Preallocating ndarrays

How can I preallocate arrays of arrays so I can do appending a bit more efficiently. In Matlab there is a function called cell(required_length) which preallocates 'cells' which can store arrays.
I have an array that currently looks like:
a=np.array([[1,2],[1],[2,3,4]])
b=np.array([[20,2]])
However I wish to append 1000s of more array which are like the 'b' shown but varying in size.
This isn't just a quesiton of preallocating an array, such as np.empty((100,), dtype=int). It's as much a question about how to collect a large number of lists into one structure, whether it be a list or numpy array. The comparison with MATLAB cells is enough, in my opinion, to warrant further discussion.
I think you should be using Python lists. They can contain lists or other objects (incuding arrays) of varying sizes. You can easily append more items (or use extend to add multiple objects). Python has had them for ever; MATLAB added cells to approximate that flexibility.
np.arrays with dtype=object are similar - arrays of pointers to objects such as lists. For the most part they are just lists with an array wrapper. You can initial an array to some large size, and insert/set items.
A = np.empty((10,),dtype=object)
produces an array with 10 elements, each None.
A[0] = [1,2,3]
A[1] = [2,3]
...
You can also concatenate elements to an existing array, but the result is new one. There is a np.append function, but it is just a cover for concatenate; it should not be confused with the list append.
If it must be an array, you can easily construct it from the list at the end. That's what your np.array([[1,2],[1],[2,3,4]]) does.
How to add to numpy array entries of different size in a for loop (similar to Matlab's cell arrays)?
On the issue of speed, let's try simple time tests
def witharray(n):
result=np.empty((n,),dtype=object)
for i in range(n):
result[i]=list(range(i))
return result
def withlist(n):
result=[]
for i in range(n):
result.append(list(range(i)))
return result
which produce
In [111]: withlist(4)
Out[111]: [[], [0], [0, 1], [0, 1, 2]]
In [112]: witharray(4)
Out[112]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)
In [113]: np.array(withlist(4))
Out[113]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)
timetests
In [108]: timeit withlist(400)
1000 loops, best of 3: 1.87 ms per loop
In [109]: timeit witharray(400)
100 loops, best of 3: 2.13 ms per loop
In [110]: timeit np.array(withlist(400))
100 loops, best of 3: 8.95 ms per loop
Simply constructing a list of lists is fastest. But if the result must be an object type array, then assigning values to an empty array is faster.

How to find which numpy array contains the maximum value on an element by element basis?

Given a list of numpy arrays, each with the same dimensions, how can I find which array contains the maximum value on an element-by-element basis?
e.g.
import numpy as np
def find_index_where_max_occurs(my_list):
# d = ... something goes here ...
return d
a=np.array([1,1,3,1])
b=np.array([3,1,1,1])
c=np.array([1,3,1,1])
my_list=[a,b,c]
array_of_indices_where_max_occurs = find_index_where_max_occurs(my_list)
# This is what I want:
# >>> print array_of_indices_where_max_occurs
# array([1,2,0,0])
# i.e. for the first element, the maximum value occurs in array b which is at index 1 in my_list.
Any help would be much appreciated... thanks!
Another option if you want an array:
>>> np.array((a, b, c)).argmax(axis=0)
array([1, 2, 0, 0])
So:
def f(my_list):
return np.array(my_list).argmax(axis=0)
This works with multidimensional arrays, too.
For the fun of it, I realised that #Lev's original answer was faster than his generalized edit, so this is the generalized stacking version which is much faster than the np.asarray version, but it is not very elegant.
np.concatenate((a[None,...], b[None,...], c[None,...]), axis=0).argmax(0)
That is:
def bystack(arrs):
return np.concatenate([arr[None,...] for arr in arrs], axis=0).argmax(0)
Some explanation:
I've added a new axis to each array: arr[None,...] is equivalent to arr[np.newaxis,...] which is the same as arr[np.newaxis,:,:,:] where the ... expands to be the appropriate number dimensions. The reason for this is because np.concatenate will then stack along the new dimension, which is 0 since the None is at the front.
So, for example:
In [286]: a
Out[286]:
array([[0, 1],
[2, 3]])
In [287]: b
Out[287]:
array([[10, 11],
[12, 13]])
In [288]: np.concatenate((a[None,...],b[None,...]),axis=0)
Out[288]:
array([[[ 0, 1],
[ 2, 3]],
[[10, 11],
[12, 13]]])
In case it helps to understand, this would work too:
np.concatenate((a[...,None], b[...,None], c[...,None]), axis=a.ndim).argmax(a.ndim)
where the new axis is now added at the end, so we must stack and maximize along that last axis, which will be a.ndim. For a, b, and c being 2d, we could do this:
np.concatenate((a[:,:,None], b[:,:,None], c[:,:,None]), axis=2).argmax(2)
Which is equivalent to the dstack I mentioned in my comment above (dstack adds a third axis to stack along if it doesn't exist in the arrays).
To test:
N = 10
M = 2
a = np.random.random((N,)*M)
b = np.random.random((N,)*M)
c = np.random.random((N,)*M)
def bystack(arrs):
return np.concatenate([arr[None,...] for arr in arrs], axis=0).argmax(0)
def byarray(arrs):
return np.array(arrs).argmax(axis=0)
def byasarray(arrs):
return np.asarray(arrs).argmax(axis=0)
def bylist(arrs):
assert arrs[0].ndim == 1, "ndim must be 1"
return [np.argmax(x) for x in zip(*arrs)]
In [240]: timeit bystack((a,b,c))
100000 loops, best of 3: 18.3 us per loop
In [241]: timeit byarray((a,b,c))
10000 loops, best of 3: 89.7 us per loop
In [242]: timeit byasarray((a,b,c))
10000 loops, best of 3: 90.0 us per loop
In [259]: timeit bylist((a,b,c))
1000 loops, best of 3: 267 us per loop
[np.argmax(x) for x in zip(*my_list)]
Well, this is just a list, but you know how to make it an array if you want. :)
To explain what this does: zip(*my_list) is equivalent to zip(a,b,c), which gives you a generator to loop over. Each step in the loop gives you a tuple like (a[i], b[i], c[i]), where i is the step in the loop. Then, np.argmax gives you the index of that tuple for the element with the largest value.

numpy: column-wise dot product

Given a 2D numpy array, I need to compute the dot product of every column with itself, and store the result in a 1D array. The following works:
In [45]: A = np.array([[1,2,3,4],[5,6,7,8]])
In [46]: np.array([np.dot(A[:,i], A[:,i]) for i in xrange(A.shape[1])])
Out[46]: array([26, 40, 58, 80])
Is there a simple way to avoid the Python loop? The above is hardly the end of the world, but if there's a numpy primitive for this, I'd like to use it.
edit In practice the matrix has many rows and relatively few columns. I am therefore not overly keen on creating temporary arrays larger than O(A.shape[1]). I also can't modify A in place.
How about:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> (A*A).sum(axis=0)
array([26, 40, 58, 80])
EDIT: Hmm, okay, you don't want intermediate large objects. Maybe:
>>> from numpy.core.umath_tests import inner1d
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> inner1d(A.T, A.T)
array([26, 40, 58, 80])
which seems a little faster anyway. This should do what you want behind the scenes, as A.T is a view (which doesn't make its own copy, IIUC), and inner1d seems to loop the way it needs to.
VERY BELATED UPDATE: Another alternative would be to use np.einsum:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> np.einsum('ij,ij->j', A, A)
array([26, 40, 58, 80])
>>> timeit np.einsum('ij,ij->j', A, A)
100000 loops, best of 3: 3.65 us per loop
>>> timeit inner1d(A.T, A.T)
100000 loops, best of 3: 5.02 us per loop
>>> A = np.random.randint(0, 100, (2, 100000))
>>> timeit np.einsum('ij,ij->j', A, A)
1000 loops, best of 3: 363 us per loop
>>> timeit inner1d(A.T, A.T)
1000 loops, best of 3: 848 us per loop
>>> (np.einsum('ij,ij->j', A, A) == inner1d(A.T, A.T)).all()
True
You can compute the square of all elements and sum up column-wise using
np.sum(np.square(A),0);
(I'm not entirely sure about the second parameter of the sum function, which identifies the axis along which to take the sum, and I have no numpy currently installed. Maybe you'll have to experiment :) ...)
EDIT
Looking at DSM's post, it seems that you should use axis=0. Using the square function might be a little more performant than using A*A.
From linear algebra, the dot product of row i with row j is the i,j th entry of AA^T. Similarly, the dot product of column i with column j is the i,jth entry of (A^T)A.
So if you want the dot product of each column vector of A with itself, you could use ColDot = np.dot(np.transpose(A), A).diagonal(). On the other hand, if you want the dot product of each row with itself, you could use RowDot = np.dot(A, np.transpose(A)).diagonal().
Both lines return an array.

Categories

Resources