How can I sum across rows that have equal values in the first column of a numpy array? For example:
In: np.array([[1,2,3],
[1,4,6],
[2,3,5],
[2,6,2],
[3,4,8]])
Out: [[1,6,9], [2,9,7], [3,4,8]]
Any help would be greatly appreciated.
Pandas has a very very powerful groupby function which makes this very simple.
import pandas as pd
n = np.array([[1,2,3],
[1,4,6],
[2,3,5],
[2,6,2],
[3,4,8]])
df = pd.DataFrame(n, columns = ["First Col", "Second Col", "Third Col"])
df.groupby("First Col").sum()
Approach #1
Here's something in a numpythonic vectorized way based on np.bincount -
# Initial setup
N = A.shape[1]-1
unqA1, id = np.unique(A[:, 0], return_inverse=True)
# Create subscripts and accumulate with bincount for tagged summations
subs = np.arange(N)*(id.max()+1) + id[:,None]
sums = np.bincount( subs.ravel(), weights=A[:,1:].ravel() )
# Append the unique elements from first column to get final output
out = np.append(unqA1[:,None],sums.reshape(N,-1).T,1)
Sample input, output -
In [66]: A
Out[66]:
array([[1, 2, 3],
[1, 4, 6],
[2, 3, 5],
[2, 6, 2],
[7, 2, 1],
[2, 0, 3]])
In [67]: out
Out[67]:
array([[ 1., 6., 9.],
[ 2., 9., 10.],
[ 7., 2., 1.]])
Approach #2
Here's another based on np.cumsum and np.diff -
# Sort A based on first column
sA = A[np.argsort(A[:,0]),:]
# Row mask of where each group ends
row_mask = np.append(np.diff(sA[:,0],axis=0)!=0,[True])
# Get cummulative summations and then DIFF to get summations for each group
cumsum_grps = sA.cumsum(0)[row_mask,1:]
sum_grps = np.diff(cumsum_grps,axis=0)
# Concatenate the first unique row with its counts
counts = np.concatenate((cumsum_grps[0,:][None],sum_grps),axis=0)
# Concatenate the first column of the input array for final output
out = np.concatenate((sA[row_mask,0][:,None],counts),axis=1)
Benchmarking
Here's some runtime tests for the numpy based approaches presented so far for the question -
In [319]: A = np.random.randint(0,1000,(100000,10))
In [320]: %timeit cumsum_diff(A)
100 loops, best of 3: 12.1 ms per loop
In [321]: %timeit bincount(A)
10 loops, best of 3: 21.4 ms per loop
In [322]: %timeit add_at(A)
10 loops, best of 3: 60.4 ms per loop
In [323]: A = np.random.randint(0,1000,(100000,20))
In [324]: %timeit cumsum_diff(A)
10 loops, best of 3: 32.1 ms per loop
In [325]: %timeit bincount(A)
10 loops, best of 3: 32.3 ms per loop
In [326]: %timeit add_at(A)
10 loops, best of 3: 113 ms per loop
Seems like Approach #2: cumsum + diff is performing quite well.
Try using pandas. Group by the first column and then sum rowwise. Something like
df.groupby(df.ix[:,1]).sum()
With a little help from your friends np.unique and np.add.at:
>>> unq, unq_inv = np.unique(A[:, 0], return_inverse=True)
>>> out = np.zeros((len(unq), A.shape[1]), dtype=A.dtype)
>>> out[:, 0] = unq
>>> np.add.at(out[:, 1:], unq_inv, A[:, 1:])
>>> out # A was the OP's array
array([[1, 6, 9],
[2, 9, 7],
[3, 4, 8]])
Related
Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]
For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop
Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample
Say I have two arrays, A and B.
An element wise multiplication is defined as follows:
I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1.
This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)
We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.
Hence, we will have one vectorized solution like so -
In [70]: from skimage.util.shape import view_as_windows
In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -
In [53]: import numexpr as ne
In [54]: w = view_as_windows(A1,A.shape)[0]
In [55]: ne.evaluate('w*B')
Out[55]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
Timings on large arrays comparing the proposed two methods -
In [56]: A = np.random.rand(500,500)
...: B = np.random.rand(500,500)
In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
...: w = view_as_windows(A1,A.shape)[0]
In [58]: %timeit w*B
...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop
Squeezing out the best off strided-based method
If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -
def vaw_with_as_strided(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
s0,s1 = A1.strides
S = (A.shape[1],)+A.shape
w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
return w*B
Comparing against #Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -
In [33]: n = 18
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [34]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop
In [35]: n = 19
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [36]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop
So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.
Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between #Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.
As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows
B = BenchmarkBuilder()
#B.add_function()
def vaw(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
w = view_as_windows(A1,A.shape)[0]
return w*B
#B.add_function()
def pp(A,B):
m,n = A.shape
aux = np.empty((n,m,2*n),A.dtype)
AA = np.concatenate([A,A],1)
aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
return aux[...,:n]*B
#B.add_arguments('array size')
def argument_provider():
for exp in range(4, 16):
dim_size = int(1.4**exp)
a = np.random.rand(dim_size,dim_size)
b = np.random.rand(dim_size,dim_size)
yield dim_size, MultiArgument([a,b])
r = B.run()
r.plot()
import pylab
pylab.savefig('vaw.png')
Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.
refer to the accepted answer in this reference.
Hope this helps.
I can actually pad the array from its two sides with 2 columns (to get 2x5 array)
and run a conv2 with 'b' as a kernel, I think it's more efficient
I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:
X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
I know somehow I should be using numpy.einsum, but I have not been able to figure out how?
Many thanks!
Sure there is an np.einsum way, like so -
np.einsum('ij,ij->i',X.dot(M),X)
This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.
Runtime test -
This section compares all the approaches posted thus far to solve the problem.
In [132]: # Setup input arrays
...: X = np.random.randn(10000, 100)
...: M = np.random.rand(100, 100)
...:
...: def original_app(X,M):
...: out = np.zeros(10000)
...: for n in range(10000):
...: out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
...: return out
...:
In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True
In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop
In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# #Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop
In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # #hpaulj's solution
1 loops, best of 3: 442 ms per loop
In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop
Here is a simpler example:
M = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
X = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:
In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692
In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))
In [212]: out=np.zeros(4)
In [213]: for n in range(4):
out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
.....:
In [214]: out
Out[214]: array([ 9., 144., 441., 900.])
One einsum approach:
In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([ 9., 144., 441., 900.])
Comparing the other einsum:
In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop
In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop
This is a bit faster, but results may diff with your larger size.
einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).
Similar use of einsum - Combine Einsum Expresions
I want to randomly select rows from a numpy array. Say I have this array-
A = [[1, 3, 0],
[3, 2, 0],
[0, 2, 1],
[1, 1, 4],
[3, 2, 2],
[0, 1, 0],
[1, 3, 1],
[0, 4, 1],
[2, 4, 2],
[3, 3, 1]]
To randomly select say 6 rows, I am doing this:
B = A[np.random.choice(A.shape[0], size=6, replace=False), :]
I want another array C which has the rows which were not selected in B.
Is there some in-built method to do this or do I need to do a brute-force, checking rows of B with rows of A?
You can make any number of row-wise random partitions of A by slicing a shuffled sequence of row indices:
ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]
If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:
B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]
(Note that the solution provided by #MaxNoe also preserves row order.)
Solution
This gives you the indices for the selection:
sel = np.random.choice(A.shape[0], size=6, replace=False)
and this B:
B = A[sel]
Get all not selected indices:
unsel = list(set(range(A.shape[0])) - set(sel))
and use them for C:
C = A[unsel]
Variation with NumPy functions
Instead of using set and list, you can use this:
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
For the example array the pure Python version:
%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))
100000 loops, best of 3: 8.42 µs per loop
is faster than the NumPy version:
%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
10000 loops, best of 3: 77.5 µs per loop
For larger A the NumPy version is faster:
A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)
%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))
1000 loops, best of 3: 1.4 ms per loop
%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
1000 loops, best of 3: 315 µs per loop
You can use boolean masks and draw random indices from an integer array which is as long as yours. The ~ is an elementwise not:
idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)
selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True
B = A[mask]
C = A[~mask]
I want to broadcast an array b to the shape it would take if it were in an arithmetic operation with another array a.
For example, if a.shape = (3,3) and b was a scalar, I want to get an array whose shape is (3,3) and is filled with the scalar.
One way to do this is like this:
>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = 1 + a*0
>>> b
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
Although this works practically, I can't help but feel it looks a bit weird, and wouldn't be obvious to someone else looking at the code what I was trying to do.
Is there any more elegant way to do this? I've looked at the documentation for np.broadcast, but it's orders of magnitude slower.
In [1]: a = np.arange(10000).reshape((100,100))
In [2]: %timeit 1 + a*0
10000 loops, best of 3: 31.9 us per loop
In [3]: %timeit bc = np.broadcast(a,1);np.fromiter((v for u, v in bc),float).reshape(bc.shape)
100 loops, best of 3: 5.2 ms per loop
In [4]: 5.2e-3/32e-6
Out[4]: 162.5
If you just want to fill an array with a scalar, fill is probably the best choice. But it sounds like you want something more generalized. Rather than using broadcast you can use broadcast_arrays to get the result that (I think) you want.
>>> a = numpy.arange(9).reshape(3, 3)
>>> numpy.broadcast_arrays(a, 1)[1]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
This generalizes to any two broadcastable shapes:
>>> numpy.broadcast_arrays(a, [1, 2, 3])[1]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's not quite as fast as your ufunc-based method, but it's still on the same order of magnitude:
>>> %timeit 1 + a * 0
10000 loops, best of 3: 23.2 us per loop
>>> %timeit numpy.broadcast_arrays(a, 1)[1]
10000 loops, best of 3: 52.3 us per loop
But scalars, fill is still the clear front-runner:
>>> %timeit b = numpy.empty_like(a, dtype='i8'); b.fill(1)
100000 loops, best of 3: 6.59 us per loop
Finally, further testing shows that the fastest approach -- in at least some cases -- is to multiply by ones:
>>> %timeit numpy.broadcast_arrays(a, numpy.arange(100))[1]
10000 loops, best of 3: 53.4 us per loop
>>> %timeit (1 + a * 0) * numpy.arange(100)
10000 loops, best of 3: 45.9 us per loop
>>> %timeit b = numpy.ones_like(a, dtype='i8'); b * numpy.arange(100)
10000 loops, best of 3: 28.9 us per loop
The fastest and cleanest solution I know is:
b_arr = numpy.empty(a.shape) # Empty array
b_arr.fill(b) # Filling with one value
fill sounds like the simplest way:
>>> a = np.arange(9).reshape((3,3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.fill(10)
>>> a
array([[10, 10, 10],
[10, 10, 10],
[10, 10, 10]])
EDIT: As #EOL points out, you don't need arange if you want to create a new array, np.empty((100,100)) (or whatever shape) is better for this.
Timings:
In [3]: a = np.arange(10000).reshape((100,100))
In [4]: %timeit 1 + a*0
100000 loops, best of 3: 19.9 us per loop
In [5]: a = np.arange(10000).reshape((100,100))
In [6]: %timeit a.fill(1)
100000 loops, best of 3: 3.73 us per loop
If you just need to broadcast a scalar to some arbitrary shape, you can do something like this:
a = b*np.ones(shape=(3,3))
Edit: np.tile is more general. You can use it to duplicate any scalar/vector in any number of dimensions:
b = 1
N = 100
a = np.tile(b, reps=(N, N))