fast way of computing diagonals of XMX^T in python

fast way of computing diagonals of XMX^T in python - python

I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:
X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
I know somehow I should be using numpy.einsum, but I have not been able to figure out how?
Many thanks!

Sure there is an np.einsum way, like so -
np.einsum('ij,ij->i',X.dot(M),X)
This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.
Runtime test -
This section compares all the approaches posted thus far to solve the problem.
In [132]: # Setup input arrays
...: X = np.random.randn(10000, 100)
...: M = np.random.rand(100, 100)
...:
...: def original_app(X,M):
...: out = np.zeros(10000)
...: for n in range(10000):
...: out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
...: return out
...:
In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True
In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop
In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# #Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop
In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # #hpaulj's solution
1 loops, best of 3: 442 ms per loop
In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop

Here is a simpler example:
M = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
X = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:
In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692

In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))
In [212]: out=np.zeros(4)
In [213]: for n in range(4):
out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
.....:
In [214]: out
Out[214]: array([ 9., 144., 441., 900.])
One einsum approach:
In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([ 9., 144., 441., 900.])
Comparing the other einsum:
In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop
In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop
This is a bit faster, but results may diff with your larger size.
einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).
Similar use of einsum - Combine Einsum Expresions

Related

find where values in one numpy array fall between values in another numpy array

Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]

For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop

Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample

Is there a quicker way to do this in Numpy?

I want to generate a 3D matrix in numpy. The code is:
mean_value = np.array([1, 2, 3], dtype=np.float32)
h, w = 5, 5
b = np.ones((h, w, 1), dtype=np.float32) * np.reshape(mean_value, [1, 1, 3])
print(b.shape) # (5, 5, 3)
Is there any quicker way for generating b? Thanks.

For efficiency (memory, performance), simply broadcast with np.broadcast_to for a view output -
np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
Being a view, it has no memory overhead and hence, virtually free on runtime.
Let's verify the performance part -
In [45]: mean_value = np.array([1, 2, 3], dtype=np.float32)
...: h, w = 5, 5
In [46]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.21 µs per loop
In [47]: mean_value = np.random.rand(10000)
...: h, w = 5000, 5000
In [48]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.22 µs per loop
And memory part (being a view) -
In [49]: np.shares_memory(mean_value,np.broadcast_to(mean_value,(h,w,)+mean_value.shape))
Out[49]: True

Processing upper triangular elements only with NumPy einsum

I'm using numpy einsum to calculate the dot products of an array of column vectors pts, of shape (3,N), with itself, resulting on a matrix dotps, of shape (N,N), with all the dot products. This is the code I use:
dotps = np.einsum('ij,ik->jk', pts, pts)
This works, but I only need the values above the main diagonal. ie. the upper triangular part of the result without the diagonal. Is it possible to compute only these values with einsum? or in any other way that is faster than using einsum to compute the whole matrix?
My pts array can be quite large so if I could calculate only the values I need that would double my computation speed.

You can slice relevant columns and then use np.einsum -
R,C = np.triu_indices(N,1)
out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
Sample run -
In [109]: N = 5
...: pts = np.random.rand(3,N)
...: dotps = np.einsum('ij,ik->jk', pts, pts)
...:
In [110]: dotps
Out[110]:
array([[ 0.26529103, 0.30626052, 0.18373867, 0.13602931, 0.51162729],
[ 0.30626052, 0.56132272, 0.5938057 , 0.28750708, 0.9876753 ],
[ 0.18373867, 0.5938057 , 0.84699103, 0.35788749, 1.04483158],
[ 0.13602931, 0.28750708, 0.35788749, 0.18274288, 0.4612556 ],
[ 0.51162729, 0.9876753 , 1.04483158, 0.4612556 , 1.82723949]])
In [111]: R,C = np.triu_indices(N,1)
...: out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
...:
In [112]: out
Out[112]:
array([ 0.30626052, 0.18373867, 0.13602931, 0.51162729, 0.5938057 ,
0.28750708, 0.9876753 , 0.35788749, 1.04483158, 0.4612556 ])
Optimizing further -
Let's time our approach and see if there's any scope for improvement performance-wise.
In [126]: N = 5000
In [127]: pts = np.random.rand(3,N)
In [128]: %timeit np.triu_indices(N,1)
1 loops, best of 3: 413 ms per loop
In [129]: R,C = np.triu_indices(N,1)
In [130]: %timeit np.einsum('ij,ij->j',pts[:,R],pts[:,C])
1 loops, best of 3: 1.47 s per loop
Staying within the memory constraints, it doesn't look like we can do much about optimizing np.einsum. So, let's shift the focus to np.triu_indices.
For N = 4, we have :
In [131]: N = 4
In [132]: np.triu_indices(N,1)
Out[132]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
It seems to be creating a regular pattern, sort of like a shifting one though. This could be written with a cumulative sum that has shifts at those 3 and 5 positions. Thinking generically, we would end up coding it something like this -
def triu_indices_cumsum(N):
# Length of R and C index arrays
L = (N*(N-1))/2
# Positions along the R and C arrays that indicate
# shifting to the next row of the full array
shifts_idx = np.arange(2,N)[::-1].cumsum()
# Initialize "shift" arrays for finally leading to R and C
shifts1_arr = np.zeros(L,dtype=int)
shifts2_arr = np.ones(L,dtype=int)
# At shift positions along the shifts array set appropriate values,
# such that when cumulative summed would lead to desired R and C arrays.
shifts1_arr[shifts_idx] = 1
shifts2_arr[shifts_idx] = -np.arange(N-2)[::-1]
# Finall cumsum to give R, C
R_arr = shifts1_arr.cumsum()
C_arr = shifts2_arr.cumsum()
return R_arr, C_arr
Let's time it for various N's!
In [133]: N = 100
In [134]: %timeit np.triu_indices(N,1)
10000 loops, best of 3: 122 µs per loop
In [135]: %timeit triu_indices_cumsum(N)
10000 loops, best of 3: 61.7 µs per loop
In [136]: N = 1000
In [137]: %timeit np.triu_indices(N,1)
100 loops, best of 3: 17 ms per loop
In [138]: %timeit triu_indices_cumsum(N)
100 loops, best of 3: 16.3 ms per loop
Thus, it looks like for decent N's, the customized cumsum based triu_indices might be worth a look!

Number of elements of array less than each element of cutoff array in python

I've got a numpy array of strictly increasing "cutoff" values of length m, and a pandas series of values (thought the index isn't important and this could be cast to a numpy array) of values of length n.
I need to come up with an efficient way of spitting out a length m vector of counts of the number of elements in the pandas series less than the jth element of the "cutoff" array.
I could do this via a list iterator:
output = array([(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar])
but I was wondering if there were any way to do this that leveraged more of numpy's magic speed, as I have to do this quite a few times inside multiple loops and it keeps crasshing my computer.
Thanks!

Is this what you are looking for?
In [36]: a = np.random.random(20)
In [37]: a
Out[37]:
array([ 0.68574307, 0.15743428, 0.68006876, 0.63572484, 0.26279663,
0.14346269, 0.56267286, 0.47250091, 0.91168387, 0.98915746,
0.22174062, 0.11930722, 0.30848231, 0.1550406 , 0.60717858,
0.23805205, 0.57718675, 0.78075297, 0.17083826, 0.87301963])
In [38]: b = np.array((0.3,0.7))
In [39]: np.sum(a[:,None]<b[None,:], axis=0)
Out[39]: array([ 8, 16])
In [40]: np.sum(a[:,None]<b, axis=0) # b's new axis above is unnecessary...
Out[40]: array([ 8, 16])
In [41]: (a[:,None]<b).sum(axis=0) # even simpler
Out[41]: array([ 8, 16])
Timings are always well received (for a longish, 2E6 elements array)
In [47]: a = np.random.random(2000000)
In [48]: %timeit (a[:,None]<b).sum(axis=0)
10 loops, best of 3: 78.2 ms per loop
In [49]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
1 loop, best of 3: 448 ms per loop
For a smaller array
In [50]: a = np.random.random(2000)
In [51]: %timeit (a[:,None]<b).sum(axis=0)
10000 loops, best of 3: 89 µs per loop
In [52]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 141 µs per loop
Edit
Divakar says that things may be different for lenghty bs, let's see
In [71]: a = np.random.random(2000)
In [72]: b =np.random.random(200)
In [73]: %timeit (a[:,None]<b).sum(axis=0)
1000 loops, best of 3: 1.44 ms per loop
In [74]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
10000 loops, best of 3: 172 µs per loop
quite different indeed! Thank you for prompting my curiosity.
Probably the OP should test for his use case, very long sample with respect to cutoff sequences or not? and where there is a balance?
Edit #2
I made a blooper in my timings, I forgot the axis=0 argument to .sum()...
I've edited the timings with the corrected statement and, of course, the corrected timing. My apologies.

You can use np.searchsorted for some NumPy magic -
# Convert to numpy array for some "magic"
pan_series_arr = np.array(pan_series)
# Let the magic begin!
sortidx = pan_series_arr.argsort()
out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
Explanation
You are performing [(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar] i.e. for each
element in cutoff_ar, we are counting the number of pan_series elements that are lesser than it. Now with np.searchsorted, we are looking for cutoff_ar to be put in a sorted pan_series_arr and get the indices of such positions compared to whom the current element in cutoff_ar is at 'right' position . These indices essentially represent the number of pan_series elements below the current cutoff_ar element, thus giving us our desired output.
Sample run
In [302]: cutoff_ar
Out[302]: array([ 1, 3, 9, 44, 63, 90])
In [303]: pan_series_arr
Out[303]: array([ 2, 8, 69, 55, 97])
In [304]: [(pan_series_arr < cutoff_val).sum() for cutoff_val in cutoff_ar]
Out[304]: [0, 1, 2, 2, 3, 4]
In [305]: sortidx = pan_series_arr.argsort()
...: out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
...:
In [306]: out
Out[306]: array([0, 1, 2, 2, 3, 4])

Sum rows where value equal in column

How can I sum across rows that have equal values in the first column of a numpy array? For example:
In: np.array([[1,2,3],
[1,4,6],
[2,3,5],
[2,6,2],
[3,4,8]])
Out: [[1,6,9], [2,9,7], [3,4,8]]
Any help would be greatly appreciated.

Pandas has a very very powerful groupby function which makes this very simple.
import pandas as pd
n = np.array([[1,2,3],
[1,4,6],
[2,3,5],
[2,6,2],
[3,4,8]])
df = pd.DataFrame(n, columns = ["First Col", "Second Col", "Third Col"])
df.groupby("First Col").sum()

Approach #1
Here's something in a numpythonic vectorized way based on np.bincount -
# Initial setup
N = A.shape[1]-1
unqA1, id = np.unique(A[:, 0], return_inverse=True)
# Create subscripts and accumulate with bincount for tagged summations
subs = np.arange(N)*(id.max()+1) + id[:,None]
sums = np.bincount( subs.ravel(), weights=A[:,1:].ravel() )
# Append the unique elements from first column to get final output
out = np.append(unqA1[:,None],sums.reshape(N,-1).T,1)
Sample input, output -
In [66]: A
Out[66]:
array([[1, 2, 3],
[1, 4, 6],
[2, 3, 5],
[2, 6, 2],
[7, 2, 1],
[2, 0, 3]])
In [67]: out
Out[67]:
array([[ 1., 6., 9.],
[ 2., 9., 10.],
[ 7., 2., 1.]])
Approach #2
Here's another based on np.cumsum and np.diff -
# Sort A based on first column
sA = A[np.argsort(A[:,0]),:]
# Row mask of where each group ends
row_mask = np.append(np.diff(sA[:,0],axis=0)!=0,[True])
# Get cummulative summations and then DIFF to get summations for each group
cumsum_grps = sA.cumsum(0)[row_mask,1:]
sum_grps = np.diff(cumsum_grps,axis=0)
# Concatenate the first unique row with its counts
counts = np.concatenate((cumsum_grps[0,:][None],sum_grps),axis=0)
# Concatenate the first column of the input array for final output
out = np.concatenate((sA[row_mask,0][:,None],counts),axis=1)
Benchmarking
Here's some runtime tests for the numpy based approaches presented so far for the question -
In [319]: A = np.random.randint(0,1000,(100000,10))
In [320]: %timeit cumsum_diff(A)
100 loops, best of 3: 12.1 ms per loop
In [321]: %timeit bincount(A)
10 loops, best of 3: 21.4 ms per loop
In [322]: %timeit add_at(A)
10 loops, best of 3: 60.4 ms per loop
In [323]: A = np.random.randint(0,1000,(100000,20))
In [324]: %timeit cumsum_diff(A)
10 loops, best of 3: 32.1 ms per loop
In [325]: %timeit bincount(A)
10 loops, best of 3: 32.3 ms per loop
In [326]: %timeit add_at(A)
10 loops, best of 3: 113 ms per loop
Seems like Approach #2: cumsum + diff is performing quite well.

Try using pandas. Group by the first column and then sum rowwise. Something like
df.groupby(df.ix[:,1]).sum()

With a little help from your friends np.unique and np.add.at:
>>> unq, unq_inv = np.unique(A[:, 0], return_inverse=True)
>>> out = np.zeros((len(unq), A.shape[1]), dtype=A.dtype)
>>> out[:, 0] = unq
>>> np.add.at(out[:, 1:], unq_inv, A[:, 1:])
>>> out # A was the OP's array
array([[1, 6, 9],
[2, 9, 7],
[3, 4, 8]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

fast way of computing diagonals of XMX^T in python - python

Related

find where values in one numpy array fall between values in another numpy array

Is there a quicker way to do this in Numpy?

Processing upper triangular elements only with NumPy einsum

Number of elements of array less than each element of cutoff array in python

Sum rows where value equal in column

Categories

Resources