Select N elements from each row without looping

Select N elements from each row without looping - python

Given a numpy array items of of shape (D, N, Q) and another array of indices ids of shape (N, P), how can I make a new array my_items of shape (D, N, P), by using the indices nq_ids, like the following:
# How can these loops be avoided?
my_items = np.zeros((D, N, P))
for n in range(N):
for p in range(P):
my_items[:, n, p] = items[:, n, ids[n, p]]
with numpy magic instead of using any explicit loops? Here is a minimal example:
import numpy as np
D, N, Q, P = 2, 5, 4, 3 # Reduced problem dimensions.
items = 1.0 * np.arange(D * N * Q).reshape((D, N, Q)) # Example data
ids = np.arange(0, N * P).reshape(N, P) % Q # Example ids
# How can these loops be avoided?
my_items = np.zeros((D, N, P))
for n in range(N):
for p in range(P):
my_items[:, n, p] = items[:, n, ids[n, p]]
# print('items', items)
# print('ids', ids)
# print('my_items', my_items)
I would also like to preserve the element order if possible.

This should work now, returning the exact same ndarray as your loop:
np.stack([np.take(items[:,i,:], ids[i, :], axis=1)
for i in range(ids.shape[0])], axis=2).transpose((0,2,1))
However, #hpaulj's method is faster, by 23.5 µs vs 5 µs. So use that.

Related

Fastest way to find the maximum minimum value of three 'connected' matrices

The answer for two matrices was given in this question, but I'm not sure how to apply this logic to three pairwise connected matrices since there are no 'free' indices. I want to maximize the following function:
f(i, j, k) = min(A(i, j), B(j, k), C(i,k))
Where A, B and C are matrices and i, j and k are indices that range up to the respective dimensions of the matrices. I would like to find (i, j, k) such that f(i, j, k) is maximized. I am currently doing that as follows:
import numpy as np
import itertools
I = 100
J = 150
K = 200
A = np.random.rand(I, J)
B = np.random.rand(J, K)
C = np.random.rand(I, K)
# All the different i,j,k
combinations = itertools.product(np.arange(I), np.arange(J), np.arange(K))
combinations = np.asarray(list(combinations))
A_vals = A[combinations[:,0], combinations[:,1]]
B_vals = B[combinations[:,1], combinations[:,2]]
C_vals = C[combinations[:,0], combinations[:,2]]
f = np.min([A_vals,B_vals,C_vals],axis=0)
best_indices = combinations[np.argmax(f)]
print(best_indices)
[ 49 14 136]
This is faster than iterating over all (i, j, k), but a lot of (and most of the) time is spent constructing the _vals matrices. This is unfortunate, because they contain many many duplicate values as the same i, j and k appear multiple times. Is there a way to do this where (1) the speed of numpy's matrix computation can be preserved and (2) I don't have to construct the memory-intensive _vals matrices.
In other languages you could maybe construct the matrices so that they contain pointers to A, B and C, but I do not see how to achieve this in Python.
Edit: see a follow-up question for more indices here

We can either brute force it using numpy broadcasting or try a bit of smart branch cutting:
import numpy as np
def bf(A,B,C):
I,J = A.shape
J,K = B.shape
return np.unravel_index((np.minimum(np.minimum(A[:,:,None],C[:,None,:]),B[None,:,:])).argmax(),(I,J,K))
def cut(A,B,C):
gmx = min(A.min(),B.min(),C.min())
I,J = A.shape
J,K = B.shape
Y,X = np.unravel_index(A.argsort(axis=None)[::-1],A.shape)
for y,x in zip(Y,X):
if A[y,x] <= gmx:
return gamx
curr = np.minimum(B[x,:],C[y,:])
camx = curr.argmax()
cmx = curr[camx]
if cmx >= A[y,x]:
return y,x,camx
if gmx < cmx:
gmx = cmx
gamx = y,x,camx
return gamx
from timeit import timeit
I = 100
J = 150
K = 200
for rep in range(4):
print("trial",rep+1)
A = np.random.rand(I, J)
B = np.random.rand(J, K)
C = np.random.rand(I, K)
print("results identical",cut(A,B,C)==bf(A,B,C))
print("brute force",timeit(lambda:bf(A,B,C),number=2)*500,"ms")
print("branch cut",timeit(lambda:cut(A,B,C),number=10)*100,"ms")
It turns out that at the given sizes branch cutting is well worth it:
trial 1
results identical True
brute force 169.74265850149095 ms
branch cut 1.951422297861427 ms
trial 2
results identical True
brute force 180.37619898677804 ms
branch cut 2.1000938024371862 ms
trial 3
results identical True
brute force 181.6371419990901 ms
branch cut 1.999850495485589 ms
trial 4
results identical True
brute force 217.75578951928765 ms
branch cut 1.5871295996475965 ms
How does the branch cutting work?
We pick one array (A, say) and sort it from largest to smallest. We then go through the array one by one comparing each value to the appropriate values from the other arrays and keeping track of the running maximum of minima. As soon as the maximum is no smaller than the remaining values in A we are done. As this will typically happen rather soonish we get a huge saving.

Instead of using itertools, you can "build" the combinations with repeats and tiles:
A_=np.repeat(A.reshape((-1,1)),K,axis=0).T
B_=np.tile(B.reshape((-1,1)),(I,1)).T
C_=np.tile(C,J).reshape((-1,1)).T
And passing them to np.min:
print((t:=np.argmax(np.min([A_,B_,C_],axis=0)) , t//(K*J),(t//K)%J, t%K,))
With timeit 10 repetitions of your code takes around 18 seconds and with numpy only about 1 second.

Building upon great answer of loopy walt - you can get slight speed-up (~20%) by using numba:
import numba
#numba.jit(nopython=True)
def find_gamx(A, B, C, X, Y, gmx):
gamx = (0, 0, 0)
for y, x in zip(Y, X):
if A[y, x] <= gmx:
return gamx
curr = np.minimum(B[x, :], C[y, :])
camx = curr.argmax()
cmx = curr[camx]
if cmx >= A[y, x]:
return y, x, camx
if gmx < cmx:
gmx = cmx
gamx = y, x, camx
return gamx
def cut_numba(A, B, C):
gmx = min(A.min(), B.min(), C.min())
I, J = A.shape
J, K = B.shape
Y, X = np.unravel_index(A.argsort(axis=None)[::-1], A.shape)
gamx = find_gamx(A, B, C, X, Y, gmx)
return gamx
from timeit import timeit
I = 100
J = 150
K = 200
for rep in range(40):
print("trial", rep + 1)
A = np.random.rand(I, J)
B = np.random.rand(J, K)
C = np.random.rand(I, K)
print("results identical", cut(A, B, C) == bf(A, B, C))
print("results identical", cut_numba(A, B, C) == bf(A, B, C))
print("brute force", timeit(lambda: bf(A, B, C), number=2) * 500, "ms")
print("branch cut", timeit(lambda: cut(A, B, C), number=10) * 100, "ms")
print("branch cut_numba", timeit(lambda: cut_numba(A, B, C), number=10) * 100, "ms")
trial 1
results identical True
results identical True
brute force 38.774325 ms
branch cut 1.7196750999999955 ms
branch cut_numba 1.3950291999999864 ms
trial 2
results identical True
results identical True
brute force 38.77167049999996 ms
branch cut 1.8655760999999993 ms
branch cut_numba 1.4977325999999902 ms
trial 3
results identical True
results identical True
brute force 39.69611449999999 ms
branch cut 1.8876490000000024 ms
branch cut_numba 1.421615300000001 ms
trial 4
results identical True
results identical True
brute force 44.338816499999936 ms
branch cut 1.614051399999994 ms
branch cut_numba 1.3842962000000014 ms

Multiplication of two huge dense matrices Hadamard-multiplied by a sparse matrix

I have two dense matrices A and B, and each of them has a size fo 3e5x100. Another sparse binary matrix, C, with size 3e5x3e5. I want to find the following quantity: C ∘ (AB'), where ∘ is Hadamard product (i.e., element wise) and B' is the transpose of B. Explicitly calculating AB' will ask for crazy amount of memory (~500GB). Since the end result won't need the whole AB', it is sufficient to only calculate the multiplication A_iB_j' where C_ij != 0, where A_i is the column i of matrix A and C_ij is the element at location (i,j) of the matrix C. A suggested approach would be like the algorithm below:
result = numpy.initalize_sparse_matrix(shape = C.shape)
while True:
(i,j) = C_ij.pop_nonzero_index() #prototype function returns the nonzero index and then points to the next nonzero index
if (i,j) is empty:
break
result(i,j) = A_iB_j'
This algorithm however takes too much time. Is there anyway to improve it using LAPACK/BLAS algorithms? I am coding in Python so I think numpy can be more human friendly wrapper for LAPACK/BLAS.

You can do this computation using the following, assuming C is stored as a scipy.sparse matrix:
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
result = sparse.coo_matrix((result_data, (row, col)), shape=C.shape)
Here we show that the result matches the naive algorithm for some smaller inputs:
import numpy as np
from scipy import sparse
N = 300
M = 10
def make_C(N, nnz=1000):
data = np.random.rand(nnz)
row = np.random.randint(0, N, nnz)
col = np.random.randint(0, N, nnz)
return sparse.coo_matrix((data, (row, col)), shape=(N, N))
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
def f_naive(C, A, B):
return C.multiply(np.dot(A, B.T))
def f_efficient(C, A, B):
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
return sparse.coo_matrix((result_data, (C.row, C.col)), shape=C.shape)
np.allclose(
f_naive(C, A, B).toarray(),
f_efficient(C, A, B).toarray()
)
# True
And here we see that it works for the full input size:
N = 300000
M = 100
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
out = f_efficient(C, A, B)
print(out.shape)
# (300000, 300000)
print(out.nnz)
# 1000

SymPy: lambdify with 2-D inputs

I'm computing a square matrix V, each element of which is an integral that I compute with sympy. I compute only one definite integral V_nm, the result of which is a numerical expression with symbolic indices m and n. Say V_nm looks like this:
>>> V_nm
sin(3*n)*cos(m)
Now I wish to make a 2-D numerical (not symbolic!) matrix out of V_nm using m and n as indices of the array. Say for a 2 x 2 matrix, the result for the given V_nm would be:
[[sin(3)cos(1) sin(3)cos(2)]
[sin(6)cos(1) sin(6)cos(2)]]
i.e., n specifies the column and m specifies the rows. (Note: I start m and n at 1 and not 0, but that's no concern).
How do I achieve this?
I know I can use V_nm.subs([(n, ...), (m, ...)]) in a list comprehension followed by evalf() but that's the long route. I wish to achieve this using lambdify. I know how to use lambdify for 1-D arrays. Can you please tell me how to implement it for 2-D arrays?

There is sympy's FunctionMatrix which is intended for this kind of case. Note that it uses zero-based indexing:
In [1]: m, n, i, j = symbols('m, n, i, j')
In [2]: V_nm = FunctionMatrix(m, n, Lambda((i, j), 100*(i+1) + (j+1)))
In [3]: V_nm
Out[3]: [100⋅i + j + 101]
In [4]: V_nm.subs({m:2, n:3}).as_explicit()
Out[4]:
⎡101 102 103⎤
⎢ ⎥
⎣201 202 203⎦
In [5]: lambdify((m, n), V_nm)(2, 3)
Out[5]:
array([[101., 102., 103.],
[201., 202., 203.]])

What you're asking doesn't look like a standard functionality. But in two steps it's possible. First lambdify the expression, and then create a function that generates the intended 2D array via numpy's broadcasting:
from sympy import sin, cos, lambdify
from sympy.abc import m, n
import numpy as np
V_mn = sin(3 * n) * cos(m)
V_mn_np = lambdify((m, n), V_mn)
# using list comprehension:
# V_mn_np2D = lambda m, n: np.array([[V_mn_np(i, j) for j in range(n)] for i in range(m)])
# using numpy's broadcasting (faster for large arrays):
V_mn_np2D = lambda m, n: V_mn_np(np.arange(m)[:, None], np.arange(n))
V_mn_np2D(2, 2)
To have the numbering start at 1 instead of 0, use np.arange(1, m+1) and np.arange(1, n+1).
As a test, a function such as 100 * m + n makes it easy to verify that the approach works as intended.
W_mn = 100 * m + n
W_mn_np = lambdify((m, n), W_mn)
W_mn_np2D = lambda m, n: W_mn_np(np.arange(1, m+1)[:, None], np.arange(1, n+1))
W_mn_np2D(2, 3)
Output:
array([[101, 102, 103],
[201, 202, 203]])

How can I scale a set of 2D arrays (3D array) by a 2D array in a vectorized way using NumPy?

I have a 3D matrix containing N x N covariance matrices for M channels [M x N x N]. I also have a 2D matrix of scaling factors for each channel at a series of time points [M x T]. I want to produce a 4D matrix containing a scaled version of the relevant channel's covariance at each time point. So to be clear, [M x T] * [M x N x N] -> [M x T x N x N]
Current version using for loops:
m, t, n = 4, 10, 7
channel_timeseries = np.zeros((m, t))
covariances = np.random.rand(m, n, n)
result_array = np.zeros((m, t, n, n))
# Each channel
for i, (channel_cov, channel_timeseries) in enumerate(zip(covariances, channel_timeseries)):
# Each time point
for j, time_point in enumerate(channel_timeseries):
result_array[i, j] = time_point * channel_cov
This should lead to the result array being all zeros. Replacing the initialisation of the channel_timeseries with np.ones, we should see the covariance for each channel replicated unchanged at every step of the time series.
The case which actually matters to me is one in which every channel has a scalar value at every time point and we scale the covariance matrix for the relevant channel by the value matching the correct channel and time point.
As you can see above, I can do this with a for loop and it works completely fine, but I'm working with some huge datasets and it would be better to have a vectorised solution.
Many thanks for your time.

You can use np.einsum, as b-fg said
np.einsum('mt,mno->mtno', channel_timeseries, covariances)
or Broadcasting:
channel_timeseries[:, :, None, None] * covariances[:, None, :, :]

numpy.einsum will come handy here. I have modified your code with a random channel_timeseries array, increased the arrays size, and renamed the loop variables (otherwise you overwrite the original ones!)
import numpy as np
import time
m, t, n = 40, 100, 70
channel_timeseries = np.random.rand(m, t)
covariances = np.random.rand(m, n, n)
t0 = time.time()
result_array_1 = np.zeros((m, t, n, n))
# Each channel
for i, (c_cov, c_ts) in enumerate(zip(covariances, channel_timeseries)):
# Each time point
for j, time_point in enumerate(c_ts):
result_array_1[i, j] = time_point * c_cov
t1 = time.time()
result_array_2 = np.einsum('ij,ikl->ijkl', channel_timeseries, covariances)
t2 = time.time()
print(np.array_equal(result_array_1, result_array_2)) # True
print('Time for result_array_1: ', t1-t0) # 0.07601261138916016
print('Time for result_array_2: ', t2-t1) # 0.02957916259765625
This results in a speed increase of more than 50% with numpy.einsum in my machine.

Vectorized syntax for creating a sequence of block matrices in NumPy

I have two 3D arrays A and B with shapes (k, n, n) and (k, m, m) respectively. I would like to create a matrix C of shape (k, n+m, n+m) such that for each 0 <= i < k, the 2D matrix C[i,:,:] is the block diagonal matrix obtained by putting A[i, :, :] at the upper left n x n part and B[i, :, :] at the lower right m x m part.
Currently I am using the following to achieve this is NumPy:
C = np.empty((k, n+m, n+m))
for i in range(k):
C[i, ...] = np.block([[A[i,...], np.zeros((n,m))],
[np.zeros((m,n)), B[i,...]]])
I was wondering if there is a way to do this without the for loop. I think if k is large my solution is not very efficient.

IIUC You can simply slice and assign -
C = np.zeros((k, n+m, n+m),dtype=np.result_type(A,B))
C[:,:n,:n] = A
C[:,n:,n:] = B

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select N elements from each row without looping - python

This should work now, returning the exact same ndarray as your loop: np.stack([np.take(items[:,i,:], ids[i, :], axis=1) for i in range(ids.shape[0])], axis=2).transpose((0,2,1)) However, #hpaulj's method is faster, by 23.5 µs vs 5 µs. So use that.

Related

Fastest way to find the maximum minimum value of three 'connected' matrices

Multiplication of two huge dense matrices Hadamard-multiplied by a sparse matrix

SymPy: lambdify with 2-D inputs

How can I scale a set of 2D arrays (3D array) by a 2D array in a vectorized way using NumPy?

Vectorized syntax for creating a sequence of block matrices in NumPy

Categories

Resources