broadcasted lstsq (least squares)

broadcasted lstsq (least squares) - python

I have a bunch of 3x2 matrices, let's say 777 of them, and just as many right-hand sides of size 3. For each of them, I would like to know the least squared solution, so I'm doing
import numpy
A = numpy.random.rand(3, 2, 777)
b = numpy.random.rand(3, 777)
for k in range(777):
numpy.linalg.lstsq(A[..., k], b[..., k])
That works, but is slow. I'd much rather compute all the solutions in one go, but upon
numpy.linalg.lstsq(A, b)
I'm getting
numpy.linalg.linalg.LinAlgError: 3-dimensional array given. Array must be two-dimensional
Any hints on how to broadcast numpy.linalg.lstsq?

One can make use of the fact that if A = U \Sigma V^T is the singular value decomposition of A,
x = V \Sigma^+ U^T b
is the least-squares solution to Ax = b. SVD is broadcasted in numpy. It now only requires a bit of fiddling with einsums to get it all right:
A = numpy.random.rand(7, 3, 2)
b = numpy.random.rand(7, 3)
for k in range(7):
x, res, rank, sigma = numpy.linalg.lstsq(A[k], b[k])
print(x)
print
u, s, v = numpy.linalg.svd(A, full_matrices=False)
uTb = numpy.einsum('ijk,ij->ik', u, b)
xx = numpy.einsum('ijk, ij->ik', v, uTb / s)
print(xx)

Related

Dimension of tensordot between 2 3D tensors

I have a rather quick question on tensordot operation. I'm trying to figure out if there is a way to perform a tensordot product between two tensors to get the right output of shape that I want. One of the tensors is B X L X D dimensions and the other one is B X 1 X D dimensions and I'm trying to figure out if it's possible to end up with B X D matrix at the end.
Currently I'm looping through the B dimension and performing a matrix multiplication between 1 X D and D X L (transposing L X D) matrices and stacking them to end up with B X L matrix at the end. This is obviously not the fastest way possible as a loop can be expensive. Would it be possible to get the desired output of B X D shape by performing a quick tensordot? I cannot seem to figure out a way to get rid of 1 of the B's.
Any insight or direction would be very much appreciated.

One option
Is to use torch.bmm() which does exactly that (docs).
It takes tensors of shape (b, n, m) and (b, m, p) and returns the batch matrix multiplication of shape (b, n, p).
(I assume you ment a result of B X L since the matrix multiplication of 1 X D and D X L is of shape 1 X L and not 1 X D).
In your case:
import torch
B, L, D = 32, 10, 512
a = torch.randn(B, 1, D) #shape (B X 1 X D)
b = torch.randn(B, L, D) #shape (B X L X D)
b = b.transpose(1,2) #shape (B X D X L)
result = torch.bmm(a, b)
result = result.squeeze()
print(result.shape)
>>> torch.Size([32, 10])
Alternatively
You can use torch.einsum(), which is more compact but less readable in my opinion:
import torch
B, L, D = 32, 10, 512
a = torch.randn(B, 1, D)
b = torch.randn(B, L, D)
result = torch.einsum('abc, adc->ad', a, b)
print(result.shape)
>>> torch.Size([32, 10])
The squeeze at the end is in order to make your result of shape (32, 10) instead of shape (32, 1, 10).

I believe torch.einsum to be the most intuitive way to perform tensor summations:
>>> torch.einsum('bld,bed->bd', x, y)
Which will have a shape of (B, D).
Formulated explicitly, the operation performed here is equivalent to:
res = torch.zeros(B, D)
for b in range(B):
for l in range(L):
for d in range(D):
res += x[b,l,d]*y[b,0,d]
Actually the second axis on y is also looped over, but the range is just [0], since y's 2nd dimension is a singleton.

python - matrix multiplication of 4 dimensional array

I'm plotting a color map using a mesh grid for the map calculation. I have an X, Y gird of say 1000 by 1000 points, and some function H = function(a, b, c, X, Y). The size of H is [2, 3, 1000, 1000], i.e. for each grid point the size of H is [2, 3]. With mesh grid this is easy and efficient.
Now I need to find D = np.matmul(np.transpose(H), H). Unfortunately, I do that with 2 for loops scanning the entire grid, see code below. Can someone suggest a more elegant and efficient way to find D?
for j in range(x_mesh_length):
for k in range(y_mesh_length):
D[j, k] = np.matmul(H[:, :, j, k].T,H[:, :, j, k])

Use numpy einsum
D = np.einsum('ikml, kjml ->ijml', np.transpose(H, (1,0,2,3)), H)

Vectorized syntax for creating a sequence of block matrices in NumPy

I have two 3D arrays A and B with shapes (k, n, n) and (k, m, m) respectively. I would like to create a matrix C of shape (k, n+m, n+m) such that for each 0 <= i < k, the 2D matrix C[i,:,:] is the block diagonal matrix obtained by putting A[i, :, :] at the upper left n x n part and B[i, :, :] at the lower right m x m part.
Currently I am using the following to achieve this is NumPy:
C = np.empty((k, n+m, n+m))
for i in range(k):
C[i, ...] = np.block([[A[i,...], np.zeros((n,m))],
[np.zeros((m,n)), B[i,...]]])
I was wondering if there is a way to do this without the for loop. I think if k is large my solution is not very efficient.

IIUC You can simply slice and assign -
C = np.zeros((k, n+m, n+m),dtype=np.result_type(A,B))
C[:,:n,:n] = A
C[:,n:,n:] = B

Avoiding double for-loops in NumPy array operations

Suppose I have two 2D NumPy arrays A and B, I would like to compute the matrix C whose entries are C[i, j] = f(A[i, :], B[:, j]), where f is some function that takes two 1D arrays and returns a number.
For instance, if def f(x, y): return np.sum(x * y) then I would simply have C = np.dot(A, B). However, for a general function f, are there NumPy/SciPy utilities I could exploit that are more efficient than doing a double for-loop?
For example, take def f(x, y): return np.sum(x != y) / len(x), where x and y are not simply 0/1-bit vectors.

Here is a reasonably general approach using broadcasting.
First, reshape your two matrices to be rank-four tensors.
A = A.reshape(A.shape + (1, 1))
B = B.reshape((1, 1) + B.shape)
Second, apply your function element by element without performing any reduction.
C = f(A, B) # e.g. A != B
Having reshaped your matrices allows numpy to broadcast. The resulting tensor C has shape A.shape + B.shape.
Third, apply any desired reduction by, for example, summing over the indices you want to discard:
C = C.sum(axis=(1, 3)) / C.shape[0]

Function application and reduction on large arrays

I have two numpy arrays, X and Y whose shapes are X.shape == (m,d) and Y.shape == (n,d), where m, n, and d are non-trivial sizes. I need to make a third array Z whose shape is Z.shape == (m,n).
An element Z[i,j] is the result of taking f(X[i,k],Y[j,k]) for k in range(d) and then summing over all k, for some non-linear f.
The obvious way to do this is to do this:
Z = numpy.zeros((m,n), dtype = numpy.float64)
for i in range(m):
for j in range(n):
Z[i,j] += (f(X[i,:],Y[j,:])).sum() # I can compose f from ufuncs
but what I'm really asking is whether there's some kind of clever broadcasting trick that I can use to compute Z that will:
take advantage of numpy's optimizations if possible
do this without putting an array of shape (n,m,d) in memory (n*m doubles will fit in memory, but n*m*d doubles won't)
Does anyone know of a way to do this? Thanks in advance.

Here is the solution you don't want, I've included it because I believe this is the "canonical" solution to your problem.
# A simple function of x, y
def f(x, y):
return 2*x + 3*y**2
x = x.reshape((m, 1, d))
y = y.reshape((1, n, d))
temp = f(x, y)
Z = temp.sum(2)
If you want to avoid creating the temporary array temp, which is quite large, you could try looping over the d dimension. In some cases the overhead of the following loop will be quite small and you'll get almost the same performance, with much less memory usage.
Z = np.zeros((m, n))
for i in range(d):
Z += f(x[:, :, i], y[:, :, i])
Let me know if that helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

broadcasted lstsq (least squares) - python

Related

Dimension of tensordot between 2 3D tensors

python - matrix multiplication of 4 dimensional array

Vectorized syntax for creating a sequence of block matrices in NumPy

Avoiding double for-loops in NumPy array operations

Function application and reduction on large arrays

Categories

Resources