Computing derivatives using numpy - python

I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?

I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.


How can I improve performance in my forward substitution method for lower triangle matrices?

I tried implementing the forward substitution method, a solving process to solve the problem Lx = b with L being a lower triangle matrix and x,b as vectors.
This was an easy task:
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = 0;
for k in range(0,i):
index = L[i,k]
preSolution = x[k]
comp = comp + index * preSolution
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Now I compared my calculation times for different sized matrices several times with linalg.solve from the scipy module and it turns out that it is much faster. This makes sense in some points, since SciPy is written in C and C++, but I still expected similar or better calculation times for matrices up to 10x10 dimension. Beginning with 6x6 matrices, linalg.solves becomes slightly faster on average.
Is there a way to improve my rather simple solution?
You could try solve_triangular
If you want to accelerate your code, what you could do is to vectorize the inner loop.
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = np.sum(L[i,:i] * x[:i])
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Edit: How to use it
You have to pass as first argument a square lower triangular matrix and as second argument you can pass a 1D array
N = 20
A = np.tril(np.random.randn(N, N))
b = np.random.randn(N)
assert np.allclose(np.linalg.solve(A, b), tri_solve(A, b))
Of course this is a naive implementation and is not stable, you can't use it to solve very large or ill conditioned systems.

Vectorizing three nested loops with Numpy

I have a complex matrix C with dimensions (r, r) as well as a complex vector of size r. I need to compute a new matrix from C and v following this equation:
where K is also a square matrix of dimensions (r, r). Here is the code to compute K with three loops:
import numpy as np
import matplotlib.pyplot as plt
r = 9
# Create random matrix
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
# Original loops
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
for i in range(r):
K[m,n] += np.imag( C[i,m] * np.conj(C[i,n]) * np.sign(np.imag(v[i])) )
Removing the loop with i is relatively easy:
# First optimization
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
K[m,n] = np.imag(np.sum(C[:,m] * np.conj(C[:,n]) * np.sign(np.imag(v)) ))
but I am not sure how to proceed to vectorize the two remaining loops. Is it actually possible in this case?
I had a lot of these of problems and here is how I usually proceeded to find solutions to writing out vectorized code.
Here is what I have noticed about your summation. Cool conclusion is that you probably do not need vectorization at all, as you can express your whole calculation as a single product of 2D matrics. Here comes...
Lets first define following matrix (sorry for lack of Latex notation, Stackoverflow does not support Mathjax) :
A_{i,j} = c_{i,j}.
B_{i,j} = c_{i,j} * sgn(Im(v_i))
Then you can write your summation as:
k_{m,n} = Im( \sum_{i=1}^{r} c_{i,m} * sgn(Im(v_i)) * c_{i,n}^* ) = Im ( \sum_{i=1}^{r} B_{i,m} * A_{i,n}^* ) = Im( \sum_{i=1}^{r} B_{m,i}^T * A_{i,n}^* )
The expression above inside of Im(.) is the by definition of matrix multiplication equivalent to following :
k_{m,n} = Im( (B^T * A^*)_{m,n} )
Which means that your matrix k can be expressed as product of transpose of matrix B and product of matrix A. In your code the matrix matrix A is assigned already to variable C. So the vectorization could be done as follows:
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
k = np.imag( (C * np.sign(np.imag(v)).T # np.conj(C) )
And you have avoided both nasty loops and convoluted expressions
This looks like matrix multiplication:
out = np.imag((C*np.sign(np.imag(v))[:,None]).T # np.conj(C))
Or you can use np.einsum:
out = np.imag(np.einsum('im,in,i', C, np.conj(C), np.sign(np.imag(v))))
Verification with your approach:
np.all(np.abs(out-K) < 1e-6)
# True
I found something that can work for now. However, one loop remains and since the resulting matrix is symetric, there is still some optimization to be made.
Instead of removing the i loop, I removed the two other ones:
K = np.zeros((r, r), dtype=np.complex128)
for i in range(r):
K += adjointMatrix(C) # (np.sign(np.imag(v)) * C)
K = np.imag(K)
def adjointMatrix(X):
return np.conjugate( np.transpose(X) )

Array Assignment with Autograd ( not again :( )

I have a fairly non trivial function I want to differentiate with autograd but I'm not quite enough of a numpy wizard to figure our how to do it without array assingment.
I also apologize that I had to make this example incredibly contrived and meaningless to be able to run standalone. The actual code I'm working with is for non linear finite elements and is trying to compute the jacobian for a complex non linear system.
import autograd.numpy as anp
from autograd import jacobian
def alpha(x):
return anp.exp(-(x - 10) ** 2) / (x + 1)
def f(x):
# Matrix getting constructed
k = anp.zeros((x.shape[0], x.shape[0]))
# loop over some random 3 dimensional vectors
for element in anp.random.randint(0, x.shape[0], (x.shape[0], 3)):
# select 3 values from x
x_ijk = anp.array([[x[i] for i in element]])
norm = anp.linalg.norm(
x_ijk # anp.vstack((element, element)).transpose()
# make some matrix from the element
m = element.reshape(3, 1) # element.reshape(1, 3)
# alpha is an arbitrary differentiable function R -> R
alpha_value = alpha(norm)
# combine m matricies into k scaling by alpha_value
n = m.shape[0]
for i in range(n):
for j in range(n):
k[element[i], element[j]] += m[i, j] * alpha_value
return k # x
# And course we get an error
# k[element[i], element[j]] += m[i, j] * alpha_value
# ValueError: setting an array element with a sequence.
I don't really understand this message since no type error is happening. I assume it must be from assignment.
After writting the above I made a trivial switch to PyTorch and the code runs just fine. But I would still prefer to use autograd
#pytorch version
import torch
from torch.autograd.gradcheck import zero_gradients
def alpha(x):
return torch.exp(x)
def f(x):
# Matrix getting constructed
k = torch.zeros((x.shape[0], x.shape[0]))
# loop over some random 3 dimensional vectors
for element in torch.randint(0, x.shape[0], (x.shape[0], 3)):
# select 3 values from x
x_ijk = torch.tensor([[1. if n == e else 0 for n in range(len(x))] for e in element]) # x
norm = torch.norm(
x_ijk # torch.stack((torch.tanh(element.float() + 4), element.float() - 4)).t()
m = torch.rand(3, 3)
# alpha is an arbitrary differentiable function R -> R
alpha_value = alpha(norm)
n = m.shape[0]
for i in range(n):
for j in range(n):
k[element[i], element[j]] += m[i, j] * alpha_value
return k # x
x = torch.rand(4, requires_grad=True)
print(x, '\n')
y = f(x)
print(y, '\n')
grads = []
for val in y:
if __name__ == '__main__':
In Autograd, and in JAX, you are not allowed to perform array indexing assignments. See the JAX gotchas for a partial explanation of this.
PyTorch allows this functionality. If you want to run your code in autograd, you'll have to find a way to remove the offending line k[element[i], element[j]] += m[i, j] * alpha_value. If you are okay with running your code in JAX (which has essentially the same syntax as autograd, but more features), then it looks like jax.ops could be helpful for performing this sort of indexing assignment.

Numpy - create an almost zero matrix with row from other matrix

I have square matrix A and I want to create matrix Z which elements are zero everywhere except for an i'th row, and the i'th row is j'th row of matrix A.
I am aware of two ways to accomplish this. The fist one is fairly straightforward and seems to be the most effective performance-wise:
def do_this(mx: np.array, i: int, j: int):
Z = np.zeros_like(mx)
Z[i, :] = mx[j, :]
return Z
The other, less straightforward way and seemingly much less efficient, is to prepare a mx matrix beforehand, which a zero matrix of the same shape as A, but has 1 in it's (i, j) position, and then to calculate Z as mx # A.
def do_this_other_way(mx: np.array, ref_mx: np.array):
return ref_mx # mx
I decided to benchmark both approaches:
from time import time
import numpy as np
n = 20
num_iters = 5000
A = np.random.rand(n, n)
i, j = 5, 10
t = time()
for _ in range(num_iters):
Z = do_this(A, i, j)
print((time() - t) / num_iters)
ref_mx = np.zeros_like(A)
ref_mx[i, j] = 1
t = time()
for _ in range(num_iters):
Z = do_this_other_way(A, ref_mx)
print((time() - t) / num_iters)
However, when A is relatively small (on my laptop it means that A's size is less than 40), do_this_other_way wins, and when A has size like 20, it wins by an order of magnitude.
That's it: I have doubts that I am doing it the most effective way possible in numpy. Is it possible to do it better without resorting to writing your own low-level implementation of do_this?

Fast interpolation over 3D array for 3D origin x

This problem is similar to a former problem answered Fast interpolation over 3D array, but cannot solve my problem.
I have a 4D array with dimensions of (time,altitude,latitude, longitude), marked as y.shape=(nt, nalt, nlat, nlon). The x is altitude and change with (time, latitude, longtitude), which means x.shape = (nt, nalt, nlat, nlon). I want to interpolate in altitude for every (nt, nlat, nlon). The interpolated x_new should be 1d, not change with (time, latitude, longtitude).
I use numpy.interp, same as scipy.interpolate.interp1d and think about the answers in former post. I cannot reduced the loops with those answers.
I can only do like this:
# y is a 4D ndarray
# x is a 4D ndarray
# new_y is a 4D array
for i in range(nlon):
for j in range(nlat):
for k in range(nt):
y_new[k,:,j,i] = np.interp(new_x, x[k,:,j,i], y[k,:,j,i])
These loops make this interpolation too slow to calculation. Would someone have good ideas? Help will be highly appreciated.
Here is my solution by using numba, it's about 3x faster.
create the test data first, x need to in ascending order:
import numpy as np
rows = 200000
cols = 66
new_cols = 69
x = np.random.rand(rows, cols)
y = np.random.rand(rows, cols)
nx = np.random.rand(new_cols)
do 200000 times interp in numpy:
ny = np.empty((x.shape[0], len(nx)))
for i in range(len(x)):
ny[i] = np.interp(nx, x[i], y[i])
I use merge method instead of binary search method, because nx is in order, and the length of nx is about the same as x.
interp() use binary search, the time complexity is O(len(nx)*log2(len(x))
merge method: the time complexity is O(len(nx) + len(x))
Here is the numba code:
import numba
#numba.jit("f8[::1](f8[::1], f8[::1], f8[::1], f8[::1])")
def interp2(x, xp, fp, f):
n = len(x)
n2 = len(xp)
j = 0
i = 0
while x[i] <= xp[0]:
f[i] = fp[0]
i += 1
slope = (fp[j+1] - fp[j])/(xp[j+1] - xp[j])
while i < n:
if x[i] >= xp[j] and x[i] < xp[j+1]:
f[i] = slope*(x[i] - xp[j]) + fp[j]
i += 1
j += 1
if j + 1 == n2:
slope = (fp[j+1] - fp[j])/(xp[j+1] - xp[j])
while i < n:
f[i] = fp[n2-1]
i += 1
#numba.jit("f8[:, ::1](f8[::1], f8[:, ::1], f8[:, ::1])")
def multi_interp(x, xp, fp):
nrows = xp.shape[0]
f = np.empty((nrows, x.shape[0]))
for i in range(nrows):
interp2(x, xp[i, :], fp[i, :], f[i, :])
return f
Then call the numba function:
ny2 = multi_interp(nx, x, y)
To check the result:
np.allclose(ny, ny2)
On my pc, the time is:
python version: 3.41 s
numba version: 1.04 s
This method need an array that the last axis is the axis to be interp().

