Avoiding double for-loops in NumPy array operations

Avoiding double for-loops in NumPy array operations - python

Suppose I have two 2D NumPy arrays A and B, I would like to compute the matrix C whose entries are C[i, j] = f(A[i, :], B[:, j]), where f is some function that takes two 1D arrays and returns a number.
For instance, if def f(x, y): return np.sum(x * y) then I would simply have C = np.dot(A, B). However, for a general function f, are there NumPy/SciPy utilities I could exploit that are more efficient than doing a double for-loop?
For example, take def f(x, y): return np.sum(x != y) / len(x), where x and y are not simply 0/1-bit vectors.

Here is a reasonably general approach using broadcasting.
First, reshape your two matrices to be rank-four tensors.
A = A.reshape(A.shape + (1, 1))
B = B.reshape((1, 1) + B.shape)
Second, apply your function element by element without performing any reduction.
C = f(A, B) # e.g. A != B
Having reshaped your matrices allows numpy to broadcast. The resulting tensor C has shape A.shape + B.shape.
Third, apply any desired reduction by, for example, summing over the indices you want to discard:
C = C.sum(axis=(1, 3)) / C.shape[0]

Related

Computing derivatives using numpy

I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?

I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.

Scipy curve fit (optimization) - vectorizing a conditional to identify threshold using a custom function

I'm trying to use scipy curve_fit to capture the value of a0 parameter. As of now, it is not changing (always comes out as 1):
X = [[1,2,3],[4,5,6]]
def func(X, a0, c):
x1 = X[0]; x2 = X[1]
a = x1*x2
result = np.where(a(a<a0), -c*(a + np.sqrt(x2)), -c*x1)
return result
Popt, Cov = scipy.curve_fit(func, X, y)
a0, c = Popt
Predicted = func(X, a0, c) # a0 and c are constants
I get the values for c, which is a scalar, without any problem. I can't explain why a0 (also a scalar) is always 1, and I am not sure how to fix it. I did see elsewhere on SO that np.where can be used the way I have used it here, but apparently not for curve_fit function. Maybe I need to use a different method of optimization, and I'd like some pointers to do this using scipy methods.
Edit: I tried the construct suggested by Brad, but that's not it.

Updated!
This should work. note that the a variable is a vector in this example of length 3 because it is computed by the element wise multiplication of the first and second elements of X which is a 2x3 matrix. Therefore a0 can either be a scalar or a vector of length 3 and c can also be a scalar or a vector of length 3.
import numpy as np
X = np.array([[1, 2, 3], [4, 5, 6]])
a0 = np.array([8,25,400])
# a0 = 2
# Code works whether C is scalar or a matrix since it can be broadcast to matrix a below.
# c = 3 # Uncomment this for scalar
c = np.array([8, 12, 2000]) # Element wise
def func(X, a0, c):
x = X[0]
y = X[1]
a = x * y
print(a.shape)
result = np.where(a < a0, c * (a + np.sqrt(y)), c * x)
return result
func(X, a0, c)
This is a minimum amount of code that works. Notice I removed the y>0 and defined a to be the same size as c. Now you get the correct insertions because the first parameter of np.where is now the same size as the second and third parameters. Before (x<a) & (y>0) always evaluated to True or False and that is a scalar in this context. If a was a N dimensional array you would have received a ValueError because the operands could not be broadcast together
import numpy as np
c = np.array([[22,34],[33,480]])
def func(X, a):
x = X[0]; y = X[1]
return np.where(c[(x<a)], -c*(a + np.sqrt(y)), -c*x)
X = [25, 600]
a = np.array([[2,14],[33,22]])
func(X,a)
This also works if c is a constant and a was the array you wanted manipulated
import numpy as np
c = 2
def func(X, a):
x = X[0]; y = X[1]
return np.where(a[(x<a)], -c*(a + np.sqrt(y)), -c*x)
X = [25, 600]
a = np.array([[2,14],[33,22]])
func(X,a)

Columns 2D tenson times rows 2D tensor equals a 3d pytorch tensor

Given 2 tensors 2-D in PyTorch A (a X m) and B (m X b), is there any efficient way to obtain a tensor C (m X a X b), where C[i,:,:] = A[:,i] # B[i,:]?
Here I will give an example of the problem:
A = torch.FloatTensor([[1,2],[3,4]])
B = torch.FloatTensor([[1,2,3],[4,5,6]])
Result:
C = torch.FloatTensor([[[1,2,3],[3,6,9]],[[12,15,18],[16,20,24]]])
I have done it using a for-loop. However, it is very inefficient.

look at torch.einsum:
C = torch.einsum('im,mj->mij', A, B)

Appending Numpy Array with Another Numpy Array

I have a NumPy array with equations solved symbolically, with constants a and b. Here's an example of the cell at index (2,0) in my array "bounds_symbolic":
-a*sqrt(1/(a**6*b**2+1))
I also have an array, called "a_values", that I would like to substitute into my "bounds_symbolic" array. I also have the b-value set to 1, which I would also like to substitute in. Keeping the top row of the arrays intact would also be nice.
In other words, for the cell indexed at (2,0) in "bounds_symbolic", I want to substitute all of my a and b-values into the equation, while extending the column to contain the substituted equations. I then want to do this operation for the entirety of the "bounds_symbolic" array.
Here is the code that I have so far:
import sympy
import numpy as np
a, b, x, y = sympy.symbols("a b x y")
# Equation of the ellipse solved for y
ellipse = sympy.sqrt((b ** 2) * (1 - ((x ** 2) / (a ** 2))))
# Functions to be tested
test_functions = np.array(
[(a * b * x), (((a * b) ** 2) * x), (((a * b) ** 3) * x), (((a * b) ** 4) * x), (((a * b) ** 5) * x)])
# Equating ellipse and test_functions so their intersection can be symbolically solved for
equate = np.array(
[sympy.Eq(ellipse, test_functions[0]), sympy.Eq(ellipse, test_functions[1]), sympy.Eq(ellipse, test_functions[2]),
sympy.Eq(ellipse, test_functions[3]), sympy.Eq(ellipse, test_functions[4])])
# Calculating the intersection points of the ellipse and the testing functions
# Array that holds the bounds of the integral solved symbolically
bounds_symbolic = np.array([])
for i in range(0, 5):
bounds_symbolic = np.append(bounds_symbolic, sympy.solve(equate[i], x))
# Array of a-values to plug into the bounds of the integral
a_values = np.array(np.linspace(-10, 10, 201))
# Setting b equal to a constant of 1
b = 1
integrand = np.array([])
for j in range(0, 5):
integrand = np.append(integrand, (ellipse - test_functions[j]))
# New array with a-values substituted into the bounds
bounds_a = bounds_symbolic
# for j in range(0, 5):
# bounds_a = np.append[:, ]
Thank you!

Numpy arrays are the best choice when working with pure numerical data, for which they can help speed up many types of calculations. Once you start mixing sympy expressions, things can get very messy. You'll also lose all the speed advantages of numpy arrays.
Apart from that, np.append is a very slow operation as it needs to recreate the complete array every time it is executed. When creating a new numpy array, the recommended way it to first create an empty array (e.g. with np.zeros()) already with its final size.
You should also check out Python's list comprehension as it eases the creation of lists. In "pythonic" code, indices are used as little as possible. List comprehension may look a bit weird when you are used to other programming languages, but you quickly get used to them, and from then on you'll certainly prefer them.
In your example code, numpy is useful for the np.linspace command, which creates an array of numbers (again converting them with np.array isn't necessary). And at the end, you might want to convert the substituted values to a numpy array. Note that this won't work when solve would return a different number of solutions for some of the equations, as numpy arrays need an equal size for all its elements. Also note that an explicit conversion from sympy's numerical type to a dtype understood by numpy might be needed. (Sympy often works with higher precision, not caring for the loss of speed.)
Also note that if you assign b = 1, you create a new variable and lose the variable pointing to the sympy symbol. It's recommended to use another name. Just writing b = 1 will not change the value of the symbol. You need subs to substitute symbols with values.
Summarizing, your code could look like this:
import sympy
import numpy as np
a, b, x, y = sympy.symbols("a b x y")
# Equation of the ellipse solved for y
ellipse = sympy.sqrt((b ** 2) * (1 - ((x ** 2) / (a ** 2))))
# Functions to be tested
test_functions = [a * b * x, ((a * b) ** 2) * x, ((a * b) ** 3) * x, ((a * b) ** 4) * x, ((a * b) ** 5) * x]
# Equating ellipse and test_functions so their intersection can be symbolically solved for
# Array that holds the bounds of the integral solved symbolically
bounds_symbolic = [sympy.solve(sympy.Eq(ellipse, fun), x) for fun in test_functions]
# Array of a-values to plug into the bounds of the integral
a_values = np.linspace(-10, 10, 201)
# Setting b equal to a constant of 1
b_val = 1
# New array with a-values substituted into the bounds
bounds_a = [[[bound.subs({a: a_val, b: b_val}) for bound in bounds]
for bounds in bounds_symbolic]
for a_val in a_values]
bounds_a = np.array(bounds_a, dtype='float') # shape: (201, 5, 2)
The values of the resulting array can for example be used for plotting:
import matplotlib.pyplot as plt
for i, (test_func, color) in enumerate(zip(test_functions, plt.cm.Set1.colors)):
plt.plot(a_values, bounds_a[:, i, 0], color=color, label=test_func)
plt.plot(a_values, bounds_a[:, i, 1], color=color, alpha=0.5)
plt.legend()
plt.margins(x=0)
plt.xlabel('a')
plt.ylabel('bounds')
plt.show()
Or filled:
for i, (test_func, color) in enumerate(zip(test_functions, plt.cm.Set1.colors)):
plt.plot(a_values, bounds_a[:, i, :], color=color)
plt.fill_between(a_values, bounds_a[:, i, 0], bounds_a[:, i, 1], color=color, alpha=0.1)

broadcasted lstsq (least squares)

I have a bunch of 3x2 matrices, let's say 777 of them, and just as many right-hand sides of size 3. For each of them, I would like to know the least squared solution, so I'm doing
import numpy
A = numpy.random.rand(3, 2, 777)
b = numpy.random.rand(3, 777)
for k in range(777):
numpy.linalg.lstsq(A[..., k], b[..., k])
That works, but is slow. I'd much rather compute all the solutions in one go, but upon
numpy.linalg.lstsq(A, b)
I'm getting
numpy.linalg.linalg.LinAlgError: 3-dimensional array given. Array must be two-dimensional
Any hints on how to broadcast numpy.linalg.lstsq?

One can make use of the fact that if A = U \Sigma V^T is the singular value decomposition of A,
x = V \Sigma^+ U^T b
is the least-squares solution to Ax = b. SVD is broadcasted in numpy. It now only requires a bit of fiddling with einsums to get it all right:
A = numpy.random.rand(7, 3, 2)
b = numpy.random.rand(7, 3)
for k in range(7):
x, res, rank, sigma = numpy.linalg.lstsq(A[k], b[k])
print(x)
print
u, s, v = numpy.linalg.svd(A, full_matrices=False)
uTb = numpy.einsum('ijk,ij->ik', u, b)
xx = numpy.einsum('ijk, ij->ik', v, uTb / s)
print(xx)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoiding double for-loops in NumPy array operations - python

Related

Computing derivatives using numpy

Scipy curve fit (optimization) - vectorizing a conditional to identify threshold using a custom function

Columns 2D tenson times rows 2D tensor equals a 3d pytorch tensor

Appending Numpy Array with Another Numpy Array

broadcasted lstsq (least squares)

Categories

Resources