How to make this python code more efficient? - python

I have the following for loop:
for x in range(int(r.shape[3]/2)):
for d in range(int(r.shape[1]/2)):
r[:, d, :, x, :] = r[:, 0, :, x+d, :]
and I want to get rid of the nested for loops and solely use the numpy library functions to make this code more efficient. How can I do that?

Related

Why does 2D assignment not work as expected in Numpy?

I have the following code:
for (old_point, new_point) in zip(masked_indices, masked_new_indices):
row, col = old_point
new_row, new_col = new_point
new_img[row, col] = img[new_row, new_col]
Where new_img and img are both 1024x1024x3 ndarrays, and masked_indices and masked_new_indices are both 80000x2 ndarrays.
Why does this statement not have the same behaviour?
new_img[masked_indices] = img[masked_new_indices]
And is there a way to optimize this for loop into a more NumPy-ish style?
I figured it out. Since new_img is three dimensional, I have to specify two indexes. Trying to use a 2D value in a single index was what was causing the unexpected behaviour.
This had the effect that I was going for, although it's not very pretty:
new_img[masked_indices[:, 0], masked_indices[:, 1], :] = img[masked_new_indices[:, 0], masked_new_indices[:, 1], :]

Faster definition of "matrix multiplication" in Python

I need to define matrix multiplication from scratch, as instead of multiplying each constant together, each constant is actually another array and any two arrays need to be "convolved" together (I don't think it's necessary to define what a convolution is here).
I have made a picture that hopefully explains what I'm trying to say better:
The code I have to do this with is this:
for row in range(arr1.shape[2]):
for column in range(arr2.shape[3]):
for index in range(arr2.shape[2]): # Could also be "arr1.shape[3]"
out[:, :, row, column] += convolve(
arr2[:, :, : , column][:, :, index],
arr1[:, :, row, : ][:, :, index]
)
However, this method had proved to be very slow for me, so I was wondering if there was a faster way to do this.
If the intermediate fits in memory the following should be reasonably efficient
import numpy as np
from scipy.signal import fftconvolve,convolve
# example
rng = np.random.default_rng()
A = rng.random((5,6,2,3))
B = rng.random((4,3,3,4))
# custom matmul
Ae,Be = A[...,None],B[:,:,None]
shsh = np.maximum(Ae.shape[2:],Be.shape[2:])
Ae = np.broadcast_to(Ae,(*Ae.shape[:2],*shsh))
Be = np.broadcast_to(Be,(*Be.shape[:2],*shsh))
C = fftconvolve(Ae,Be,axes=(0,1),mode='valid').sum(3)
# original loop for reference
out = np.zeros_like(C)
for row in range(A.shape[2]):
for column in range(B.shape[3]):
for index in range(B.shape[2]): # Could also be "A.shape[3]"
out[:, :, row, column] += convolve(
B[:, :, : , column][:, :, index],
A[:, :, row, : ][:, :, index],
mode='valid'
)
print(np.allclose(C,out))
# True
By doing the convolution in bulk we reduce the total number of fft's we have to do.
If need be this could be further optimized for both speed and memory by doing the sum reduction in Fourier space using einsum. This would require doing the fft convolution by hand, though.

Equivalence between python for-loop and 3D numpy matrix additions

I cannot figure out a bug in a very simple transition from a for-loop to a vectorized numpy operation. The code is the following
for null_pos in null_positions:
np.add(singletree[null_pos, parent.x, :, :],
posteriors[parent.u, null_pos, :, :],
out=singletree[null_pos, parent.x, :, :])
Since it is a simple addition between 2D matrices, I generalise into a 3D addition
np.add(singletree[null_positions, parent.x, :, :],
posteriors[parent.u, null_positions, :, :],
out=singletree[null_positions, parent.x, :, :])
The thing is, it appears the result is different! Can you see why?
Thanks!
Update:
It seems that
singletree[null_positions, parent.x, :, :] = \
posteriors[parent.u, null_positions, :, :] +
singletree[null_positions, parent.x, :, :]
solves the problem. In what does this differ with respect to the add operation? (apart from allocating a new matrix, I'm interested in the semantic aspects)
The problem is that passing out=singletree[null_positions, parent.x, :, :] is making a copy of the portion of singletree, since you are using advanced indexing (as opposed to basic indexing, which returns views). Hence, the result will be written to an entirely different array and the original one will remain unmodified.
However, you can use advanced indexing to assign values. In you case, the most recommendable syntax would be:
singletree[null_positions, parent.x, :, :] += \
posteriors[parent.u, null_positions, :, :]
Which would minimize the use of intermediate arrays.

Numpy - Speed up iteration comparison?

The following use case:
I have a Numpy matrix/array with a few thousand 2d points. Call it A.
Eg:
[1 2]
[300 400]
..
[123 242]
I also have another Numpy matrix with a few 2d points as above. Call it B.
Basically, I want to iterate through A, then iterate through B and compute the distance between A[i] and B[j]. Then assign that back to another array. I could do it like this:
for i, (x0, x1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = 0
for j, (p0, p1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = weight_distance + distance((p0,p1),(x0,x1))
weight_array[i] = weight_distance
But this is too slow. What might be a Numpy way to approach this?
What you're probably looking for is the code in scipy.spatial.distance, particularly the cdist function. This can efficiently compute the pairwise distances between arrays of points for a wide variety of metrics.
import numpy as np
from scipy.spatial.distance import cdist
A = np.random.random((1000, 2))
B = np.random.random((100, 2))
D = cdist(A, B, metric='euclidean')
print(D.shape) # (1000, 100)
weights = D.sum(1)
print(weights.shape) # (1000,)
Here euclidean is the standard root-sum-square distance that you're probably used to, and D[i, j] holds the distance between A[i] and B[j], and so summing along axis 1 gives the desired weights.
There are ways to do this via broadcasting directly in numpy, but that approach would use several large temporary arrays, and will in general be slower than the scipy cdist approach.
Edit:
I thought I may as well add a note on the NumPy-only approach. It looks like this:
D2 = np.sqrt(((A[:, None, :] - B[None, :, :]) ** 2).sum(-1))
weights2 = D2.sum(1)
np.allclose(weights, weights2) # True
Let's break it down:
A[:, None, :] adds a new dimension to A, so its shape is now [1000, 1, 2]. Similar for B[None, :, :], which becomes [1, 100, 2]
A[:, None, :] - B[None, :, :] is a broadcasting operation which results in an array of differences, with shape [1000, 100, 2]
We square every element of this result.
the sum(-1) method on this result sums across the last dimension, resulting in an array of shape [1000, 100]
we take the square root of the result, which gives the distance matrix
we sum along axis 1 to get the weights
Notice that this broadcasting approach creates not one, but two temporary arrays of size 1000 * 100 * 2 along the way, which is why it is less efficient than a purpose-built compiled function like cdist.

Avoid for loop using numpy matrices

I'm wondering that must be a way of doing the following without the for loop:
import numpy
from itertools import product as itprod
a = np.arange(120.).reshape(3,2,5,2,2)
fact = np.linspace(1,1.4,15).reshape((3,5))
for i,j in itprod(range(3),range(5)):
a[i,:,j]*= fact[i,j]
Any suggestions??
To take advantage of broadcasting, you have to insert new axes for fact at the right places:
a *= fact[:, np.newaxis, :, np.newaxis, np.newaxis]

Categories

Resources