I need to define matrix multiplication from scratch, as instead of multiplying each constant together, each constant is actually another array and any two arrays need to be "convolved" together (I don't think it's necessary to define what a convolution is here).
I have made a picture that hopefully explains what I'm trying to say better:
The code I have to do this with is this:
for row in range(arr1.shape[2]):
for column in range(arr2.shape[3]):
for index in range(arr2.shape[2]): # Could also be "arr1.shape[3]"
out[:, :, row, column] += convolve(
arr2[:, :, : , column][:, :, index],
arr1[:, :, row, : ][:, :, index]
)
However, this method had proved to be very slow for me, so I was wondering if there was a faster way to do this.
If the intermediate fits in memory the following should be reasonably efficient
import numpy as np
from scipy.signal import fftconvolve,convolve
# example
rng = np.random.default_rng()
A = rng.random((5,6,2,3))
B = rng.random((4,3,3,4))
# custom matmul
Ae,Be = A[...,None],B[:,:,None]
shsh = np.maximum(Ae.shape[2:],Be.shape[2:])
Ae = np.broadcast_to(Ae,(*Ae.shape[:2],*shsh))
Be = np.broadcast_to(Be,(*Be.shape[:2],*shsh))
C = fftconvolve(Ae,Be,axes=(0,1),mode='valid').sum(3)
# original loop for reference
out = np.zeros_like(C)
for row in range(A.shape[2]):
for column in range(B.shape[3]):
for index in range(B.shape[2]): # Could also be "A.shape[3]"
out[:, :, row, column] += convolve(
B[:, :, : , column][:, :, index],
A[:, :, row, : ][:, :, index],
mode='valid'
)
print(np.allclose(C,out))
# True
By doing the convolution in bulk we reduce the total number of fft's we have to do.
If need be this could be further optimized for both speed and memory by doing the sum reduction in Fourier space using einsum. This would require doing the fft convolution by hand, though.
Related
Consider a square matrix containing positive numbers, given as a 2d numpy array A of shape ((m,m)). I would like to build a new array B that has the same shape with entries
B[i,j] = A[i,j] / (np.sqrt(A[i,i]) * np.sqrt(A[j,j]))
An obvious solution is to loop over all (i,j) but I'm wondering if there is a faster way.
Two approaches leveraging broadcasting could be suggested.
Approach #1 :
d = np.sqrt(np.diag(A))
B = A/d[:,None]
B /= d
Approach #2 :
B = A/(d[:,None]*d) # d same as used in Approach #1
Approach #1 has lesser memory overhead and as such I think would be faster.
You can normalize each row of your array by the main diagonal leveraging broadcasting using
b = np.sqrt(np.diag(a))
a / b[:, None]
Also, you can normalize each column using
a / b[None, :]
To do both, as your question seems to ask, using
a / (b[:, None] * b[None, :])
If you want to prevent the creation of intermediate arrays and do the operation in place, you can use
a /= b[:, None]
a /= b[None, :]
I cannot figure out a bug in a very simple transition from a for-loop to a vectorized numpy operation. The code is the following
for null_pos in null_positions:
np.add(singletree[null_pos, parent.x, :, :],
posteriors[parent.u, null_pos, :, :],
out=singletree[null_pos, parent.x, :, :])
Since it is a simple addition between 2D matrices, I generalise into a 3D addition
np.add(singletree[null_positions, parent.x, :, :],
posteriors[parent.u, null_positions, :, :],
out=singletree[null_positions, parent.x, :, :])
The thing is, it appears the result is different! Can you see why?
Thanks!
Update:
It seems that
singletree[null_positions, parent.x, :, :] = \
posteriors[parent.u, null_positions, :, :] +
singletree[null_positions, parent.x, :, :]
solves the problem. In what does this differ with respect to the add operation? (apart from allocating a new matrix, I'm interested in the semantic aspects)
The problem is that passing out=singletree[null_positions, parent.x, :, :] is making a copy of the portion of singletree, since you are using advanced indexing (as opposed to basic indexing, which returns views). Hence, the result will be written to an entirely different array and the original one will remain unmodified.
However, you can use advanced indexing to assign values. In you case, the most recommendable syntax would be:
singletree[null_positions, parent.x, :, :] += \
posteriors[parent.u, null_positions, :, :]
Which would minimize the use of intermediate arrays.
I have a series of 2d arrays where the rows are points in some space. Many similar points occur across all arrays but in different row order. I want to sort the rows so they have the most similar order. Also the points are too different for clustering with K-means or DBSCAN. The problem can also be cast like this. If I stack the arrays into a 3d array, how do I permute the rows to minimize the average standard deviation (SD) along the 2nd axis? What's a good sorting algorithm for this problem?
I've tried the following approaches.
Create a set of reference 2d array and sort rows in each array to minimize mean euclidean distances to the reference 2d array. This I am afraid gives biased results.
Sort rows in arrays pairwise, then pairs of pair-medians, then pairs of that, etc... This doesn't really work and I'm not sure why.
A third approach could be just brute force optimization but I try to avoid that since I have multiple sets of arrays to perform the procedure on.
This is my code for the 2nd approach (Python):
def reorder_to(A, B):
"""Reorder rows in A to best match rows in B.
Input
-----
A : N x M numpy.array
B : N x M numpy.array
Output
------
perm_order : permutation order
"""
if A.shape != B.shape:
print "A and B must have the same shape"
return None
N = A.shape[0]
# Create a distance matrix of distance between rows in A and B
distance_matrix = np.ones((N, N))*np.inf
for i, a in enumerate(A):
for ii, b in enumerate(B):
ba = (b-a)
distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
# Choose permutation order by smallest distances first
perm_order = [[] for _ in range(N)]
for _ in range(N):
ind = np.argmin(distance_matrix)
i, ii = ind/N, ind%N
perm_order[ii] = i
distance_matrix[i, :] = np.inf
distance_matrix[:, ii] = np.inf
return perm_order
def permute_tensor_rows(A):
"""Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
Input
-----
A : numpy.3darray
Each "slice" in the 2nd direction is an independent array whose rows can be permuted
to decrease the average SD in the 2nd direction.
Output
------
A : numpy.3darray
A with sorted rows in each "slice".
"""
step = 2
while step <= A.shape[2]:
for k in range(0, A.shape[2], step):
# If last, reorder to previous
if k + step > A.shape[2]:
A_kk = A[:, :, k:(k+step)]
kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
continue
k_0, k_1 = k, k+step/2
kk_0, kk_1 = k+step/2, k+step
A_k = A[:, :, k_0:k_1]
A_kk = A[:, :, kk_0:kk_1]
order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
step *= 2
return A
Sorry I should have looked at your code sample; that was very informative.
Seems like this here gives an out-of-the-box solution to your problem:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.
The following use case:
I have a Numpy matrix/array with a few thousand 2d points. Call it A.
Eg:
[1 2]
[300 400]
..
[123 242]
I also have another Numpy matrix with a few 2d points as above. Call it B.
Basically, I want to iterate through A, then iterate through B and compute the distance between A[i] and B[j]. Then assign that back to another array. I could do it like this:
for i, (x0, x1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = 0
for j, (p0, p1) in enumerate(zip(A[:,0],A[:,1])):
weight_distance = weight_distance + distance((p0,p1),(x0,x1))
weight_array[i] = weight_distance
But this is too slow. What might be a Numpy way to approach this?
What you're probably looking for is the code in scipy.spatial.distance, particularly the cdist function. This can efficiently compute the pairwise distances between arrays of points for a wide variety of metrics.
import numpy as np
from scipy.spatial.distance import cdist
A = np.random.random((1000, 2))
B = np.random.random((100, 2))
D = cdist(A, B, metric='euclidean')
print(D.shape) # (1000, 100)
weights = D.sum(1)
print(weights.shape) # (1000,)
Here euclidean is the standard root-sum-square distance that you're probably used to, and D[i, j] holds the distance between A[i] and B[j], and so summing along axis 1 gives the desired weights.
There are ways to do this via broadcasting directly in numpy, but that approach would use several large temporary arrays, and will in general be slower than the scipy cdist approach.
Edit:
I thought I may as well add a note on the NumPy-only approach. It looks like this:
D2 = np.sqrt(((A[:, None, :] - B[None, :, :]) ** 2).sum(-1))
weights2 = D2.sum(1)
np.allclose(weights, weights2) # True
Let's break it down:
A[:, None, :] adds a new dimension to A, so its shape is now [1000, 1, 2]. Similar for B[None, :, :], which becomes [1, 100, 2]
A[:, None, :] - B[None, :, :] is a broadcasting operation which results in an array of differences, with shape [1000, 100, 2]
We square every element of this result.
the sum(-1) method on this result sums across the last dimension, resulting in an array of shape [1000, 100]
we take the square root of the result, which gives the distance matrix
we sum along axis 1 to get the weights
Notice that this broadcasting approach creates not one, but two temporary arrays of size 1000 * 100 * 2 along the way, which is why it is less efficient than a purpose-built compiled function like cdist.
I have a python code as follow:
import numpy as np
sizes = 2000
array1 = np.empty((sizes, sizes, sizes, 3), dtype=np.float32)
for i in range(sizes):
array1[i, :, :, 0] = 1.5*i
array1[:, i, :, 1] = 2.5*i
array1[:, :, i, 2] = 3.5*i
array2 = array1.reshape(sizes*sizes*sizes, 3)
#do something with array2
array3 = array2.reshape(sizes*sizes*sizes, 3)
I would want to optimize this code for memory efficient but I have no idea. Could I use "numpy.reshape" by a more memory efficient way?
I think your code is already memory efficient.
When possible, np.reshape returns a view of the original array. That is so in this case and therefore np.reshape is already as memory efficient as can be.
Here is how you can tell np.reshape is returning a view:
import numpy as np
# Let's make array1 smaller; it won't change our conclusions
sizes = 5
array1 = np.arange(sizes*sizes*sizes*3).reshape((sizes, sizes, sizes, 3))
for i in range(sizes):
array1[i, :, :, 0] = 1.5*i
array1[:, i, :, 1] = 2.5*i
array1[:, :, i, 2] = 3.5*i
array2 = array1.reshape(sizes*sizes*sizes, 3)
Note the value of array2 at a certain location:
assert array2[0,0] == 0
Change the corresponding value in array1:
array1[0,0,0,0] = 100
Note that the value of array2 changes.
assert array2[0,0] == 100
Since array2 changes due to a modification of array1, you can conclude that array2 is a view of array1. Views share the underlying data. Since there is no copy being made, the reshape is memory efficient.
array2 is already of shape (sizes*sizes*sizes, 3), so this reshape does nothing.
array3 = array2.reshape(sizes*sizes*sizes, 3)
Finally, the assert below shows array3 was also affected by the modification made to array1. So that proves conclusively that array3 is also a view of array1.
assert array3[0,0] == 100
So really your problem depends on what you are doing with the array. You are currently storing a large amount of redundant information. You could keep 0.15% of the currently stored information and not lose anything.
For instance, if we define the following three one dimensional arrays
a = np.linspace(0,(size-1)*1.5,size).astype(np.float32)
b = np.linspace(0,(size-1)*2.5,size).astype(np.float32)
c = np.linspace(0,(size-1)*3.5,size).astype(np.float32)
We can create any minor entry (i.e. entry in the fastest rotating axis) in your array1:
In [235]: array1[4][3][19] == np.array([a[4],b[3],c[19]])
Out[235]: array([ True, True, True], dtype=bool)
The use of this all depends on what you are doing with the array, as it will be less performant to remake array1 from a,b and c. However, if you are nearing the limits of what your machine can handle, sacrificing performance for memory efficiency may be a necessary step. Also moving a,b and c around will have a much lower overhead than moving array1 around.