Related
I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])
Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.
So I have an image I of size (H x W x C), where C is some number of channels. The challenge is to obtain a new image J, again of size (H x W x C), in which J[i, j] contains only the maximum n entries in I[i, j].
Equivalently, think about iterating through each image pixel in I and zero-ing out all but the highest n entries.
What I've tried:
# NOTE: bone_weight_matrix is a matrix of size (256 x 256 x 43)
argsort_four = np.argsort(bone_weight_matrix, axis=2)[:, :, -4:]
# For each pixel, retain only the top four influencing bone weights
proc_matrix = np.zeros(bone_weight_matrix.shape)
for i in range(bone_weight_matrix.shape[0]):
for j in range(bone_weight_matrix.shape[1]):
proc_matrix[i, j, argsort_four[i, j]] = bone_weight_matrix[i, j, argsort_four[i, j]]
return proc_matrix
Problem is this method seems to be super slow and doesn't feel very pythonic. Any advice would be great.
Cheers.
Generic case : Keeping largest or smallest n elements along an axis
Basically two steps would be involved :
Get those n indices to be kept along the specified axis with np.argparition.
Initialize a zeros array and use those earlier obtained indices with advanced-indexing to select from the input array as well as assign into the zeros array.
Let's try to solve for a generic problem that works to select n elements along the specified axis and also be able to keep largest n as well as smallest n elements.
The implementation would look like this -
def keep(ar, n, axis=-1, order='largest'):
axis = np.core.multiarray.normalize_axis_index(axis, ar.ndim)
slice_l = [slice(None, None, None)]*ar.ndim
if order=='largest':
slice_l[axis] = slice(-n,None,None)
idx = np.argpartition(ar, kth=-n, axis=axis)[slice_l]
elif order=='smallest':
slice_l[axis] = slice(None,n,None)
idx = np.argpartition(ar, kth=n, axis=axis)[slice_l]
else:
raise Exception('Invalid order value')
grid = np.ogrid[tuple(map(slice, ar.shape))]
grid[axis] = idx
out = np.zeros_like(ar)
out[grid] = ar[grid]
return out
Sample runs
Input array :
In [208]: np.random.seed(0)
...: I = np.random.randint(11,99,(2,2,6))
In [209]: I
Out[209]:
array([[[55, 58, 75, 78, 78, 20],
[94, 32, 47, 98, 81, 23]],
[[69, 76, 50, 98, 57, 92],
[48, 36, 88, 83, 20, 31]]])
Keep largest 2 elements along last axis :
In [210]: keep(I, n=2, axis=-1, order='largest')
Out[210]:
array([[[ 0, 0, 0, 78, 78, 0],
[94, 0, 0, 98, 0, 0]],
[[ 0, 0, 0, 98, 0, 92],
[ 0, 0, 88, 83, 0, 0]]])
Keep largest 1 element along first axis :
In [211]: keep(I, n=1, axis=1, order='largest')
Out[211]:
array([[[ 0, 58, 75, 0, 0, 0],
[94, 0, 0, 98, 81, 23]],
[[69, 76, 0, 98, 57, 92],
[ 0, 0, 88, 0, 0, 0]]])
Keep smallest 2 elements along last axis :
In [212]: keep(I, n=2, axis=-1, order='smallest')
Out[212]:
array([[[55, 0, 0, 0, 0, 20],
[ 0, 32, 0, 0, 0, 23]],
[[ 0, 0, 50, 0, 57, 0],
[ 0, 0, 0, 0, 20, 31]]])
I have a 3D numpy array looks like this
shape(3,1000,100)
[[[2,3,0,2,6,...,0,-1,-1,-1,-1,-1],
[1,4,6,1,4,5,3,...,1,2,6,-1,-1],
[7,4,6,3,1,0,1,...,2,0,8,-1,-1],
...
[8,7,6,4,...,2,4,5,2,1,-1]],
...,
[1,5,6,7,...,0,0,0,0,1]]]
Each lane of array end with 0 or multiple(less than 70 I'm sure) -1.
For now, I want to select only 30 values before the -1 for each lane, to make a subset of original numpy array with shape of (3,1000,30)
Should be similar like this,
[[[...,0],
[...,1,2,6],
[...,2,0,8],
...
[...,2,4,5,2,1]],
...,
[...,0,0,0,0,1]]]
Is it possible to do it with some numpy functions? Hope without a for loop:)
Here's one making use of broadcasting and advanced-indexing -
def preceedingN(a, N):
# mask of value (minus 1 here) to be found
mask = a==-1
# Get the first index with the value along the last axis.
# In case its not found, choose the last index
idx = np.where(mask.any(-1), mask.argmax(-1), mask.shape[-1])
# Get N ranged indices along the last axis
ind = idx[...,None] + np.arange(-N,0)
# Finally advanced-index and get the ranged indexed elements as the o/p
m,n,r = a.shape
return a[np.arange(m)[:,None,None], np.arange(n)[:,None], ind]
Sample run -
Setup for reproducible input :
import numpy as np
# Setup sample input array
np.random.seed(0)
m,n,r = 2,4,10
a = np.random.randint(11,99,(m,n,r))
# Select N elements off each row
N = 3
idx = np.random.randint(N,a.shape[-1]-1,(m,n))
a[idx[...,None] < np.arange(a.shape[-1])] = -1
a[0,0] = range(r) # set first row of first 2D slice to range (no -1s there)
Input, output :
>>> a
array([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[81, 23, 69, 76, 50, 98, 57, -1, -1, -1],
[88, 83, 20, 31, 91, 80, 90, 58, -1, -1],
[60, 40, 30, 30, 25, 50, 43, 76, -1, -1]],
[[43, 42, 85, 34, 46, 86, 66, 39, -1, -1],
[11, 47, 64, 16, -1, -1, -1, -1, -1, -1],
[42, 12, 76, 52, 68, 46, 22, 57, -1, -1],
[25, 64, 23, 53, 95, 86, 79, -1, -1, -1]]])
>>> preceedingN(a, N=3)
array([[[ 7, 8, 9],
[50, 98, 57],
[80, 90, 58],
[50, 43, 76]],
[[86, 66, 39],
[47, 64, 16],
[46, 22, 57],
[95, 86, 79]]])
This is a solution based on the idea of calculating the indices which should be kept. We use numpy.argmin and numpy.nonzero to find the start of the -1 or the end of the row, then use 2-dimensional addition/subtraction to build the indices of the 30 elements that need to be kept.
First off, we create reproducible example data
import numpy as np
np.random.seed(0)
a = np.random.randint(low=0, high=10, size=(3, 1000, 100))
for i in range(3):
for j in range(1000):
a[i, j, np.random.randint(low=30, high=a.shape[2]+1):] = -1
You can of course skip this step, I just added it to allow others to reproduce this solution. :)
Now let's go through the code step-by-step:
Change shape of a to simplify code
a.shape = 3 * 1000, 100
Find index of -1 in each row of a
i = np.argmin(a, axis=1)
Replace any indices for rows of a with no -1 by a 100
i[np.nonzero(a[np.arange(a.shape[0]), i] != -1)] = a.shape[1]
Translate indices to 1D
i = i + a.shape[1] * np.arange(a.shape[0])
Calculate indices for all elements that should be kept. We make the indices two-dimensional so that we get the 30 indices before each -1 index.
i = i.reshape(a.shape[0], 1) - 30 + np.arange(30).reshape(1, 30)
a.shape = 3 * 1000 * 100
Perform filtering
a = a[i]
Return a to desired shape
a.shape = 3, 1000, 30
Here's one using stride_tricks for efficient retrieval of slices:
import numpy as np
from numpy.lib.stride_tricks import as_strided
# create mock data
data = np.random.randint(0, 9, (3, 1000, 100))
fstmone = np.random.randint(30, 101, (3, 1000))
data[np.arange(100) >= fstmone[..., None]] = -1
# count -1s (ok, this is a bit wasteful compared to #Divakar's)
aux = np.where(data[..., 30:] == -1, 1, -100)
nmones = np.maximum(np.max(np.cumsum(aux[..., ::-1], axis=-1), axis=-1), 0)
# slice (but this I'd expect to be faster)
sliceable = as_strided(data, data.shape[:2] + (71, 30),
data.strides + data.strides[2:])
result = sliceable[np.arange(3)[:, None], np.arange(1000)[None, :], 70-nmones, :]
Benchmarks, best solution is a hybrid of #Divakar's and mine, #Florian Rhiem's is also quite fast:
import numpy as np
from numpy.lib.stride_tricks import as_strided
# create mock data
data = np.random.randint(0, 9, (3, 1000, 100))
fstmone = np.random.randint(30, 101, (3, 1000))
data[np.arange(100) >= fstmone[..., None]] = -1
def pp(data, N):
# count -1s
aux = np.where(data[..., N:] == -1, 1, -data.shape[-1])
nmones = np.maximum(np.max(np.cumsum(aux[..., ::-1], axis=-1), axis=-1), 0)
# slice
sliceable = as_strided(data, data.shape[:2] + (data.shape[-1]-N+1, N),
data.strides + data.strides[2:])
return sliceable[np.arange(data.shape[0])[:, None],
np.arange(data.shape[1])[None, :],
data.shape[-1]-N-nmones, :]
def Divakar(data, N):
# mask of value (minus 1 here) to be found
mask = data==-1
# Get the first index with the value along the last axis.
# In case its not found, choose the last index
idx = np.where(mask.any(-1), mask.argmax(-1), mask.shape[-1])
# Get N ranged indices along the last axis
ind = idx[...,None] + np.arange(-N,0)
# Finally advanced-index and get the ranged indexed elements as the o/p
m,n,r = data.shape
return data[np.arange(m)[:,None,None], np.arange(n)[:,None], ind]
def combined(data, N):
# mix best of Divakar's and mine
mask = data==-1
idx = np.where(mask.any(-1), mask.argmax(-1), mask.shape[-1])
sliceable = as_strided(data, data.shape[:2] + (data.shape[-1]-N+1, N),
data.strides + data.strides[2:])
return sliceable[np.arange(data.shape[0])[:, None],
np.arange(data.shape[1])[None, :],
idx-N, :]
def fr(data, N):
data = data.reshape(-1, data.shape[-1])
i = np.argmin(data, axis=1)
i[np.nonzero(data[np.arange(data.shape[0]), i] != -1)] = data.shape[1]
i = i + data.shape[1] * np.arange(data.shape[0])
i = i.reshape(data.shape[0], 1) - N + np.arange(N).reshape(1, N)
data.shape = -1,
res = data[i]
res.shape = 3, 1000, 30
return res
print(np.all(combined(data, 30) == Divakar(data, 30)))
print(np.all(combined(data, 30) == pp(data, 30)))
print(np.all(combined(data, 30) == fr(data, 30)))
from timeit import timeit
for func in pp, Divakar, combined, fr:
print(timeit('f(data, 30)', number=100, globals={'f':func, 'data':data}))
Sample output:
True
True
True
0.2767702739802189
0.13680238201050088
0.060565065999981016
0.0795100320247002
I have two ndarrays :
a = [[30,40],
[60,90]]
b = [[0,0,1],
[1,0,1],
[1,1,1]]
please notice that a shape might be larger but always square array (50,50) , (100,100)
The wanted result is :
Result = [[a*0,a*0,a*1],
[[a*1,a*0,a*1],
[[a*1,a*1,a*1]]
I managed to get the right answer with this code but I think there would be a built in function in numpy that accomplish this task in fast manners
totalrows=[]
for row in range(b.shape[0]):
cells=[]
for column in range(b.shape[1]):
print row,column
cells.append(b[row,column]*a)
totalrows.append(np.concatenate(cells,axis=1))
return np.concatenate(totalrows,axis=0)
Indeed there's a NumPy built-in np.kron for such block-based elementwise multiplication problems. To solve your case, it could be used like so -
np.kron(b,a)
Sample run -
In [50]: a
Out[50]:
array([[30, 40],
[60, 90]])
In [51]: b
Out[51]:
array([[0, 0, 1],
[1, 0, 1],
[1, 1, 1]])
In [52]: np.kron(b,a)
Out[52]:
array([[ 0, 0, 0, 0, 30, 40],
[ 0, 0, 0, 0, 60, 90],
[30, 40, 0, 0, 30, 40],
[60, 90, 0, 0, 60, 90],
[30, 40, 30, 40, 30, 40],
[60, 90, 60, 90, 60, 90]])
3D array case
Now, let's say we are working with a as a 3D array (m,n,p) and b as (q,r) and assuming you are looking to perform such a block-wise multiplication iteratively along the last axis of a. Thus, the shapes are to be multiplied along the first two axes on the two inputs to get the output array. To achieve such an output, we need to extend the dimension of b by introducing a singleton dimension as the last axis. The final output would be of shape (m*q,n*r,p*1). The implementation would be simply -
np.kron(b[...,None],a)
Shape check -
In [161]: a = np.random.randint(0,99,(4,5,2))
...: b = np.random.randint(0,99,(6,7))
...:
In [162]: np.kron(b[...,None],a).shape
Out[162]: (24, 35, 2)
I have 4 2D numpy arrays, called a, b, c, d, each of them made of n rows and m columns. What I need to do is giving to each element of b and d a value calculated as follows (pseudo-code):
min_coords = min_of_neighbors_coords(x, y)
b[x,y] = a[x,y] * a[min_coords];
d[x,y] = c[min_coords];
Where min_of_neighbors_coords is a function that, given the coordinates of an element of the array, returns the coordinates of the 'neighbor' element that has the lower value. I.e., considering the array:
1, 2, 5
3, 7, 2
2, 3, 6
min_of_neighbors_coords(1, 1) will refer to the central element with the value of 7, and will return the tuple (0, 0): the coordinates of the number 1.
I managed to do this using for loops (element per element), but the algorithm is VERY slow and I'm searching a way to improve it, avoiding loops and demanding the calculations to numpy.
Is it possible?
EDIT I have kept my original answer at the bottom. As Paul points out in the comments, the original answer didn't really answer the OP's question, and could be more easily achieved with an ndimage filter. The following much more cumbersome function should do the right thing. It takes two arrays, a and c, and returns the windowed minimum of a and the values in c at the positions of the windowed minimums in a:
def neighbor_min(a, c):
ac = np.concatenate((a[None], c[None]))
rows, cols = ac.shape[1:]
ret = np.empty_like(ac)
# Fill in the center
win_ac = as_strided(ac, shape=(2, rows-2, cols, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :cols] +
[np.argmin(win_ac[0], axis=2)]]
win_ac = as_strided(win_ac, shape=(2, rows-2, cols-2, 3),
strides=win_ac.strides+win_ac.strides[2:3])
ret[:, 1:-1, 1:-1] = win_ac[np.ogrid[:2, :rows-2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
# Fill the top, bottom, left and right borders
win_ac = as_strided(ac[:, :2, :], shape=(2, 2, cols-2, 3),
strides=ac.strides+ac.strides[2:3])
win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 0, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
np.ogrid[:cols-2]]
win_ac = as_strided(ac[:, -2:, :], shape=(2, 2, cols-2, 3),
strides=ac.strides+ac.strides[2:3])
win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, -1, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
np.ogrid[:cols-2]]
win_ac = as_strided(ac[:, :, :2], shape=(2, rows-2, 2, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 1:-1, 0] = win_ac[:, np.ogrid[:rows-2],
np.argmin(win_ac[0], axis=1)]
win_ac = as_strided(ac[:, :, -2:], shape=(2, rows-2, 2, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 1:-1, -1] = win_ac[:, np.ogrid[:rows-2],
np.argmin(win_ac[0], axis=1)]
# Fill the corners
win_ac = ac[:, :2, :2]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, 0, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, :2, -2:]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, 0, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, -2:, -2:]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, -1, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, -2:, :2]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, -1, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
return ret
The return is a (2, rows, cols) array that can be unpacked into the two arrays:
>>> a = np.random.randint(100, size=(5,5))
>>> c = np.random.randint(100, size=(5,5))
>>> a
array([[42, 54, 18, 88, 26],
[80, 65, 83, 31, 4],
[51, 52, 18, 88, 52],
[ 1, 70, 5, 0, 89],
[47, 34, 27, 67, 68]])
>>> c
array([[94, 94, 29, 6, 76],
[81, 47, 67, 21, 26],
[44, 92, 20, 32, 90],
[81, 25, 32, 68, 25],
[49, 43, 71, 79, 77]])
>>> neighbor_min(a, c)
array([[[42, 18, 18, 4, 4],
[42, 18, 18, 4, 4],
[ 1, 1, 0, 0, 0],
[ 1, 1, 0, 0, 0],
[ 1, 1, 0, 0, 0]],
[[94, 29, 29, 26, 26],
[94, 29, 29, 26, 26],
[81, 81, 68, 68, 68],
[81, 81, 68, 68, 68],
[81, 81, 68, 68, 68]]])
The OP's case could then be solved as:
def bd_from_ac(a, c):
b,d = neighbor_min(a, c)
return a*b, d
And while there is a serious performance hit, it is pretty fast still:
In [3]: a = np.random.rand(1000, 1000)
In [4]: c = np.random.rand(1000, 1000)
In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 570 ms per loop
You are not really using the coordinates of the minimum neighboring element for anything else than fetching it, so you may as well skip that part and create a min_neighbor function. If you don't want to resort to cython for fast looping, you are going to have to go with rolling window views, such as outlined in Paul's link. This will typically convert your (m, n) array into a (m-2, n-2, 3, 3) view of the same data, and you would then apply np.min over the last two axes.
Unfortunately you have to apply it one axis at a time, so you will have to create a (m-2, n-2, 3) copy of your data. Fortunately, you can compute the minimum in two steps, first windowing and minimizing along one axis, then along the other, and obtain the same result. So at most you are going to have intermediate storage the size of your input. If needed, you could even reuse the output array as intermediate storage and avoid memory allocations, but that is left as exercise...
The following function does that. It is kind of lengthy because it has to deal not only with the central area, but also with the special cases of the four edges and four corners. Other than that it is a pretty compact implementation:
def neighbor_min(a):
rows, cols = a.shape
ret = np.empty_like(a)
# Fill in the center
win_a = as_strided(a, shape=(m-2, n, 3),
strides=a.strides+a.strides[:1])
win_a = win_a.min(axis=2)
win_a = as_strided(win_a, shape=(m-2, n-2, 3),
strides=win_a.strides+win_a.strides[1:])
ret[1:-1, 1:-1] = win_a.min(axis=2)
# Fill the top, bottom, left and right borders
win_a = as_strided(a[:2, :], shape=(2, cols-2, 3),
strides=a.strides+a.strides[1:])
ret[0, 1:-1] = win_a.min(axis=2).min(axis=0)
win_a = as_strided(a[-2:, :], shape=(2, cols-2, 3),
strides=a.strides+a.strides[1:])
ret[-1, 1:-1] = win_a.min(axis=2).min(axis=0)
win_a = as_strided(a[:, :2], shape=(rows-2, 2, 3),
strides=a.strides+a.strides[:1])
ret[1:-1, 0] = win_a.min(axis=2).min(axis=1)
win_a = as_strided(a[:, -2:], shape=(rows-2, 2, 3),
strides=a.strides+a.strides[:1])
ret[1:-1, -1] = win_a.min(axis=2).min(axis=1)
# Fill the corners
ret[0, 0] = a[:2, :2].min()
ret[0, -1] = a[:2, -2:].min()
ret[-1, -1] = a[-2:, -2:].min()
ret[-1, 0] = a[-2:, :2].min()
return ret
You can now do things like:
>>> a = np.random.randint(10, size=(5, 5))
>>> a
array([[0, 3, 1, 8, 9],
[7, 2, 7, 5, 7],
[4, 2, 6, 1, 9],
[2, 8, 1, 2, 3],
[7, 7, 6, 8, 0]])
>>> neighbor_min(a)
array([[0, 0, 1, 1, 5],
[0, 0, 1, 1, 1],
[2, 1, 1, 1, 1],
[2, 1, 1, 0, 0],
[2, 1, 1, 0, 0]])
And your original question can be solved as:
def bd_from_ac(a, c):
return a*neighbor_min(a), neighbor_min(c)
As a performance benchmark:
In [2]: m, n = 1000, 1000
In [3]: a = np.random.rand(m, n)
In [4]: c = np.random.rand(m, n)
In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 123 ms per loop
Finding a[min_coords] is a rolling window operation. Several clever solutions our outlined in this post. You'll want to make the creation of the c[min_coords] array a side-effect of whichever solution you choose.
I hope this helps. I can post some sample code later when I have some time.
I have interest in helping you, and I believe there are possibly better solutions outside the scope of your question, but in order to put my own time into writing code, I must have some feedback of yours, because I am not 100% sure I understand what you need.
One thing to consider: if you are a C# developer, maybe a "brute-force" implementation of C# can outperform a clever implementation of Numpy, so you could consider at least testing your rather simple operations implemented in C#. Geotiff (which I suppose you are reading) has a relatively friendly specification, and I guess there might be .NET GeoTiff libraries around.
But supposing you want to give Numpy a try (and I believe you should), let's take a look at what you're trying to achieve:
If you are going to run min_coords(array) in every element of arrays a and c, you might consider to "stack" nine copies of the same array, each copy rolled by some offset, using numpy.dstack() and numpy.roll(). Then, you apply numpy.argmin(stacked_array, axis=2) and you get an array containing values between 0 and 8, where each of these values map to a tuple containing the offset indexes.
Then, using this principle, your min_coords() function would be vectorized, operating in the whole array at once, and giving back an array that gives you an offset which would be the index of a lookup table containing the offsets.
If you have interest in elaborating this, please leave a comment.
Hope this helps!