I have a raster with a set of unique ID patches/regions which I've converted into a two-dimensional Python numpy array. I would like to calculate pairwise Euclidean distances between all regions to obtain the minimum distance separating the nearest edges of each raster patch. As the array was originally a raster, a solution needs to account for diagonal distances across cells (I can always convert any distances measured in cells back to metres by multiplying by the raster resolution).
I've experimented with the cdist function from scipy.spatial.distance as suggested in this answer to a related question, but so far I've been unable to solve my problem using the available documentation. As an end result I would ideally have a 3 by X array in the form of "from ID, to ID, distance", including distances between all possible combinations of regions.
Here's a sample dataset resembling my input data:
import numpy as np
import matplotlib.pyplot as plt
# Sample study area array
example_array = np.array([[0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 2, 2, 0, 6, 0, 3, 3, 3],
[0, 0, 0, 0, 2, 2, 0, 0, 0, 3, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3],
[1, 1, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 5, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]])
# Plot array
plt.imshow(example_array, cmap="spectral", interpolation='nearest')
Distances between labeled regions of an image can be calculated with the following code,
import itertools
from scipy.spatial.distance import cdist
# making sure that IDs are integer
example_array = np.asarray(example_array, dtype=np.int)
# we assume that IDs start from 1, so we have n-1 unique IDs between 1 and n
n = example_array.max()
indexes = []
for k in range(1, n):
tmp = np.nonzero(example_array == k)
tmp = np.asarray(tmp).T
indexes.append(tmp)
# calculating the distance matrix
distance_matrix = np.zeros((n-1, n-1), dtype=np.float)
for i, j in itertools.combinations(range(n-1), 2):
# use squared Euclidean distance (more efficient), and take the square root only of the single element we are interested in.
d2 = cdist(indexes[i], indexes[j], metric='sqeuclidean')
distance_matrix[i, j] = distance_matrix[j, i] = d2.min()**0.5
# mapping the distance matrix to labeled IDs (could be improved/extended)
labels_i, labels_j = np.meshgrid( range(1, n), range(1, n))
results = np.dstack((labels_i, labels_j, distance_matrix)).reshape((-1, 3))
print(distance_matrix)
print(results)
This assumes integer IDs, and would need to be extended if that is not the case. For instance, with the test data above, the calculated distance matrix is,
# From 1 2 3 4 5 # To
[[ 0. 4.12310563 4. 9.05538514 5. ] # 1
[ 4.12310563 0. 3.16227766 10.81665383 8.24621125] # 2
[ 4. 3.16227766 0. 4.24264069 2. ] # 3
[ 9.05538514 10.81665383 4.24264069 0. 3.16227766] # 4
[ 5. 8.24621125 2. 3.16227766 0. ]] # 5
while the full output can be found here. Note that this takes the Eucledian distance from the center of each pixel. For instance, the distance between zones 1 and 3 is 2.0, while they are separated by 1 pixel.
This is a brute-force approach, where we calculate all the pairwise distances between pixels of different regions. This should be sufficient for most applications. Still, if you need better performance, have a look at scipy.spatial.cKDTree which would be more efficient in computing the minimum distance between two regions, when compared to cdist.
Related
Let's say I have to initialise the bi-directional edges for the following graph between the nodes:
I can easily do this using the following code:
import numpy as np
node_num = 3
graph = np.ones([node_num, node_num]) - np.eye(node_num)
Now I am extending this graph in the following way:
What is the simple and efficient way to make it code for this graph?
Assuming you're looking for an adjacency matrix, you could use:
out = np.block([
[1 - np.eye(3), np.eye(3) ],
[ np.eye(3), np.zeros((3, 3))]
]).astype(int)
out:
array([[0, 1, 1, 1, 0, 0], # A
[1, 0, 1, 0, 1, 0], # B
[1, 1, 0, 0, 0, 1], # C
[1, 0, 0, 0, 0, 0], # BC
[0, 1, 0, 0, 0, 0], # AB
[0, 0, 1, 0, 0, 0]]) # AB(red)
but I would suggest you just initialize it as the outputted adjacency matrix. I would only use a short one liner for very simple graphs like your first image, not the second.
I'm fairly new to python and I am having trouble with an array.
I'm having a problem with a symmetric matrix. Being symmetric, X[0][6] is the same as X[6][0]. I'm looking to be able to append all non-zero values in said array into a list - however I don't want to include elements on the diagonal nor duplicates of elements on one of the sides of the diagonal. For example, I only want to append X[0][6] but not also X[6][0].
The 2D array is as follows:
X = [[9, 0, 0, 1, 0, 0, 6, 0],
[0, 4, 0, 0, 0, 0, 0, 1],
[0, 0, 9, 0, 4, 0, 0, 0],
[1, 0, 0, 3, 0, 0, 0, 1],
[0, 0, 4, 0, 8, 0, 0, 3],
[0, 0, 0, 0, 0, 4, 0, 0],
[6, 0, 0, 0, 0, 0, 9, 2],
[0, 1, 0, 1, 3, 0, 2, 8]]
I've attempted a for loop like so:
non_zero_entries = []
for i in X:
for j in i:
if j > 0:
non_zero_entries.append(j)
When I do this however, due to the nature of the symmetry of the array I get the following output which has not only the diagonal but also the duplicates within the matrix:
Out: [9, 1, 6, 4, 1, 9, 4, 1, 3, 1, 4, 8, 3, 4, 6, 9, 2, 1, 1, 3, 2, 8]
Ideally I need to be able to transform my matrix to look like this so that the diagonal and the other side of becomes 0.
ideal_X = [[0, 0, 0, 1, 0, 0, 6, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 4, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 3],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0, 0, 0]]
This would give the output that I require:
Out: [1, 6, 1, 4, 1, 3, 2]
How would I either transform my matrix into the one I've provided, or is there a simpler way to get my desired output with the initial matrix?
You have the right idea in writing two nested for loops to iterate through the elements of your matrix. The outer for loop iterates once through each row of the matrix, which is already correct; you just need to change the inner for loop so that instead of iterating over every element in the row, it only iterates through the elements on one side of the diagonal.
If you don't care which "side" of the diagonal (upper or lower) you extract the non-zero elements from, it's easiest to get them from the lower side of the diagonal, by doing this:
for row_num in range(len(X)):
for col_num in range(row_num):
if X[row_num][col_num] > 0:
non_zero_entries.append(X[row_num][col_num])
Note that instead of using a "for-each" loop, where the loop variable is each entire row, we use a for loop with a range, where the loop variable is a number. This has equivalent behavior (X[row_num] is the same as i in your original code), but allows us to have a variable that counts which row the loop is currently on. This number, row_num, is equal to the index within the row (the column number) that is the "diagonal" entry. Thus the inner for loop can use a range that ends at row_num, rather than len(X[row_num]), to only iterate over the entries from 0 to the "diagonal" entry.
If you specifically want the entries from the upper side of the diagonal, the inner for loop's range needs to be a little more complicated:
for row_num in range(len(X)):
for col_num in range(row_num+1, len(X[row_num])):
if X[row_num][col_num] > 0:
non_zero_entries.append(X[row_num][col_num])
In this code, range(row_num+1, len(X[row_num])) produces a range for the inner for loop that starts at the entry after the "diagonal" entry, and ends at the end of the row.
I am trying to extract elements from a 2 dimensional array, a, and I am using a 2 dimensional array of indices, b , representing x/y coordinates. I have found something similar for a 1D array but I was unable to successfully apply it to a 2D array: Python - How to extract elements from an array based on an array of indices?
a = np.random.randn(10,7)
a = a.astype(int)
b = np.array([[1, 2], [2,3], [3,5], [2,7], [5,6]])
I have been using the bit of code below, however it returns a 3D matrix with values from the rows of each of the indices:
result2 = np.array(a)[b]
result2
Out[101]:
array([[[ 0, -1, 0, 0, 0, 1, 0],
[ 0, -1, 0, 0, 0, 0, 0]],
[[ 0, -1, 0, 0, 0, 0, 0],
[-1, 0, 0, 1, 0, 0, 0]],
[[-1, 0, 0, 1, 0, 0, 0],
[ 0, 0, -1, -2, 1, 0, 0]],
[[ 0, -1, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 1]],
[[ 0, 0, -1, -2, 1, 0, 0],
[ 1, 0, 0, 1, 0, -1, 0]]])
How can I modify b in order to index (column 1,row 2) ... (column 2, row 3)... (column 3, row 5) ... etc?
...
This is a minimal reproducible example and my actual data involves me indexing 500 cells in a 100x100 matrix (using an array of x/y coordinates/indices, size (500x2), similar to the above b). Would it be best to use a for loop in this case? Something along the lines of ...
for i in b:
for j in b:
result2 = np.array(a)[i,j]
I've encountered the same issue not long ago, and the answer is actually quite simple :
result = a[b[:,0], b[:,1]]
I have raster containing spatial ecological habitat data which I've converted to a 2-dimensional numpy array. In this array, values of 1 = data, and 0 = no data.
From this data I want to generate an array of containing all data cell pairs where the distance between each cell is less than a maximum Euclidean cutoff distance (i.e. 2 cells apart).
I've found this answer useful, but the answers there appear to first measure all pairwise distances, then subsequently threshold the results by a maximum cutoff. My datasets are large (over 1 million data cells in a 13500*12000 array), so any pairwise distance measure that tries to calculate distances between all pairs of cells will fail: I need a solution which somehow stops looking for possible neighbours outside of a certain search radius (or something similar).
I've experimented with scipy.spatial.distance.pdist, but so far haven't had luck applying it to either my two-dimensional data, or finding a way to prevent pdist from calculating distances between even distant pairs of cells. I've attached an example array and a desired output array for maximum Euclidean cutoff distance = 2 cells:
import numpy as np
import matplotlib.pyplot as plt
# Example 2-D habitat array (1 = data)
example_array = np.array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Plot example array
plt.imshow(example_array, cmap="spectral", interpolation='nearest')
I have to confess my numpy is weak -- maybe there is a way to do it directly. Nonetheless, the problem is not difficult in pure Python. The following code will output pairs of x/y coordinates of your matching data. There are a lot of potential optimizations that could obscure the code and make it go faster, but given the size of your data set and the size (2.0) of your example radius, I doubt any of those are worthwhile (with the possible exception of creating numpy views into the arrays instead of sublists).
Updated -- The code has had a couple of bugs fixed -- (1) it was looking too far to the left on lines that were below the starting point, and (2) it was not doing the right thing near the left edge. The invocation of the function now uses a radius of 2.5 to show how additional pairs can be picked up.
example_array = [[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
def findpairs(mylist, radius = 2.0):
"""
Find pairs with data within a given radius.
If we work from the top of the array down, we never
need to look up (because we already would have found
those, and we never need to look left on the same line.
"""
# Create the parameters of a half circle, which is
# the relative beginning and ending X coordinates to
# search for each Y line starting at this one and
# working down. To avoid duplicates and extra work,
# not only do we not look up, we never look left on
# the same line as what we are matching, but we do
# on subsequent lines.
semicircle = []
x = 1
while x:
y = len(semicircle)
x = int(max(0, (radius ** 2 - y ** 2)) ** 0.5)
# Don't look back on same line...
semicircle.append((-x if y else 1, x + 1))
# The maximum number of y lines we will search
# at a time.
max_y = len(semicircle)
for y_start in range(len(mylist)):
sublists = enumerate(mylist[y_start:y_start + max_y], y_start)
sublists = zip(semicircle, sublists)
check = (x for (x, value) in enumerate(mylist[y_start]) if value)
for x_start in check:
for (x_lo, x_hi), (y, ylist) in sublists:
# Deal with left edge problem
x_lo = max(0, x_lo + x_start)
xlist = ylist[x_lo: x_start + x_hi]
for x, value in enumerate(xlist, x_lo):
if value:
yield (x_start, y_start), (x, y)
print(list(findpairs(example_array, 2.5)))
Execution time is going to be highly data dependent. For grins, I created arrays of the size you specified (13500 x 12000) to test timing. I used a bigger radius (3.0 instead of 2.0) and tried two cases: no matches, and every match. To avoid reallocating lists over and over, I simply ran the iterator and tossed the results. The code to do that is below. For a best-case (empty) array, it ran on my machine in 7 seconds; the time for the worst-case (all 1s) array was around 12 minutes.
def dummy(val):
onelist = 13500 * [val]
listolists = 12000 * [onelist]
for i in findpairs(listolists, 3.0):
pass
dummy(0)
dummy(1)
Using theano tensor operations, how can I toggle one cell on each row of a matrix based on a integer position indicator on the correspond row index of a vector (i.e. |v| = rows of the matrix). For example, given a 100x5 matrix of zeros
M = [
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
] # |M| = 100x5
and a 100-element vector of integer in the range of [0, 4].
V = [2, 4, ..., 0, 2] # |V| = 100, max(V) = 4, min(V) = 0
update (or create another) matrix M to
M = [
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
...
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0]
] # |M| = 100x5
(I know how to do this iteratively using conventional codes, but I want to run it as part of an algorithm on GPU without complicating my input which is currently vector V, so a direct theano implementation would be great.)
I figured out the answer myself. This operation is known as one-hot and it is supported as the "to_one_hot" in Theano's extra_ops package. Code:
M_one_hot = theano.tensor.extra_ops.to_one_hot(V, 5, dtype='int32')