Optimal path through one-hot vectors - python

I have a collection of one-hot vectors (in numpy)
[[0 0 0 ... 0 0 0] [0 1 0 ... 0 0 0] [0 1 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 1 0 0]]
My goal is to find the optimal path to reach all of the vectors, starting from the first vector (which is all 0's), which minimizes the number of steps. The path does not need to be continuous (ie if each vector has only one 1, then the number of steps can just be the number of non-zero vectors).
Is there any existing method that optimizes this? It's kind of like a shortest path problem.

Related

How to compare each row of vectors of numpy array to itself and every element

I have a numpy array which contains vectorised data. I need to compare each of these vectors (a row in the array) euclidean distances to itself and every other row.
The vectors are of the form
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
I know I need two loops, here is what I have so far
def euclidean_distance_loop(termdoc):
i = 0
j = 0
matrix = np.array([])
while( j < (len(termdoc-1))):
matrix = np.append(matrix,[euclidean_distance(termdoc[i],termdoc[j])])
j = j + 1
return np.array([matrix])
euclidean_distance_loop(termdoc)
I know this is an index problem and I need another index or an incremented index in another loop but not sure how to construct it
You don’t need loops.
def self_distance(x):
return np.linalg.norm(x[:,np.newaxis] - x, axis=-1)
See also:
Numpy. Compare all vector row in one array with every other one in the same array
How can the Euclidean distance be calculated with NumPy?

How to get confusion matrix for binary image?

I'm trying to produce a confusion matrix for 2 binary images. These are extracted (using binary thresholding) from 2 bands in a GeoTiff image, although I think this information should be irrelevant.
dataset = rasterio.open('NDBI.tif')
VH_26Jun2015 = dataset.read(1)
VH_30Sep2015 = dataset.read(3)
GND_Truth = dataset.read(7)
VH_diff = VH_26Jun2015 - VH_30Sep2015
ret,th1 = cv2.threshold(VH_diff,0.02,255,cv2.THRESH_BINARY)
print(confusion_matrix(GND_Truth,th1)
Error 1: I used the code above and ran into the problem mentioned here ValueError: multilabel-indicator is not supported for confusion matrix
I tried the argmax(axis=1) solution mentioned in the question and other places, but with a resulting 1983x1983 sized matrix. (This Error 1 is probably same as what the person in the question above ran into).
print(confusion_matrix(GND_Truth.argmax(axis=1),th1.argmax(axis=1)))
Output:
[[8 2 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
I checked the contents of the GND_Truth and th1 and verified that they are binary.
numpy.unique(GND_Truth)
Output:
array([0., 1.], dtype=float32)
Error 2: Then I tried instead ravel() to flatten my binary images when passing to confusion_matrix like shown below, but resulting in a 3x3 matrix, whereas I'm expecting a 2x2 matrix.
print(confusion_matrix(GND_Truth.ravel().astype(int),th1.ravel().astype(int)))
Output:
[[16552434 0 2055509]
[ 6230317 0 1531602]
[ 0 0 0]]
Converting the data astype(int) did not really make a difference. Can you please suggest what might be causing these 2 errors?

matlab's bwmorph(image, 'spur') in python

I'm porting a matlab image processing script over to python/skimage and haven't been able to find Matlab's bwmorph function, specifically the 'spur' operation in skimage. The matlab docs say this about spur operation:
Removes spur pixels. For example:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 becomes 0 0 0 0
0 1 0 0 0 1 0 0
1 1 0 0 1 1 0 0
I've implemented a version in python than handles the above case fine:
def _neighbors_conv(image):
image = image.astype(np.int)
k = np.array([[1,1,1],[1,0,1],[1,1,1]])
neighborhood_count = ndimage.convolve(image,k, mode='constant', cval=1)
neighborhood_count[~image.astype(np.bool)] = 0
return neighborhood_count
def spur(image):
return _neighbors_conv(image) > 1
def bwmorph(image, fn, n=1):
for _ in range(n):
image = fn(image)
return image
t= [[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 1, 0, 0]]
t = np.array(t)
print('neighbor count:')
print(_neighbors_conv(t))
print('after spur:')
print(bwmorph(t,spur).astype(np.int))
neighbor count:
[[0 0 0 0]
[0 0 1 0]
[0 3 0 0]
[7 5 0 0]]
after spur:
[[0 0 0 0]
[0 0 0 0]
[0 1 0 0]
[1 1 0 0]]
The above works by removing any pixels that only have a single neighboring pixel.
I have noticed that the above implementation behaves differently than matlab's spur operation though. Take this example in matlab:
0 0 0 0 0
0 0 1 0 0
0 1 1 1 1
0 0 1 0 0
0 0 0 0 0
becomes, via bwmorph(t,'spur',1):
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
The spur operation is a bit more complex than looking at the 8-neighbor count. It is not clear to me how to extend my implementation to satisfy this case without making it too aggressive (i.e. removing valid pixels).
What is the underlying logic of matlab's spur or is there a python implementation already available that I can use?
UPDATE:
I have found Octave's implemenation of spur that uses a LUT:
case('spur')
## lut=makelut(inline("xor(x(2,2),(sum((x&[0,1,0;1,0,1;0,1,0])(:))==0)&&(sum((x&[1,0,1;0,0,0;1,0,1])(:))==1)&&x(2,2))","x"),3);
## which is the same as
lut=repmat([zeros(16,1);ones(16,1)],16,1); ## identity
lut([18,21,81,273])=0; ## 4 qualifying patterns
lut=logical(lut);
cmd="BW2=applylut(BW, lut);";
(via https://searchcode.com/codesearch/view/9585333/)
Assuming that is correct I just need to be able to create this LUT in python and apply it...
I ended up implementing my own version of spur and other operations of bwmorph myself. For future internet travelers who have the same need here is a handy gist of what I ended up using:
https://gist.github.com/bmabey/4dd36d9938b83742a88b6f68ac1901a6

Evenly Split 3D Numpy Arays of Varying Sizes [duplicate]

I have a 3D image with size: Deep x Weight x Height (for example: 10x20x30, means 10 images, and each image has size 20x30.
Given a patch size is pd x pw x ph (such as pd <Deep, pw<Weight, ph<Height), for example patch size: 4x4x4. The center point location of the path will be: pd/2 x pw/2 x ph/2. Let's call the distance between time t and time t+1 of the center point be stride, for example stride=2.
I want to extract the original 3D image into patches with size and stride given above. How can I do it in python? Thank you
.
Use np.lib.stride_tricks.as_strided. This solution does not require the strides to divide the corresponding dimensions of the input stack. It even allows for overlapping patches (Just do not write to the result in this case, or make a copy.). It therefore is more flexible than other approaches:
import numpy as np
from numpy.lib import stride_tricks
def cutup(data, blck, strd):
sh = np.array(data.shape)
blck = np.asanyarray(blck)
strd = np.asanyarray(strd)
nbl = (sh - blck) // strd + 1
strides = np.r_[data.strides * strd, data.strides]
dims = np.r_[nbl, blck]
data6 = stride_tricks.as_strided(data, strides=strides, shape=dims)
return data6#.reshape(-1, *blck)
#demo
x = np.zeros((5, 6, 12), int)
y = cutup(x, (2, 2, 3), (3, 3, 5))
y[...] = 1
print(x[..., 0], '\n')
print(x[:, 0, :], '\n')
print(x[0, ...], '\n')
Output:
[[1 1 0 1 1 0]
[1 1 0 1 1 0]
[0 0 0 0 0 0]
[1 1 0 1 1 0]
[1 1 0 1 1 0]]
[[1 1 1 0 0 1 1 1 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]]
[[1 1 1 0 0 1 1 1 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]
[1 1 1 0 0 1 1 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0]]
Explanation. Numpy arrays are organised in terms of strides, one for each dimension, data point [x,y,z] is located in memory at address base + stridex * x + stridey * y + stridez * z.
The stride_tricks.as_strided factory allows to directly manipulate the strides and shape of a new array sharing its memory with a given array. Try this only if you know what you're doing because no checks are performed, meaning you are allowed to shoot your foot by addressing out-of-bounds memory.
The code uses this function to split up each of the three existing dimensions into two new ones, one for the corresponding within block coordinate (this will have the same stride as the original dimension, because adjacent points in a block corrspond to adjacent points in the whole stack) and one dimension for the block index along this axis; this will have stride = original stride x block stride.
All the code does is computing the correct strides and dimensions (= block dimensions and block counts along the three axes).
Since the data are shared with the original array, when we set all points of the 6d array to 1, they are also set in the original array exposing the block structure in the demo. Note that the commented out reshape in the last line of the function breaks this link, because it forces a copy.
the skimage module offer you an integrated solution with view_as_blocks.
The source is on line.
Take care to choose Deep,Weight,Height multiple of pd, pw, ph, because as_strided do not check bounds.

Recursion and Percolation

I'm trying to write a function that will check for undirected percolation in a numpy array. In this case, undirected percolation occurs when there is some kind of path that the liquid can follow (the liquid can travel up, down, and sideways, but not diagonally). Below is an example of an array that could be given to us.
1 0 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 1 0 0
1 0 1 0 1
The result of percolation in this scenario is below.
1 0 1 1 0
1 0 0 0 0
1 0 1 0 0
1 1 1 0 0
1 0 1 0 0
In the scenario above, the liquid could follow a path and everything with a 1 currently would refill except for the 1's in positions [1,4] and [4,4].
The function I'm trying to write starts at the top of the array and checks to see if it's a 1. If it's a 1, it writes it to a new array. What I want it to do next is check the positions above, below, left, and right of the 1 that has just been assigned.
What I currently have is below.
def flow_from(sites,full,i,j)
n = len(sites)
if j>=0 and j<n and i>=0 and i<n: #Check to see that value is in array bounds
if sites[i,j] == 0:
full[i,j] = 0
else:
full[i,j] = 1
flow_from(sites, full, i, j + 1)
flow_from(sites, full, i, j - 1)
flow_from(sites, full, i + 1, j)
flow_from(sites, full, i - 1, j)
In this case, sites is the original matrix, for example the one shown above. New is the matrix that has been replaced with it's flow matrix. Second matrix shown. And i and j are used to iterate through.
Whenever I run this, I get an error that says "RuntimeError: maximum recursion depth exceeded in comparison." I looked into this and I don't think I need to adjust my recursion limit, but I have a feeling there's something blatantly obvious with my code that I just can't see. Any pointers?
Forgot about your code block. This is a known problem with a known solution from the scipy library. Adapting the code from this answer and assume your data is in an array named A.
from scipy.ndimage import measurements
# Identify the clusters
lw, num = measurements.label(A)
area = measurements.sum(A, lw, index=np.arange(lw.max() + 1))
print A
print area
This gives:
[[1 0 1 1 0]
[1 0 0 0 1]
[1 0 1 0 0]
[1 1 1 0 0]
[1 0 1 0 1]]
[[1 0 2 2 0]
[1 0 0 0 3]
[1 0 1 0 0]
[1 1 1 0 0]
[1 0 1 0 4]]
[ 0. 9. 2. 1. 1.]
That is, it's labeled all the "clusters" for you and identified the size! From here you can see that the clusters labeled 3 and 4 have size 1 which is what you want to filter away. This is a much more powerful approach because now you can filter for any size.

Categories

Resources