I'm trying to produce a confusion matrix for 2 binary images. These are extracted (using binary thresholding) from 2 bands in a GeoTiff image, although I think this information should be irrelevant.
dataset = rasterio.open('NDBI.tif')
VH_26Jun2015 = dataset.read(1)
VH_30Sep2015 = dataset.read(3)
GND_Truth = dataset.read(7)
VH_diff = VH_26Jun2015 - VH_30Sep2015
ret,th1 = cv2.threshold(VH_diff,0.02,255,cv2.THRESH_BINARY)
print(confusion_matrix(GND_Truth,th1)
Error 1: I used the code above and ran into the problem mentioned here ValueError: multilabel-indicator is not supported for confusion matrix
I tried the argmax(axis=1) solution mentioned in the question and other places, but with a resulting 1983x1983 sized matrix. (This Error 1 is probably same as what the person in the question above ran into).
print(confusion_matrix(GND_Truth.argmax(axis=1),th1.argmax(axis=1)))
Output:
[[8 2 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
I checked the contents of the GND_Truth and th1 and verified that they are binary.
numpy.unique(GND_Truth)
Output:
array([0., 1.], dtype=float32)
Error 2: Then I tried instead ravel() to flatten my binary images when passing to confusion_matrix like shown below, but resulting in a 3x3 matrix, whereas I'm expecting a 2x2 matrix.
print(confusion_matrix(GND_Truth.ravel().astype(int),th1.ravel().astype(int)))
Output:
[[16552434 0 2055509]
[ 6230317 0 1531602]
[ 0 0 0]]
Converting the data astype(int) did not really make a difference. Can you please suggest what might be causing these 2 errors?
Related
I'm facing an issue when opening a .tif using rasterio using the code below.
fp = 'image.tif'
image = rasterio.open(fp)
print(image.read())
When printing the content of the image, I get this
[[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]]
I verified all values and they are all 0. However, when dragging the image in QGIS, I can view it and confirm that the image contains values ranging from 101 to 122.
QGIS image
Any idea on how to read the image and get these 101 to 122 values as a numpy array ?
Here's a link to the image in question
I have a collection of one-hot vectors (in numpy)
[[0 0 0 ... 0 0 0] [0 1 0 ... 0 0 0] [0 1 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 1 0 0]]
My goal is to find the optimal path to reach all of the vectors, starting from the first vector (which is all 0's), which minimizes the number of steps. The path does not need to be continuous (ie if each vector has only one 1, then the number of steps can just be the number of non-zero vectors).
Is there any existing method that optimizes this? It's kind of like a shortest path problem.
I am new to DTW and was trying to apply the same for a dataset with ~700,000 rows and 9 features. I have two arrays (matrix) of the form,
[
[0 1 0 0 0 0 0 0 0],
[0 0 0 0 1 0 0 0 0],
...
[0 0 0 0 0 0 1 0 0],
[0 0 1 0 0 0 0 0 0],
]
I have explored the the fastdtw and dtaidistance packages. 'fastdtw' is able to give an output distance for the above matrix in around 5 min. In addition, I am looking to visualize the results as well, and apply hierarchical clustering. I didn't find any function in fastdtw to visualize the path/results and for clustering.
dtaidistance does provide these functions, but it takes too long to run (I ran it for the same two series above, it was still running after 15-20 minutes). Is there any way to handle this? Or can I do clustering and visualization with the results of fastdtw?
I would really appreciate some help regarding this.
I have a problem with connectedComponents (or connectedComponentsWithStats) which is an opencv (3.3.0) function in Python (2.7.12). A simple code is the following :
import numpy as np
import cv2
img = np.zeros((4,4), dtype = np.uint8)
img[1,1] = 255
img[2,2] = 255
output = cv2.connectedComponents(img, 4)
print output[1]
It returns
[[0 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 0]]
which is strange since I asked for connected components with connectivity 4 (not 8). So the two pixels in (1, 1) and (2, 2) are not connected and should give two different connected components, labelled 1 and 2 for instance.
Did I make a mistake ?
replacing
output = cv2.connectedComponents(img, 4)
by
output = cv2.connectedComponents(img, connectivity=4)
will give you
[[0 0 0 0]
[0 1 0 0]
[0 0 2 0]
[0 0 0 0]]
Alternatively provide all 3 arguments
output = cv2.connectedComponents(img, 4, cv2.CV_32S)
I'm not 100% why. I'll leave that to the Python experts out there. From my understanding cv2.connectedComponents(img, 4) should work just fine. But it doesn't
The following code generates a matrix X (I use Python 2.7):
X = [random.randint(0, 2 ** 8) for _ in range(num)]
# Removes duplicates
X = list(set(X))
# Transforms into string representation
X = [('{0:0' + str(8) + 'b}').format(x) for x in X]
# Transforms each bit into an integer.
X = np.asarray([list(map(int, list(x))) for x in X], dtype=np.int8)
Which is deliberately in this form (Assuming I generate only 10 numbers):
[[1 0 1 1 0 0 0 0]
[0 1 0 0 0 1 1 1]
[0 0 0 0 0 0 0 1]
[1 0 0 0 0 1 0 0]
[0 1 1 0 0 1 1 0]
[1 1 0 0 1 1 0 1]
[1 1 1 0 0 1 1 1]
[0 1 0 0 1 1 1 1]]
My goal is to store and load it again (with square brackets) using numpy. In the storing process, I use numpy.savetxt('dataset.txt', X, fmt='%d') (which removes the square brackets :( ). The problem is that I want to load it back into in the same shape shown above (including the square brackets). Using numpy.loadtxt(StringIO('dataset.txt')) does it help. I am not sure how to implement that. I tried to find an (efficient) trick to do so but really I am stuck!! Any help is REALLY appreciated.
Thank you
I would use np.save() which will save it as a binary file and use np.load() to get it back.