I have trained a Convolutional Neural Network (CNN) using Keras, and doing the following in order to find the accuracy on a test data set:
for root, dirs, files in os.walk(test_directory):
for file in files:
img = cv2.imread(root + '/' + file)
img = cv2.resize(img,(512,512),interpolation=cv2.INTER_AREA)
img = np.expand_dims(img, axis=0)
img = img/255.0
if os.path.basename(root) == 'nevus':
label = 1
elif os.path.basename(root) == 'melanoma':
label = 0
img_class = model.predict_classes(img)
prediction = img_class[0]
if prediction == label:
correct_classification = correct_classification + 1
print 'This is the prediction: '
print prediction
number_of_test_images = number_of_test_images + 1
print 'correct results:'
print correct_classification
print 'number of test images'
print number_of_test_images
print 'Accuray:'
print number_of_test_images/correct_classification * 100
Is there a way I can find the ROC curve from testing the model on the test data set?
Thanks.
The ROC Curve is simply a curve that plots TP (True Positive) vs. FP (False Positive) at different probability thresholds. So, if this is a binary classification problem then you could simply change the probability threshold on the predictions of your test dataset and get the TP-FP rates. Essentially create a table that has three columns: [Prob Threshold, TP, FP] and plot that. You'll need to use model.predict_proba(...) to get the probabilities by class and then use that to create the ROC Curve.
For multi-class it gets a little tricky. You have a few options though. You can plot a ROC curve for each class (A one vs many case) essentially binarizing the primary class against all the other classes. Alternatively, for multi-class you could do what sklean attempts to do and create a micro-average ROC curve or a macro-average ROC Curve.
Related
I am working on a binary segmentation problem using Pytorch. I want to know the correct way of calculating metrics like precision, recall, f1-score, mIOU, etc for my test set. From many of the online codes available, I found different ways of calculation, which I have mentioned below:
Method 1
Calculate metrics for individual images separately.
Compute the sum of each metric of all test images.
Divide the metrics by the number of test images.
Example:
for i, (x, y) in enumerate(zip(test_x, test_y)):
...
Score += calc_metrics(mask1, pred1)
# Score could be any of the metrics
Final_Score = Score/ len(test_x)
Method 2
Add TP, FP, TN, and FN pixel count of all the images and prepare a confusion matrix.
Calculate all metrics at the end using the total TP, FP, TN, and FN pixel count and confusion matrix.
Example:
for i, (x, y) in enumerate(zip(test_x, test_y)):
...
FP += np.float(np.sum((pred1==1) & (mask1==0)))
FN += np.float(np.sum((pred1==0) & (mask1==1)))
TP += np.float(np.sum((pred1==1) & (mask1==1)))
TN += np.float(np.sum((pred1==0) & (mask1==0)))
Score = calc_metrics(TP, FP, TN, TP)
Method 3
Calculate batch-wise metrics and divide them by the number of test images at the end.
Method 4
Unlike all the above methods, which use 0.5 as the threshold on the predicted mask (after applying sigmoid), this method uses a range of thresholds and computes metrics on different thresholds for each metric and takes the mean of these values at the end.
Example:
for i, (x, y) in enumerate(zip(test_x, test_y)):
...
for t in range(len(threshold)):
thresh = threshold(t)
thresh_metric(t) = calc_metrics(mask1, pred1, thresh)
thresh_final(i,:) = thresh_metric
Score = np.sum(mean(thresh_final))/len(test_x)
I am confused about which way to use to report my model's results on the test set.
I'm using a CNN with an Autoencoder to cluster different types of RNA. The clusters are calculated from the compressed representations of the different RNAs. Every RNA has a label corresponding to the type of RNA. In my case 7 different classes. After I get the result of the clustering I would like to visualize the results and see which RNA clusters where but right now the y_pred value does not correspond to the to the RNA-class but to the cluster that was initialized by kmeans.
kmeans = KMeans(n_clusters=self.n_clusters, n_init=20)
self.y_pred = kmeans.fit_predict(self.encoder.predict(x))
y_pred_last = np.copy(self.y_pred)
self.model.get_layer(name='clustering').set_weights([kmeans.cluster_centers_])
print(kmeans.labels_)
self.y_pred = q.argmax(1)
if y is not None:
acc = np.round(metrics.acc(y, self.y_pred), 5)
nmi = np.round(metrics.nmi(y, self.y_pred), 5)
ari = np.round(metrics.ari(y, self.y_pred), 5)
loss = np.round(loss, 5)
logdict = dict(iter=ite, acc=acc, nmi=nmi, ari=ari, L=loss[0], Lc=loss[1], Lr=loss[2])
optimizer = 'adam'
dcec.compile(loss=['kld', 'mse'], loss_weights=[args.gamma, 1], optimizer=optimizer)
dcec.fit(x, y=y, tol=args.tol, maxiter=args.maxiter,
update_interval=args.update_interval,
save_dir=args.save_dir,
cae_weights=args.cae_weights)
y_pred = dcec.y_pred
result = list(itertools.chain(y))
with open('datapoints.csv', mode='w', newline='') as data_points:
data_writer = csv.writer(data_points)
data_writer.writerow(['id', 'ytrue', 'ypred'])
truth= y
prediction = dcec.y_pred
for i in range(len(result)):
data_writer.writerow([i, truth[i], prediction[i]])
My problem right now is this part: prediction = dcec.y_pred
The output shows me the correct true label but not the "correct" predicted label. It returns a value but this does not correspond to the RNA-types
I don't know if this is the right path. Mainly I just want to visualize the clusters and see which RNA type was rightly and wrongly classified.
You might not be using the correct function call to get the prediction from the Keras model. I believe you should be doing something like:
prediction = dcec.predict(x)
Additional details are here: https://keras.io/models/model/
I hope this helps.
I am trying to implement myself a bag of words classifier to classify a dataset I have. To be certain that my implementation is correct, I used just two classes from the Caltech dataset (http://www.vision.caltech.edu/Image_Datasets/Caltech101/) to test my implementation: elephant and electric guitar. As they are totally different visually, I believe that a correct implementation of Bag Of Visual Words (BOVW) classification could classify these images accurately.
From my understanding (please correct me if I am wrong), the correct BOVW classification happens in three steps:
Detect SIFT 128-Dimensional descriptors from training images and clusterize them with k-means.
Test the training and testing images SIFT descriptors in the k-means classifier (trained in step 1) and make a histogram of classification results.
Use these histograms as feature vectors for SVM classification
As I explained before, I tried to solve a very easy problem of classifying two very distinct classes. I am reading the training and testing files from a text file, I use the training images SIFT descriptors to train a k-means classifier, use the training and testing images to get the histogram of classifications and finally use them as feature vectors for classification.
The source code of my solution follows:
import numpy as np
from sklearn import svm
from sklearn.metrics import accuracy_score
#this function will get SIFT descriptors from training images and
#train a k-means classifier
def read_and_clusterize(file_images, num_cluster):
sift_keypoints = []
with open(file_images) as f:
images_names = f.readlines()
images_names = [a.strip() for a in images_names]
for line in images_names:
print(line)
#read image
image = cv2.imread(line,1)
# Convert them to grayscale
image =cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
# SIFT extraction
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(image,None)
#append the descriptors to a list of descriptors
sift_keypoints.append(descriptors)
sift_keypoints=np.asarray(sift_keypoints)
sift_keypoints=np.concatenate(sift_keypoints, axis=0)
#with the descriptors detected, lets clusterize them
print("Training kmeans")
kmeans = MiniBatchKMeans(n_clusters=num_cluster, random_state=0).fit(sift_keypoints)
#return the learned model
return kmeans
#with the k-means model found, this code generates the feature vectors
#by building an histogram of classified keypoints in the kmeans classifier
def calculate_centroids_histogram(file_images, model):
feature_vectors=[]
class_vectors=[]
with open(file_images) as f:
images_names = f.readlines()
images_names = [a.strip() for a in images_names]
for line in images_names:
print(line)
#read image
image = cv2.imread(line,1)
#Convert them to grayscale
image =cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
#SIFT extraction
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(image,None)
#classification of all descriptors in the model
predict_kmeans=model.predict(descriptors)
#calculates the histogram
hist, bin_edges=np.histogram(predict_kmeans)
#histogram is the feature vector
feature_vectors.append(hist)
#define the class of the image (elephant or electric guitar)
class_sample=define_class(line)
class_vectors.append(class_sample)
feature_vectors=np.asarray(feature_vectors)
class_vectors=np.asarray(class_vectors)
#return vectors and classes we want to classify
return class_vectors, feature_vectors
def define_class(img_patchname):
#print(img_patchname)
print(img_patchname.split('/')[4])
if img_patchname.split('/')[4]=="electric_guitar":
class_image=0
if img_patchname.split('/')[4]=="elephant":
class_image=1
return class_image
def main(train_images_list, test_images_list, num_clusters):
#step 1: read and detect SURF keypoints over the input image (train images) and clusterize them via k-means
print("Step 1: Calculating Kmeans classifier")
model= bovw.read_and_clusterize(train_images_list, num_clusters)
print("Step 2: Extracting histograms of training and testing images")
print("Training")
[train_class,train_featvec]=bovw.calculate_centroids_histogram(train_images_list,model)
print("Testing")
[test_class,test_featvec]=bovw.calculate_centroids_histogram(test_images_list,model)
#vamos usar os vetores de treino para treinar o classificador
print("Step 3: Training the SVM classifier")
clf = svm.SVC()
clf.fit(train_featvec, train_class)
print("Step 4: Testing the SVM classifier")
predict=clf.predict(test_featvec)
score=accuracy_score(np.asarray(test_class), predict)
file_object = open("results.txt", "a")
file_object.write("%f\n" % score)
file_object.close()
print("Accuracy:" +str(score))
if __name__ == "__main__":
main("train.txt", "test.txt", 1000)
main("train.txt", "test.txt", 2000)
main("train.txt", "test.txt", 3000)
main("train.txt", "test.txt", 4000)
main("train.txt", "test.txt", 5000)
As you can see, I tried to vary a lot the number of clusters in the kmeans classifier. However, no matter what I try, the accuracy is always 53.62%, which is terrible, considering that the images classes are quite diferent.
So, is there any problem with my understanding or implementation of BOVW? What I've mistaken here?
The solution is simpler than I thought.
In this line:
hist, bin_edges=np.histogram(predict_kmeans)
The number of bins is the standard number of bins from numpy (I belive it is 10). By doing this:
hist, bin_edges=np.histogram(predict_kmeans, bins=num_clusters)
The accuracy increased from the 53.62% I reported to 78.26% using 1000 clusters and, therefore 1000 dimensional vectors.
It looks like you are creating clusters and histograms for each image. But in order to make it work you have to aggregate the sift features for all images and then clusters theses and use these common clusters to create the histogram. Check out also https://github.com/shackenberg/Minimal-Bag-of-Visual-Words-Image-Classifier
I'm trying to plot the ROC curve from a modified version of the CIFAR-10 example provided by tensorflow. It's now for 2 classes instead of 10.
The output of the network are called logits and take the form:
[[-2.57313061 2.57966399] [ 0.04221377 -0.04033273] [-1.42880082
1.43337202] [-2.7692945 2.78173304] [-2.48195744 2.49331546] [ 2.0941515 -2.10268974] [-3.51670194 3.53267646] [-2.74760485 2.75617766] ...]
First of all, what do these logits actually represent? The final layer in the network is a "softmax linear" of form WX+b.
The model is able to calculate accuracy by calling
top_k_op = tf.nn.in_top_k(logits, labels, 1)
Then once the graph has been initialized:
predictions = sess.run([top_k_op])
predictions_int = np.array(predictions).astype(int)
true_count += np.sum(predictions)
...
precision = true_count / total_sample_count
This works fine.
But now how can I plot a ROC curve from this?
I've been trying the "sklearn.metrics.roc_curve()" function (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) but I don't know what to use as my "y_score" parameter.
Any help would be appreciated!
'y_score' here should be an array corresponding to the probability of each sample that will be classified as positive (if positive was labeled as 1 in your y_true array)
Actually, if your network use Softmax as the last layer, then the model should output the probability of each category for this instance. But the data you given here doesn't conform with this format. And I checked the example code : https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10.py
it seems use the layer called softmax_linear, I know little for this Example but I guess you should process the output with something like Logistic Function to turn it into the probability.
Then just feed it along with your true label 'y_true' to the scikit-learn function:
y_score = np.array(output)[:,1]
roc_curve(y_true, y_score)
import tensorflow as tf
tp = [] # the true positive rate list
fp = [] # the false positive rate list
total = len(fp)
writer = tf.train.SummaryWriter("/tmp/tensorboard_roc")
for idx in range(total):
summt = tf.Summary()
summt.value.add(tag="roc", simple_value = tp[idx])
writer.add_summary (summt, tp[idx] * 100) #act as global_step
writer.flush ()
then start a tensorboard:
tensorboard --logdir=/tmp/tensorboard_roc
tensorboard_roc
for details and code, you can visit my blog: http://blog.csdn.net/mao_feng/article/details/54731098
I'm just wondering if this is a legitimate way of calculating classification accuracy:
obtain precision recall thresholds
for each threshold binarize the continuous y_scores
calculate their accuracy from the contingency table (confusion matrix)
return the average accuracy for the thresholds
recall, precision, thresholds = precision_recall_curve(np.array(np_y_true), np.array(np_y_scores))
accuracy = 0
for threshold in thresholds:
contingency_table = confusion_matrix(np_y_true, binarize(np_y_scores, threshold=threshold)[0])
accuracy += (float(contingency_table[0][0]) + float(contingency_table[1][1]))/float(np.sum(contingency_table))
print "Classification accuracy is: {}".format(accuracy/len(thresholds))
You are heading into the right direction.
The confusion matrix definetly is the right start for computing the accuracy of your classifier. It seems to me that you are aiming at reciever operating characteristics.
In statistics, a receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied.
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
The AUC (area under the curve) is a measurement of your classifiers performance. More information and explanation can be found here:
https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it
http://mlwiki.org/index.php/ROC_Analysis
This is my implementation, which you are welcome to improve/comment:
def auc(y_true, y_val, plot=False):
#check input
if len(y_true) != len(y_val):
raise ValueError('Label vector (y_true) and corresponding value vector (y_val) must have the same length.\n')
#empty arrays, true positive and false positive numbers
tp = []
fp = []
#count 1's and -1's in y_true
cond_positive = list(y_true).count(1)
cond_negative = list(y_true).count(-1)
#all possibly relevant bias parameters stored in a list
bias_set = sorted(list(set(y_val)), key=float, reverse=True)
bias_set.append(min(bias_set)*0.9)
#initialize y_pred array full of negative predictions (-1)
y_pred = np.ones(len(y_true))*(-1)
#the computation time is mainly influenced by this for loop
#for a contamination rate of 1% it already takes ~8s to terminate
for bias in bias_set:
#"lower values tend to correspond to label −1"
#indices of values which exceed the bias
posIdx = np.where(y_val > bias)
#set predicted values to 1
y_pred[posIdx] = 1
#the following function simply calculates results which enable a distinction
#between the cases of true positive and false positive
results = np.asarray(y_true) + 2*np.asarray(y_pred)
#append the amount of tp's and fp's
tp.append(float(list(results).count(3)))
fp.append(float(list(results).count(1)))
#calculate false positive/negative rate
tpr = np.asarray(tp)/cond_positive
fpr = np.asarray(fp)/cond_negative
#optional scatterplot
if plot == True:
plt.scatter(fpr,tpr)
plt.show()
#calculate AUC
AUC = np.trapz(tpr,fpr)
return AUC