Bag Of Visual Words Implementation in Python is giving terrible accuracy - python

I am trying to implement myself a bag of words classifier to classify a dataset I have. To be certain that my implementation is correct, I used just two classes from the Caltech dataset (http://www.vision.caltech.edu/Image_Datasets/Caltech101/) to test my implementation: elephant and electric guitar. As they are totally different visually, I believe that a correct implementation of Bag Of Visual Words (BOVW) classification could classify these images accurately.
From my understanding (please correct me if I am wrong), the correct BOVW classification happens in three steps:
Detect SIFT 128-Dimensional descriptors from training images and clusterize them with k-means.
Test the training and testing images SIFT descriptors in the k-means classifier (trained in step 1) and make a histogram of classification results.
Use these histograms as feature vectors for SVM classification
As I explained before, I tried to solve a very easy problem of classifying two very distinct classes. I am reading the training and testing files from a text file, I use the training images SIFT descriptors to train a k-means classifier, use the training and testing images to get the histogram of classifications and finally use them as feature vectors for classification.
The source code of my solution follows:
import numpy as np
from sklearn import svm
from sklearn.metrics import accuracy_score
#this function will get SIFT descriptors from training images and
#train a k-means classifier
def read_and_clusterize(file_images, num_cluster):
sift_keypoints = []
with open(file_images) as f:
images_names = f.readlines()
images_names = [a.strip() for a in images_names]
for line in images_names:
print(line)
#read image
image = cv2.imread(line,1)
# Convert them to grayscale
image =cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
# SIFT extraction
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(image,None)
#append the descriptors to a list of descriptors
sift_keypoints.append(descriptors)
sift_keypoints=np.asarray(sift_keypoints)
sift_keypoints=np.concatenate(sift_keypoints, axis=0)
#with the descriptors detected, lets clusterize them
print("Training kmeans")
kmeans = MiniBatchKMeans(n_clusters=num_cluster, random_state=0).fit(sift_keypoints)
#return the learned model
return kmeans
#with the k-means model found, this code generates the feature vectors
#by building an histogram of classified keypoints in the kmeans classifier
def calculate_centroids_histogram(file_images, model):
feature_vectors=[]
class_vectors=[]
with open(file_images) as f:
images_names = f.readlines()
images_names = [a.strip() for a in images_names]
for line in images_names:
print(line)
#read image
image = cv2.imread(line,1)
#Convert them to grayscale
image =cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
#SIFT extraction
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(image,None)
#classification of all descriptors in the model
predict_kmeans=model.predict(descriptors)
#calculates the histogram
hist, bin_edges=np.histogram(predict_kmeans)
#histogram is the feature vector
feature_vectors.append(hist)
#define the class of the image (elephant or electric guitar)
class_sample=define_class(line)
class_vectors.append(class_sample)
feature_vectors=np.asarray(feature_vectors)
class_vectors=np.asarray(class_vectors)
#return vectors and classes we want to classify
return class_vectors, feature_vectors
def define_class(img_patchname):
#print(img_patchname)
print(img_patchname.split('/')[4])
if img_patchname.split('/')[4]=="electric_guitar":
class_image=0
if img_patchname.split('/')[4]=="elephant":
class_image=1
return class_image
def main(train_images_list, test_images_list, num_clusters):
#step 1: read and detect SURF keypoints over the input image (train images) and clusterize them via k-means
print("Step 1: Calculating Kmeans classifier")
model= bovw.read_and_clusterize(train_images_list, num_clusters)
print("Step 2: Extracting histograms of training and testing images")
print("Training")
[train_class,train_featvec]=bovw.calculate_centroids_histogram(train_images_list,model)
print("Testing")
[test_class,test_featvec]=bovw.calculate_centroids_histogram(test_images_list,model)
#vamos usar os vetores de treino para treinar o classificador
print("Step 3: Training the SVM classifier")
clf = svm.SVC()
clf.fit(train_featvec, train_class)
print("Step 4: Testing the SVM classifier")
predict=clf.predict(test_featvec)
score=accuracy_score(np.asarray(test_class), predict)
file_object = open("results.txt", "a")
file_object.write("%f\n" % score)
file_object.close()
print("Accuracy:" +str(score))
if __name__ == "__main__":
main("train.txt", "test.txt", 1000)
main("train.txt", "test.txt", 2000)
main("train.txt", "test.txt", 3000)
main("train.txt", "test.txt", 4000)
main("train.txt", "test.txt", 5000)
As you can see, I tried to vary a lot the number of clusters in the kmeans classifier. However, no matter what I try, the accuracy is always 53.62%, which is terrible, considering that the images classes are quite diferent.
So, is there any problem with my understanding or implementation of BOVW? What I've mistaken here?

The solution is simpler than I thought.
In this line:
hist, bin_edges=np.histogram(predict_kmeans)
The number of bins is the standard number of bins from numpy (I belive it is 10). By doing this:
hist, bin_edges=np.histogram(predict_kmeans, bins=num_clusters)
The accuracy increased from the 53.62% I reported to 78.26% using 1000 clusters and, therefore 1000 dimensional vectors.

It looks like you are creating clusters and histograms for each image. But in order to make it work you have to aggregate the sift features for all images and then clusters theses and use these common clusters to create the histogram. Check out also https://github.com/shackenberg/Minimal-Bag-of-Visual-Words-Image-Classifier

Related

Features extraction with VGG16 for clustering

I am asking a lot but I am very stuck on this one...
I have this part of code I used to extract features with SIFT, and I am trying to adapdt it to extract features based on a VGG16 model.
No matter how hard I try, I can't get passed through and always rise errors.
So if anyone can help to get the features in a way to use it for a clustering afterwards.
Here is the code with SIFT :
# identification of key points and associated descriptors
import time, cv2
sift_keypoints = []
temps1=time.time()
sift = cv2.xfeatures2d.SIFT_create(500)
for image_num in range(len(list_photos)) :
if image_num%100 == 0 : print(image_num)
image = cv2.imread(path+list_photos[image_num],0) # convert in gray
image = cv2.GaussianBlur(image,(7,7),cv2.BORDER_DEFAULT) #apply gaussianblur filter
# image = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
res = cv2.equalizeHist(image) # equalize image histogram
kp, des = sift.detectAndCompute(res, None)
sift_keypoints.append(des)
sift_keypoints_by_img = np.asarray(sift_keypoints)
sift_keypoints_all = np.concatenate(sift_keypoints_by_img, axis=0)
And here is how I use it for my clustering :
from sklearn import cluster, metrics
# Determination number of clusters
k = int(round(np.sqrt(len(sift_keypoints_all)),0))
print("Nombre de clusters estimés : ", k)
print("Création de",k, "clusters de descripteurs ...")
# Clustering
kmeans = cluster.MiniBatchKMeans(n_clusters=k, init_size=3*k, random_state=0)
kmeans.fit(sift_keypoints_all)
What should I do to be able to extract features with a VGG model?
Thanks
There is an example regarding feature extraction with VGG16 in the official Keras documentation [1].
Note the layers of a convolutional network are successive representations of varying dimensions of your picture. Depending the layer you choose as output, the results from clustering may be very different.
[1] https://keras.io/api/applications/

CV2 Contour from VGG16 model

I need to know how to make a cv2 rectangle contour from the model I had?
My project is about Object Detection using Random Forests Algorithm
I Used the VGG16 model to extract features and put extract the result by Random forests
feature_extractor=VGG16()
feature_extractor.layers.pop()
feature_extractor=Model(inputs=feature_extractor.inputs,outputs=feature_extractor.layers[-1].output)
X_for_RF = feature_extractor.predict(x_train,verbose=1)
model = RandomForestClassifier(n_estimators = 100, random_state = 42)
model.fit(X_for_RF,y_train)
Example: if i have apple images and i made training model about it .
now i need when i enter image it make rectangle on the apple
i need to make detection contour not classfication

How can I plot the ROC curve from this data?

I have trained a Convolutional Neural Network (CNN) using Keras, and doing the following in order to find the accuracy on a test data set:
for root, dirs, files in os.walk(test_directory):
for file in files:
img = cv2.imread(root + '/' + file)
img = cv2.resize(img,(512,512),interpolation=cv2.INTER_AREA)
img = np.expand_dims(img, axis=0)
img = img/255.0
if os.path.basename(root) == 'nevus':
label = 1
elif os.path.basename(root) == 'melanoma':
label = 0
img_class = model.predict_classes(img)
prediction = img_class[0]
if prediction == label:
correct_classification = correct_classification + 1
print 'This is the prediction: '
print prediction
number_of_test_images = number_of_test_images + 1
print 'correct results:'
print correct_classification
print 'number of test images'
print number_of_test_images
print 'Accuray:'
print number_of_test_images/correct_classification * 100
Is there a way I can find the ROC curve from testing the model on the test data set?
Thanks.
The ROC Curve is simply a curve that plots TP (True Positive) vs. FP (False Positive) at different probability thresholds. So, if this is a binary classification problem then you could simply change the probability threshold on the predictions of your test dataset and get the TP-FP rates. Essentially create a table that has three columns: [Prob Threshold, TP, FP] and plot that. You'll need to use model.predict_proba(...) to get the probabilities by class and then use that to create the ROC Curve.
For multi-class it gets a little tricky. You have a few options though. You can plot a ROC curve for each class (A one vs many case) essentially binarizing the primary class against all the other classes. Alternatively, for multi-class you could do what sklean attempts to do and create a micro-average ROC curve or a macro-average ROC Curve.

Why does my keras model get good accuracy but bad predictions?

So, I am trying to make a model which can predict doodles. I am using google's quick draw data :https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap which are images rendered into 28x28 greyscale bitmap numpy array. I only chose 10 classes and took 60,000 photos to train/evaluate. I get a test accuracy of 91% . When I try to make predictions with data from test data, it works. But when i make a drawing in paint and convert it into 28x28, it doesn't make good predictions. What sort of data do I need to have? What kind of preprocessing does the image need?
This is how i preprocessed the data from google's npy file
def load_set(name,path,resultx,resulty,label):
loaded_set = np.load(path+name+".npy")
loaded_set = loaded_set.reshape(loaded_set.shape[0],1,28,28)
# print(name,loaded_set.shape)
loaded_set = loaded_set[0:6000,0:6000,0:6000,0:6000]
resultx = np.append(resultx,loaded_set,axis=0)
resulty = createLabelArray(label,loaded_set.shape[0],resulty)
print("loaded "+name)
return resultx,resulty
def createLabelArray(label,size,result):
for i in range(0,size):
result = np.append(result,[[label]],axis=0)
return result
where label is the label i want for that category.
I shuffle them afterwards and everything.
And this is how I am trying to process new images(drawings by me):
print("[INFO] loading and preprocessing image...")
image = image_utils.load_img(os.path.join(path, name), grayscale=True,target_size=(28, 28))
image = image_utils.img_to_array(image)
print(image.shape)
image = np.expand_dims(image, axis=0)
print(image.shape)
image = image.astype('float32')
image /= 255
return image
Please help, I've been stuck on this for a while now. Thank you
Seems to be a typical case of overfitting.
Please try 10-fold cross-validation to get accuracy of model.
Further use regularization and dropout in keras to prevent overfitting.

Scikit-learn SVM digit recognition

I want to make a program to recognize the digit in an image. I follow the tutorial in scikit learn .
I can train and fit the svm classifier like the following.
First, I import the libraries and dataset
from sklearn import datasets, svm, metrics
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
Second, I create the SVM model and train it with the dataset.
classifier = svm.SVC(gamma = 0.001)
classifier.fit(data[:n_samples], digits.target[:n_samples])
And then, I try to read my own image and use the function predict() to recognize the digit.
Here is my image:
I reshape the image into (8, 8) and then convert it to a 1D array.
img = misc.imread("w1.jpg")
img = misc.imresize(img, (8, 8))
img = img[:, :, 0]
Finally, when I print out the prediction, it returns [1]
predicted = classifier.predict(img.reshape((1,img.shape[0]*img.shape[1] )))
print predicted
Whatever I user others images, it still returns [1]
When I print out the "default" dataset of number "9", it looks like:
My image number "9" :
You can see the non-zero number is quite large for my image.
I dont know why. I am looking for help to solve my problem. Thanks
My best bet would be that there is a problem with your data types and array shapes.
It looks like you are training on numpy arrays that are of the type np.float64 (or possibly np.float32 on 32 bit systems, I don't remember) and where each image has the shape (64,).
Meanwhile your input image for prediction, after the resizing operation in your code, is of type uint8 and shape (1, 64).
I would first try changing the shape of your input image since dtype conversions often just work as you would expect. So change this line:
predicted = classifier.predict(img.reshape((1,img.shape[0]*img.shape[1] )))
to this:
predicted = classifier.predict(img.reshape(img.shape[0]*img.shape[1]))
If that doesn't fix it, you can always try recasting the data type as well with
img = img.astype(digits.images.dtype).
I hope that helps. Debugging by proxy is a lot harder than actually sitting in front of your computer :)
Edit: According to the SciPy documentation, the training data contains integer values from 0 to 16. The values in your input image should be scaled to fit the same interval. (http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
1) You need to create your own training set - based on data similar to what you will be making predictions. The call to datasets.load_digits() in scikit-learn is loading a preprocessed version of the MNIST Digits dataset, which, for all we know, could have very different images to the ones that you are trying to recognise.
2) You need to set the parameters of your classifier properly. The call to svm.SVC(gamma = 0.001) is just choosing an arbitrary value of the gamma parameter in SVC, which may not be the best option. In addition, you are not configuring the C parameter - which is pretty important for SVMs. I'd bet that this is one of the reasons why your output is 'always 1'.
3) Whatever final settings you choose for your model, you'll need to use a cross-validation scheme to ensure that the algorithm is effectively learning
There's a lot of Machine Learning theory behind this, but, as a good start, I would really recommend to have a look at SVM - scikit-learn for a more in-depth description of how the SVC implementation in sickit-learn works, and GridSearchCV for a simple technique for parameter setting.
It's just a guess but... The Training set from Sk-Learn are black numbers on a white background. And you are trying to predict numbers which are white on a black background...
I think you should either train on your training set, or train on the negative version of your pictures.
I hope this help !
If you look at:
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits
you can see that each point in the matrix as a value between 0-16.
You can try to transform the values of the image to between 0-16. I did it and now the prediction works well for the digit 9 but not for 8 and 6. It doesn't give 1 any more.
from sklearn import datasets, svm, metrics
import cv2
import numpy as np
# Load digit database
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
# Train SVM classifier
classifier = svm.SVC(gamma = 0.001)
classifier.fit(data[:n_samples], digits.target[:n_samples])
# Read image "9"
img = cv2.imread("w1.jpg")
img = img[:,:,0];
img = cv2.resize(img, (8, 8))
# Normalize the values in the image to 0-16
minValueInImage = np.min(img)
maxValueInImage = np.max(img)
normaliizeImg = np.floor(np.divide((img - minValueInImage).astype(np.float),(maxValueInImage-minValueInImage).astype(np.float))*16)
# Predict
predicted = classifier.predict(normaliizeImg.reshape((1,normaliizeImg.shape[0]*normaliizeImg.shape[1] )))
print predicted
I have solved this problem using below methods:
check the number of attributes, too large or too small.
check the scale of your gray value, I change to [0,16].
check data type, I change it to uint8.
check the number of training data, too small or not.
I hope it helps. ^.^
Hi in addition to #carrdelling respond, i will add that you may use the same training set, if you normalize your images to have the same range of value.
For example you could binaries your data ( 1 if > 0, 0 else ) or you could divide by the maximum intensity in your image to have an arbitrary interval [0;1].
You probably want to extract features relevant to to your data set from the images and train your model on them.
One example I copied from here.
surf = cv2.SURF(400)
kp, des = surf.detectAndCompute(img,None)
But the SURF features may not be the most useful or relevant to your dataset and training task. You should try others too like HOG or others.
Remember this more high level the features you extract the more general/error-tolerant your model will be to unseen images. However, you may be sacrificing accuracy in your known samples and test cases.

Categories

Resources