How to calculate similarity between an image and a text?

How to calculate similarity between an image and a text? - python

I have several images and I want to know if there is any aircraft in the images or not.
I used the clip shown below but the output is [[1.0]], while the image is the face of humans. I think it is because it uses softmax.
I tried to use logits_per_image but the value is not understandable to me tensor([[20.03]]).
Is there any way to know if an image is related to a word in percent or so?
Can I use object detection in my problem to see if there are any aircraft in my image?
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
image = Image.open('image_4.jpg')
inputs = processor(text=['aircraft'], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
probs.tolist()

Related

Features extraction with VGG16 for clustering

I am asking a lot but I am very stuck on this one...
I have this part of code I used to extract features with SIFT, and I am trying to adapdt it to extract features based on a VGG16 model.
No matter how hard I try, I can't get passed through and always rise errors.
So if anyone can help to get the features in a way to use it for a clustering afterwards.
Here is the code with SIFT :
# identification of key points and associated descriptors
import time, cv2
sift_keypoints = []
temps1=time.time()
sift = cv2.xfeatures2d.SIFT_create(500)
for image_num in range(len(list_photos)) :
if image_num%100 == 0 : print(image_num)
image = cv2.imread(path+list_photos[image_num],0) # convert in gray
image = cv2.GaussianBlur(image,(7,7),cv2.BORDER_DEFAULT) #apply gaussianblur filter
# image = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
res = cv2.equalizeHist(image) # equalize image histogram
kp, des = sift.detectAndCompute(res, None)
sift_keypoints.append(des)
sift_keypoints_by_img = np.asarray(sift_keypoints)
sift_keypoints_all = np.concatenate(sift_keypoints_by_img, axis=0)
And here is how I use it for my clustering :
from sklearn import cluster, metrics
# Determination number of clusters
k = int(round(np.sqrt(len(sift_keypoints_all)),0))
print("Nombre de clusters estimés : ", k)
print("Création de",k, "clusters de descripteurs ...")
# Clustering
kmeans = cluster.MiniBatchKMeans(n_clusters=k, init_size=3*k, random_state=0)
kmeans.fit(sift_keypoints_all)
What should I do to be able to extract features with a VGG model?
Thanks

There is an example regarding feature extraction with VGG16 in the official Keras documentation [1].
Note the layers of a convolutional network are successive representations of varying dimensions of your picture. Depending the layer you choose as output, the results from clustering may be very different.
[1] https://keras.io/api/applications/

CV2 Contour from VGG16 model

I need to know how to make a cv2 rectangle contour from the model I had?
My project is about Object Detection using Random Forests Algorithm
I Used the VGG16 model to extract features and put extract the result by Random forests
feature_extractor=VGG16()
feature_extractor.layers.pop()
feature_extractor=Model(inputs=feature_extractor.inputs,outputs=feature_extractor.layers[-1].output)
X_for_RF = feature_extractor.predict(x_train,verbose=1)
model = RandomForestClassifier(n_estimators = 100, random_state = 42)
model.fit(X_for_RF,y_train)
Example: if i have apple images and i made training model about it .
now i need when i enter image it make rectangle on the apple
i need to make detection contour not classfication

Class Imbalance for image classification

I am working on multi-label image classification where some labels have very few images. How to handle these cases?

Data augmentation, which means making 'clones' (reverse image/ set different angle/ etc.)

Do Image Augmentation for your data-set. Image augmentation means add variation (noise, resize etc) to your training image in a way that your object you are classifying can be seen through naked eye.
Some code for Image augmentation are.
adding Noise
gaussian_noise=iaa.AdditiveGaussianNoise(10,20)
noise_image=gaussian_noise.augment_image(image)
ia.imshow(noise_image)
Cropping
crop = iaa.Crop(percent=(0, 0.3)) # crop image
corp_image=crop.augment_image(image)
ia.imshow(corp_image)
Sheering
shear = iaa.Affine(shear=(0,40))
shear_image=shear.augment_image(image)
ia.imshow(shear_image)
Flipping
#flipping image horizontally
flip_hr=iaa.Fliplr(p=1.0)
flip_hr_image= flip_hr.augment_image(image)
ia.imshow(flip_hr_image)
Now you just need to put that into your data generator and your problem for class imbalance will be solved

While you can augment your data as suggested in the answers, you can use different weights to balance your multi-label loss. If n_c is the number of samples in class c then you can weight your loss value for class c:
l_c' = (1/n_c) * l_c

How to predict a single input(external) image for categorical data using CNN using Keras

I am making a project on handwritten digit recognition using MNIST database and I have trained it for 60,0000 images in the data set and tested it for the 10,000 test images and got results about 99% accurate.
Now I want to input an external image to see whether my handwritten digit is recognized by the CNN or not. So I scanned my own handwritten image, converted it into gray scale and numpy array and feed it into the CNN, but I am always getting the output predicted result as 8 as a one hot encoded vector of numpy array.
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('six.jpg', target_size = (28,28))
test_image = image.img_to_array(test_image, data_format = None)
from numpy import *
test_image= delete(test_image, np.s_[::2], 2)
test_image = np.expand_dims(test_image, axis = 0)
predicted_dig = digit_recogniser.predict(test_image,batch_size= 32)
predicted_digits = np.argmax(np.round(predicted_digits),axis=0)
Can you please help me in figuring out what is the problem with the code and how can I successfully predict the digits individually scanned by/ external inputs? My CNN is fully trained using the MNIST data set. This is a kind of single prediction I want to make with some accuracy on taking random handwritten images of my choice.

Do you match the training data preprocessing during testing?

Why does my keras model get good accuracy but bad predictions?

So, I am trying to make a model which can predict doodles. I am using google's quick draw data :https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap which are images rendered into 28x28 greyscale bitmap numpy array. I only chose 10 classes and took 60,000 photos to train/evaluate. I get a test accuracy of 91% . When I try to make predictions with data from test data, it works. But when i make a drawing in paint and convert it into 28x28, it doesn't make good predictions. What sort of data do I need to have? What kind of preprocessing does the image need?
This is how i preprocessed the data from google's npy file
def load_set(name,path,resultx,resulty,label):
loaded_set = np.load(path+name+".npy")
loaded_set = loaded_set.reshape(loaded_set.shape[0],1,28,28)
# print(name,loaded_set.shape)
loaded_set = loaded_set[0:6000,0:6000,0:6000,0:6000]
resultx = np.append(resultx,loaded_set,axis=0)
resulty = createLabelArray(label,loaded_set.shape[0],resulty)
print("loaded "+name)
return resultx,resulty
def createLabelArray(label,size,result):
for i in range(0,size):
result = np.append(result,[[label]],axis=0)
return result
where label is the label i want for that category.
I shuffle them afterwards and everything.
And this is how I am trying to process new images(drawings by me):
print("[INFO] loading and preprocessing image...")
image = image_utils.load_img(os.path.join(path, name), grayscale=True,target_size=(28, 28))
image = image_utils.img_to_array(image)
print(image.shape)
image = np.expand_dims(image, axis=0)
print(image.shape)
image = image.astype('float32')
image /= 255
return image
Please help, I've been stuck on this for a while now. Thank you

Seems to be a typical case of overfitting.
Please try 10-fold cross-validation to get accuracy of model.
Further use regularization and dropout in keras to prevent overfitting.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate similarity between an image and a text? - python

Related

Features extraction with VGG16 for clustering

CV2 Contour from VGG16 model

Class Imbalance for image classification

How to predict a single input(external) image for categorical data using CNN using Keras

Why does my keras model get good accuracy but bad predictions?

Categories

Resources