Negative confidences in TFlite inference

Negative confidences in TFlite inference - python

I trained my own tflite classification model having 3 classes following this tutorial and now try to test it by applying it to a video feed. Here is my inference code:
import cv2
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image
import tensorflow.lite as tflite
Model_Path = "/path/to/model.tflite"
labels = ["class1", "class2", "class3"]
##Load tflite model and allocate tensors
interpreter = tflite.Interpreter(model_path=Model_Path)
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]["shape"]
vid_file = "/path/to/video.mp4"
# Create a VideoCapture object and read from input file
cap = cv2.VideoCapture(vid_file)
while cap.isOpened():
_, frame = cap.read()
cv_image = preprocess(frame)
##Converting image into tensor
image = np.array(cv_image, dtype=np.float32)
input_tensor = np.array(np.expand_dims(image, 0))
interpreter.set_tensor(input_details[0]["index"], input_tensor)
interpreter.invoke()
output_details = interpreter.get_output_details()
output_data = interpreter.get_tensor(output_details[0]["index"])
pred = np.squeeze((output_data))
classi = np.argmax(pred)
# write prediction in the corner
cv2.putText(
frame,
labels[classi],
(10, 50),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(255, 255, 255),
2,
cv2.LINE_AA,
)
cv2.namedWindow("cv_image", cv2.WINDOW_NORMAL)
cv2.imshow("cv_image", frame)
##Use p to pause the video and use q to termiate the program
key = cv2.waitKey(1) & 0xFF
if key == ord("q"):
break
elif key == ord("p"):
cv2.waitKey(0)
continue
cap.release()
out.release()
cv2.destroyAllWindows()
with preprocess() defined as:
def preprocess(image):
*** some image cropping, just as for training data ***
# resize image to 224x224
image = cv2.resize(image, (224, 224))
new_img = image.astype(np.float32)
new_img /= 255.0
return image
The prediction seems to be okay using argmax, but if I look at the confidence values, they are all negative (most of the time):
[-2.3782427 -1.6677225 -3.0637422]
[-2.4214256 -1.2143787 -3.4843316]
[-1.6566806 -2.1574929 -3.1999807]
[-1.9782547 -2.7043173 -2.0971687]
This is quite problematic, because on one hand it makes me doubt that everything works really as it should, and on the other I cannot have any post-processing logic to rule out false positives (like 2 classes with more than 50% or so).
Does anyone know what the issue could be? Previously I made the mistake that the preprocessing didn't normalise the image as done in the training. Could I still have a difference that I don't see?

Related

Run Tensorflow Keras custom model in OpenCV

So I was basically trying to figure out how to import a Tensorflow Keras CNN Model in OpenCV. The Docs I found on Github, weren't helpful and also were not clear about what to do EXACTLY. I have searched whole Youtube for tutorials, but nobody seems to have imported a custom made model before haha. I have basically tried everything...
Saving model as pickle (.p) and reading it in OpenCv (gave me this error: "Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://7b832872-99f5-4b67-8675-14f2423877df/variables/variables You may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as '/job:localhost'."). I can't figure out what this is...
I also tried importing with tf.keras.models.load_model("saved_model.pb"), which also didnt seem to work and throw me the following error: File "h5py\h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (file signature not found). It seems like I need a .h5 file, which I dont know how to get from my current model.
The next thing I tried was using cv2.dnn.readNetFromTensorflow(). For this to work you need the Tensorflow .pb file, which I have and (i guess its optional) the .pbtxt file. So the first problem was this error, which appeared after pasing my saved_model.pb: Failed to parse GraphDef file: saved_model.pb in function 'cv::dnn::ReadTFNetParamsFromBinaryFileOrDie'. I checked on that and some people wrote you should check the file for corruption and if it the name is written correctly, which I guess it is. Model passed with 0.986 Accuracy in my tests.
No I am on the end with my energy and dont't know what to do. I can't be the only one to have these issues, but certainly, it should be easy to use a tensorflow model in opencv, according to the docs...
I will now share for you the code I am using for creating the model as also the code for reading it in OpenCv. Versions of OpenCV, Python, Tensorflow, cuda, cudnn and pickle I will include below.
Any of your help is greatly appreciated!
This is the Code for creating Tensorflow Model
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical
from keras.layers import Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
import cv2
from sklearn.model_selection import train_test_split
import tensorflow as tf
import pickle
import os
import pandas as pd
import random
from keras.preprocessing.image import ImageDataGenerator
import time
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.compat.v1.Session(config=config)
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))
################# Parameters #####################
path = "dataset" # folder with all the class folders
labelFile = 'labels.csv' # file with all names of classes
batch_size_val=10 # how many to process together
steps_per_epoch_val=300
epochs_val=100
imageDimesions = (300,300,3)
testRatio = 0.2
validationRatio = 0.2
###################################################
############################### Importing of the Images
count = 0
images = []
classNo = []
myList = os.listdir(path)
print("Total Classes Detected:",len(myList))
noOfClasses=len(myList)
print("Importing Classes.....")
for x in range (0,len(myList)):
myPicList = os.listdir(path+"/"+str(count))
for y in myPicList:
curImg = cv2.imread(path+"/"+str(count)+"/"+y)
images.append(curImg)
classNo.append(count)
print(count, end =" ")
count +=1
print(" ")
images = np.array(images)
classNo = np.array(classNo)
############################### Split Data
X_train, X_test, y_train, y_test = train_test_split(images, classNo, test_size=testRatio)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=validationRatio)
# X_train = ARRAY OF IMAGES TO TRAIN
# y_train = CORRESPONDING CLASS ID
############################### TO CHECK IF NUMBER OF IMAGES MATCHES TO NUMBER OF LABELS FOR EACH DATA SET
print("Data Shapes")
print("Train",end = "");print(X_train.shape,y_train.shape)
print("Validation",end = "");print(X_validation.shape,y_validation.shape)
print("Test",end = "");print(X_test.shape,y_test.shape)
assert(X_train.shape[0]==y_train.shape[0]), "The number of images in not equal to the number of lables in training set"
assert(X_validation.shape[0]==y_validation.shape[0]), "The number of images in not equal to the number of lables in validation set"
assert(X_test.shape[0]==y_test.shape[0]), "The number of images in not equal to the number of lables in test set"
assert(X_train.shape[1:]==(imageDimesions))," The dimesions of the Training images are wrong "
assert(X_validation.shape[1:]==(imageDimesions))," The dimesionas of the Validation images are wrong "
assert(X_test.shape[1:]==(imageDimesions))," The dimesionas of the Test images are wrong"
############################### READ CSV FILE
data=pd.read_csv(labelFile)
print("data shape ",data.shape,type(data))
############################### DISPLAY SOME SAMPLES IMAGES OF ALL THE CLASSES
num_of_samples = []
cols = 5
num_classes = noOfClasses
fig, axs = plt.subplots(nrows=num_classes, ncols=cols, figsize=(5, 300))
fig.tight_layout()
for i in range(cols):
for j,row in data.iterrows():
x_selected = X_train[y_train == j]
axs[j][i].imshow(x_selected[random.randint(0, len(x_selected)- 1), :, :], cmap=plt.get_cmap("gray"))
axs[j][i].axis("off")
if i == 2:
axs[j][i].set_title(str(j)+ "-"+str(row["Name"]))
num_of_samples.append(len(x_selected))
############################### DISPLAY A BAR CHART SHOWING NO OF SAMPLES FOR EACH CATEGORY
print(num_of_samples)
plt.figure(figsize=(12, 4))
plt.bar(range(0, num_classes), num_of_samples)
plt.title("Distribution of the training dataset")
plt.xlabel("Class number")
plt.ylabel("Number of images")
plt.show()
############################### PREPROCESSING THE IMAGES
def grayscale(img):
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
return img
def equalize(img):
img =cv2.equalizeHist(img)
return img
def preprocessing(img):
img = grayscale(img) # CONVERT TO GRAYSCALE
img = equalize(img) # STANDARDIZE THE LIGHTING IN AN IMAGE
img = img/255 # TO NORMALIZE VALUES BETWEEN 0 AND 1 INSTEAD OF 0 TO 255
return img
X_train=np.array(list(map(preprocessing,X_train))) # TO IRETATE AND PREPROCESS ALL IMAGES
X_validation=np.array(list(map(preprocessing,X_validation)))
X_test=np.array(list(map(preprocessing,X_test)))
cv2.imshow("GrayScale Images",X_train[random.randint(0,len(X_train)-1)]) # TO CHECK IF THE TRAINING IS DONE PROPERLY
############################### ADD A DEPTH OF 1
X_train=X_train.reshape(X_train.shape[0],X_train.shape[1],X_train.shape[2],1)
X_validation=X_validation.reshape(X_validation.shape[0],X_validation.shape[1],X_validation.shape[2],1)
X_test=X_test.reshape(X_test.shape[0],X_test.shape[1],X_test.shape[2],1)
############################### AUGMENTATAION OF IMAGES: TO MAKEIT MORE GENERIC
dataGen= ImageDataGenerator(width_shift_range=0.1, # 0.1 = 10% IF MORE THAN 1 E.G 10 THEN IT REFFERS TO NO. OF PIXELS EG 10 PIXELS
height_shift_range=0.1,
zoom_range=0.2, # 0.2 MEANS CAN GO FROM 0.8 TO 1.2
shear_range=0.1, # MAGNITUDE OF SHEAR ANGLE
rotation_range=10) # DEGREES
dataGen.fit(X_train)
batches= dataGen.flow(X_train,y_train,batch_size=20) # REQUESTING DATA GENRATOR TO GENERATE IMAGES BATCH SIZE = NO. OF IMAGES CREAED EACH TIME ITS CALLED
X_batch,y_batch = next(batches)
# TO SHOW AGMENTED IMAGE SAMPLES
fig,axs=plt.subplots(1,15,figsize=(20,5))
fig.tight_layout()
for i in range(15):
axs[i].imshow(X_batch[i].reshape(imageDimesions[0],imageDimesions[1]))
axs[i].axis('off')
plt.show()
y_train = to_categorical(y_train,noOfClasses)
y_validation = to_categorical(y_validation,noOfClasses)
y_test = to_categorical(y_test,noOfClasses)
############################### CONVOLUTION NEURAL NETWORK MODEL
def myModel():
no_Of_Filters=60
size_of_Filter=(5,5) # THIS IS THE KERNEL THAT MOVE AROUND THE IMAGE TO GET THE FEATURES.
# THIS WOULD REMOVE 2 PIXELS FROM EACH BORDER WHEN USING 32 32 IMAGE
size_of_Filter2=(3,3)
size_of_pool=(2,2) # SCALE DOWN ALL FEATURE MAP TO GERNALIZE MORE, TO REDUCE OVERFITTING
no_Of_Nodes = 500 # NO. OF NODES IN HIDDEN LAYERS
model= Sequential()
model.add((Conv2D(no_Of_Filters,size_of_Filter,input_shape=(imageDimesions[0],imageDimesions[1],1),activation='relu'))) # ADDING MORE CONVOLUTION LAYERS = LESS FEATURES BUT CAN CAUSE ACCURACY TO INCREASE
model.add((Conv2D(no_Of_Filters, size_of_Filter, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool)) # DOES NOT EFFECT THE DEPTH/NO OF FILTERS
model.add((Conv2D(no_Of_Filters//2, size_of_Filter2,activation='relu')))
model.add((Conv2D(no_Of_Filters // 2, size_of_Filter2, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(no_Of_Nodes,activation='relu'))
model.add(Dropout(0.5)) # INPUTS NODES TO DROP WITH EACH UPDATE 1 ALL 0 NONE
model.add(Dense(noOfClasses,activation='softmax')) # OUTPUT LAYER
# COMPILE MODEL
model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
return model
############################### TRAIN
model = myModel()
print(model.summary())
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=int(batch_size_val)),steps_per_epoch=int(steps_per_epoch_val),epochs=int(epochs_val),validation_data=(X_validation,y_validation),shuffle=1)
############################### PLOT
plt.figure(1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['training','validation'])
plt.title('loss')
plt.xlabel('epoch')
plt.figure(2)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training','validation'])
plt.title('Acurracy')
plt.xlabel('epoch')
plt.show()
score =model.evaluate(X_test,y_test,verbose=0)
print('Test Score:',score[0])
print('Test Accuracy:',score[1])
#model.save(r"C:\Path\To\My\Directory\DetectBrick")
#print("model saved!!")
pickle_out= open(r"C:\Path\To\My\Directory\DetectBrick\model_trained.p","wb") # wb = WRITE BYTE
pickle.dump(model,pickle_out)
pickle_out.close()
cv2.waitKey(0)
This is the code for opening Model in OpenCV (at least trying to :) )
import numpy as np
import cv2
import pickle
from tensorflow import keras
import tensorflow as tf
import h5py
framewidth = 640
frameheight = 480
brightness = 180
threshold = 0.7
font = cv2.FONT_HERSHEY_SIMPLEX
camera = cv2.VideoCapture(0)
camera.set(3, framewidth)
camera.set(4, framewidth)
camera.set(10, framewidth)
#pb="saved_model.pb"
#pbtxt = "" #don't know if I need it .pbtxt
#model = cv2.dnn.readNetFromTensorflow(pb) #pbtxt file would be second parameter, #but dont know if needed
pickle_in = open("model_trained.p", "rb")
model = pickle.load(pickle_in)
def grayscale(img):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
return img
def equalize(img):
img = cv2.equalizeHist(img)
return img
def preprocessing(img):
img = grayscale(img)
img = equalize(img)
img = img/255
return img
def getClassName(classNo):
if classNo == 0: return '3003'
elif classNo == 1: return '3010'
while camera.IsOpened():
boolean, frameoriginal = camera.read()
img = np.asarray(frameoriginal)
img = cv2.resize(img, (32,32))
img = preprocessing(img)
cv2.imshow("processed image", img)
img = img.reshape(1, 32, 32, 1)
cv2.putText(frameoriginal, "Klasse: ", (20,35), font, 0.75, (0,0,255), 2, cv2.LINE_AA)
cv2.putText(frameoriginal, "Genauigkeit: ", (20,75), font, 0.75, (255,0,0), 2, cv2.LINE_AA)
predictions = model.predict([img])
classIndex = model.predict_classes([img])
probabilityValue = np.amax(predictions)
if probabilityValue > threshold:
cv2.putText(frameoriginal, str(classIndex)+ " " + str(getClassName(classIndex)), (120,35), font, 0.75, (0,0,255),2, cv2.LINE_AA)
cv2.putText(frameoriginal, str(round(probabilityValue*100, 2)) + "%", (180,75), font, 0.75, (0,0,255),2, cv2.LINE_AA)
cv2.imshow("result", frameoriginal)
if cv2.waitKey(2) & OxFF == ord('q'):
break
cv2.destroyAllWindows()
camera.realease()
And here are the Versions I am using
Python: 3.10.5
Tensorflow (GPU): 2.9.1
CUDA: 11.2
Cudnn: 8.1
Pickle: 4.0
System: Windows 11
CPU: AMD Ryzen 5 5600G
GPU: GTX 1660 Super 6GB

How to properly use pre-trained CNN for image prediction on a folder of images

I am trying to build a CNN model and use it on 2833 images to see if it can predict a selection (of my own choice) of three features and the popularity score from a tabular dataset. So far my code looks like this:
import os
import cv2
import argparse
import numpy as np
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image as image_utils
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
# Construct argument parser and parse the arguments
argument_parser = argparse.ArgumentParser()
# First two arguments specifies our only argument "image" with both short-/longhand versions where either
# can be used
# This is a required argument, noted by required=True, the help gives additional info in the terminal
# if needed
argument_parser.add_argument("-i", "--image", required=True, help="path to the input image")
# Set path to files
img_path = "images/"
files = os.listdir(img_path)
print("[INFO] loading and processing images...")
# Loop through images
for filename in files:
# Load original via OpenCV, so we can draw on it and display it on our screen
original = cv2.imread(filename)
# Load image while resizing to 224x224 pixels, then convert to a NumPy array because load_img returns
# Pillow format
image = image_utils.load_img(filename, target_size=(224, 224))
image = image_utils.img_to_array(image)
"""
PRE-PROCESS
The image is now a NumPy array of shape (224, 224, 3). 224 pixels tall, 224 pixels wide, 3 channels =
Red, Green, Blue. We need to expand to (1, 3, 224, 224) because when classifying images using Deep
Learning and Convolutional Neural Networks, we often send several images (instead of one) through
the network in “batches” for efficiency. We also subtract the mean RGB pixel intensity from the
ImageNet dataset.
"""
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
# Load Keras and classify the image
print("[INFO] loading network...")
model = VGG16(weights="imagenet") # Load the VGG16 network pre-trained on the ImageNet dataset
print("[INFO] classifying image...")
predictions = model.predict(image) # Classify the image (NumPy array with 1000 entries)
P = decode_predictions(predictions) # Get the ImageNet Unique ID of the label, along with human-readable label
print(P)
# Loop over the predictions and display the rank-5 (5 epochs) predictions + probabilities to our terminal
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
# Load the image via OpenCV, draw the top prediction on the image, and display the
image to our screen
original = cv2.imread(filename)
(imagenetID, label, prob) = P[0][0]
cv2.putText(original, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow("Classification", original)
cv2.waitKey(0)
I followed this article on how to do it, and it worked on one image. But when I tried to put the code inside a loop, I get this error message:
[ WARN:0#44.040] global D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp (239) cv::findDecoder imread_('100.png'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "C:\PATH\test_imagenet.py", line 28, in <module>
image = image_utils.load_img(filename, target_size=(224, 224))
File "C:\PATH\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\preprocessing\image.py", line 313, in load_img
return image.load_img(path, grayscale=grayscale, color_mode=color_mode,
File "C:\PATH\AppData\Local\Programs\Python\Python39\lib\site-packages\keras_preprocessing\image\utils.py", line 113, in load_img
with open(path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '100.png'
As you can see, I have the file in the project, so I don't know why it doesn't find it. How do I do this correctly for a file of images, instead of for one image only?

Please find the working code below;
import os
import cv2
import argparse
import numpy as np
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image as image_utils
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
# Construct argument parser and parse the arguments
argument_parser = argparse.ArgumentParser()
# First two arguments specifies our only argument "image" with both short-/longhand versions where either
# can be used
# This is a required argument, noted by required=True, the help gives additional info in the terminal
# if needed
argument_parser.add_argument("-i", "--image", required=True, help="path to the input image")
# Set path to files
img_path = "/content/train/"
files = os.listdir(img_path)
print("[INFO] loading and processing images...")
for filename in files:
# Passing the entire path of the image file
file= os.path.join(img_path, filename)
# Load original via OpenCV, so we can draw on it and display it on our screen
original = cv2.imread(file)
image = image_utils.load_img(file, target_size=(224, 224))
image = image_utils.img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
print("[INFO] loading network...")
model = VGG16(weights="imagenet") # Load the VGG16 network pre-trained on the ImageNet dataset
print("[INFO] classifying image...")
predictions = model.predict(image) # Classify the image (NumPy array with 1000 entries)
P = decode_predictions(predictions) # Get the ImageNet Unique ID of the label, along with human-readable label
print(P)
# Loop over the predictions and display the rank-5 (5 epochs) predictions + probabilities to our terminal
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
original = cv2.imread(file)
(imagenetID, label, prob) = P[0][0]
cv2.putText(original, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow(original)
cv2.waitKey(0)
Output is as follows:
Let us know if the issue still persists. Thanks!

ValueError: could not broadcast input array from shape (224,224,4) into shape (224,224,3) , error while testing with GRAYSCALE IMAGES

The following code works great with RGB images but not working with GRAYSCALE images, Also I need to know why grayimages are having shape as (224,224,4) , according to my knowledge it should be (224,224,1).
import silence_tensorflow.auto
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
np.set_printoptions(suppress=True)
model = tensorflow.keras.models.load_model('models/keras_model.h5')
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
size = (224, 224)
def classify(img_path):
image = Image.open(img_path)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
image_array = np.asarray(image)
print(image_array.shape)
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
data[0] = normalized_image_array
prediction = model.predict(data)
print(prediction)
if prediction[0][-1] == 1:
return False
else:
return True

For the benefit of community providing solution here
Grayscale images have 1 channel, RGB images have 3, and
RGBA has 4 channels last channel represents alpha. You can try image = Image.open(img_path).convert('RGB') (paraphrased from Frightera)
Working code as shown below
import silence_tensorflow.auto
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
np.set_printoptions(suppress=True)
model = tensorflow.keras.models.load_model('models/keras_model.h5')
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
size = (224, 224)
def classify(img_path):
image = Image.open(img_path).convert('RGB')
image = ImageOps.fit(image, size, Image.ANTIALIAS)
image_array = np.asarray(image)
print(image_array.shape)
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
data[0] = normalized_image_array
prediction = model.predict(data)
print(prediction)
if prediction[0][-1] == 1:
return False
else:
return True

List index out of range error when using mouse events functions with tensorflow for object detection

I am using Tensorflow for object detection. I successfully trained the neural network and it can detect the object I want to detect in the livestream. It does this by making a bounding box around the object.
Now I want to mark areas in the video frame(initially) such that if the object comes in the marked area and is detected,(i.e. if a bounding box is made in the marked area) then I want to print a message in the terminal.
For this purpose, I am using OpenCV. I found a nice tutorial on how to use Mouse callback functions to do this. The link is given below.
https://www.pyimagesearch.com/2015/03/09/capturing-mouse-click-events-with-python-and-opencv/
But there is an error when I execute my code. The error is shown below.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-1-10159c26292b> in click_and_crop(event, x, y, flags, params)
201 refPt.append((x,y))
202 cropping = False
--> 203 cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2)
204 ret, image_np = cap.read()
205 # Expand dimensions since the model expects images to have
shape: [1, None, None, 3]
IndexError: list index out of range
My main program is as follows:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from PIL import Image
import cv2
cap = cv2.VideoCapture(0)
# This is needed since the notebook is stored in the object_detection
folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops
if tf.__version__ < '1.4.0' and tf.__version__ != '1.10.0':
raise ImportError('Please upgrade your tensorflow installation to v1.4.* or
later!')
# ## Env setup
# In[3]:
# This is needed to display the images.
#get_ipython().run_line_magic('matplotlib', 'inline')
# ## Object detection imports
# Here are the imports from the object detection module.
# In[5]:
from utils import label_map_util
from utils import visualization_utils as vis_util
# # Model preparation
# ## Variables
#
# Any model exported using the `export_inference_graph.py` tool can be
loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb
file.
#
# In[6]:
# What model to download.
MODEL_NAME = 'car_inference_graph'
# Path to frozen detection graph. This is the actual model that is used for
the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'object-detection.pbtxt')
NUM_CLASSES = 1
# ## Download Model
# ## Load a (frozen) Tensorflow model into memory.
# In[7]:
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# ## Loading label map
# Label maps map indices to category names, so that when our convolution
network predicts `5`, we know that this corresponds to `airplane`. Here we
use internal utility functions, but anything that returns a dictionary
mapping integers to appropriate string labels would be fine
# In[8]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map,
max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# ## Helper code
# In[9]:
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
# # Detection
# In[10]:
# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images
to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR,
'image{}.jpg'.format(i)) for i in range(1,44) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
# In[11]:
def run_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks'
]:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image
coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict['num_detections'][0],
tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0],
[real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0],
[real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor =
tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.uint8)
output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
output_dict['detection_scores'] = output_dict['detection_scores'][0]
if 'detection_masks' in output_dict:
output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict
# In[12]:
# In[10]:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
refPt = [] #ROI code starts from here
cropping = False
def click_and_crop(event,x,y,flags,params):
global refPt,cropping
if event == cv2.EVENT_LBUTTONDOWN:
refPt = [(x,y)]
cropping = True
elif event == cv2.EVENT_LBUTTONUP:
refPt.append((x,y))
cropping = False
cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2) # ROI code end
ret, image_np = cap.read()
# Expand dimensions since the model expects images to have shape: [1,
None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was
detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow("object detection", image_np)
cv2.setMouseCallback("object detection", click_and_crop)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
cap.release()
break
Using this code:
I can see the livestream properly.
The desired object is successfully detected.
But as soon as I drag the left mouse button to draw ROI in the frame, I get
the error mentioned above.
I understand this has something to do with refPt[0],refPt[1] but I don't understand where to make the necessary changes!
Technical information:
Tensorflow 1.10
OS - Ubuntu 18.04
Python 3.6
OpenCV 3.4.2
Please help.
Thanks :)

You have a couple of problems in your code, but basically the problem can be summarized in the scope of the variables you are using.
avoid creating a function in a loop... this will redefine a function every time... not a problem here, but it is better not to do it.
You have a refPt = [] inside the while, this will empty the array at every iteration... as in case 1, it should be outside. Anyways you have in the function refPt = [(x,y)] that will remove old values and "clean" the variable.
Inside the function you have cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2) which changes image_np, but this image gets changed locally and not globally.
In the loop you have ret, image_np = cap.read() which will remove any rectangle almost immediately without being displayed.... you need to draw the rectangle on the new images. Something like:
ret, image_np = cap.read()
# if no image was obtained quit the loop
if !ret:
break
tmpPt = refPt.copy() # to avoid it being changed in the callback
if len(tmpPt ) ==2:
cv2.rectangle(image_np,tmpPt [0],tmpPt [1],(0,255,0),2)
It is recommended to use cv2.setMouseCallback("object detection", click_and_crop) outside the loop... You can use cv2.namedWindow("object detection") to create the window without having the image.
These are the problems I see, maybe you encounter more once these are corrected...One more thing, you are just drawing a rectangle, but I do not see that you are actually using it to select a roi (cropping the image to the rectangle size), I do not know if this is intended...
I hope this helps you, and if you have a question, just ask in a comment.
UPDATE
To make myself a little bit clear and to add the selection first and then the detection part, the code should look like this:
refPt = []
cropping = False
def click_and_crop(event,x,y,flags,params):
global refPt,cropping
if event == cv2.EVENT_LBUTTONDOWN:
refPt = [(x,y)]
cropping = True
elif event == cv2.EVENT_LBUTTONUP:
refPt.append((x,y))
cropping = False
cv2.namedWindow("object detection")
cv2.setMouseCallback("object detection", click_and_crop)
detect = False
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
ret, image_np = cap.read()
# if no image was obtained quit the loop
if !ret:
break
tmpPt = refPt.copy() # to avoid it being changed in the callback
if len(tmpPt ) ==2:
cv2.rectangle(image_np,tmpPt [0],tmpPt [1],(0,255,0),2)
if detect:
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow("object detection", image_np)
key = cv2.waitKey(25) & 0xFF
if key == ord('q'):
cv2.destroyAllWindows()
cap.release()
break
elif key == ord('s'):
detect = True # start detecting
Once again this will only draw the rectangle... it does not crop

Open CV ValueError: total size of new array must be unchanged

I am new to OpenCV and TensorFlow. I am trying to get a live camera preview and use the live camera feed for TensorFlow prediction. Here is the part of code for live preview and prediction:
image = np.zeros((64, 64, 3))
softmax_pred = tf.nn.softmax(conv_net(x, weights, biases, image_size, 1.0))
cam = cv2.VideoCapture(0)
while True:
ret_val, img = cam.read()
img = cv2.flip(img,1)
cv2.imshow('my webcam',img)
img = img.resize((64,64))
image = array(img).reshape(1,64,64,3)
image.astype(float)
result = sess.run(softmax_pred, feed_dict={x: image})
I am not sure what's wrong here. I am getting this error:
image = array(img).reshape(1,64,64,3)
ValueError: total size of new array must be unchanged
My Tensor placeholder for image has the shape Tensor '(?, 64, 64, 3)'. I did the same for jpeg image by manually loading an image from disk and reshaping that image to (1,64,643) and it works fine.Here is the code for manually loading an image and then predicting:
img = Image.open('/home/pragyan/Documents/miniProject/PredictImages/IMG_4804.JPG')
img = img.resize((64, 64))
image = array(img).reshape(1,64,64,3)
image.astype(float)
result = sess.run(softmax_pred, feed_dict={x: image})
The above code works but while reshaping a live frame from webcam gives me this error(ValueError: total size of new array must be unchanged). Is there a way to fix this? I am not able to understand how to fix it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Negative confidences in TFlite inference - python

Related

Run Tensorflow Keras custom model in OpenCV

How to properly use pre-trained CNN for image prediction on a folder of images

ValueError: could not broadcast input array from shape (224,224,4) into shape (224,224,3) , error while testing with GRAYSCALE IMAGES

List index out of range error when using mouse events functions with tensorflow for object detection

Open CV ValueError: total size of new array must be unchanged

Categories

Resources