Running a pre trained ONNX model - image recognition - python

I am trying to run a pre-trained ONNX model (trained on a third-party labeling tool) for image recognition. The model is trained via some pre-defined labels in the tool. The next aim now is to be able to run this model outside the tool. For the same, I am taking a sample image and trying to run the same via model to get the identified labels as output. While doing so I hit an impediment regarding how to adjust the inputs. The model needs inputs as follows:
How can I adjust my inputs in the following code?
import cv2
import numpy as np
import onnxruntime
import pytesseract
import PyPDF2
# Load the image
image = cv2.imread("example.jpg")
# Check if the image has been loaded successfully
if image is None:
raise ValueError("Failed to load the image")
# Get the shape of the image
height, width = image.shape[:2]
# Make sure the height and width are positive
if height <= 0 or width <= 0:
raise ValueError("Invalid image size")
# Set the desired size of the resized image
dsize = (640, 640)
# Resize the image using cv2.resize
resized_image = cv2.resize(image, dsize)
# Display the resized image
cv2.imshow("Resized Image", resized_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Load the ONNX model
session = onnxruntime.InferenceSession("ic/model.onnx")
# Check if the model has been loaded successfully
if session is None:
raise ValueError("Failed to load the model")
# Get the input names and shapes of the model
inputs = session.get_inputs()
for i, input_info in enumerate(inputs):
print(f"Input {i}: name = {input_info.name}, shape = {input_info.shape}")
# Run the ONNX model
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
prediction = session.run([output_name], {input_name: image})[0]
# Postprocess the prediction to obtain the labels
labels = postprocess(prediction)
# Use PyTesseract to extract the text from the image
text = pytesseract.image_to_string(image)
# Print the labels and the text
print("Labels:", labels)
print("Text:", text)
Because the code throws the following error:
ValueError: Model requires 4 inputs. Input Feed contains 1

Related

How to properly use pre-trained CNN for image prediction on a folder of images

I am trying to build a CNN model and use it on 2833 images to see if it can predict a selection (of my own choice) of three features and the popularity score from a tabular dataset. So far my code looks like this:
import os
import cv2
import argparse
import numpy as np
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image as image_utils
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
# Construct argument parser and parse the arguments
argument_parser = argparse.ArgumentParser()
# First two arguments specifies our only argument "image" with both short-/longhand versions where either
# can be used
# This is a required argument, noted by required=True, the help gives additional info in the terminal
# if needed
argument_parser.add_argument("-i", "--image", required=True, help="path to the input image")
# Set path to files
img_path = "images/"
files = os.listdir(img_path)
print("[INFO] loading and processing images...")
# Loop through images
for filename in files:
# Load original via OpenCV, so we can draw on it and display it on our screen
original = cv2.imread(filename)
# Load image while resizing to 224x224 pixels, then convert to a NumPy array because load_img returns
# Pillow format
image = image_utils.load_img(filename, target_size=(224, 224))
image = image_utils.img_to_array(image)
"""
PRE-PROCESS
The image is now a NumPy array of shape (224, 224, 3). 224 pixels tall, 224 pixels wide, 3 channels =
Red, Green, Blue. We need to expand to (1, 3, 224, 224) because when classifying images using Deep
Learning and Convolutional Neural Networks, we often send several images (instead of one) through
the network in “batches” for efficiency. We also subtract the mean RGB pixel intensity from the
ImageNet dataset.
"""
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
# Load Keras and classify the image
print("[INFO] loading network...")
model = VGG16(weights="imagenet") # Load the VGG16 network pre-trained on the ImageNet dataset
print("[INFO] classifying image...")
predictions = model.predict(image) # Classify the image (NumPy array with 1000 entries)
P = decode_predictions(predictions) # Get the ImageNet Unique ID of the label, along with human-readable label
print(P)
# Loop over the predictions and display the rank-5 (5 epochs) predictions + probabilities to our terminal
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
# Load the image via OpenCV, draw the top prediction on the image, and display the
image to our screen
original = cv2.imread(filename)
(imagenetID, label, prob) = P[0][0]
cv2.putText(original, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow("Classification", original)
cv2.waitKey(0)
I followed this article on how to do it, and it worked on one image. But when I tried to put the code inside a loop, I get this error message:
[ WARN:0#44.040] global D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp (239) cv::findDecoder imread_('100.png'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "C:\PATH\test_imagenet.py", line 28, in <module>
image = image_utils.load_img(filename, target_size=(224, 224))
File "C:\PATH\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\preprocessing\image.py", line 313, in load_img
return image.load_img(path, grayscale=grayscale, color_mode=color_mode,
File "C:\PATH\AppData\Local\Programs\Python\Python39\lib\site-packages\keras_preprocessing\image\utils.py", line 113, in load_img
with open(path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '100.png'
As you can see, I have the file in the project, so I don't know why it doesn't find it. How do I do this correctly for a file of images, instead of for one image only?
Please find the working code below;
import os
import cv2
import argparse
import numpy as np
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image as image_utils
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
# Construct argument parser and parse the arguments
argument_parser = argparse.ArgumentParser()
# First two arguments specifies our only argument "image" with both short-/longhand versions where either
# can be used
# This is a required argument, noted by required=True, the help gives additional info in the terminal
# if needed
argument_parser.add_argument("-i", "--image", required=True, help="path to the input image")
# Set path to files
img_path = "/content/train/"
files = os.listdir(img_path)
print("[INFO] loading and processing images...")
for filename in files:
# Passing the entire path of the image file
file= os.path.join(img_path, filename)
# Load original via OpenCV, so we can draw on it and display it on our screen
original = cv2.imread(file)
image = image_utils.load_img(file, target_size=(224, 224))
image = image_utils.img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
print("[INFO] loading network...")
model = VGG16(weights="imagenet") # Load the VGG16 network pre-trained on the ImageNet dataset
print("[INFO] classifying image...")
predictions = model.predict(image) # Classify the image (NumPy array with 1000 entries)
P = decode_predictions(predictions) # Get the ImageNet Unique ID of the label, along with human-readable label
print(P)
# Loop over the predictions and display the rank-5 (5 epochs) predictions + probabilities to our terminal
for (i, (imagenetID, label, prob)) in enumerate(P[0]):
print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))
original = cv2.imread(file)
(imagenetID, label, prob) = P[0][0]
cv2.putText(original, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow(original)
cv2.waitKey(0)
Output is as follows:
Let us know if the issue still persists. Thanks!

'TFLiteKerasModelConverterV2' object has no attribute 'predict'

I am trying to predict values by loading a saved version of my model.
here is the code for it-
def classifier(img, weights_file):
# Load the model
model = tf.lite.TFLiteConverter.from_keras_model(weights_file)
# Create the array of the right shape to feed into the keras model
data = np.ndarray(shape=(1, 200, 200, 3), dtype=np.float32)
image = img
# image sizing
size = (200, 200)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
# turn the image into a numpy array
image_array = np.asarray(image)
# Normalize the image
normalized_image_array = image_array.astype(np.float32) / 255
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction_percentage = model.predict(data)
prediction = prediction_percentage.round()
return prediction, prediction_percentage
My model throws an error " 'TFLiteKerasModelConverterV2' object has no attribute 'predict'"
Can anyone please tell me what can i change here?
You are creating a TFLiteConverter object from your weights file. The correct way to load the model weights is using load_weights link. Try:
tf.keras.model.load_weights(weights_file)
However, you also would first need to define the model the same way as you did when training the model. If you have saved your model in SavedModel format, use
model = tf.keras.models.load_model(weights_file)

Using Tkinter to display images from a numpy array

i'm inexperienced with Python and using Tkinter for the first time to make a UI that displays results of my digit classification program with the mnist dataset. I have a question about displaying images in Tkinter when they're from a numpy array rather than a filepath on my PC. The current code I have tried for this is:
img = PhotoImage(test_images[0])
window.create_image(20,20, image=img)
Which was unsuccessful, however i'm not sure how else to approach it. Below is a picture of the image plotted from the array that I would like to display in the UI, and below the image is just the code that shows how i'm loading and plotting the images in case that helps. Sorry if this is an easy fix that i'm missing, i'm very new to this. Cheers
https://i.gyazo.com/8962f16b4562c0c15c4ff79108656087.png
# Load the data set
train_images = mnist.train_images() #training data
train_labels = mnist.train_labels() #training labels
test_images = mnist.test_images() # training training images
test_labels = mnist.test_labels()# training data labels
# normalise the pixel values of the images to make the network easier to train
train_images = (train_images/255) - 0.5
test_images = (test_images/255) - 0.5
# Flatten the images in to a 784 dimensional vector to pass into the neural network
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
# Print shape of images
print(train_images.shape) # 60,000 rows and 784 columns
print(test_images.shape)
for i in range(0,15):
first_image = test_images[i]
first_image = np.array(first_image, dtype='float')
pixels = first_image.reshape((28,28))
plt.imshow(pixels)
plt.show()
Error Message:
Traceback (most recent call last):
File "C:/Users/Ben/Desktop/Python Projects/newdigitclassifier/classifier.py", line 122, in <module>
img = PhotoImage(test_images[0])
File "C:\Users\Ben\AppData\Local\Programs\Python\Python36\lib\tkinter\__init__.py", line 3545, in __init__
Image.__init__(self, 'photo', name, cnf, master, **kw)
File "C:\Users\Ben\AppData\Local\Programs\Python\Python36\lib\tkinter\__init__.py", line 3491, in __init__
if not name:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Here is the solution:
import cv2
import tkinter as tk
from PIL import Image, ImageTk
import tensorflow as tf
# initializing window and image properties
HEIGHT = 200
WIDTH = 200
IMAGE_HEIGHT = 200
IMAGE_WIDTH = 200
# loading mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
def imageShow(index):
root = tk.Tk()
# resizing image into larger image
img_array = cv2.resize(x_train[index], (IMAGE_HEIGHT,IMAGE_WIDTH), interpolation = cv2.INTER_AREA)
img = ImageTk.PhotoImage(image=Image.fromarray(img_array))
canvas = tk.Canvas(root,width=WIDTH,height=HEIGHT)
canvas.pack()
canvas.create_image(IMAGE_HEIGHT/2,IMAGE_WIDTH/2, image=img)
root.mainloop()
imageShow(5)
The dataset have been imported from tensorflow.
I have added an extra feature to resize images.
And the result looks like this
An implementation of a similar task that depends on numpy and TkInter only can be found here:
https://gist.github.com/FilipDominec/14761052f42d80d283bd3adcf7eb5347
It is a "ripple tank simulator" example and I tried to optimize its speed as much as possible.
It also allows for choosing color map, embossing, zooming in and out etc.

RuntimeError: size mismatch, m1: [28 x 28], m2: [784 x 128]

After training my model, I tried to plot graph of the softmax output, but it resulted in the runtime error mentioned in the title.
Here is the following code snippet:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import helper
# Test out your network!
dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
# Plot the image and probabilities
helper.view_classify(img, ps, version='Fashion')
The problem is with this part (I guess).
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
Problem: image you are loading is of dimension 28x28, however, the first index in input to the model is generally batch size. Since there is 1 image only, so you have to make the first dimension to be of size 1. To do that do img = img.view( (-1,) + img.shape) or img=img.unsqueeze(dim=0). Also, it seems that the first layer weight is 784 x 128. i.e the image should be converted to vector and fed to model. For that we do img=img.view(1, -1).
So, in total, you need to do
img = images[1]
img = img.unsqueeze(dim=0)
img=img.view(1, -1)
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
or you can just use one command instead of two (unsqueeze is unnecessary)
img = images[1]
img=img.view(1, -1)

Open CV ValueError: total size of new array must be unchanged

I am new to OpenCV and TensorFlow. I am trying to get a live camera preview and use the live camera feed for TensorFlow prediction. Here is the part of code for live preview and prediction:
image = np.zeros((64, 64, 3))
softmax_pred = tf.nn.softmax(conv_net(x, weights, biases, image_size, 1.0))
cam = cv2.VideoCapture(0)
while True:
ret_val, img = cam.read()
img = cv2.flip(img,1)
cv2.imshow('my webcam',img)
img = img.resize((64,64))
image = array(img).reshape(1,64,64,3)
image.astype(float)
result = sess.run(softmax_pred, feed_dict={x: image})
I am not sure what's wrong here. I am getting this error:
image = array(img).reshape(1,64,64,3)
ValueError: total size of new array must be unchanged
My Tensor placeholder for image has the shape Tensor '(?, 64, 64, 3)'. I did the same for jpeg image by manually loading an image from disk and reshaping that image to (1,64,643) and it works fine.Here is the code for manually loading an image and then predicting:
img = Image.open('/home/pragyan/Documents/miniProject/PredictImages/IMG_4804.JPG')
img = img.resize((64, 64))
image = array(img).reshape(1,64,64,3)
image.astype(float)
result = sess.run(softmax_pred, feed_dict={x: image})
The above code works but while reshaping a live frame from webcam gives me this error(ValueError: total size of new array must be unchanged). Is there a way to fix this? I am not able to understand how to fix it.

Categories

Resources