PyTorch model prediction fail for single item

PyTorch model prediction fail for single item - python

I use PyTorch and transfer learning to train mobilenet_v2 based classifier. I use a batch of 20 images during training and my test accuracy is ~80%.
I try to use the model with single image for individual prediction and output is a wrong class.
At the same time if I will take a batch from my test dataset and insert my single image in it instead of element 0 it will have a correct prediction. Prediction 0 will be a correct class. So model works for a batch but not for an individual item.
If I just repeat my single image and pass 10 copies of it the result prediction still will be wrong. So I have situation when my prediction accuracy somehow depends on other items in the batch.
My code for individual item test:
from PIL import Image
# load one batch from test set (20)
dataiter = iter(test_loader)
images, labels = dataiter.next()
# load image we want get prediction for
img_path = "dataset/Barber50/Barber50-25r.jpg"
image = Image.open(img_path)
image_cropped = transforms.CenterCrop(img_size)(transforms.Resize(img_size)(image))
image_tensor = transforms.Normalize(mean=mean, std=std)(transforms.ToTensor()(image_cropped))
# insert the image into batch from test set
images[0] = image_tensor
# show image
fig = plt.figure(figsize=(5, 5))
inp = images[0].numpy().transpose((1, 2, 0))
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
# mode data to gpu, switch to eval
if train_on_gpu: images = images.cuda()
model.eval()
# get prediction. PROBLEM HERE
output = model(images[:1]) # predicts wrong class if :1 and correct class if :10
# turn prediction into class label
output = F.softmax(output, dim=1)
_, pred = torch.max(output, 1)
print(classes[pred[0]])

Related

How can I use a function or loop on this resnet50 code to predict the components of multiple images (within a folder), instead of just one?

How can I do this for multiple images (within a folder) and put them into a Dataframe?
This is the code for analysing one image:
import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50
import warnings
warnings.filterwarnings('ignore')
# Load Keras' ResNet50 model that was pre-trained against the ImageNet database
model = resnet50.ResNet50()
# Load the image file, resizing it to 224x224 pixels (required by this model)
img = image.load_img("rgotunechair10.jpg", target_size=(224, 224))
# Convert the image to a numpy array
x = image.img_to_array(img)
# Add a forth dimension since Keras expects a list of images
x = np.expand_dims(x, axis=0)
# Scale the input image to the range used in the trained network
x = resnet50.preprocess_input(x)
# Run the image through the deep neural network to make a prediction
predictions = model.predict(x)
# Look up the names of the predicted classes. Index zero is the results for the first image.
predicted_classes = resnet50.decode_predictions(predictions, top=9)
image_components = []
for x,y,z in predicted_classes[0]:
image_components.append(y)
print(image_components)
This is the output:
['desktop_computer', 'desk', 'monitor', 'space_bar', 'computer_keyboard', 'typewriter_keyboard', 'screen', 'notebook', 'television']
How can I do this for multiple images (within a folder) and put them into a Dataframe?

First of all, move the code for analyzing the image to a function. Instead of printing the result, you will return it there:
import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50
import warnings
warnings.filterwarnings('ignore')
def run_resnet50(image_name):
# Load Keras' ResNet50 model that was pre-trained against the ImageNet database
model = resnet50.ResNet50()
# Load the image file, resizing it to 224x224 pixels (required by this model)
img = image.load_img(image_name, target_size=(224, 224))
# Convert the image to a numpy array
x = image.img_to_array(img)
# Add a forth dimension since Keras expects a list of images
x = np.expand_dims(x, axis=0)
# Scale the input image to the range used in the trained network
x = resnet50.preprocess_input(x)
# Run the image through the deep neural network to make a prediction
predictions = model.predict(x)
# Look up the names of the predicted classes. Index zero is the results for the first image.
predicted_classes = resnet50.decode_predictions(predictions, top=9)
image_components = []
for x,y,z in predicted_classes[0]:
image_components.append(y)
return(image_components)
Then, get all images inside the desired folder (for instance, the current directory):
images_path = '.'
images = [f for f in os.listdir(images_path) if f.endswith('.jpg')]
Run the function on all images, get the result:
result = [run_resnet50(img_name) for img_name in images]
This result will be a list of lists. Then you could just move it to a DataFrame. If you want to keep the image name for each result, use a dictionary instead.

How to change output of a layer in a pre-trained CNN model in Keras?

I am running VGG16 in Keras for image classification as follows:
model = VGG16()
image = load_img('mug.jpg', target_size=(224, 224))
image = img_to_array(image)
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
image = preprocess_input(image)
yhat = model.predict(image)
label = decode_predictions(that)
label = label[0][0]
# print the output
print('%s (%.2f%%)' % (label[1], label[2]*100))
Now I want to view the output of the first layer and change it/add noise to it and see how the classification changes. I am not sure how to do this and could not find any suitable resources that matched my query.
I am new to Keras, so any help on this aspect will be highly appreciated. Thank You!

The output of any layer can be obtained by
model.layers[index].output
so in your case you can do
outputlayer1 = model.layers[0].output
outputlayer1 += noise
later to do a forward pass, you can iterate over the layers and do a forward pass. For forward pass refer to call function in this link https://keras.io/api/layers/base_layer/

Improve real-life results of neural network trained with mnist dataset

I've built a neural network with keras using the mnist dataset and now I'm trying to use it on photos of actual handwritten digits. Of course I don't expect the results to be perfect but the results I currently get have a lot of room for improvement.
For starters I test it with some photos of individual digits written in my clearest handwriting. They are square and they have the same dimensions and color as the images in the mnist dataset. They are saved in a folder called individual_test like this for example: 7(2)_digit.jpg.
The network often is terribly sure of the wrong result which I'll give you an example for:
The results I get for this picture are the following:
result: 3 . probabilities: [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]
So the network is 97% sure this is a 3 and this picture is by far not the only case. Out of 38 pictures only 16 were correctly recognised. What shocks me is the fact that the network is so sure of its result although it couldn't be farther from the correct result.
EDIT
After adding a threshold to prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]) the performance has slightly improved. It now gets 19 out of 38 pictures right but for some images including the one shown above it still is pretty sure of the wrong result. This is what I get now:
result: 3 . probabilities: [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]
So it now is only 72% sure of its result which is better but still ...
What can I do to improve the performance? Can I prepare my images better? Or should I add my own images to the training data? And if so, how would I do such a thing?
EDIT
This is what the picture displayed above looks like after applying prepare_image to it:
After using threshold this is what the same picture looks like:
In comparison: This is one of the pictures provided by the mnist dataset:
They look fairly similar to me. How can I improve this?
Here's my code (including threshold):
# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np
# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2
# imports for tests
import random
import os
class mnist_network():
def __init__(self):
""" load data, create and train model """
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# create model
self.model = Sequential()
self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model
self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
self.train_img = X_train
self.train_res = y_train
self.test_img = X_test
self.test_res = y_test
def predict_result(self, img, show = False):
""" predicts the number in a picture (vector) """
assert type(img) == np.ndarray and img.shape == (784,)
if show:
img = img.reshape((28, 28))
# show the picture
plt.imshow(img, cmap='Greys')
plt.show()
img = img.reshape(img.shape[0] * img.shape[1])
num_pixels = img.shape[0]
# the actual number
res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
# the probabilities
res_probabilities = self.model.predict(img.reshape(-1,num_pixels))
return (res_number[0], res_probabilities.tolist()[0]) # we only need the first element since they only have one
def prepare_image(self, img, show = False):
""" prepares the partial images used in partial_img_rec by transforming them
into numpy arrays that the network will be able to process """
# convert to greyscale
img = img.convert("L")
# rescale image to 28 *28 dimension
img = img.resize((28,28), PIL.Image.ANTIALIAS)
# inverse colors since the training images have a black background
#img = PIL.ImageOps.invert(img)
# transform to vector
img = np.asarray(img, "float32")
img = img / 255.
img[img < 0.5] = 0.
img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]
if show:
plt.imshow(img, cmap = "Greys")
# flatten image to 28*28 = 784 vector
num_pixels = img.shape[0] * img.shape[1]
img = img.reshape(num_pixels)
return img
def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
""" partial is a part of an image """
left_x, left_y = upper_left
right_x, right_y = lower_right
print("current test part: ", upper_left, lower_right)
print("results: ", results)
# condition to stop recursion: we've reached the full width of the picture
width, height = image.size
if right_x > width:
return results
partial = image.crop((left_x, left_y, right_x, right_y))
if show:
partial.show()
partial = self.prepare_image(partial)
step = height // 10
# is there a number in this part of the image?
res, prop = self.predict_result(partial)
print("result: ", res, ". probabilities: ", prop)
# only count this result if the network is at least 50% sure
if prop[res] >= 0.5:
results.append(res)
# step is 80% of the partial image's size (which is equivalent to the original image's height)
step = int(height * 0.8)
print("found valid result")
else:
# if there is no number found we take smaller steps
step = height // 20
print("step: ", step)
# recursive call with modified positions ( move on step variables )
return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)
def individual_digits(self, img):
""" uses partial_img_rec to predict individual digits in square images """
assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image
return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])
def test_individual_digits(self):
""" test partial_img_rec with some individual digits (shape: square)
saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\individual_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
correct_res = int(imageName[0])
image = PIL.Image.open(".\\individual_test\\" + imageName).convert("L")
# only square images in this test
if image.size[0] != image.size[1]:
print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
continue
predicted_res = self.individual_digits(image)
if predicted_res == []:
print("No prediction possible for ", imageName)
else:
predicted_res = predicted_res[0]
if predicted_res != correct_res:
print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
cnt_wrong += 1
else:
cnt_right += 1
print("correctly predicted ",imageName)
print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")
def multiple_digits(self, img):
""" takes as input an image without unnecessary whitespace surrounding the digits """
#assert type(img) == myImage
width, height = img.size
# start with the first square part of the image
res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
res_str = ""
for elem in res_list:
res_str += str(elem)
return res_str
def test_multiple_digits(self):
""" tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
These images contain multiple handwritten digits without much whitespac surrounding them.
The correct solutions are saved in the files' names followed by the characte '_'. """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\multi_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
image = PIL.Image.open(".\\multi_test\\" + imageName).convert("L")
correct_res = imageName.split("_")[0]
predicted_res = self.multiple_digits(image)
if correct_res == predicted_res:
cnt_right += 1
else:
cnt_wrong += 1
print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)
print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")
network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\\7(2)_digit.jpg"))

Update:
You have three options to achive a better performance in this particular task:
Use Convolutional network as it performs better in tasks with spatial data, like images and are more generative classifier, like this one.
Use or Create and/or generate more pictures of your types and train your network with them your network to be able to learn them too.
Preprocess your images to be better aligned to the original MNIST images, against which you trained your network before.
I've just made an experiment. I checked the MNIST images regarding one represented number each. I took your images and made some preprocessing I proposed to you earlier like:
1. made some threshold, but just downwards eliminating the background noice because the original MNIST data has some minimal threshold only for the blank background:
image[image < 0.1] = 0.
2. Surprisingly the size of the number inside of the image has proved to be crucial, so I scaled the number inside of the 28 x 28 image e.g. we have more padding around the number.
3. I inverted the images as the MNIST data from keras has inverted also.
image = ImageOps.invert(image)
4. Finally scaled data with, as we did it at the training as well:
image = image / 255.
After the preprocessing I trained the model with MNIST dataset with the parameters epochs=12, batch_size=200 and the results:
Result: 1 with probabilities: 0.6844741106033325
result: **1** . probabilities: [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]
Result: 6 with probabilities: 0.9221984148025513
result: 6 . probabilities: [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]
Result: 7 with probabilities: 0.7105212807655334
Note:
result: 7 . probabilities: [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]
Your number 9 was a bit tricky:
As I figured out the model with MNIST dataset picked up two main "features" regarding 9. Upper and lower parts. Upper parts with nice round shape, as on your image, is not a 9, but mostly 3 for your model trained against the MNIST dataset. Lower part of 9 is mostly a straighten curve as per the MNIST dataset. So basicly your perfect shaped 9 is always a 3 for your model because of the MNIST samples, unless you will train again the model with sufficiant amount of samples of your shaped 9. In order to check my thoughts I made a subexperiment with 9s:
My 9 with skewed upper parts (mostly OK for 9 as per MNIST) but with slightly curly bottom (Is not OK for 9 as per MNIST):
Result: 9 with probabilities: 0.5365301370620728
My 9 with skewed upper parts (mostly OK for 9 as per MNIST) and with straight bottom (Is OK for 9 as per MNIST):
Result: 9 with probabilities: 0.923724353313446
Your 9 with the misinterpreted shape properties:
Result: 3 with probabilities: 0.8158268928527832
result: 3 . probabilities: [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]
Finally just a proof for the image scaling (padding) importance what I mentioned as crucial above:
Result: 3 with probabilities: 0.9845736622810364
Result: 9 with probabilities: 0.923724353313446
So we can see that our model picked up some features, which it interprets, classifies always as 3 in the case of an oversized shape inside of the image with low padding size.
I think that we can get a better performance with CNN, but the way of sampling and preprocessing is always crucial for getting the best performance in an ML task.
I hope it helps.
Update 2:
I found another issue, what I checked as well and proved to be true, that the placement of number inside of image is crucial as well, which makes sense by this type of NN. A good example the number 7 and 9 which have been placed of center in MNIST dataset, near to bottom of the image resulted in harder or flase classification if we place the new number for classifying in the center of image. I checked the theory shifting the 7s and 9s towards to the bottom, so lefting more place at the top of the image and the result was almost 100% accuracy.
As this is a spatial type problem, I guess that, with CNN we could eliminate it with more effectiveness. However would be better, if MNIST was alligned to center, or we can do it programatically to avoid the issue.

What was your test score,on MNIST dataset?
And one thing that is coming to my mind that your images are missing thresholding,
Thresholding is a technique where the pixel value below a certain pixel is made to zero,See OpenCV thresholding examples anywhere,You probaly need to use inverse thresholding and check your results again.
Do,inform if there is some progress.

The main problem you have is that the images you are testing are different from the MNIST images, probably due to the preparation of images you have done, can you show an image from the ones you are testing with after that you apply prepare_image on it.

tensorflow input pipeline returns multiple values

I'm trying to make an input pipeline in tensorflow for image classification, therefore I want to make batches of images and corresponding labels. The Tensorflow document suggests that we can use tf.train.batch to make batches of inputs:
train_batch, train_label_batch = tf.train.batch(
[train_image, train_image_label],
batch_size=batch_size,
num_threads=1,
capacity=10*batch_size,
enqueue_many=False,
shapes=[[224,224,3], [len(labels),]],
allow_smaller_final_batch=True
)
However, I'm thinking would it be a problem if I feed in the graph like this:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=Model(train_batch)))
The question is does the operation in the cost function dequeues images and their corresponding labels, or it returns them separately? Therefore causing the training with wrong images and labels.

There are several things you need to consider to preserve the ordering of images and labels.
let's say we need a function that gives us images and labels.
def _get_test_images(_train=False):
"""
Gets the test images and labels as a batch
Inputs:
======
_train : Boolean if images are from training set
random_crop : Boolean if random cropping is allowed
random_flip : Boolean if random horizontal flip is allowed
distortion : Boolean if distortions are allowed
Outputs:
========
images_batch : Batch of images containing BATCH_SIZE images at a time
label_batch : Batch of labels corresponding to the images in images_batch
idx : Batch of indexes of images
"""
#get images and labels
_,_img_names,_img_class,index= _get_list(_train = _train)
#total number of distinct images used for train will be equal to the images
#fed in tf.train.slice_input_producer as _img_names
img_path,label,idx = tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False)
img_path,label,idx = tf.convert_to_tensor(img_path),tf.convert_to_tensor(label),tf.convert_to_tensor(idx)
img_path = tf.cast(img_path,dtype=tf.string)
#read file
image_file = tf.read_file(img_path)
#decode jpeg/png/bmp
#tf.image.decode_image won't give shape out. So it will give error while resizing
image = tf.image.decode_jpeg(image_file)
#image preprocessing
image = tf.image.resize_images(image, [IMG_DIM,IMG_DIM])
float_image = tf.cast(image,dtype=tf.float32)
#subtracting mean and divide by standard deviation
float_image = tf.image.per_image_standardization(float_image)
#set the shape
float_image.set_shape(IMG_SIZE)
labels_original = tf.cast(label,dtype=tf.int32)
img_index = tf.cast(idx,dtype=tf.int32)
#parameters for shuffle
batch_size = BATCH_SIZE
min_fraction_of_examples_in_queue = 0.3
num_preprocess_threads = 1
num_examples_per_epoch = MAX_TEST_EXAMPLE
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
images_batch, label_batch,idx = tf.train.batch(
[float_image,label,img_index],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.summary.image('images', images_batch)
return images_batch, label_batch,idx
Here,tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False) is an interesting thing to look at where if you put shuffle=True it will shuffle all three arrays in coordination.
Second thing is, num_preprocess_threads. As long as you are using single threads for dequeue operation, batches will come out in a deterministic way. But more than one threads will shuffle the arrays randomly. for example for image 0001.jpg if True label is 1 you might get 2 or 4. Once its dequeue it is in tensor form. tf.nn.softmax_cross_entropy_with_logits shouldn't have problem with such tensors.

Inception: How to process image to use with Inception

I want to make tensorflow's inception v3 to give out tags for an image. My goal is to convert a JPEG image to input that is accepted by inception neural network. I don't know how to process the images first so that it can run with Google Inception's v3 model. The original tensorflow project is here:
https://github.com/tensorflow/models/tree/master/inception
Originally, all the images are in a dataset and the entire dataset is first passed to input() or distorted_inputs() in ImageProcessing.py . The images in dataset are processed and passed to the train() or eval() methods (both of these work). The problem is I want a function to print out tags for one specific image (not dataset).
Below is the code for inference function that is used to generate tag with google inception. inceptionv4 function is a convolutional neural network implemented in tensorflow.
def inference(images, num_classes, for_training=False, restore_logits=True,
scope=None):
"""Build Inception v3 model architecture.
See here for reference: http://arxiv.org/abs/1512.00567
Args:
images: Images returned from inputs() or distorted_inputs().
num_classes: number of classes
for_training: If set to `True`, build the inference model for training.
Kernels that operate differently for inference during training
e.g. dropout, are appropriately configured.
restore_logits: whether or not the logits layers should be restored.
Useful for fine-tuning a model with different num_classes.
scope: optional prefix string identifying the ImageNet tower.
Returns:
Logits. 2-D float Tensor.
Auxiliary Logits. 2-D float Tensor of side-head. Used for training only.
"""
# Parameters for BatchNorm.
batch_norm_params = {
# Decay for the moving averages.
'decay': BATCHNORM_MOVING_AVERAGE_DECAY,
# epsilon to prevent 0s in variance.
'epsilon': 0.001,
}
# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.ops.conv2d, slim.ops.fc], weight_decay=0.00004):
with slim.arg_scope([slim.ops.conv2d],
stddev=0.1,
activation=tf.nn.relu,
batch_norm_params=batch_norm_params):
logits, endpoints = inception_v4(
images,
dropout_keep_prob=0.8,
num_classes=num_classes,
is_training=for_training,
scope=scope)
# Add summaries for viewing model statistics on TensorBoard.
_activation_summaries(endpoints)
# Grab the logits associated with the side head. Employed during training.
auxiliary_logits = endpoints['AuxLogits']
return logits, auxiliary_logits
This is my attempt to process the image before it is passed to inference function.
def process_image(self, image_path):
filename_queue = tf.train.string_input_producer(image_path)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
img = tf.image.decode_jpeg(value)
height = self.image_size
width = self.image_size
image_data = tf.cast(img, tf.float32)
image_data = tf.reshape(image_data, shape=[1, height, width, 3])
return image_data
I wanted to process an image file simply so that I can pass it to the inference function. And that inference prints out the tags. The above code didn't work and printed error:
ValueError: Shape () must have rank at least 1
I appreciate if anyone can provide any insight into this problem.

Inception just needs (299,299,3) images with inputs scaled between -1 and 1. See code below. I just change the images using this and put them in a TFRecord ( and then queue ) to run my stuff.
from PIL import Image
import PIL
import numpy as np
def load_image( self, image_path ):
img = Image.open( image_path )
newImg = img.resize((299,299), PIL.Image.BILINEAR).convert("RGB")
data = np.array( newImg.getdata() )
return 2*( data.reshape( (newImg.size[0], newImg.size[1], 3) ).astype( np.float32 )/255 ) - 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.