How to resize image regions for CNN? - python

I am using AlexNet for object recognition. I have trained my model using images with size (277,277). Then have used Selective search algorithm to extract regions from image and feeding those regions to network for testing/prediction.
How ever when I resize image regions(from SelectiveSearch), it gives error.
Code For resizing Training Images:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
gray_img = cv2.cvtColor(new_array, cv2.COLOR_BGR2GRAY)
training_data.append([gray_img, class_num])
except Exception as e:
code for resizing selected image regions:
img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.4, min_size=10)
for r in regions:
x, y, w, h = r['rect']
segment = img[y:y + h, x:x + w]
gray_img = cv2.resize(segment, (277, 277))
gray_img = cv2.cvtColor(gray_img, cv2.COLOR_BGR2GRAY)
gray_img = np.array(gray_img).reshape(-1, 277, 277, 1)
gray_img = gray_img / 255.0
prediction = model.predict(gray_img)
it gives error on last line i.e:
prediction = model.predict(gray_img)
and error is:
Error: Error when checking input: expected conv2d_1_input to have shape (227, 227, 1) but got array with shape (277, 277, 1)
When both shapes are same then why it is giving this error.

Your model is expecting a tensor as an input, but you are trying to evaluate on a numpy array. Instead use placeholder of a given shape and then feed your array into this placeholder in a session.
# define a placeholder for input
image = tf.placeholder(dtype=tf.float32, name="image", shape=[277,277,1])
prediction = model.predict(image)
# evaluate each of your resized images in a session
with tf.Session() as sess:
for r in regions:
x, y, w, h = r['rect']
# rest of your code from the loop here
gray_img = gray_img /255.
p =, feed_dict={image: gray_img})
print(p) # to print the prediction of your model for this image
Maybe you should take a look at this question: What's the difference between tf.placeholder and tf.Variable?


Grad Cam outputs for all the images are the same

I am using grad cam to see which regions of the test images are most important for the prediction of resnet50. The output I got has some errors.
Code Snippets:
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import cv2
class GradCAM:
def __init__(self, model, classIdx, layerName=None):
# store the model, the class index used to measure the class
# activation map, and the layer to be used when visualizing
# the class activation map
self.model = model
self.classIdx = classIdx
self.layerName = layerName
# if the layer name is None, attempt to automatically find
# the target output layer
if self.layerName is None:
self.layerName = self.find_target_layer()
def find_target_layer(self):
# attempt to find the final convolutional layer in the network
# by looping over the layers of the network in reverse order
for layer in reversed(self.model.layers):
# check to see if the layer has a 4D output
if len(layer.output_shape) == 4:
# otherwise, we could not find a 4D layer so the GradCAM
# algorithm cannot be applied
raise ValueError("Could not find 4D layer. Cannot apply GradCAM.")
def compute_heatmap(self, image, eps=1e-8):
# construct our gradient model by supplying (1) the inputs
# to our pre-trained model, (2) the output of the (presumably)
# final 4D layer in the network, and (3) the output of the
# softmax activations from the model
gradModel = Model(
outputs=[self.model.get_layer(self.layerName).output, self.model.output])
# record operations for automatic differentiation
with tf.GradientTape() as tape:
# cast the image tensor to a float-32 data type, pass the
# image through the gradient model, and grab the loss
# associated with the specific class index
inputs = tf.cast(image, tf.float32)
(convOutputs, predictions) = gradModel(inputs)
loss = predictions[:, tf.argmax(predictions[0])]
# use automatic differentiation to compute the gradients
grads = tape.gradient(loss, convOutputs)
# compute the guided gradients
castConvOutputs = tf.cast(convOutputs > 0, "float32")
castGrads = tf.cast(grads > 0, "float32")
guidedGrads = castConvOutputs * castGrads * grads
# the convolution and guided gradients have a batch dimension
# (which we don't need) so let's grab the volume itself and
# discard the batch
convOutputs = convOutputs[0]
guidedGrads = guidedGrads[0]
# compute the average of the gradient values, and using them
# as weights, compute the ponderation of the filters with
# respect to the weights
weights = tf.reduce_mean(guidedGrads, axis=(0, 1))
cam = tf.reduce_sum(tf.multiply(weights, convOutputs), axis=-1)
# grab the spatial dimensions of the input image and resize
# the output class activation map to match the input image
# dimensions
(w, h) = (image.shape[2], image.shape[1])
heatmap = cv2.resize(cam.numpy(), (w, h))
# normalize the heatmap such that all values lie in the range
# [0, 1], scale the resulting values to the range [0, 255],
# and then convert to an unsigned 8-bit integer
numer = heatmap - np.min(heatmap)
denom = (heatmap.max() - heatmap.min()) + eps
heatmap = numer / denom
heatmap = (heatmap * 255).astype("uint8")
# return the resulting heatmap to the calling function
return heatmap
def overlay_heatmap(self, heatmap, image, alpha=0.5,
# apply the supplied color map to the heatmap and then
# overlay the heatmap on the input image
heatmap = cv2.applyColorMap(heatmap, colormap)
output = cv2.addWeighted(image, alpha, heatmap, 1 - alpha, 0)
# return a 2-tuple of the color mapped heatmap and the output,
# overlaid image
return (heatmap, output)
Code Snippet for visualising heatmap:
import random
num_images = 5
random_indices = random.sample(range(len(X_test)), num_images)
for idx in random_indices:
image = X_test[idx] #assuming the image array is the first element in the tuple
# print(image)
# image = cv2.resize(image, (224, 224))
image1 = image.astype('float32') / 255
image1 = np.expand_dims(image1, axis=0)
preds = model.predict(image1)
i = np.argmax(preds[0])
icam = GradCAM(model, i, 'conv5_block3_out')
heatmap = icam.compute_heatmap(image1)
heatmap = cv2.resize(heatmap, (224, 224))
(heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)
fig, ax = plt.subplots(1, 3)
The output:
The problem I am facing is, here in the output you can see the original images are different but the heatmaps, images, and grad cam are the same for all the images. I don't know whats the reason behind this.

Negative confidences in TFlite inference

I trained my own tflite classification model having 3 classes following this tutorial and now try to test it by applying it to a video feed. Here is my inference code:
import cv2
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image
import tensorflow.lite as tflite
Model_Path = "/path/to/model.tflite"
labels = ["class1", "class2", "class3"]
##Load tflite model and allocate tensors
interpreter = tflite.Interpreter(model_path=Model_Path)
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]["shape"]
vid_file = "/path/to/video.mp4"
# Create a VideoCapture object and read from input file
cap = cv2.VideoCapture(vid_file)
while cap.isOpened():
_, frame =
cv_image = preprocess(frame)
##Converting image into tensor
image = np.array(cv_image, dtype=np.float32)
input_tensor = np.array(np.expand_dims(image, 0))
interpreter.set_tensor(input_details[0]["index"], input_tensor)
output_details = interpreter.get_output_details()
output_data = interpreter.get_tensor(output_details[0]["index"])
pred = np.squeeze((output_data))
classi = np.argmax(pred)
# write prediction in the corner
(10, 50),
(255, 255, 255),
cv2.namedWindow("cv_image", cv2.WINDOW_NORMAL)
cv2.imshow("cv_image", frame)
##Use p to pause the video and use q to termiate the program
key = cv2.waitKey(1) & 0xFF
if key == ord("q"):
elif key == ord("p"):
with preprocess() defined as:
def preprocess(image):
*** some image cropping, just as for training data ***
# resize image to 224x224
image = cv2.resize(image, (224, 224))
new_img = image.astype(np.float32)
new_img /= 255.0
return image
The prediction seems to be okay using argmax, but if I look at the confidence values, they are all negative (most of the time):
[-2.3782427 -1.6677225 -3.0637422]
[-2.4214256 -1.2143787 -3.4843316]
[-1.6566806 -2.1574929 -3.1999807]
[-1.9782547 -2.7043173 -2.0971687]
This is quite problematic, because on one hand it makes me doubt that everything works really as it should, and on the other I cannot have any post-processing logic to rule out false positives (like 2 classes with more than 50% or so).
Does anyone know what the issue could be? Previously I made the mistake that the preprocessing didn't normalise the image as done in the training. Could I still have a difference that I don't see?

ValueError: could not broadcast input array from shape (224,224,4) into shape (224,224,3) , error while testing with GRAYSCALE IMAGES

The following code works great with RGB images but not working with GRAYSCALE images, Also I need to know why grayimages are having shape as (224,224,4) , according to my knowledge it should be (224,224,1).
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
model = tensorflow.keras.models.load_model('models/keras_model.h5')
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
size = (224, 224)
def classify(img_path):
image =
image =, size, Image.ANTIALIAS)
image_array = np.asarray(image)
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
data[0] = normalized_image_array
prediction = model.predict(data)
if prediction[0][-1] == 1:
return False
return True
For the benefit of community providing solution here
Grayscale images have 1 channel, RGB images have 3, and
RGBA has 4 channels last channel represents alpha. You can try image ='RGB') (paraphrased from Frightera)
Working code as shown below
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
model = tensorflow.keras.models.load_model('models/keras_model.h5')
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
size = (224, 224)
def classify(img_path):
image ='RGB')
image =, size, Image.ANTIALIAS)
image_array = np.asarray(image)
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
data[0] = normalized_image_array
prediction = model.predict(data)
if prediction[0][-1] == 1:
return False
return True

How to get training data for Keras Sequential CNN into the correct tensor shape?

I have a 4 dimensional tensor of image pixel data (Red(height, width), Green (height, width), Blue (height, width), 14000 examples) and a CSV file containing the coordinates of the bounding boxes that each image has ie, (Image name, X1, Y1, X2, Y2), it has 14000 rows, one for each example, as well.
How do I feed this data to my neural network? Currently, if I try feeding the tensor it passes the entire array of 14000 examples against one row of (X1,Y1,X2,Y2) {it should have passed one array for one row of x1,y1,x2,y2}.
Any idea how to fix this?
Here's the code and the associated error:
train_csv = pd.read_csv('datasets/training.csv').values
test_csv = pd.read_csv('datasets/test.csv').values
y_train = train_csv[:,[1,2,3,4]] #done
x_train_names = train_csv[:,0] #obtained names of images in array
#### load images into an array ####
X_train = []
path = "datasets/images/images/"
imagelist = listdir(path)
for i in range(len(x_train_names)):
img_name = x_train_names[i]
img = + str(img_name))
arr = array(img)
#### building a very basic classifier, just to get some result ####
classifier = Sequential()
classifier.add(Convolution2D(64,(3,3),input_shape=(64,64,3), activation =
classifier.add(Convolution2D(32,(2,2), activation = 'relu'))
classifier.add(Dense(16, activation = 'relu'))
classifier.compile('adam','binary_crossentropy',['accuracy']),y=y_train, steps_per_epoch=80, batch_size=32,
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 14000 arrays:
[array([[[141, 154, 144],
[141, 154, 144],
[141, 154, 144],
[149, 159, 150],
[150, 160, 151],
[150, 160, 151]],
[[140, 153, 143],
EDIT: I converted all my images to grayscale so I don't get a memory error. This means that my X_train should have 1 dimension along the number of channels (earlier, RGB). Here's my edited code:
y_train = train_csv[:,[1,2,3,4]] #done
x_train_names = train_csv[:,0] #obtained names of images in array
# load images into an array
path = "datasets/images/images/"
imagelist = listdir(path)
img_name = x_train_names[0]
X_train = np.ndarray((14000,img.height,img.width,1))
for i in range(len(x_train_names)):
img_name = x_train_names[i]
img = + str(img_name)).convert('L')
##converting image to grayscale because I get memory error else
X_train[i,:,:,:] = np.asarray(img)
ValueError: could not broadcast input array from shape (480,640) into shape (480,640,1)
(At X_train[i,:,:,:] = np.asarray(img) line)
The first step is always to find out which input shape your first convolution layer expects. The documentation of tf.nn.conv2d states that the expected shape of the 4D input tensor is [batch, in_height, in_width, in_channels].
To load the data we can use a numpy ndarray. For that we should know the number of images you want to load, as well as the dimensions of the images:
path = "datasets/images/images/"
imagelist = listdir(path)
img_name = x_train_names[0]
img = + str(img_name))
X_train = np.ndarray((len(imagelist),img.height,img.width,3))
for i in range(len(x_train_names)):
img_name = x_train_names[i]
img = + str(img_name))
X_train[i,:,:,:] = np.asarray(img)
The shape property of your X_train tensor should give you then:
> (len(x_train_names), img.height, img.width, 3)
To load the images in multiple batches you could do something like this:
#### Build and compile your classifier up here here ####
num_batches = 5
len_batch = np.floor(len(x_train_names)/num_batches).astype(int)
X_train = np.ndarray((len_batch,img.height,img.width,3))
for batch_idx in range(num_batches):
idx_start = batch_idx*len_batch
idx_end = (batch_idx+1)*len_batch-1
x_train_names_batch = x_train_names[idx_start:idx_end]
for i in range(len(x_train_names_batch)):
img_name = x_train_names_batch[i]
img = + str(img_name))
X_train[i,:,:,:] = np.asarray(img),y=y_train, steps_per_epoch=num_batches, batch_size=len(x_train_names_batch), epochs=2)

Open CV ValueError: total size of new array must be unchanged

I am new to OpenCV and TensorFlow. I am trying to get a live camera preview and use the live camera feed for TensorFlow prediction. Here is the part of code for live preview and prediction:
image = np.zeros((64, 64, 3))
softmax_pred = tf.nn.softmax(conv_net(x, weights, biases, image_size, 1.0))
cam = cv2.VideoCapture(0)
while True:
ret_val, img =
img = cv2.flip(img,1)
cv2.imshow('my webcam',img)
img = img.resize((64,64))
image = array(img).reshape(1,64,64,3)
result =, feed_dict={x: image})
I am not sure what's wrong here. I am getting this error:
image = array(img).reshape(1,64,64,3)
ValueError: total size of new array must be unchanged
My Tensor placeholder for image has the shape Tensor '(?, 64, 64, 3)'. I did the same for jpeg image by manually loading an image from disk and reshaping that image to (1,64,643) and it works fine.Here is the code for manually loading an image and then predicting:
img ='/home/pragyan/Documents/miniProject/PredictImages/IMG_4804.JPG')
img = img.resize((64, 64))
image = array(img).reshape(1,64,64,3)
result =, feed_dict={x: image})
The above code works but while reshaping a live frame from webcam gives me this error(ValueError: total size of new array must be unchanged). Is there a way to fix this? I am not able to understand how to fix it.

