I am solving a Multiview Classification problem using VGG16 pretrained model. In my case, I have 4 views that are my inputs and they are of size (64,64,3). But VGG16 uses input size of (224,224,3).
Now for solving the problem, I am supposed to create my own data loader instead of using quick built-in methods like keras load_img() or openCV imread(). So I am doing all this with plain numpy arrays.
I am trying to resize the shape of my input from 64x64 to 224X224. But I am unable to do it, it keeps throwing one error or another. This is my code for data loader:
def data_loader(dataframe, classDict, basePath, batch_size=16):
while True:
x_batch = np.zeros((batch_size, 4, 64, 64, 3)) #Create a zeros array for images
y_batch = np.zeros((batch_size, 20)) #Create a zeros array for classes
for i in range(0, batch_size):
rndNumber = np.random.randint(len(dataframe))
*images, class_id = dataframe.iloc[rndNumber]
for j in range(4):
x_batch[i,j] = plt.imread(os.path.join(basePath, images[j])) / 255.
# x_batch[i,j] = x_batch[i,j].resize(1, 224, 224, 3) #<--- Try(1)
class_id = classDict[class_id]
y_batch[i, class_id] = 1.0
# yield {'image1': np.resize(x_batch[:, 0],(batch_size, 224, 224, 3)), #<--- Try(2)
# 'image2': np.resize(x_batch[:, 1],(1, 224, 224, 3)),
# 'image3': np.resize(x_batch[:, 2],(1, 224, 224, 3)),
# 'image4': np.resize(x_batch[:, 3],(1, 224, 224, 3)) }, {'class_out': y_batch} #'yield' is a keyword that is used like return, except the function will return a generator"
yield {'image1': x_batch[:, 0],
'image2': x_batch[:, 1],
'image3': x_batch[:, 2],
'image4': x_batch[:, 3], }, {'class_out': y_batch}
## Testing the data loader
example, lbl= next(data_loader(df_train, classDictTrain, basePath))
print(example['image1'].shape) #example['image1'][0].shape
I have made several attempts to resizing the images. I am listing them below with error messages I am receiving with each TRY:
Try(1) : Using x_batch[i,j] = x_batch[i,j].resize(1, 224, 224, 3) >> Error: ValueError: cannot resize this array: it does not own its data
Try(2) : Using yield {'image1': np.resize(x_batch[:, 0],(batch_size, 224, 224, 3)), ....... } >> The output shape is (16, 224, 224, 3) which seems fine but when I plot this, the resultant is an image like this
where I need original image just bigger in size like this
Please tell me what am I doing wrong and how can I fix it?
If I understand your problem correctly, you have an image which is 64x64, and you want to upscale it to a resolution of 224x224. Notice that the latter resolution contains many more pixels and you cannot simply force a reshape, because the original image has way less pixel.
You have to upsample the image, generating the missing pixels. A tool you can try is PIL Resize function which can be used with different resampling filters.
As far as I know, numpy does not easily support upscaling filters. Check out this post to understand how to convert a PIL image to a numpy array and you are ready to go.
I would like to save the tensor image as a transparent image because I want to merge two images. I have tried different solutions but always there are tensor reshape problems. I am unable to do so. The shape of the tensor is torch.Size([1, 1, 256, 256])
from torchvision.utils import save_image
image = net_G(input_image)
# transform = T.ToPILImage()
# image = torch.squeeze(image, 0).shape
# img = transform(image)
save_image(img, full_output_dir+'/%s.jpg' % name, transparency=255)
I have created my own custom dataset (with 2 classes) with the following code:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
ds_train = tf.keras.preprocessing.image_dataset_from_directory(
labels = 'inferred', # from subfolders in alphabetical order
label_mode = "int",
class_names = ["CVS", "No_CVS"],
color_mode = 'rgb',
batch_size = 2,
image_size = (224, 224),
shuffle = True, # randomized order of images
seed = 123, #set the seed if train, valid images are the same when you run again
validation_split = 0.1,
subset = "training"
df_train results in:
<BatchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int32)>
Now, I want to visualize my data by looking at 9 images:
for i, (image, label) in enumerate(ds_train.take(9)):
ax = plt.subplot(3, 3, i + 1)
However, I get the following error:
line 61, in
TypeError: Invalid shape (2, 224, 224, 3) for image data
I'm looking for a way to resolve this, and be able to plot my images with matplotlib.
More importantly, it seems that the data of the data cannot be used when training the model either as I get this error:
ValueError: Input 0 is incompatible with layer EfficientNet: expected shape=(None, 224, 224, 3), found shape=(2, None, 224, 224, 3)
After running the Keras example code I found here (where I created ds_train with the image_dataset_from_directory instead of the tdsf.load() function).
So I think there is something going wrong in the way I created the ds_train. Any resolutions are very welcome.
It seems like you are leaving the batch_size in, when you do:
With your original code you still won't be able to see 9 images because of your batch_size. I think it will be fine if you do it like:
No errors should be thrown like TypeError: Invalid shape...:
Furthermore you can do following to see batch_size:
for img_batch_size, labels_batch_size in train_df:
For your case img_batch_size.shape should print (2,224,224,3) where this tuple corresponds to image tensor.
For input_shape problem, you need to add your model so we can see what's wrong with input_shape.
First off, why am I using Keras? I'm trying to stay as high level as possible, which doesn't mean I'm scared of low-level Tensorflow; I just want to see how far I can go while keeping my code as simple and readable as possible.
I need my Keras model (custom-built using the Keras functional API) to read the left image from a stereo pair and minimize a loss function that needs to access both the right and left images. I want to store the data in a tf.data.Dataset.
What I tried:
Reading the dataset as (left image, right image), i.e. as tensors with shape ((W, H, 3), (W, H, 3)), then use function closure: define a keras_loss(left_images) that returns a loss(y_true, y_pred), with y_true being a tf.Tensor that holds the right image. The problem with this approach is that left_images is a tf.data.Dataset and Tensorflow complains (rightly so) that I'm trying to operate on a dataset instead of a tensor.
Reading the dataset as (left image, (left image, right image)), which should make y_true a tf.Tensor with shape ((W, H, 3), (W, H, 3)) that holds both the right and left images. The problem with this approach is that it...does not work and raises the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), for inputs ['tf_op_layer_resize/ResizeBilinear']
but instead got the following list of 2 arrays: [<tf.Tensor 'args_1:0'
shape=(None, 512, 256, 3) dtype=float32>, <tf.Tensor 'args_2:0'
shape=(None, 512, 256, 3) dtype=float32>]...
So, is there anything I did not consider? I read the documentation and found nothing about what gets considered as y_pred and what as y_true, nor about how to convert a dataset into a tensor smartly and without loading it all in memory.
My model is designed as such:
def my_model(input_shape):
width = input_shape[0]
height = input_shape[1]
inputs = tf.keras.Input(shape=input_shape)
# < a few more layers >
outputs = tf.image.resize(tf.nn.sigmoid(tf.slice(disp6, [0, 0, 0, 0], [-1, -1, -1, 2])), tf.Variable([width, height]))
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
And my dataset is built as such (in case 2, while in case 1 only the function read_stereo_pair_from_line() changes):
def read_img_from_file(file_name):
img = tf.io.read_file(file_name)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_png(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [args.input_width, args.input_height])
def read_stereo_pair_from_line(line):
split_line = tf.strings.split(line, ' ')
return read_img_from_file(split_line[0]), (read_img_from_file(split_line[0]), read_img_from_file(split_line[1]))
# Dataset loading
list_ds = tf.data.TextLineDataset('test/files.txt')
images_ds = list_ds.map(lambda x: read_stereo_pair_from_line(x))
images_ds = images_ds.batch(1)
Solved. I just needed to read the dataset as (left image, [left image, right image]) instead of (left image, (left image, right image)) i.e. make the second item a list and not a tuple. I can then access the images as input_r = y_true[:, 1, :, :] and input_l = y_true[:, 0, :, :]
I am using Transfer learning for recognizing objects. I used trained VGG16 model as the base model and added my classifier on top of it using Keras. I then trained the model on my data, the model works well. I want to see the feature generated by the intermediate layers of the model for the given data. I used the following code for this purpose:
def ModeloutputAtthisLayer(model, layernme, imgnme, width, height):
layer_name = layernme
intermediate_layer_model = Model(inputs=model.input,
img = image.load_img(imgnme, target_size=(width, height))
imageArray = image.img_to_array(img)
image_batch = np.expand_dims(imageArray, axis=0)
processed_image = preprocess_input(image_batch.copy())
intermediate_output = intermediate_layer_model.predict(processed_image)
print("outshape of ", layernme, "is ", intermediate_output.shape)
In the code, I used np.expand_dims to add one extra dimension for the batch as the input matrix to the network should be of the form (batchsize, height, width, channels). This code works fine. The shape of the feature vector is 1, 224, 224, 64.
Now I wish to display this as image, for this I understand there is an additional dimension added as batch so I should remove it. Following this I used the following lines of the code:
imge = np.squeeze(intermediate_output, axis=0)
However it throws an error:
"Invalid dimensions for image data"
I wonder how can I display the extracted feature vector as an image. Any suggestion please.
Your feature shape is (1,224,224,64), you cannot directly plot a 64 channel image. What you can do is plot the individual channels independently like following
imge = np.squeeze(intermediate_output, axis=0)
filters = imge.shape[2]
plt.figure(1, figsize=(32, 32)) # plot image of size (32x32)
n_columns = 8
n_rows = math.ceil(filters / n_columns) + 1
for i in range(filters):
plt.subplot(n_rows, n_columns, i+1)
plt.title('Filter ' + str(i))
plt.imshow(imge[:,:,i], interpolation="nearest", cmap="gray")
This will plot 64 images in 8 rows and 8 columns.
A possible way to go consists in combining the 64 channels into a single-channel image through a weighted sum like this:
weighted_imge = np.sum(imge*weights, axis=-1)
where weights is an array with 64 weighting coefficients.
If you wish to give all the channels the same weight you could simply compute the average:
weighted_imge = np.mean(imge, axis=-1)
import numpy as np
import matplotlib.pyplot as plt
intermediate_output = np.random.randint(size=(1, 224, 224, 64),
low=0, high=2**8, dtype=np.uint8)
imge = np.squeeze(intermediate_output, axis=0)
weights = np.random.random(size=(imge.shape[-1],))
weighted_imge = np.sum(imge*weights, axis=-1)
In [33]: intermediate_output.shape
Out[33]: (1, 224, 224, 64)
In [34]: imge.shape
Out[34]: (224, 224, 64)
In [35]: weights.shape
Out[35]: (64,)
In [36]: weighted_imge.shape
Out[36]: (224, 224)
I'm fairly new to TensorFlow and Image Classification, so I may be missing key knowledge and is probably why I'm facing this issue.
I've built a ResNet50 model in TensorFlow for the purpose of image classification of Dog Breeds using the ImageNet library and I have successfully trained a neural network which can detect various Dog Breeds.
I'm now at the point in which I would like to pass a random image of a dog to my model for it to spit out an output on what it thinks the dog breed is. However, when I run this function, dog_breed_predictor("<file path to image>"), I get the error expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048) when it tries to execute the line Resnet50_model.predict(bottleneck_feature) and I don't know how to get around this.
Here's the code. I've provided all that I feel is relevant to the problem.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from tqdm import tqdm
from sklearn.datasets import load_files
np_utils = tf.keras.utils
# define function to load train, test, and validation datasets
def load_dataset(path):
data = load_files(path)
dog_files = np.array(data['filenames'])
dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
return dog_files, dog_targets
# load train, test, and validation datasets
train_files, train_targets = load_dataset('dogImages/dogImages/train')
valid_files, valid_targets = load_dataset('dogImages/dogImages/valid')
test_files, test_targets = load_dataset('dogImages/dogImages/test')
#define Resnet50 model
Resnet50_model = ResNet50(weights="imagenet")
def path_to_tensor(img_path):
#loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
#convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
#convert 3D tensor into 4D tensor with shape (1, 224, 224, 3)
return np.expand_dims(x, axis=0)
from keras.applications.resnet50 import preprocess_input, decode_predictions
def ResNet50_predict_labels(img_path):
#returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(Resnet50_model.predict(img))
###returns True if a dog is detected in the image stored at img_path
def dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))
###Obtain bottleneck features from another pre-trained CNN
bottleneck_features = np.load("bottleneck_features/DogResnet50Data.npz")
train_DogResnet50 = bottleneck_features["train"]
valid_DogResnet50 = bottleneck_features["valid"]
test_DogResnet50 = bottleneck_features["test"]
###Define your architecture
Resnet50_model = tf.keras.Sequential()
Resnet50_model.add(tf.contrib.keras.layers.Dense(133, activation="softmax"))
###Compile the model
Resnet50_model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])
###Train the model
checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath="saved_models/weights.best.ResNet50.hdf5",
verbose=1, save_best_only=True)
Resnet50_model.fit(train_DogResnet50, train_targets,
validation_data=(valid_DogResnet50, valid_targets),
epochs=20, batch_size=20, callbacks=[checkpointer])
###Load the model weights with the best validation loss.
###Calculate classification accuracy on the test dataset
Resnet50_predictions = [np.argmax(Resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_DogResnet50]
#Report test accuracy
test_accuracy = 100*np.sum(np.array(Resnet50_predictions)==np.argmax(test_targets, axis=1))/len(Resnet50_predictions)
print("Test accuracy: %.4f%%" % test_accuracy)
def extract_Resnet50(tensor):
from keras.applications.resnet50 import ResNet50, preprocess_input
return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))
def dog_breed(img_path):
#extract bottleneck features
bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
#obtain predicted vector
predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
#return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]
def dog_breed_predictor(img_path):
#determine the predicted dog breed
breed = dog_breed(img_path)
#display the image
img = cv2.imread(img_path)
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
#display relevant predictor result
if dog_detector(img_path):
print("This is a dog and its breed is: " + str(breed))
elif face_detector(img_path):
print("This is a human but it looks like a: " + str(breed))
print("I don't know what this is.")
The image I'm feeding into my function is from the same dataset that was used to train the model - I wanted to see myself if the model is working as intended - so this error makes it extra confusing. What could I be doing wrong?
Thanks to nessuno's assistance, I figured out the issue. The problem was indeed with the pooling layer of ResNet50.
The following code in my script above:
return ResNet50(weights='imagenet',
returns a shape of (1, 7, 7, 2048) (admittedly though, I do not fully understand why). To get around this, I added in the parameter pooling="avg" as so:
return ResNet50(weights='imagenet',
This instead returns a shape of (1, 2048) (again, admittedly, I do not know why.)
However, the model still expects a 4-D shape. To get around this I added in the following code in my dog_breed() function:
print(bottleneck_feature.shape) #returns (1, 2048)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.
and this returns a shape of (1, 1, 1, 1, 2048). For some reason, the model still complained it was a 3D shape when I only added 2 more dimensions, but stopped when I added a 3rd (this is peculiar, and I would like to find out more about why this is.).
So overall, my dog_breed() function went from:
def dog_breed(img_path):
#extract bottleneck features
bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
#obtain predicted vector
predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
#return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]
to this:
def dog_breed(img_path):
#extract bottleneck features
bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
print(bottleneck_feature.shape) #returns (1, 2048)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.
#obtain predicted vector
predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
#return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]
whilst ensuring the parameter pooling="avg" is added to my call to ResNet50.
The documentation of ResNet50 says something about the constructor parameter input_shape (emphasis is mine):
input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3, 224, 224) (with 'channels_first' data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g. (200, 200, 3) would be one valid value.
My guess is that since you specified include_top to False the network definition pads the input to a bigger shape than 224x224, so when you extract the features you end up with a feature map and not with a feature vector (and that's the cause of your error).
Just try to specify and input_shape in this way:
return ResNet50(weights='imagenet',
input_shape=(224, 224, 3)).predict(preprocess_input(tensor))