I'm fine-tuning the GoogleNet network with Caffe to my own dataset. If I use IMAGE_DATA layers as input learning takes place. However, I need to switch to an HDF5 layer for further extensions that I require. When I use HDF5 layers no learning takes place.
I am using the exact same input images, and the labels match also. I have also checked to ensure that the data in .h5 files can be loaded correctly. It does, and Caffe is also able to find the number of examples I feed it as well as the correct number of classes (2).
This leads me to think that the issue lies in the transformations I am performing manually (since HDF5 layers do not perform any built-in transformations). The code for these is below. I do the following:
Convert image from RGB to BGR
Resize it to 256x256 so I can subtract the mean file from ImageNet (included in the Caffe library)
Since the original GoogleNet prototxt does not divide by 255, I also do not (see here)
I resize the image down to 224x224, which is the crop size required by GoogleNet
I transpose the image as needed to satisfy CxHxW, as required by Caffe
At the moment I am not performing data augmentation, which could be turned on if I let oversample=True.
Can anyone see anything wrong with this approach? Is data augmentation so critical that no learning would take place without it?
The HDF5 conversion code
IMG_RESHAPE = 224
IMG_UNCROPPED = 256
def resize_convert(img_names, path=None, oversample=False):
'''
Load images, set to BGR mode and transpose to CxHxW
and subtract the Imagenet mean. If oversample is True,
perform data augmentation.
Parameters:
---------
img_names (list): list of image names to be processed.
path (string): path to images.
oversample (bool): if True then data augmentation is performed
on each image, and 10 crops of size 224x224 are produced
from each image. If False, then a single 224x224 is produced.
'''
path = path if path is not None else ''
if oversample == False:
all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
else:
all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')
#load the imagenet mean
mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')
for i, img_name in enumerate(img_names):
img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC
#subtract the mean of Imagenet
#First, resize to 256 so we can subtract the mean of dims 256x256
img = img[...,::-1] #Convert RGB TO BGR
img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
img = np.transpose(img, (2, 0, 1)) #HxWxC => CxHxW
#Since mean is given in Caffe channel order: 3xWxH
#Assume it also is given in BGR order
img = img - mean_val
#set to 0-1 range => I don't think googleNet requires this
#I tried both and it didn't make a difference
#img = img/255
#resize images down since GoogleNet accepts 224x224 crops
if oversample == False:
img = np.transpose(img, (1,2,0)) # CxHxW => HxWxC
img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe
all_imgs[i, :, :, :] = img
#oversampling requires HxWxC order
if oversample:
all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe
return all_imgs
Relevant differences between IMAGE_DATA and HDF5 prototxt files
name: "GoogleNet"
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/train_list.txt"
batch_size: 32
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/valid_list.txt"
batch_size:10
}
include: { phase: TEST }
}
Update
When I say no learning is taking place I mean that my training loss is not going down consistently when using HDF5 data compared to the IMG_Data. In the images below, the first plot is plot the change in the training loss for the IMG_DATA network, and the other is the HDF5 data network.
One possibility that I am considering is that the network is overfitting to each of the .h5 that I am feeding it. At the moment I am using data augmentation, but all of the augmented examples are stored into a single .h5 file, along with other examples. However, because all of the augmented versions of a single input image are all contained within the same .h5 file, I think this could cause the network to overfit to that specific .h5 file. However, I am not sure whether this is what the second plot suggests.
I faced the same problem and found out that for some reason doing the transformation manually as you are doing in your code causes the images to be all black (all zeros). try to debug your code and see if that is happening.
the solution is to use the same methodology explained in the Caffe tutorial here
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
the part where you see
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR
then few lines down
image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
Related
I am trying to modify and calling my own model from this website.
but here is my question.
def prepare(filepath):
IMG_SIZE = 70 # 50 in txt-based
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE) # read in the image, convert to grayscale
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) # resize image to match model's expected sizing
return new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 1) # return the image with shaping that TF wants.
My model input is (180x180x3) , and I can't change it to grayscale due to index out of range.
Since I know my channel is 3, I would like to change my array to new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 3), but when it predict in
print(prediction[0][0])
it is not number 0 or 1, so I can't predict my picture.
Please help me to figure out what happened, no matter question 1 or 2.
I appreciate all of your help.
I expect only 1 or 0, so I can classify label "Pass" or "Fail"
In the prepare function, the image is being read in as grayscale and then being resized to (IMG_SIZE, IMG_SIZE). If your model expects 3 channels (RGB) but the image is being converted to grayscale (1 channel), then the model will not be able to process the image correctly and you will not get the expected output.
To fix this issue, you can change the following line:
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE) # read in the image, convert to grayscale
to:
img_array = cv2.imread(filepath) # read in the image
This will read in the image with 3 channels (RGB).
Regarding the second issue, if the prediction is not 0 or 1, then it is likely that the model is not outputting a binary classification. You can try to check the output of the model to see what it is predicting. You can do this by printing out the output of the model before the argmax operation is applied.
If the model is not outputting a binary classification, you may need to modify the model or use a different model that is designed for binary classification.
I've tried to compile the code from this website
https://keras.io/examples/vision/oxford_pets_image_segmentation/
the model does work and i obtain good results but the last part (to actually see the image created) does not work on jupyter notebook (it says that the kernel is overrated) and i don't know why, i tried it on multiple computers but the result is always the same
I kept the same code i just changed num_classes for 256 and I reshaped the input images in 256x25
Here is part that does not work :
// Generate predictions for all images in the validation set
val_gen = OxfordPets(batch_size, img_size, val_input_img_paths, val_target_img_paths)
val_preds = model.predict(val_gen)
def display_mask(i):
"""Quick utility to display a model's prediction."""
mask = np.argmax(val_preds[i], axis=-1)
mask = np.expand_dims(mask, axis=-1)
img = PIL.ImageOps.autocontrast(keras.preprocessing.image.array_to_img(mask))
display(img)
// Display results for validation image #10
i = 10
// Display input image
display(Image(filename=val_input_img_paths[i]))
// Display ground-truth target mask
img = PIL.ImageOps.autocontrast(load_img(val_target_img_paths[i]))
display(img)
// Display mask predicted by our model
display_mask(i) // Note that the model only sees inputs at 150x150.
I would enjoy any help and thank you in advanced for your time.
I have tried the tensorflow example with zalando mnist here:
https://www.tensorflow.org/tutorials/keras/basic_classification
After that I changed the clothes images with handwritten mnist database, which also works.
Now I want to train the AI with the mnist handwritten database, take a picture from my handwritten "1" and let the KI guess the number.
I appended after the trainig of the KI some lines of code.
What I tried is this:
ownPicArr = imageio.imread(filename) #it is a 28x28 PNG file
ownPicArr = ownPicArr / 255.0
pred = model.predict(ownPicArr)
I got following error:
ValueError: Error when checking input: expected flatten_input to have 3 dimensions, but got array with shape (28, 28)
How to solve this problem? Thnak you...
Even if the colours of your picture were inverted, this is how you could perform the predictions using OpenCV
import os, cv2
image=cv2.imread(imagePath)
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
p = np.expand_dims(size_image, 0)
img = tf.cast(p, tf.float32)
pred = model.predict(img)
First we read the image using OpenCV, which stores it as an array. We then convert the array and also specify the colour channels. After Resizing the image we create a batch of a single image and then after changing the datatype to float32 to or the datatype matching your model we finally make predictions
I want to capture frames from a video with python and opencv and then classify the captured Mat images with tensorflow. The problem is that i donĀ“t know how to convert de Mat format to a 3D Tensor variable. This is how i am doing now with tensorflow (loading the image from file) :
image_data = tf.gfile.FastGFile(imagePath, 'rb').read()
with tf.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
I will appreciate any help, thanks in advance
Load the OpenCV image using imread, then convert it to a numpy array.
For feeding into inception v3, you need to use the Mult:0 Tensor as entry point, this expects a 4 dimensional Tensor that has the layout: [Batch index,Width,Height,Channel]
The last three are perfectly fine from a cv::Mat, the first one just needs to be 0, as you do not want to feed a batch of images, but a single image.
The code looks like:
#Loading the file
img2 = cv2.imread(file)
#Format for the Mul:0 Tensor
img2= cv2.resize(img2,dsize=(299,299), interpolation = cv2.INTER_CUBIC)
#Numpy array
np_image_data = np.asarray(img2)
#maybe insert float convertion here - see edit remark!
np_final = np.expand_dims(np_image_data,axis=0)
#now feeding it into the session:
#[... initialization of session and loading of graph etc]
predictions = sess.run(softmax_tensor,
{'Mul:0': np_final})
#fin!
Kind regards,
Chris
Edit: I just noticed, that the inception network wants intensity values normalized as floats to [-0.5,0.5], so please use this code to convert them before building the RGB image:
np_image_data=cv2.normalize(np_image_data.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
With Tensorflow 2.0 and OpenCV 4.2.0, you can convert by this way :
import numpy as np
import tensorflow as tf
import cv2 as cv
width = 32
height = 32
#Load image by OpenCV
img = cv.imread('img.jpg')
#Resize to respect the input_shape
inp = cv.resize(img, (width , height ))
#Convert img to RGB
rgb = cv.cvtColor(inp, cv.COLOR_BGR2RGB)
#Is optional but i recommend (float convertion and convert img to tensor image)
rgb_tensor = tf.convert_to_tensor(rgb, dtype=tf.float32)
#Add dims to rgb_tensor
rgb_tensor = tf.expand_dims(rgb_tensor , 0)
#Now you can use rgb_tensor to predict label for exemple :
#Load pretrain model, made from: https://www.tensorflow.org/tutorials/images/cnn
model = tf.keras.models.load_model('cifar10_model.h5')
#Create probability model
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
#Predict label
predictions = probability_model.predict(rgb_tensor, steps=1)
It looks like you're using the pre-trained and pre-defined Inception model, which has a tensor named DecodeJpeg/contents:0. If so, this tensor expects a scalar string containing the bytes for a JPEG image.
You have a couple of options, one is to look further down the network for the node where the JPEG is converted to a matrix. I'm not sure what the MAT format is, but this will be a [height, width, colour_depth] representation. If you can get your image in that format you can replace the DecodeJpeg... string with the name of the node you want to feed into.
The other option is to simply convert your images to JPEGs and feed them straight in.
You should be able to convert the opencv mat format to a numpy array as:
np_image_data = np.asarray(image_data)
Once you have the data as a numpy array you can pass it to tensor flow through a feeding mechanism as in the link that #thesonyman101 referenced:
feed_dict = {some_tf_input:np_image_data}
predictions = sess.run(some_tf_output, feed_dict=feed_dict)
In my case i had to read an image from file, do some processing and then inject into inception to obtain the return from a features layer, called last layer.
My solution is short but effective.
img = cv2.imread(file)
... do some processing
img_as_string = cv2.imencode('.jpg', img)[1].tostring()
features = sess.run(last_layer, {'DecodeJpeg/contents:0': img_as_string})
Based on the Tensorflow tutorial for ConvNet, some points are not readily apparent to me:
are the images being distorted actually added to the pool of original images?
or are the distorted images used instead of the originals?
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
The flow of functions for the tutorial seems to be as follows:
cifar_10_train.py
def train
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
[...]
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
[...]
cifar10.py
def distorted_inputs():
"""Construct distorted input for CIFAR training using the Reader ops.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
Raises:
ValueError: If no data_dir
"""
if not FLAGS.data_dir:
raise ValueError('Please supply a data_dir')
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
return cifar10_input.distorted_inputs(data_dir=data_dir,
batch_size=FLAGS.batch_size)
and finally cifar10_input.py
def distorted_inputs(data_dir, batch_size):
"""Construct distorted input for CIFAR training using the Reader ops.
Args:
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in xrange(1, 6)]
for f in filenames:
if not tf.gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
# Randomly flip the image horizontally.
distorted_image = tf.image.random_flip_left_right(distorted_image)
# Because these operations are not commutative, consider randomizing
# the order their operation.
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(distorted_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
min_fraction_of_examples_in_queue)
print('Filling queue with %d CIFAR images before starting to train.'
'This will take a few minutes.' % min_queue_examples)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=True)
are the images being distorted actually added to the pool of original images?
It depends on the definition of the pool. In tensorflow, you have ops which are basic objects in your network graph. here, data production is an op itself. Thus you do not have a finite set of training samples, instead you have a potentialy infinite set of samples generated from the training set.
or are the distorted images used instead of the originals?
As you can see from the source you included - sample is taken from the training batch, then it is randomly transformed, thus there is very small probability of using unaltered image (especially that cropping is used, which always modifies).
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
There is no such thing, this is never ending process. Think about this in terms of random access to possibly infinite source of data, as this is what is efficiently happening here. Every single batch can be different from the previous one.