I've tried to compile the code from this website
https://keras.io/examples/vision/oxford_pets_image_segmentation/
the model does work and i obtain good results but the last part (to actually see the image created) does not work on jupyter notebook (it says that the kernel is overrated) and i don't know why, i tried it on multiple computers but the result is always the same
I kept the same code i just changed num_classes for 256 and I reshaped the input images in 256x25
Here is part that does not work :
// Generate predictions for all images in the validation set
val_gen = OxfordPets(batch_size, img_size, val_input_img_paths, val_target_img_paths)
val_preds = model.predict(val_gen)
def display_mask(i):
"""Quick utility to display a model's prediction."""
mask = np.argmax(val_preds[i], axis=-1)
mask = np.expand_dims(mask, axis=-1)
img = PIL.ImageOps.autocontrast(keras.preprocessing.image.array_to_img(mask))
display(img)
// Display results for validation image #10
i = 10
// Display input image
display(Image(filename=val_input_img_paths[i]))
// Display ground-truth target mask
img = PIL.ImageOps.autocontrast(load_img(val_target_img_paths[i]))
display(img)
// Display mask predicted by our model
display_mask(i) // Note that the model only sees inputs at 150x150.
I would enjoy any help and thank you in advanced for your time.
I have trained a classifier with this: https://teachablemachine.withgoogle.com/
Then I set up a python environment where I can run the model. I heard that with some tweaks such model could be turned into a deep dream like model.
Does anyone know how I could tweak the model with keras to generate pictures that it learned co classify? Is it even possible?
Here is my current code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = tensorflow.keras.models.load_model('C:/Users/me/Downloads/keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('C:/Users/me/Downloads/0a8d8fa2c09ed00a54b6590f2fa01436.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
image.show()
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
print(prediction)
The idea is quite simple. you need to feed the image to the model and then maximize the activation of certain layers wrt the image itself not the weights of the model (changing the layers will change the result)
tensorflow made an awesome notebook, check it out here for more information and detailed examples
I have built a neural network to detect handwritten digits using the MNIST dataset.
The network takes an input shape of (28,28) as the training MNIST images are 28x28 grayscale.
I now want to test my neural network on some of my own handwriting.
The images I have are not 28x28 grayscale images so I am trying to convert them so that my model will accept them to make predictions.
Currently I have the following:
img = Image.open('image.png').convert('LA')
newImg = img.resize((28,28), Image.ANTIALIAS)
toPredict = np.array(new_img)
However this is giving my an numpy array of shape (28, 28, 2) I don't understand this.
After conversion to grayscale and resizing I should have a 28x28 shaped array (28 pixels height multiplied by 28 pixels width).
I don't understand why the shape is not that.
Can anyone help me get the shape to be 28x28 (and explain why it isn't already) so I can pass this to my neural network?
Thank you!
You're almost there.
img = Image.open('image.png').convert('LA') is 28x28x2 because it is greyscale with an alpha channel.
Instead convert it to just greyscale with:
img = Image.open('image.png').convert('L')
You can see more information on the modes here:
https://pillow.readthedocs.io/en/latest/handbook/concepts.html#modes
I have tried the tensorflow example with zalando mnist here:
https://www.tensorflow.org/tutorials/keras/basic_classification
After that I changed the clothes images with handwritten mnist database, which also works.
Now I want to train the AI with the mnist handwritten database, take a picture from my handwritten "1" and let the KI guess the number.
I appended after the trainig of the KI some lines of code.
What I tried is this:
ownPicArr = imageio.imread(filename) #it is a 28x28 PNG file
ownPicArr = ownPicArr / 255.0
pred = model.predict(ownPicArr)
I got following error:
ValueError: Error when checking input: expected flatten_input to have 3 dimensions, but got array with shape (28, 28)
How to solve this problem? Thnak you...
Even if the colours of your picture were inverted, this is how you could perform the predictions using OpenCV
import os, cv2
image=cv2.imread(imagePath)
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
p = np.expand_dims(size_image, 0)
img = tf.cast(p, tf.float32)
pred = model.predict(img)
First we read the image using OpenCV, which stores it as an array. We then convert the array and also specify the colour channels. After Resizing the image we create a batch of a single image and then after changing the datatype to float32 to or the datatype matching your model we finally make predictions
I'm fine-tuning the GoogleNet network with Caffe to my own dataset. If I use IMAGE_DATA layers as input learning takes place. However, I need to switch to an HDF5 layer for further extensions that I require. When I use HDF5 layers no learning takes place.
I am using the exact same input images, and the labels match also. I have also checked to ensure that the data in .h5 files can be loaded correctly. It does, and Caffe is also able to find the number of examples I feed it as well as the correct number of classes (2).
This leads me to think that the issue lies in the transformations I am performing manually (since HDF5 layers do not perform any built-in transformations). The code for these is below. I do the following:
Convert image from RGB to BGR
Resize it to 256x256 so I can subtract the mean file from ImageNet (included in the Caffe library)
Since the original GoogleNet prototxt does not divide by 255, I also do not (see here)
I resize the image down to 224x224, which is the crop size required by GoogleNet
I transpose the image as needed to satisfy CxHxW, as required by Caffe
At the moment I am not performing data augmentation, which could be turned on if I let oversample=True.
Can anyone see anything wrong with this approach? Is data augmentation so critical that no learning would take place without it?
The HDF5 conversion code
IMG_RESHAPE = 224
IMG_UNCROPPED = 256
def resize_convert(img_names, path=None, oversample=False):
'''
Load images, set to BGR mode and transpose to CxHxW
and subtract the Imagenet mean. If oversample is True,
perform data augmentation.
Parameters:
---------
img_names (list): list of image names to be processed.
path (string): path to images.
oversample (bool): if True then data augmentation is performed
on each image, and 10 crops of size 224x224 are produced
from each image. If False, then a single 224x224 is produced.
'''
path = path if path is not None else ''
if oversample == False:
all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
else:
all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')
#load the imagenet mean
mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')
for i, img_name in enumerate(img_names):
img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC
#subtract the mean of Imagenet
#First, resize to 256 so we can subtract the mean of dims 256x256
img = img[...,::-1] #Convert RGB TO BGR
img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
img = np.transpose(img, (2, 0, 1)) #HxWxC => CxHxW
#Since mean is given in Caffe channel order: 3xWxH
#Assume it also is given in BGR order
img = img - mean_val
#set to 0-1 range => I don't think googleNet requires this
#I tried both and it didn't make a difference
#img = img/255
#resize images down since GoogleNet accepts 224x224 crops
if oversample == False:
img = np.transpose(img, (1,2,0)) # CxHxW => HxWxC
img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe
all_imgs[i, :, :, :] = img
#oversampling requires HxWxC order
if oversample:
all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe
return all_imgs
Relevant differences between IMAGE_DATA and HDF5 prototxt files
name: "GoogleNet"
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/train_list.txt"
batch_size: 32
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/valid_list.txt"
batch_size:10
}
include: { phase: TEST }
}
Update
When I say no learning is taking place I mean that my training loss is not going down consistently when using HDF5 data compared to the IMG_Data. In the images below, the first plot is plot the change in the training loss for the IMG_DATA network, and the other is the HDF5 data network.
One possibility that I am considering is that the network is overfitting to each of the .h5 that I am feeding it. At the moment I am using data augmentation, but all of the augmented examples are stored into a single .h5 file, along with other examples. However, because all of the augmented versions of a single input image are all contained within the same .h5 file, I think this could cause the network to overfit to that specific .h5 file. However, I am not sure whether this is what the second plot suggests.
I faced the same problem and found out that for some reason doing the transformation manually as you are doing in your code causes the images to be all black (all zeros). try to debug your code and see if that is happening.
the solution is to use the same methodology explained in the Caffe tutorial here
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
the part where you see
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR
then few lines down
image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)