(Attempt #2, see below for history)
I'm a total newbie with Deep Learning and Keras, and trying to play with the MNIST sample, following the tutorial at https://elitedatascience.com/keras-tutorial-deep-learning-in-python.
With regard to that sample, I just made some minimal adjustments to the code, to make it compatible with the latest API. Then, I added a sample Python script which loads the saved model and a BMP image file (specified as the script first argument), and outputs the class prediction in the form of an exit code.
I trained the model getting accuracy 0.9911, and saved it.
Yet, when testing with some BMP samples I get inconsistent results: only some of the digits are recognized correctly. Probably I'm missing something obvious in preprocessing my data. After Toyo's reply, I changed both the scripts, following https://nextjournal.com/schmudde/ml4a-mnist, so here I quote the newer versions. The older ones are at the bottom.
I also added a sample visualization of the preprocessed array representing an image, and looking at it I can see the digit, 'white on black', as expected (in the predict preprocessing I added color inversion by setting each value equal to 255-value, as my input BMP are 'black on white').
Yet, the issue is still there.
Thanks!
New Scripts
Training:
# numpy with a specific seed for reproducibility
import numpy as np
np.random.seed(123)
# keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
# load pre-shuffled MNIST data into train and test sets
# MNIST: 60k samples of 28x28 grayscale (0-255) images,
# and additional 10k samples for test.
((x_train, y_train), (x_test, y_test)) = mnist.load_data()
n_train, height, width = x_train.shape
n_test, _, _ = x_test.shape
print('x_train shape: {}'.format(x_train.shape))
print('n_train: {}'.format(n_train))
print('height: {}'.format(height))
print('width: {}'.format(width))
print('n_test: {}'.format(n_test))
# we require shape (count, height, width, channel), where count
# is the count of our samples, and channel is the number of channels,
# here equal to 1 as these are grayscale images (for RGB it would be 3).
x_train = x_train.reshape(n_train, height, width, 1).astype('float32')
x_test = x_test.reshape(n_test, height, width, 1).astype('float32')
# normalize from [0, 255] to [0, 1]
x_train /= 255
x_test /= 255
# show image array
print('Sample array: {}\n'.format(x_train[0].shape))
print(x_train[0])
from matplotlib import pyplot as plt
plt.imshow(x_train[0].reshape(28, 28))
plt.show()
# the labels need to be converted into one-hot vectors,
# which are nc-element arrays (nc is the number of classes),
# which are 0 for all classes except 1 for the class the label
# is assigned to.
# convert integer labels into one-hot vectors
n_classes = 10
y_train = np_utils.to_categorical(y_train, n_classes)
y_test = np_utils.to_categorical(y_test, n_classes)
# define model architecture.
model = Sequential()
# number of convolutional filters
n_filters = 32
# convolution filter size
n_conv = 3
# pooling window size
n_pool = 2
# (1) convolution
model.add(Convolution2D(
n_filters,
kernel_size=(n_conv, n_conv),
# we have a 28x28 single channel (grayscale) image
# so the input shape should be (28, 28, 1)
input_shape=(height, width, 1)
))
model.add(Activation('relu'))
model.add(Convolution2D(n_filters, kernel_size=(n_conv, n_conv)))
model.add(Activation('relu'))
# apply pooling to summarize the features extracted thus far
model.add(MaxPooling2D(pool_size=(n_pool, n_pool)))
# (2) connected layers
model.add(Dropout(0.25))
# flatten the data for the 1D layers
model.add(Flatten())
# Dense(n_outputs)
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
# (3) the softmax output layer gives us a probablity for each class
model.add(Dense(n_classes))
model.add(Activation('softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
# how many examples to look at during each update step
batch_size = 128
# how many times to run through the full set of examples
n_epochs = 10
model.fit(x_train,
y_train,
batch_size=batch_size,
epochs=n_epochs,
validation_data=(x_test, y_test))
# how'd we do?
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print('loss:', loss)
print('accuracy:', accuracy)
# save the model. Use load_model(path) to load it later
model.save('/projects/python/mnist/mnist.h5')
Prediction:
# get the path to the image to be recognized from 1st arg
import sys
if len(sys.argv) < 2:
print('Missing image file path')
exit(-1)
# load the pretrained MNIST model
print('Loading model...')
import numpy as np
import keras
model = keras.models.load_model('/projects/python/mnist/mnist.h5')
# load the image ensuring that its size is 28x28
print('Loading image ' + sys.argv[1])
from keras.preprocessing import image
x = image.load_img(sys.argv[1], color_mode='grayscale', target_size=(28, 28))
# convert image to NumPy array
x = image.img_to_array(x, data_format='channels_last')
print('np from image:\n')
print(x.shape)
print(x)
# add a major dimension to represent the number of samples, i.e. 1
x = x.reshape(1, 28, 28, 1)
# convert data type to float32 and normalize 0-255 values to range [0, 1]
x = x.astype('float32')
# invert colors, we need white on black
x = 255 - x
x /= 255
print('np before predicting:\n')
print(x.shape)
print(x)
from matplotlib import pyplot as plt
plt.imshow(x.reshape(28, 28))
plt.show()
# predict
classes = model.predict_classes(x)
print(classes[0])
exit(classes[0])
Old scripts
# numpy with a specific seed for reproducibility
import numpy as np
np.random.seed(123)
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
from matplotlib import pyplot as plt
# load pre-shuffled MNIST data into train and test sets
# 60k samples of 28x28 images
((x_train, y_train), (x_test, y_test)) = mnist.load_data()
print(x_train.shape)
plt.imshow(x_train[0])
input("Press enter to continue...")
# preprocess input data, as Theano requires explicit depth:
# from shape (n, width, height) to (n, depth=1, width, height)
x_train = x_train.reshape(x_train.shape[0], 1, 28, 28)
x_test = x_test.reshape(x_test.shape[0], 1, 28, 28)
# convert our data type to float32 and normalize our data values to the range [0, 1]
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# convert 1-dimensional class arrays (with values 0-9)
# to 10-dimensional class matrices (with values 0-1)
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)
# define model architecture
model = Sequential()
# input_shape is the shape of each single parameter:
# here it is depth=1, width=28, height=28
# 32 convolution filters
# 3 rows in each convolution kernel
# 3 cols in each convolution kernel
model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1, 28, 28), data_format="channels_first"))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format="channels_first"))
# reduce the model's parameters by sliding a 2x2 pooling filter
# across the previous layer and taking the max of the 4 values in the 2x2 filter
model.add(MaxPooling2D(pool_size=(2,2)))
# regularize the model to prevent overfitting
model.add(Dropout(0.25))
# flatten (make 1-dimensional)
model.add(Flatten())
# dense: 128=output size of the layer
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
# final layer has size 10 for the 10 digits
model.add(Dense(10, activation='softmax'))
# compile model (loss function and optimizer)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# fit model on training data
model.fit(x_train, y_train,
batch_size=32, nb_epochs=10, verbose=1)
# evaluate model on test data
score = model.evaluate(x_test, y_test, verbose=0)
# save the model. Use load_model(path) to load it later
model.save('/projects/python/mnist/mnist.h5')
The consuming script:
# get the path to the image to be recognized from 1st arg
import sys
if len(sys.argv) < 2:
print('Missing image file path')
exit(-1)
# load the pretrained MNIST model
print('Loading model...')
import numpy as np
import keras
model = keras.models.load_model('/projects/python/mnist/mnist.h5')
# load the image ensuring that its size is 28x28
print('Loading image ' + sys.argv[1])
from keras.preprocessing import image
x = image.load_img(sys.argv[1], color_mode='grayscale', target_size=(28, 28))
# convert image to NumPy array
x = image.img_to_array(x, data_format='channels_first')
# add a major dimension to represent the number of samples, i.e. 1
x = x.reshape(1, 1, 28, 28)
# convert data type to float32 and normalize 0-255 values to range [0, 1]
x = x.astype('float32')
x /= 255
print(x.shape)
print(x)
# evaluate
classes = model.predict_classes(x)
print(classes[0])
exit(classes[0])
Related
I trained a neural network using gtsrb and trying to use the saved model(.h5) but getting the following error when using model.predict:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 3)
the training code:
import cv2
import numpy as np
import os
import sys
import tensorflow as tf
from sklearn.model_selection import train_test_split
EPOCHS = 10
IMG_WIDTH = 30
IMG_HEIGHT = 30
NUM_CATEGORIES = 43
TEST_SIZE = 0.4
def main():
# Check command-line arguments
if len(sys.argv) not in [2, 3]:
sys.exit("Usage: python traffic.py data_directory [model.h5]")
# Get image arrays and labels for all image files
images, labels = load_data(sys.argv[1])
# Split data into training and testing sets
labels = tf.keras.utils.to_categorical(labels)
x_train, x_test, y_train, y_test = train_test_split(
np.array(images), np.array(labels), test_size=TEST_SIZE
)
# Get a compiled neural network
model = get_model()
# Fit model on training data
model.fit(x_train, y_train, epochs=EPOCHS)
# Evaluate neural network performance
model.evaluate(x_test, y_test, verbose=2)
# Save model to file
if len(sys.argv) == 3:
filename = sys.argv[2]
model.save(filename)
print(f"Model saved to {filename}.")
def load_data(data_dir):
"""
Load image data from directory `data_dir`.
Assume `data_dir` has one directory named after each category, numbered
0 through NUM_CATEGORIES - 1. Inside each category directory will be some
number of image files.
Return tuple `(images, labels)`. `images` should be a list of all
of the images in the data directory, where each image is formatted as a
numpy ndarray with dimensions IMG_WIDTH x IMG_HEIGHT x 3. `labels` should
be a list of integer labels, representing the categories for each of the
corresponding `images`.
"""
images = list()
labels = list()
for folder in os.listdir(data_dir):
folder_path = os.path.join(data_dir, folder)
if os.path.isdir(folder_path):
for photo in os.listdir(folder_path):
img = cv2.imread(os.path.join(folder_path, photo))
img = cv2.resize(img, (IMG_WIDTH, IMG_HEIGHT))
images.append(img)
labels.append(int(folder))
return images, labels
def get_model():
"""
Returns a compiled convolutional neural network model. Assume that the
`input_shape` of the first layer is `(IMG_WIDTH, IMG_HEIGHT, 3)`.
The output layer should have `NUM_CATEGORIES` units, one for each category.
"""
# Create a convolutional neural network
model = tf.keras.models.Sequential([
# Convolutional layer. Learn 32 filters using a 3x3 kernel
tf.keras.layers.Conv2D(
32, (3, 3), activation="relu", input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
),
# Max-pooling layer, using 2x2 pool size
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(
32, (3, 3), activation="relu", input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
),
# Flatten units
tf.keras.layers.Flatten(),
# Add a hidden layer with dropout
tf.keras.layers.Dense(NUM_CATEGORIES * 16, activation="relu"),
tf.keras.layers.Dropout(0.3),
# Add an output layer with output units for all 10 digits
tf.keras.layers.Dense(NUM_CATEGORIES, activation="softmax")
])
# Train neural network
model.compile(
optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"]
)
return model
if __name__ == "__main__":
main()
the code to use the saved model to classify a photo
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy
import cv2
PHOTO_PATH = r'C:\Users\m\ai50\projects\2020\x\traffic\gtsrb\x.ppm'
MODEL_PATH = r'C:\Users\m\ai50\projects\2020\x\traffic\model.h5'
model = load_model(MODEL_PATH, compile=True)
img = cv2.imread(PHOTO_PATH)
img = cv2.resize(img, (30, 30))
img = numpy.array(img)
print(img.shape)
result = model.predict(img)
print(result)
Add a dimension to your image array. The .predict() method expects input with four dimensions: (batch, height, width, channels). Adding a dimension effectively makes this a batch with one image.
result = model.predict(img[numpy.newaxis])
I've watched a tutorial about image recognition in Python, and used written code for training a network. It compiles and learning fine, but how to use it for prediction on new images? Maybe something like: model.predict(y)?
Here is the code:
import numpy
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Flatten, Activation
from keras.layers import Dropout
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras.optimizers import SGD
numpy.random.seed(42)
#Loading data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
batch_size = 32
nb_classes = 10
#Number of epochs
epochNumber = 25
#Image size
img_rows, img_cols = 32, 32
#RGB
img_channels = 3
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
#To catogories
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
#Creating a model
model = Sequential()
#Adding layers
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=(32, 32, 3), activation='relu'))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes, activation='softmax'))
#Optimization
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
#Training model
model.fit(X_train, Y_train,
batch_size=batch_size,
epochs=epochNumber,
validation_split=0.1,
shuffle=True,
verbose=2)
scores = model.evaluate(X_test, Y_test, verbose=0)
print("Accuracy on test data: %.2f%%" % (scores[1]*100))
Then, what to do to predict?
target = "C://Users//Target.png"
print(model.predict(target))
How to correctly use model.predict and how to convert result to user-friendly output?
Note: if you are using keras package instead of tf.keras, replace tf.keras with keras in all the following code snippets.
To load a single image, you can use tf.keras.preprocessing.image.load_img:
image = tf.keras.preprocessing.image.load_img(image_path, target_size=(img_rows, img_cols))
This would load the image into PIL format; therefore, we need to convert it to numpy array before feeding it to our model:
import numpy as np
input_arr = tf.keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
Now, you might make a mistake by rushing into using predict method on input_arr. However, you should first perform the same preprocessing steps of training phase in prediction phase as well:
input_arr = input_arr.astype('float32') / 255. # This is VERY important
Now, it's ready to be given to the model for prediction:
predictions = model.predict(input_arr)
Bonus: Since your model is a classifier and it's using Softmax activation at the top, the predictions variable would contain the probabilities for each class. To find out the predicted class, we use argmax from Numpy to find the index of the class with the highest probability:
predicted_class = np.argmax(predictions, axis=-1)
you can use cv2 to read in the image. You want to make sure that what ever processing you did on the input image in training you also do on the image you read in with CV2. Be careful CV2 reads images in BGR format. If you trained your model on rgb images you need to convert the cv2 image to rgb as shown in the code below.Then you want to make the image 32 X 32 X3 so if it is not that size use cv2 to resize the image. I assume you rescaled your training images so you need to rescale the cv2 image as well. Code is below
import cv2
img=cv2.imread(f_path) # where f_path is the path to the image file
img=cv2.resize(img, (32,32), interpolation = cv2.INTER_AREA)
img=img/255
# CV2 inputs images in BGR format in general when you train a model you may have
#trained it with images in rgb format. If so you need to convert the cv2 image.
#uncomment the line below if that is the case.
#img=img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
predictions=model.predict(img)
pre_class=predictions.argmax()
# this will give you an integer value
I am a total beginner in keras i implemented following code in keras, i found this code on web and successfully trained it with 97 % accuracy. I am getting little bit problem during Prediction.
The following code for training:
from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, Adam
from keras.utils import np_utils
import numpy as np
#seed = 7
#np.random.seed(seed)
batch_size = 50
nb_classes = 10
nb_epoch = 150
data_augmentation = False
# input image dimensions
img_rows, img_cols = 32, 32
# the CIFAR10 images are RGB
img_channels = 3
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
# let's train the model using SGD + momentum (how original).
#sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
sgd= Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, Y_test),
shuffle=True)
else:
print('Using real-time data augmentation.')
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)
# fit the model on the batches generated by datagen.flow()
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch,
validation_data=(X_test, Y_test))
model.save('model3.h5')
The model was saved successfully and i implemented this following Prediction code.
Code for Prediction:
import keras
import tensorflow as tf
import h5py
from keras.models import load_model
import cv2
import numpy as np
model = load_model('model3.h5')
print('Model Loaded')
dim = (32,32)
img = cv2.imread('download.jpg')
img = cv2.resize(img,dim)
Array = [np.array(img)]
Prediction = model.predict(Array)
print(Prediction)
Error generated:
Using TensorFlow backend.
Model Loaded
Traceback (most recent call last):
File "E:\Prediction\Prediction.py", line 16, in <module>
Prediction = model.predict(Array)
File "C:\Users\Dilip\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File "C:\Users\Dilip\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "C:\Users\Dilip\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (32, 32, 3)
>>>
I know here that some problem is generated for not being in a proper shape of the input image i tried to convert it into (1,32,32,3) but i failed !!
Help here please.
It appears you are missing the classes in your code for prediction. Try this instead:
import cv2
import tensorflow as tf
#write the 10 classes here nb_classes
CATEGORIES = ['1','2','3','4','5','6','7','8','9','10']
def prepare(filepath):
IMG_SIZE = 32
img_array = cv2.imread(filepath, cv2.IMREAD_COLOR)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
return new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 3) #img_channels = 3
model = tf.keras.models.load_model('model3.h5')
prediction = model.predict([prepare('download.jpg')])
print(CATEGORIES[int(prediction[0][0])])
I want to use the code below with my own input images instead of the mnist images. However I am having a hard time inputting several color .jpg images into a numpy array similar to the X_train used in the code below. I have a folder called data with another folder called train that includes several images that I would like to use as my X_train. I can generate the labels for them and one-hot encode them. I just don't know how to make all my images go into a nice array like X_train. Help? I did look here, but I got a Value Error: setting an array element with a sequence when I just copied and pasted the summarized code.
import numpy as np
np.random.seed(123) # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
# 7. Define model architecture
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28)))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
# 8. Compile model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# 9. Fit model on training data
model.fit(X_train, Y_train,
batch_size=32, nb_epoch=10, verbose=1)
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)
I am assuming that you are using Theano and that your jpgs have 3 bands. In addition, your jpgs should have the same input shape as the input shape that you indicate in the first convolutional model (28x28 pixels). In that case you can reshape all your jpes with the following lines:
#create random data
no_of_jpgs = 10
jpgs = [np.random.randint(0,255,(28,28,3)) for i in range(no_of_jpgs)]
jpgs = np.array(jpgs)
#reshape data
jpgs.reshape(no_of_jpgs, jpgs.shape[1], jpgs.shape[2], 3)
now you have an array with: (features, n_bands, x, y)
In addition you should change your input_shape so that it supports 3 bands:
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(3, 28,28)))
If you have jpgs with a different shape or more bands, just change the input_shape values in the first convolutional layer.
I am pretty new to machine learning so I am playing around with examples and such.
The image size specified in the code is (28,28)
But for some reason I keep getting the same ValueError
I cant figure out why this is happening.
Here's the code:
import pandas as pd
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils import np_utils
# input image dimensions
img_rows, img_cols = 28, 28
batch_size = 128 # Number of images used in each optimization step
nb_classes = 10 # One class per digit
nb_epoch = 35 # Number of times the whole data is used to learn
# Read the train and test datasets
train = pd.read_csv("../input/train.csv").values
test = pd.read_csv("../input/test.csv").values
# Reshape the data to be used by a Theano CNN. Shape is
# (nb_of_samples, nb_of_color_channels, img_width, img_heigh)
X_train = train[:, 1:].reshape(train.shape[0], 1, img_rows, img_cols)
X_test = test.reshape(test.shape[0], 1, img_rows, img_cols)
y_train = train[:, 0] # First data is label (already removed from X_train)
# Make the value floats in [0;1] instead of int in [0;255]
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# convert class vectors to binary class matrices (ie one-hot vectors)
Y_train = np_utils.to_categorical(y_train, nb_classes)
#Display the shapes to check if everything's ok
print('X_train shape:', X_train.shape)
print('Y_train shape:', Y_train.shape)
print('X_test shape:', X_test.shape)
model = Sequential()
# For an explanation on conv layers see http://cs231n.github.io/convolutional-networks/#conv
# By default the stride/subsample is 1
# border_mode "valid" means no zero-padding.
# If you want zero-padding add a ZeroPadding layer or, if stride is 1 use border_mode="same"
model.add(Convolution2D(12, 5, 5, border_mode='valid',input_shape=(1,img_rows, img_cols)))
model.add(Activation('relu'))
# For an explanation on pooling layers see http://cs231n.github.io/convolutional-networks/#pool
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.15))
model.add(Convolution2D(24, 5, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.15))
# Flatten the 3D output to 1D tensor for a fully connected layer to accept the input
model.add(Flatten())
model.add(Dense(180))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes)) #Last layer with one output per class
model.add(Activation('softmax')) #We want a score simlar to a probability for each class
# The function to optimize is the cross entropy between the true label and the output (softmax) of the model
# We will use adadelta to do the gradient descent see http://cs231n.github.io/neural-networks-3/#ada
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=["accuracy"])
# Make the model learn
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)
# Predict the label for X_test
yPred = model.predict_classes(X_test)
# Save prediction in file for Kaggle submission
np.savetxt('mnist-pred.csv', np.c_[range(1,len(yPred)+1),yPred], delimiter=',', header = 'ImageId,Label', comments = '', fmt='%d')
So the problem is with the convolution sizes used. Convolution operations usually reduce dimension of the image. Similarly - each pooling operation reduces the size. You have very small images yet applied model architecture which has been designed for a bigger ones, thus at some point, after one of the convolutions/poolings you actually have a smaller outputed image than a following filter size, and this is an ill-defined operation.
To temporarly fix the problem - remove second convolution and maxpooling layers, since these operations (with parameters provided) cannot be performed on such small data. In general you should first understand how convolution works, and not apply someone elses model, since the parameters are crucial for good performance - if you apply transformations which reduce resolution to much - you will be unable to learn anything. Thus once you have some intuition how convolution works you can go back and try different architectures, but there is no one, "magical" equation to figure out the architecture, thus I cannot provide you with parameters that will "just work" - start with removing this additional convolution and pooling, and than go back and try other possibilities once you have better understanding of your data and model.