Siamese network for feature similarity

Siamese network for feature similarity - python

I have around 20k images of different domains with the features already extracted using GLCM and HOG . The dimensions of features are around 2000 for each image. I want to find similarity between features using Siamese network.I stored all in a dataframe. I'm not sure how we can give input features to neural net.
There is only one possibilty of using 1DCNN / Dense layers.
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(units=1024, activation=activations.relu, input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation=activations.relu))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation=activations.relu))
encoder.add(layers.Dropout(0.1))
In this above code we only we give number of features as input to encoder, But number of features for my both images are same.
Should I train two encoders separately and join them at the end to form a embedding layer?
But how should I test?

For a siamese network you would want to have one network, and train it on different sets of data.
So say you have two sets of data X0 and X1 that have the same shape, you would do
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.python.keras.utils import losses_utils
# number of features
n_features = 2000
# fake data w/batch size 4
X0 = tf.random.normal([4, n_features])
X1 = tf.random.normal([4, n_features])
# siamese encoder model
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))
# send both sets of data through same model
enc0 = encoder(X0)
enc1 = encoder(X1)
# compare the two outputs
compared = tf.keras.losses.CosineSimilarity(
reduction=losses_utils.ReductionV2.NONE)(enc0, enc1)
print(f"cosine similarity of output: {compared.numpy()}")
# cosine similarity of output: [-0.5785658, -0.6405066, -0.57274437, -0.6017716]
# now do optimization ...
There are numerous way to compare the output, cosine similarity being one of them, but I just included it for illustration and you may require some other metric.

There is only one network which is just duplicated. All weights are shared. so you are training one network, just run it twice at each step of learning.
you should pick two sample from your dataset and label it to 1 if came from same class and 0 otherwise.
from tensorflow.keras import models
from tensorflow.keras import layers
import tensorflow.keras.backend as K
n_features = 2000
def cos_similarity(x):
x1,x2 = x
return K.sum(x1*x2)/(K.sqrt(K.sum(x1*x1))*K.sqrt(K.sum(x2*x2)))
inp1 = layers.Input(shape=(n_features))
inp2 = layers.Input(shape=(n_features))
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))
out1 = encoder(inp1)
out2 = encoder(inp2)
similarity = layers.Lambda(cos_similarity)([out1,out2])
model = models.Model(inputs=[inp1,inp2],outputs=[similarity])
model.compile(optimizer='adam',loss='mse')
For testing, first of all you should compute HOG features which you said it has 2000 features. Then run
model.predict(hog_feature)
and you have output feature.
By the way I recommend to do not use hog feature and siamese network. Extract image feature just using this network. change input shape and train with images.

Related

Multi-Multi-Class Classification in Tensorflow/Keras

I already posted this question on CrossValidated, but thought the StackOverflow community, being bigger, might be able to answer this question faster.
I'd like to build a model that can output results for several multi-class classification problems at once. Suppose you have diagnostic data about a product that needs to be repaired and you want to predict the quantity of various part numbers that will be needed to repair the product. The input data is the same for all part numbers to be predicted.
Here's a concrete example. You have 2 part numbers that can get replaced, part A and part B. For part A you can replace 0, 1, 2, or 3 of them on the product. For part B you can replace 0, 2 or 4 (replaced in pairs). How can a Tensorflow/Keras Neural Network be configured to have outputs such that the probabilities of replacing part A 0, 1, 2, and 3 times sum to 1. With similar behavior for part B (probabilities sum to 1).
Simple code like the code below would treat all of the values as coming from the same discrete probability distribution. How can this be modified to create 2 discrete probability distributions in the output:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(7, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
UPDATE
Based on the comment(s), will something like this work?
References this question
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten, Concatenate
from mypackage import get_my_data, compiler_args
data = get_my_data() # obviously, this is a stand-in for however you get your data.
input_layer = Input(data.shape[1:])
hidden = Flatten()(input_layer)
hidden = Dense(192, activation='relu')(hidden)
main_output = Dense(192, activation='relu')(hidden)
# I'm going to build each individual parallel set of layers separately
part_a = Dense(10, activation='relu')(main_output)
output_a = Dense(4, activation='softmax')(part_a) # multi-class classification for part A
part_b = Dense(10, activation='relu')(main_output) # note that it is main_output again
output_b = Dense(3, activation='softmax')(part_b) # multi-class classification for part B
final_output = Concatenate()([output_a, output_b]) # Combine the outputs into final output layer
model = tf.keras.Model(input_layer, final_output)
model.compile(**compiler_args)
model.summary()

Is it possible to create multiple instances of the same CNN that take in multiple images and are concatenated into a dense layer? (keras)

Similar to this question, I'm looking to have several image input layers that go through one larger CNN (e.g. XCeption minus dense layers), and then have the output of the one CNN across all images be concatenated into a dense layer.
Is this possible with Keras or is it even possible to train a network from the ground-up with this architecture?
I'm essentially looking to train a model that takes in a larger but fixed number of images per sample (i.e. 3+ image inputs with similar visual features), but not to explode the number of parameters by training several CNNs at once. The idea is to train only one CNN, that can be used for all the outputs. Having all images go into the same dense layers is important so the model can learn the associations across multiple images, which are always ordered based on their source.

You can easily achieve this using the Keras functional API the following way.
from tensorflow.python.keras import layers, models, applications
# Multiple inputs
in1 = layers.Input(shape=(128,128,3))
in2 = layers.Input(shape=(128,128,3))
in3 = layers.Input(shape=(128,128,3))
# CNN output
cnn = applications.xception.Xception(include_top=False)
out1 = cnn(in1)
out2 = cnn(in2)
out3 = cnn(in3)
# Flattening the output for the dense layer
fout1 = layers.Flatten()(out1)
fout2 = layers.Flatten()(out2)
fout3 = layers.Flatten()(out3)
# Getting the dense output
dense = layers.Dense(100, activation='softmax')
dout1 = dense(fout1)
dout2 = dense(fout2)
dout3 = dense(fout3)
# Concatenating the final output
out = layers.Concatenate(axis=-1)([dout1, dout2, dout3])
# Creating the model
model = models.Model(inputs=[in1,in2,in3], outputs=out)
model.summary()```

Training on sequences of sentences using Keras

I am working on a project where I have to use a combination of numeric and text data in a neural network to make predictions of a system's availability for the next hour. Instead of trying to use separate neural networks and doing something weird/unclear (to me) at the end to produce the desired output, I decided to use Keras' merge layer with two networks (one for numeric data, one for text). The idea is that I feed the model a sequence of performance metrics for the previous 6 hours in the shape of (batch_size, 6hrs, num_features). Alongside the input I am giving to the network that handles numeric data, I am giving the second network another sequence of the size (batch_size, max_alerts_per_sequence, max_sentence length).
Any sequence of numeric data within a time range can have a variable number of events (text data) associated with it. For the sake of simplicity, I only allow a maximum of 50 events to accompany a sequence of performance data. Each event is hash encoded by word and padded. I have tried using a flatten layer to reduce the input shape from (50, 30) to (1500) so that the model can train on every event in these "sequences" (to clarify: I pass the model 50 sentences with 30 encoded elements each for every sequence of performance data).
My question is: Due to the fact that I need the NN to look at all events for a given sequence of performance metrics, how can I make the NN for text based data train on sequences of sentences?
My Model:
#LSTM Module for performance metrics
input = Input(shape=(shape[1], shape[2]))
lstm1 = Bidirectional(LSTM(units=lstm_layer_count, activation='tanh', return_sequences=True, input_shape=shape))(input)
dropout1 = Dropout(rate=0.2)(lstm1)
lstm2 = Bidirectional(LSTM(units=lstm_layer_count, activation='tanh', return_sequences=False))(dropout1)
dropout2 = Dropout(rate=0.2)(lstm2)
#LSTM Module for text based data
tInput = Input(shape=(50, 30))
flatten = Flatten()(tInput)
embed = Embedding(input_dim=vocabsize + 1, output_dim= 50 * 30, input_length=30*50)(flatten)
magic = Bidirectional(LSTM(100))(embed)
tOut = Dense(1, activation='relu')(magic)
#Merge the layers
concat = Concatenate()([dropout2, tOut])
output = Dense(units=1, activation='sigmoid')(concat)
nn = keras.models.Model(inputs=[input, tInput], outputs = output)
opt = keras.optimizers.SGD(lr=0.1, momentum=0.8, nesterov=True, decay=0.001)
nn.compile(optimizer=opt, loss='mse', metrics=['accuracy', coeff_determination])

So as far as I understood you have a sequence of max 50 events, which you want to make predictions for. These events have text data attached, which can be treated as another sequence of word embeddings. Here is an article about a similar architecture.
I would propose a solution which involves LSTMs for the text part an 1D-convolution for the "real" sequence part. Every LSTM layer is concatenated with the numerical data. This involves 50 LSTM layers, which can be time consuming to train, even if you use shared weights. It would be also possible to use only convolution layers for the text part, which is faster, but does not model long term dependencies. (I have the experience, that these long term dependencies are often not that important in text mining).
Text -> LSTM or 1DConv -> concat with numerical data -> 1DConv -> Output
Here is some exmaple code, which shows how to do use shard weights
numeric_input = Input(shape=(x_numeric_train.values.shape[1],), name='numeric_input')
nlp_seq = Input(shape=(number_of_messages ,seq_length,), name='nlp_input'+str(i))
# shared layers
emb = TimeDistributed(Embedding(input_dim=num_features, output_dim=embedding_size,
input_length=seq_length, mask_zero=True,
input_shape=(seq_length, )))(nlp_seq)
x = TimeDistributed(Bidirectional(LSTM(32, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01))))(emb)
c1 = Conv1D(filter_size, kernel1, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p1 = GlobalMaxPooling1D()(c1)
c2 = Conv1D(filter_size, kernel2, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p2 = GlobalMaxPooling1D()(c2)
c3 = Conv1D(filter_size, kernel3, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p3 = GlobalMaxPooling1D()(c3)
x = concatenate([p1, p2, p3, numeric_input])
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[nlp_seq, meta_input] , outputs=[x])
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
And training:
model.fit([x_train, x_numeric_train], y_train)
# where x_train is a a array of num_samples * num_messages * seq_length
A complex model like this needs a lot of data to converge. For less data a simpler solution could be implemented by aggregating the events to have only one sequence. For example the text data of all events can be treated as one single text (with a separator token), instead of multiple texts, while the numerical data can be summed up, averaged or even combined into a fixed length list. But this depends on your data.
As I am working on something similar, I will update these answer with code later on.

How to handle variable sized input in CNN with Keras?

I am trying to perform the usual classification on the MNIST database but with randomly cropped digits.
Images are cropped the following way : removed randomly first/last and/or row/column.
I would like to use a Convolutional Neural Network using Keras (and Tensorflow backend) to perform convolution and then the usual classification.
Inputs are of variable size and i can't manage to get it to work.
Here is how I cropped digits
import numpy as np
from keras.utils import to_categorical
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.images
X = np.expand_dims(X, axis=3)
X_crop = list()
for index in range(len(X)):
X_crop.append(X[index, np.random.randint(0,2):np.random.randint(7,9), np.random.randint(0,2):np.random.randint(7,9), :])
X_crop = np.array(X_crop)
y = to_categorical(digits.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_crop, y, train_size=0.8, test_size=0.2)
And here is the architecture of the model I want to use
from keras.layers import Dense, Dropout
from keras.layers.convolutional import Conv2D
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=10,
kernel_size=(3,3),
input_shape=(None, None, 1),
data_format='channels_last'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test))
Does someone have an idea on how to handle variable sized input in my neural network?
And how to perform classification?

TL/DR - go to point 4
So - before we get to the point - let's fix some problems with your network:
Your network will not work because of activation: with categorical_crossentropy you need to have a softmax activation:
model.add(Dense(10, activation='softmax'))
Vectorize spatial tensors: as Daniel mentioned - you need to, at some stage, switch your vectors from spatial (images) to vectorized (vectors). Currently - applying Dense to output from a Conv2D is equivalent to (1, 1) convolution. So basically - output from your network is spatial - not vectorized what causes dimensionality mismatch (you can check that by running your network or checking the model.summary(). In order to change that you need to use either GlobalMaxPooling2D or GlobalAveragePooling2D. E.g.:
model.add(Conv2D(filters=10,
kernel_size=(3, 3),
input_shape=(None, None, 1),
padding="same",
data_format='channels_last'))
model.add(GlobalMaxPooling2D())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
Concatenated numpy arrays need to have the same shape: if you check the shape of X_crop you'll see that it's not a spatial matrix. It's because you concatenated matrices with different shapes. Sadly - it's impossible to overcome this issue as numpy.array need to have a fixed shape.
How to make your network train on examples of different shape: The most important thing in doing this is to understand two things. First - is that in a single batch every image should have the same size. Second - is that calling fit multiple times is a bad idea - as you reset inner model states. So here is what needs to be done:
a. Write a function which crops a single batch - e.g. a get_cropped_batches_generator which given a matrix cuts a batch out of it and crops it randomly.
b. Use train_on_batch method. Here is an example code:
from six import next
batches_generator = get_cropped_batches_generator(X, batch_size=16)
losses = list()
for epoch_nb in range(nb_of_epochs):
epoch_losses = list()
for batch_nb in range(nb_of_batches):
# cropped_x has a different shape for different batches (in general)
cropped_x, cropped_y = next(batches_generator)
current_loss = model.train_on_batch(cropped_x, cropped_y)
epoch_losses.append(current_loss)
losses.append(epoch_losses.sum() / (1.0 * len(epoch_losses))
final_loss = losses.sum() / (1.0 * len(losses))
So - a few comments to code above: First, train_on_batch doesn't use nice keras progress bar. It returns a single loss value (for a given batch) - that's why I added logic to compute loss. You could use Progbar callback for that also. Second - you need to implement get_cropped_batches_generator - I haven't written a code to keep my answer a little bit more clear. You could ask another question on how to implement it. Last thing - I use six to keep compatibility between Python 2 and Python 3.

Usually, a model containing Dense layers cannot have variable size inputs, unless the outputs are also variable. But see the workaround and also the other answer using GlobalMaxPooling2D - The workaround is equivalent to GlobalAveragePooling2D. These are layers that can eliminiate the variable size before a Dense layer and suppress the spatial dimensions.
For an image classification case, you may want to resize the images outside the model.
When my images are in numpy format, I resize them like this:
from PIL import Image
im = Image.fromarray(imgNumpy)
im = im.resize(newSize,Image.LANCZOS) #you can use options other than LANCZOS as well
imgNumpy = np.asarray(im)
Why?
A convolutional layer has its weights as filters. There is a static filter size, and the same filter is applied to the image over and over.
But a dense layer has its weights based on the input. If there is 1 input, there is a set of weights. If there are 2 inputs, you've got twice as much weights. But weights must be trained, and changing the amount of weights will definitely change the result of the model.
As #Marcin commented, what I've said is true when your input shape for Dense layers has two dimensions: (batchSize,inputFeatures).
But actually keras dense layers can accept inputs with more dimensions. These additional dimensions (which come out of the convolutional layers) can vary in size. But this would make the output of these dense layers also variable in size.
Nonetheless, at the end you will need a fixed size for classification: 10 classes and that's it. For reducing the dimensions, people often use Flatten layers, and the error will appear here.
A possible fishy workaround (not tested):
At the end of the convolutional part of the model, use a lambda layer to condense all the values in a fixed size tensor, probably taking a mean of the side dimensions and keeping the channels (channels are not variable)
Suppose the last convolutional layer is:
model.add(Conv2D(filters,kernel_size,...))
#so its output shape is (None,None,None,filters) = (batchSize,side1,side2,filters)
Let's add a lambda layer to condense the spatial dimensions and keep only the filters dimension:
import keras.backend as K
def collapseSides(x):
axis=1 #if you're using the channels_last format (default)
axis=-1 #if you're using the channels_first format
#x has shape (batchSize, side1, side2, filters)
step1 = K.mean(x,axis=axis) #mean of side1
return K.mean(step1,axis=axis) #mean of side2
#this will result in a tensor shape of (batchSize,filters)
Since the amount of filters is fixed (you have kicked out the None dimensions), the dense layers should probably work:
model.add(Lambda(collapseSides,output_shape=(filters,)))
model.add(Dense.......)
.....
In order for this to possibly work, I suggest that the number of filters in the last convolutional layer be at least 10.
With this, you can make input_shape=(None,None,1)
If you're doing this, remember that you can only pass input data with a fixed size per batch. So you have to separate your entire data in smaller batches, each batch having images all of the same size. See here: Keras misinterprets training data shape

How to increase accuracy of neural networks

I'm trying to build a simple neural network to classify product images to different labels (product types). i.e, given a new product image tell which product category type (books, toys, electronics etc.) it belongs to.
I have a couple of product images under each product number and each product number has a label (i.e., product type) in a excel sheet.
Below is my code:
from sklearn.preprocessing import LabelEncoder
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers import Activation
from keras.optimizers import SGD
from keras.layers import Dense
from keras.utils import np_utils
from imutils import paths
import numpy as np
import argparse
import cv2
import os
import xlwt
import xlrd
import glob2
import pickle
def image_to_feature_vector(image, size=(32,32)):
return cv2.resize(image, size).flatten()
def read_data(xls = "/Desktop/num_to_product_type.xlsx"):
book = xlrd.open_workbook(xls)
sheet = book.sheet_by_index(0)
d = {}
for row_index in xrange(1, sheet.nrows): # skip heading row
prod_type, prod_num = sheet.row_values(row_index, end_colx=2)
prod_type = unicode(prod_type).encode('UTF8')
produ_num = unicode(prod_num).encode('UTF8')
d[prod_num] = prod_type
return d
def main():
try:
imagePaths=[]
print("[INFO] describing images...")
for path, subdirs, files in os.walk(r'/Desktop/data'):
for filename in files:
imagePaths.append(os.path.join(path, filename))
files = glob2.glob('/Desktop/data/**/.DS_Store')
for i in files:
imagePaths.remove(i)
except:
pass
dd = read_data()
# initialize the data matrix and labels list
data = []
labels1 = []
for (i, imagePath) in enumerate(imagePaths):
image = cv2.imread(imagePath)
#print(image.shape)
subdir = imagePath.split('/')[-2]
for k, v in dd.items():
if k == subdir:
label = v
break
features = image_to_feature_vector(image)
data.append(features)
labels1.append(label)
# show an update every 1,000 images
if i > 0 and i % 1000 == 0:
print("[INFO] processed {}/{}".format(i, len(imagePaths)))
print("String Labels")
print(labels1)
# encode the labels, converting them from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels1)
print(labels)
d={}
d[labels[0]] = labels1[0]
for i in range(1,len(labels)-1):
if labels[i-1] != labels[i] and labels[i] == labels[i+1]:
d[labels[i]] = labels1[i]
data = np.array(data) / 255.0
labels = np_utils.to_categorical(labels, 51)
print("To_Categorical")
print(labels)
print("[INFO] constructing training/testing split...")
(trainData, testData, trainLabels, testLabels) = train_test_split(
data, labels, test_size=0.25, random_state=42)
model = Sequential()
model.add(Dense(768, input_dim=3072, init="uniform",
activation="relu"))
model.add(Dense(384, init="uniform", activation="relu"))
model.add(Dense(51))
model.add(Activation("softmax"))
print("[INFO] compiling model...")
sgd = SGD(lr=0.125
)
model.compile(loss="categorical_crossentropy", optimizer=sgd,
metrics=["accuracy"])
model.fit(trainData, trainLabels, nb_epoch=50, batch_size=750)
# #Test the model
#show the accuracy on the testing set
print("[INFO] evaluating on testing set...")
(loss, accuracy) = model.evaluate(testData, testLabels,
batch_size=128, verbose=1)
print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss,
accuracy * 100))
if __name__ == '__main__':
main()
The neural network is a 3-2-3-51 feedforward neural network. Layer 0 contains 3 inputs. Layers 1 & 2 are hidden layers containing 2 & 3 nodes resp. Layer 3 is the output layer which has 51 nodes (i.e., for 51 product category type). However, with this I'm getting very low accuracy, only about 45-50%.
Is there something wrong that I'm doing? How do you increase the accuracy of the neural network? I read somewhere that it can be done by "crossvalidation and hyperparameter tuning" but how is it done? Sorry, I'm very new at neural network, just trying something new. Thanks.

Hyper-parameter validation
Why did you choose a 3-2-3-2 ANN instead of a 3-6-2 or 3-4-4-4-4-2?
Normally we don't know the exact topology (number of layers, number of neurons per layer, connections between neurons) that we need to reach an 80% accuracy or whatever makes us happy. This is where hyper-parameter training comes into play. With it, we tell our program to try with several different topologies until it finds one that is good enough for us.
How do you tell your program which topologies to try? we do it with another ANN or an Evolutionary Algorithm, which generate pseudo-random topologies, evaluates each one, and gives a score to each topology, then the topologies with higher scores are combined, and well, you know how it works.
Doing this will for sure help you increase your overall score (provided there is a good solution for your problem)
Cross validation
How do you know how many iterations to do in your algorithm? what is your stop criteria?
There is a recurring problem with ANN called memorization. If you run your learning algorithm for 1 million iterations you will normally get a better score than if you run it for just 10 iterations, but it can be due to memorization of your training set: you ANN learn only to predict the outcome of those training sets, but will do poorly trying to predict data it has not seen before.
One way of solving that problem is cross-validation, which means you will split your data in 2 groups: train and validation. Then you train your ANN only with your train set for as many iterations you want, but in parallel, you will test your ANN with the validation set to know when to stop. If after 10 iterations your train accuracy keeps going up, but your validation accuracy keeps going down, then you can determine your ANN is memorizing, so you will stop your learning algorithm and choose the ANN as it was 10 iterations ago.
Of course, 10 is just an example, you can try with different values of even put this in your hyper-parameter training s you don't need to hardcode the value.
I recommend you take a look at the materials of this course in Coursera where they explain very clearly concepts like these.
(BTW: normally you split your input set in 3: train, validate and test. Where test is used to see how your ANN will behave with totally unseen data, you don't use that test set to take any decision in your training)

For creating an image classifier in keras I would suggest trying a convolutional neural network as they tend to work much better for images. Also, normalizing between layers can help with accuracy during training which should help yield a better validation/test accuracy. (The same concept as normalizing data before training.)
For a keras convolutional layer simply call model.add(Conv2D(params)) and to normalize between layers you can call model.add(BatchNormalization())
Convolutional neural networks are more advanced but better suited for images. The difference being that a convolutional is at a high level just a "mini" neural network scanning over patches of the image. This is important because for example you can have the EXACT same object in two images, but if they are in different places in that image a normal neural network would view that as two different objects vs the same object in different places in the images...
So this "mini" neural network that scans the image in patches (often referred to as the kernel size) is more inclined to pick up on similar features of objects. The object features are then trained into the network so even if the object is present in different areas of your images it can be more accurately recognized as the same thing. This is the key to why a convolutional neural network is better for working with images.
Here is a basic example in keras 2 with normalization based off of an NVIDIA model architecture...
model = Sequential()
# crop the images to get rid of irrelevant features if needed...
model.add(Cropping2D(cropping=((0, 0), (0,0)), input_shape=("your_input_shape tuple x,y,rgb_depth")))
model.add(Lambda(lambda x: (x - 128) / 128)) # normalize all pixels to a mean of 0 +-1
model.add(Conv2D(24, (2,2), strides=(2,2), padding='valid', activation='elu')) # 1st convolution
model.add(BatchNormalization()) # normalize between layers
model.add(Conv2D(36, (2,2), strides=(2,2), padding='valid', activation='elu')) # 2nd convolution
model.add(BatchNormalization())
model.add(Conv2D(48, (1,1), strides=(2,2), padding='valid', activation='elu')) # 3rd convolution
model.add(BatchNormalization())
# model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
# model.add(BatchNormalization())
# model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
# model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # flatten the dimensions
model.add(Dense(100, activation='elu')) # 1st fully connected layer
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(51, activation= 'softmax')) # label output as probabilites
Lastly, hyperparameter tuning is just adjusting batch sizes, epochs, learning rates etc to achieve the best result. All you can do there is experiment and see what works best.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.