I already posted this question on CrossValidated, but thought the StackOverflow community, being bigger, might be able to answer this question faster.
I'd like to build a model that can output results for several multi-class classification problems at once. Suppose you have diagnostic data about a product that needs to be repaired and you want to predict the quantity of various part numbers that will be needed to repair the product. The input data is the same for all part numbers to be predicted.
Here's a concrete example. You have 2 part numbers that can get replaced, part A and part B. For part A you can replace 0, 1, 2, or 3 of them on the product. For part B you can replace 0, 2 or 4 (replaced in pairs). How can a Tensorflow/Keras Neural Network be configured to have outputs such that the probabilities of replacing part A 0, 1, 2, and 3 times sum to 1. With similar behavior for part B (probabilities sum to 1).
Simple code like the code below would treat all of the values as coming from the same discrete probability distribution. How can this be modified to create 2 discrete probability distributions in the output:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(7, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
UPDATE
Based on the comment(s), will something like this work?
References this question
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten, Concatenate
from mypackage import get_my_data, compiler_args
data = get_my_data() # obviously, this is a stand-in for however you get your data.
input_layer = Input(data.shape[1:])
hidden = Flatten()(input_layer)
hidden = Dense(192, activation='relu')(hidden)
main_output = Dense(192, activation='relu')(hidden)
# I'm going to build each individual parallel set of layers separately
part_a = Dense(10, activation='relu')(main_output)
output_a = Dense(4, activation='softmax')(part_a) # multi-class classification for part A
part_b = Dense(10, activation='relu')(main_output) # note that it is main_output again
output_b = Dense(3, activation='softmax')(part_b) # multi-class classification for part B
final_output = Concatenate()([output_a, output_b]) # Combine the outputs into final output layer
model = tf.keras.Model(input_layer, final_output)
model.compile(**compiler_args)
model.summary()
Related
It seems Sequential and Model([input],[output]) have the same results when I just build a model layer by layer.
However, when I use the following two models with the same input, they give me different results.By the way,the input shape is (None, 15, 2) ande the output shape is (None, 1, 2).
Sequential model:
model = tf.keras.Sequential(
[
tf.keras.layers.Conv1D(filters = 4, kernel_size =7, activation = "relu"),
tf.keras.layers.Conv1D(filters = 6, kernel_size = 11, activation = "relu"),
tf.keras.layers.LSTM(100, return_sequences=True,activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(100,activation='relu'),
tf.keras.layers.Dense(2,activation='relu'),
tf.keras.layers.Reshape((1,2))
]
)
Model([input],[output]) model
input_layer = tf.keras.layers.Input(shape=(LOOK_BACK, 2))
conv = tf.keras.layers.Conv1D(filters=4, kernel_size=7, activation='relu')(input_layer)
conv = tf.keras.layers.Conv1D(filters=6, kernel_size=11, activation='relu')(conv)
lstm = tf.keras.layers.LSTM(100, return_sequences=True, activation='relu')(conv)
dropout = tf.keras.layers.Dropout(0.2)(lstm)
lstm = tf.keras.layers.LSTM(100, activation='relu')(dropout)
dense = tf.keras.layers.Dense(2, activation='relu')(lstm)
output_layer = tf.keras.layers.Reshape((1,2))(dense)
model = tf.keras.models.Model([input_layer], [output_layer])
the result of Sequential model:
mse: 21.679258038588586
rmse: 4.65609901511862
mae: 3.963341420395535
And the result of Model([input],[output]) model:
mse: 36.85855652774293
rmse: 6.071124815694612
mae: 4.4878270279889065
The Sequence version uses the Sequencial model while the Model([inputs], [outputs]) uses the Functional API.
The first is easier to use, but only works for single-input single-output feed forward models (in the sense of Keras layers).
The second is more complex but get rid of those constraints, allowing to create many more models.
So, your main point is right: any sequencial model can be re-written as a functional model. You can double check this by comparing the architectures with the usage of summary function and plotting the models.
However, this only shows that architectures are the same, but not the weights!
Assuming you are fitting both models with same data and same compile and fit params (by the way, include those in your question), there is lots of randomness in the training process which may lead to different results. So, try the following to compare them better:
remove as much randomness as possible by setting seeds, in your code and for each layer instantiation.
avoid using data augmentation if using it.
use the same validation/train split for both models: to be sure, you can split the dataset yourself.
do not use shuffling in data generators nor during the training.
Here you can read more about producing reproducible results in keras.
Even after following those tips, your results may not be deterministic, and hence not the same, so finally, and maybe more important: do not compare single run: train and eval each model several times (for instance, 20) and then compare the average MAE with it's standard deviation.
If after all this your results are still so different, please, update your question with them.
I have around 20k images of different domains with the features already extracted using GLCM and HOG . The dimensions of features are around 2000 for each image. I want to find similarity between features using Siamese network.I stored all in a dataframe. I'm not sure how we can give input features to neural net.
There is only one possibilty of using 1DCNN / Dense layers.
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(units=1024, activation=activations.relu, input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation=activations.relu))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation=activations.relu))
encoder.add(layers.Dropout(0.1))
In this above code we only we give number of features as input to encoder, But number of features for my both images are same.
Should I train two encoders separately and join them at the end to form a embedding layer?
But how should I test?
For a siamese network you would want to have one network, and train it on different sets of data.
So say you have two sets of data X0 and X1 that have the same shape, you would do
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.python.keras.utils import losses_utils
# number of features
n_features = 2000
# fake data w/batch size 4
X0 = tf.random.normal([4, n_features])
X1 = tf.random.normal([4, n_features])
# siamese encoder model
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))
# send both sets of data through same model
enc0 = encoder(X0)
enc1 = encoder(X1)
# compare the two outputs
compared = tf.keras.losses.CosineSimilarity(
reduction=losses_utils.ReductionV2.NONE)(enc0, enc1)
print(f"cosine similarity of output: {compared.numpy()}")
# cosine similarity of output: [-0.5785658, -0.6405066, -0.57274437, -0.6017716]
# now do optimization ...
There are numerous way to compare the output, cosine similarity being one of them, but I just included it for illustration and you may require some other metric.
There is only one network which is just duplicated. All weights are shared. so you are training one network, just run it twice at each step of learning.
you should pick two sample from your dataset and label it to 1 if came from same class and 0 otherwise.
from tensorflow.keras import models
from tensorflow.keras import layers
import tensorflow.keras.backend as K
n_features = 2000
def cos_similarity(x):
x1,x2 = x
return K.sum(x1*x2)/(K.sqrt(K.sum(x1*x1))*K.sqrt(K.sum(x2*x2)))
inp1 = layers.Input(shape=(n_features))
inp2 = layers.Input(shape=(n_features))
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))
out1 = encoder(inp1)
out2 = encoder(inp2)
similarity = layers.Lambda(cos_similarity)([out1,out2])
model = models.Model(inputs=[inp1,inp2],outputs=[similarity])
model.compile(optimizer='adam',loss='mse')
For testing, first of all you should compute HOG features which you said it has 2000 features. Then run
model.predict(hog_feature)
and you have output feature.
By the way I recommend to do not use hog feature and siamese network. Extract image feature just using this network. change input shape and train with images.
I got the following data sample:
[1,2,1,4,5],[1,2,1,4,5],[0,2,7,0,1] with a label of [1,0,1]
....
[1,9,1,4,5],[1,5,1,4,5],[0,7,7,0,1] with a label of [0,1,1]
I can't train it on a single series of [1,2,1,4,5] with a label of 1 or 0, as the whole row got a meaningful context information to it, so the whole 15 input digits should be inferred together.
It's not your typical classification, and it doesn't seem as a regression issue either. Also, the data is not related to imagery, it's taken from a scientific domain.
Obviously I am feeding the data as a flat 15 input node to the net
model = Sequential(
[
Dense(units=16,input_shape = scaled_train_samples[0].shape,activation='relu'),
Dense(units=32,activation='relu'),
Dense(units=3,activation='???'),
])
Which activation output function would be ideal in such case?
I would recommend having 3 outputs to the network. Since the data can affect the 3 "sub-labels", the network only branches apart on the classification layer. If you want, you can add more layers to each specific branch.
I'm assuming that each "sub-label" is binary classification, so that's why I chose sigmoid (returns value from 0 to 1, so larger number means network thinks it's class 1 over class 0)
To do this, you would have to change to the Functional API like this:
from keras.layers import Input, Dense
from keras.models import Model
visible = Input(shape=(scaled_train_samples[0].shape))
model = Dense(16, input_shape = activation='relu')(visible)
model = Dense(32,activation='relu')(model)
model = Dense(16,activation='relu')(model)
out1 = Dense(units=1,activation='sigmoid',name='OUT1')(model)
out2 = Dense(units=1,activation='sigmoid',name='OUT2')(model)
out3 = Dense(units=1,activation='sigmoid',name='OUT3')(model)
finalModel = Model(inputs=visible outputs=[out1, out2, out3])
optimizer = Adam(learning_rate=.0001)
losses = {
'OUT1': 'binary_crossentropy',
'OUT2': 'binary_crossentropy',
'OUT3': 'binary_crossentropy',
}
model.compile(optimizer=optimizer, loss=losses, metrics={'OUT1':'accuracy', 'OUT2':'accuracy', 'OUT3':'accuracy'})
I'm trying to build a simple neural network to classify product images to different labels (product types). i.e, given a new product image tell which product category type (books, toys, electronics etc.) it belongs to.
I have a couple of product images under each product number and each product number has a label (i.e., product type) in a excel sheet.
Below is my code:
from sklearn.preprocessing import LabelEncoder
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers import Activation
from keras.optimizers import SGD
from keras.layers import Dense
from keras.utils import np_utils
from imutils import paths
import numpy as np
import argparse
import cv2
import os
import xlwt
import xlrd
import glob2
import pickle
def image_to_feature_vector(image, size=(32,32)):
return cv2.resize(image, size).flatten()
def read_data(xls = "/Desktop/num_to_product_type.xlsx"):
book = xlrd.open_workbook(xls)
sheet = book.sheet_by_index(0)
d = {}
for row_index in xrange(1, sheet.nrows): # skip heading row
prod_type, prod_num = sheet.row_values(row_index, end_colx=2)
prod_type = unicode(prod_type).encode('UTF8')
produ_num = unicode(prod_num).encode('UTF8')
d[prod_num] = prod_type
return d
def main():
try:
imagePaths=[]
print("[INFO] describing images...")
for path, subdirs, files in os.walk(r'/Desktop/data'):
for filename in files:
imagePaths.append(os.path.join(path, filename))
files = glob2.glob('/Desktop/data/**/.DS_Store')
for i in files:
imagePaths.remove(i)
except:
pass
dd = read_data()
# initialize the data matrix and labels list
data = []
labels1 = []
for (i, imagePath) in enumerate(imagePaths):
image = cv2.imread(imagePath)
#print(image.shape)
subdir = imagePath.split('/')[-2]
for k, v in dd.items():
if k == subdir:
label = v
break
features = image_to_feature_vector(image)
data.append(features)
labels1.append(label)
# show an update every 1,000 images
if i > 0 and i % 1000 == 0:
print("[INFO] processed {}/{}".format(i, len(imagePaths)))
print("String Labels")
print(labels1)
# encode the labels, converting them from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels1)
print(labels)
d={}
d[labels[0]] = labels1[0]
for i in range(1,len(labels)-1):
if labels[i-1] != labels[i] and labels[i] == labels[i+1]:
d[labels[i]] = labels1[i]
data = np.array(data) / 255.0
labels = np_utils.to_categorical(labels, 51)
print("To_Categorical")
print(labels)
print("[INFO] constructing training/testing split...")
(trainData, testData, trainLabels, testLabels) = train_test_split(
data, labels, test_size=0.25, random_state=42)
model = Sequential()
model.add(Dense(768, input_dim=3072, init="uniform",
activation="relu"))
model.add(Dense(384, init="uniform", activation="relu"))
model.add(Dense(51))
model.add(Activation("softmax"))
print("[INFO] compiling model...")
sgd = SGD(lr=0.125
)
model.compile(loss="categorical_crossentropy", optimizer=sgd,
metrics=["accuracy"])
model.fit(trainData, trainLabels, nb_epoch=50, batch_size=750)
# #Test the model
#show the accuracy on the testing set
print("[INFO] evaluating on testing set...")
(loss, accuracy) = model.evaluate(testData, testLabels,
batch_size=128, verbose=1)
print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss,
accuracy * 100))
if __name__ == '__main__':
main()
The neural network is a 3-2-3-51 feedforward neural network. Layer 0 contains 3 inputs. Layers 1 & 2 are hidden layers containing 2 & 3 nodes resp. Layer 3 is the output layer which has 51 nodes (i.e., for 51 product category type). However, with this I'm getting very low accuracy, only about 45-50%.
Is there something wrong that I'm doing? How do you increase the accuracy of the neural network? I read somewhere that it can be done by "crossvalidation and hyperparameter tuning" but how is it done? Sorry, I'm very new at neural network, just trying something new. Thanks.
Hyper-parameter validation
Why did you choose a 3-2-3-2 ANN instead of a 3-6-2 or 3-4-4-4-4-2?
Normally we don't know the exact topology (number of layers, number of neurons per layer, connections between neurons) that we need to reach an 80% accuracy or whatever makes us happy. This is where hyper-parameter training comes into play. With it, we tell our program to try with several different topologies until it finds one that is good enough for us.
How do you tell your program which topologies to try? we do it with another ANN or an Evolutionary Algorithm, which generate pseudo-random topologies, evaluates each one, and gives a score to each topology, then the topologies with higher scores are combined, and well, you know how it works.
Doing this will for sure help you increase your overall score (provided there is a good solution for your problem)
Cross validation
How do you know how many iterations to do in your algorithm? what is your stop criteria?
There is a recurring problem with ANN called memorization. If you run your learning algorithm for 1 million iterations you will normally get a better score than if you run it for just 10 iterations, but it can be due to memorization of your training set: you ANN learn only to predict the outcome of those training sets, but will do poorly trying to predict data it has not seen before.
One way of solving that problem is cross-validation, which means you will split your data in 2 groups: train and validation. Then you train your ANN only with your train set for as many iterations you want, but in parallel, you will test your ANN with the validation set to know when to stop. If after 10 iterations your train accuracy keeps going up, but your validation accuracy keeps going down, then you can determine your ANN is memorizing, so you will stop your learning algorithm and choose the ANN as it was 10 iterations ago.
Of course, 10 is just an example, you can try with different values of even put this in your hyper-parameter training s you don't need to hardcode the value.
I recommend you take a look at the materials of this course in Coursera where they explain very clearly concepts like these.
(BTW: normally you split your input set in 3: train, validate and test. Where test is used to see how your ANN will behave with totally unseen data, you don't use that test set to take any decision in your training)
For creating an image classifier in keras I would suggest trying a convolutional neural network as they tend to work much better for images. Also, normalizing between layers can help with accuracy during training which should help yield a better validation/test accuracy. (The same concept as normalizing data before training.)
For a keras convolutional layer simply call model.add(Conv2D(params)) and to normalize between layers you can call model.add(BatchNormalization())
Convolutional neural networks are more advanced but better suited for images. The difference being that a convolutional is at a high level just a "mini" neural network scanning over patches of the image. This is important because for example you can have the EXACT same object in two images, but if they are in different places in that image a normal neural network would view that as two different objects vs the same object in different places in the images...
So this "mini" neural network that scans the image in patches (often referred to as the kernel size) is more inclined to pick up on similar features of objects. The object features are then trained into the network so even if the object is present in different areas of your images it can be more accurately recognized as the same thing. This is the key to why a convolutional neural network is better for working with images.
Here is a basic example in keras 2 with normalization based off of an NVIDIA model architecture...
model = Sequential()
# crop the images to get rid of irrelevant features if needed...
model.add(Cropping2D(cropping=((0, 0), (0,0)), input_shape=("your_input_shape tuple x,y,rgb_depth")))
model.add(Lambda(lambda x: (x - 128) / 128)) # normalize all pixels to a mean of 0 +-1
model.add(Conv2D(24, (2,2), strides=(2,2), padding='valid', activation='elu')) # 1st convolution
model.add(BatchNormalization()) # normalize between layers
model.add(Conv2D(36, (2,2), strides=(2,2), padding='valid', activation='elu')) # 2nd convolution
model.add(BatchNormalization())
model.add(Conv2D(48, (1,1), strides=(2,2), padding='valid', activation='elu')) # 3rd convolution
model.add(BatchNormalization())
# model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
# model.add(BatchNormalization())
# model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
# model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten()) # flatten the dimensions
model.add(Dense(100, activation='elu')) # 1st fully connected layer
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(51, activation= 'softmax')) # label output as probabilites
Lastly, hyperparameter tuning is just adjusting batch sizes, epochs, learning rates etc to achieve the best result. All you can do there is experiment and see what works best.
I'm trying to combine two outputs that are produced by the same network that makes predictions on a 4 class task and a 10 class task. Then I look to combine these outputs to give a length 14 array which I use as my end target.
While this seems to work actively the predictions are always for one class so it produces a probability dist which is only concerned with selecting 1 out of the 14 options instead of 2. What I actually need it to do is to provide 2 predictions, one for each class. I want this all to be produced by the same model.
input = Input(shape=(100, 100), name='input')
lstm = LSTM(128, input_shape=(100, 100)))(input)
output1 = Dense(len(4), activation='softmax', name='output1')(lstm)
output2 = Dense(len(10), activation='softmax', name='output2')(lstm)
output3 = concatenate([output1, output2])
model = Model(inputs=[input], outputs=[output3])
My issue here is determining an appropriate loss function and method of prediction? For prediction I can simply grab the output of each layer after the softmax however I'm unsure how to set the loss function for each of these things to be trained.
Any ideas?
Thanks a lot
You don't need to concatenate the outputs, your model can have two outputs:
input = Input(shape=(100, 100), name='input')
lstm = LSTM(128, input_shape=(100, 100)))(input)
output1 = Dense(len(4), activation='softmax', name='output1')(lstm)
output2 = Dense(len(10), activation='softmax', name='output2')(lstm)
model = Model(inputs=[input], outputs=[output1, output2])
Then to train this model, you typically use two losses that are weighted to produce a single loss:
model.compile(optimizer='sgd', loss=['categorical_crossentropy',
'categorical_crossentropy'], loss_weights=[0.2, 0.8])
Just make sure to format your data right, as now each input sample corresponds to two output labeled samples. For more information check the Functional API Guide.