How to choose dimensionality of the Dense layer in LSTM?

How to choose dimensionality of the Dense layer in LSTM? - python

I have a task of multi-label text classification. My dataset has 1369 classes:
# data shape
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
(54629, 500)
(23413, 500)
(54629, 1369)
(23413, 1369)
For this task, I've decided to use LSTM NN with the next parameters:
# define model
maxlen = 400
inp = Input(shape=(maxlen, ))
embed_size = 128
x = Embedding(max_features, embed_size)(inp)
x = LSTM(60, return_sequences=True,name='lstm_layer')(x)
x = GlobalMaxPool1D()(x)
x = Dropout(0.1)(x)
x = Dense(2000, activation="relu")(x)
x = Dropout(0.1)(x)
x = Dense(1369, activation="sigmoid")(x)
model = Model(inputs=inp, outputs=x)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
batch_size = 32
epochs = 2
model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
Question: Are there any scientific methods for determining Dense and LSTM dimensionality (in my example, LSTM dimension=60, I Dense dimension=2000, and II Dense dimension=1369)?
If there are no scientific methods, maybe there are some heuristics or tips on how to do this with data with similar dimension.
I randomly chose these parameters. I would like to improve the accuracy of the model and correctly approach to solving similar problems.

I heard that optimizing hyper parameters is an np problem, even there is a better way to do it, it may not worth it for your project given the overhead cost.
For the dimension of LSTM layer, I heard some empirically well working numbers from some conference talks, such as 128 or 256 units and 3 stacked layers. If you can plot your loss along training, and you saw the loss decrease dramatically in the first several epoch but then stopped decreasing, you may want to increase the capacity of your model. This means to make it either deeper or wider. Otherwise, should have less parameters as possible.
For the dimension of dense layer, if your task is many-to-many which means you have a label of certain dimension, then you have to have same number of that dimension as number of units in the dense layer.

Related

Time Series Forecasting model with LSTM in Tensorflow predicts a constant

I am building a hurricane track predictor using satellite data. I have a multiple to many output in a multilayer LSTM model, with input and output arrays following the structure [samples[time[features]]]. I have as features of inputs and outputs the coordinates of the hurricane, WS, and other dimensions.
The problem is that the error reduction, and as a consequence, the model predicts always a constant. After reading several posts, I standardized the data, removed some unnecessary layers, but still, the model always predicts the same output.
I think the model is big enough, activation functions make sense, given that the outputs are all within [-1;1].
So my questions are : What am I doing wrong ?
The model is the following:
class Stacked_LSTM():
def __init__(self, training_inputs, training_outputs, n_steps_in, n_steps_out, n_features_in, n_features_out, metrics, optimizer, epochs):
self.training_inputs = training_inputs
self.training_outputs = training_outputs
self.epochs = epochs
self.n_steps_in = n_steps_in
self.n_steps_out = n_steps_out
self.n_features_in = n_features_in
self.n_features_out = n_features_out
self.metrics = metrics
self.optimizer = optimizer
self.stop = EarlyStopping(monitor='loss', min_delta=0.000000000001, patience=30)
self.model = Sequential()
self.model.add(LSTM(360, activation='tanh', return_sequences=True, input_shape=(self.n_steps_in, self.n_features_in,))) #, kernel_regularizer=regularizers.l2(0.001), not a good idea
self.model.add(layers.Dropout(0.1))
self.model.add(LSTM(360, activation='tanh'))
self.model.add(layers.Dropout(0.1))
self.model.add(Dense(self.n_features_out*self.n_steps_out))
self.model.add(Reshape((self.n_steps_out, self.n_features_out)))
self.model.compile(optimizer=self.optimizer, loss='mae', metrics=[metrics])
def fit(self):
return self.model.fit(self.training_inputs, self.training_outputs, callbacks=[self.stop], epochs=self.epochs)
def predict(self, input):
return self.model.predict(input)
Notes
1) In this particular problem, the time series data is not "continuous", because one time serie belongs to a particular hurricane. I have therefore adapted the training and test samples of the time series to each hurricane. The implication of this is that I cannot use the function stateful=True in my layers because it would then mean that the model doesn't makes any difference between the different hurricanes (if my understanding is correct).
2) No image data, so no convolutionnal model needed.

Few suggestions, based on my experience:
4 layers of LSTM is too much. Stick to two, maximum three.
Don't use relu as activations for LSTMs.
Do not use BatchNormalization for time-series.
Other than these, I'd also suggest removing the dense layers between two LSTM layers.

How can i improve my CNN's accuracy evolution?

So, i'm trying to create a CNN which can predict if there is any "support devices" in a x-ray thorax image, but when training my model it seems it's not learning anything.
I'm using a dataset called "CheXpert" which has over 200.000 images to use. After doing some "cleaning", the final dataset ended up with 100.000 images.
As far as the model is concerned, i imported the convolutional base of the vgg16 pretrained model and added by my self 2 fully conected layers. Then, i freezed all the convolutional base and make only trainable the fully conected layers. Here's the code:
from keras.layers import GlobalAveragePooling2D
from keras.models import Model
pretrained_model = VGG16(weights='imagenet', include_top=False)
pretrained_model.summary()
for layer in pretrained_model.layers:
layer.trainable = False
x = pretrained_model.output
x = GlobalAveragePooling2D()(x)
dropout = Dropout(0.25)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
x = dropout(x)
x = Dense(1024, activation = 'relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
final_model = Model(inputs=pretrained_model.input, outputs=predictions)
final_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
As far as i know, the normal behavior should be that the accuracy should start low and then grow up with the epochs. But here it only oscillates through the same values (0.93 and 0.95). I'm sorry i cannot upload images to show you the graphs.
To sum up, i want to know if that little variance in the accuracy means that the model is not learning anything.
I have an hypothesis: from all the 100.000 images of the dataset, 95.000 have the label "1" and only 5.000 have the label "0". I think that if diminish the images with "1" equate them with the images with "0" the results would change.

The lack of images labeled "0" doesn't help the CNN for sure. I also suggest to lower the learning rate and play around with the batch size to see if something changes.
I wish it helps.

Because of imbalance training data, I suggest that you can set "class_weight" during the training step. The more data you have, the lower class_weight you set.
class_weight = {0: 1.5, 1: 0.5}
model.fit(X, Y, class_weight=class_weight)
You can check the augment of class_weight in keras document.
class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.

Could not increase accuracy from a fixed threshold using Keras Dense layer ANN

I'm learning the simplest neural networks using Dense layers using Keras. I'm trying to implement face recognition on a relatively small dataset (In total ~250 images with 50 images per class).
I've downloaded the images from google images and resized them to 100 * 100 png files. Then I've read those files into a numpy array and also created a one hot label array for training my model.
Here is my code for processing the training data:
X, Y = [], []
feature_map = {
'Alia Bhatt': 0,
'Dipika Padukon': 1,
'Shahrukh khan': 2,
'amitabh bachchan': 3,
'ayushmann khurrana': 4
}
for each_dir in os.listdir('.'):
if os.path.isdir(each_dir):
for each_file in os.listdir(each_dir):
X.append(cv2.imread(os.path.join(each_dir, each_file), -1).reshape(1, -1))
Y.append(feature_map[os.path.basename(each_file).split('-')[0]])
X = np.squeeze(X)
X = X / 255.0 # normalize the training data
Y = np.array(Y)
Y = np.eye(5)[Y]
print (X.shape)
print (Y.shape)
This is printing (244, 40000) and (244, 5). Here is my model:
model = Sequential()
model.add(Dense(8000, input_dim = 40000, activation = 'relu'))
model.add(Dense(1200, activation = 'relu'))
model.add(Dense(700, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(5, activation = 'softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=25, batch_size=15)
When I train the model, It stuck at the accuracy 0.2172, which is almost the same as random predictions (0.20).
I've also tried to train mode with grayscale images but still not getting expected accuracy. Also tried with different network architectures by changing the number of hidden layers and neurons in hidden layers.
What am I missing here? Is my dataset too small? or am I missing any other technical detail?
For more details of code, here is my notebook: https://colab.research.google.com/drive/1hSVirKYO5NFH3VWtXfr1h6y0sxHjI5Ey

Two suggestions I can make:
Your data set is probably too small. If you are splitting training and validation at 80/20, that means you are only training on 200 images, which is probably too small. Try increasing your data set to see if results improve.
I would recommend adding Dropout to each layer of your network as your training set is so small. Your network is most likely over-fitting your training data set since it is so small, and Dropout is an easy way to help avoid this problem.
Let me know if these suggestions make a difference!

I agree that the dataset is too small, 50 instances of each person is probably not enough. You can use data augmentation with the keras ImageDataGenerator method to increase the number of images, and rewrite your numpy reshaping code as a pre-processing function for the generator. I also noticed that you haven't shuffled the data, so the network is likely predicting the first class for everything (which is maybe why the accuracy is near random chance).
If increasing the dataset size doesn't help, you'll probably have to play around with the learning rate for the Adam optimizer.

Keras accuracy and actual accuracy are exactly reverse of each other

I'm learning Neural Networks and currently implemented object classification on CFAR-10 dataset using Keras library. Here is my definition of a neural network defined by Keras:
# Define the model and train it
model = Sequential()
model.add(Dense(units = 60, input_dim = 1024, activation = 'relu'))
model.add(Dense(units = 50, activation = 'relu'))
model.add(Dense(units = 60, activation = 'relu'))
model.add(Dense(units = 70, activation = 'relu'))
model.add(Dense(units = 30, activation = 'relu'))
model.add(Dense(units = 10, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=10000)
So I've 1 input layer having the input of dimensions 1024 or (1024, ) (each image of 32 * 32 *3 is first converted to grayscale resulting in dimensions of 32 * 32), 5 hidden layers and 1 output layer as defined in the above code.
When I train my model over 50 epochs, I got the accuracy of 0.9 or 90%. Also when I evaluate it using test dataset, I got the accuracy of approx. 90%. Here is the line of code which evaluates the model:
print (model.evaluate(X_test, y_test))
This prints following loss and accuracy:
[1.611809492111206, 0.8999999761581421]
But When I calculate the accuracy manually by making predictions on each test data images, I got accuracy around 11% (This is almost the same as probability randomly making predictions). Here is my code to calculate it manually:
wrong = 0
for x, y in zip(X_test, y_test):
if not (np.argmax(model.predict(x.reshape(1, -1))) == np.argmax(y)):
wrong += 1
print (wrong)
This prints out 9002 out of 10000 wrong predictions. So what am I missing here? Why both accuracies are exactly reverse (100 - 89 = 11%) of each other? Any intuitive explanation will help! Thanks.
EDIT:
Here is my code which processes the dataset:
# Process the training and testing data and make in Neural Network comfortable
# convert given colored image to grayscale
def rgb2gray(rgb):
return np.dot(rgb, [0.2989, 0.5870, 0.1140])
X_train, y_train, X_test, y_test = [], [], [], []
def process_batch(batch_path, is_test = False):
batch = unpickle(batch_path)
imgs = batch[b'data']
labels = batch[b'labels']
for img in imgs:
img = img.reshape(3,32,32).transpose([1, 2, 0])
img = rgb2gray(img)
img = img.reshape(1, -1)
if not is_test:
X_train.append(img)
else:
X_test.append(img)
for label in labels:
if not is_test:
y_train.append(label)
else:
y_test.append(label)
process_batch('cifar-10-batches-py/data_batch_1')
process_batch('cifar-10-batches-py/data_batch_2')
process_batch('cifar-10-batches-py/data_batch_3')
process_batch('cifar-10-batches-py/data_batch_4')
process_batch('cifar-10-batches-py/data_batch_5')
process_batch('cifar-10-batches-py/test_batch', True)
number_of_classes = 10
number_of_batches = 5
number_of_test_batch = 1
X_train = np.array(X_train).reshape(meta_data[b'num_cases_per_batch'] * number_of_batches, -1)
print ('Shape of training data: {0}'.format(X_train.shape))
# create labels to one hot format
y_train = np.array(y_train)
y_train = np.eye(number_of_classes)[y_train]
print ('Shape of training labels: {0}'.format(y_train.shape))
# Process testing data
X_test = np.array(X_test).reshape(meta_data[b'num_cases_per_batch'] * number_of_test_batch, -1)
print ('Shape of testing data: {0}'.format(X_test.shape))
# create labels to one hot format
y_test = np.array(y_test)
y_test = np.eye(number_of_classes)[y_test]
print ('Shape of testing labels: {0}'.format(y_test.shape))

The reason why this is happening is due to the loss function that you are using. You are using binary cross entropy where you should be using categorical cross entropy as the loss. Binary is only for a two-label problem but you have 10 labels here due to CIFAR-10.
When you show the accuracy metric, it is in fact misleading you because it is showing binary classification performance. The solution is to retrain your model by choosing categorical_crossentropy.
This post has more details: Keras binary_crossentropy vs categorical_crossentropy performance?
Related - this post is answering a different question, but the answer is essentially what your problem is: Keras: model.evaluate vs model.predict accuracy difference in multi-class NLP task
Edit
You mentioned that the accuracy of your model is hovering at around 10% and not improving in your comments. Upon examining your Colab notebook and when you change to categorical cross-entropy, it appears that you are not normalizing your data. Because the pixel values are originally unsigned 8-bit integer, when you create your training set it promotes the values to floating-point, but because of the dynamic range of the data, your neural network has a hard time learning the right weights. When you try to update the weights, the gradients are so small that there are essentially no updates and hence your network is performing just like random chance. The solution is to simply divide your training and test dataset by 255 before you proceed:
X_train /= 255.0
X_test /= 255.0
This will transform your data so that the dynamic range scales from [0,255] to [0,1]. Your model will have an easier time training due to the smaller dynamic range, which should help gradients propagate and not vanish because of the larger scale before normalizing. Because your original model specification has a significant number of dense layers, due to the dynamic range of your data the gradient updates will most likely vanish which is why the performance is poor initially.
When I run your notebook, I get 37% accuracy. This is not unexpected with CIFAR-10 and only a fully-connected / dense network. Also when you run your notebook now, the accuracy and the fraction of wrong examples match.
If you want to increase accuracy, I have a couple of suggestions:
Actually include colour information. Each object in CIFAR-10 has a distinct colour profile that should help in discrimination
Add Convolutional layers. I'm not sure where you are in your learning, but convolutional layers help in learning and extracting the right features in the image so that the most optimal features are presented to the dense layers so that classification on these features increases accuracy. Right now you're classifying raw pixels, which is not advisable given how noisy they can be, or due to how unconstrained things can get (rotation, translation, skew, scale, etc.).

Constant Output and Prediction Syntax with LSTM Keras Network

I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features.
My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep).
Or to make it more specific, something like (2000, 10, 3).
I try to predict the value of one feature, not of all three.
Problem:
If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=50, verbose=1)])
If I just use one layer, like:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=200, verbose=1)])
The predictions are somewhat reasonable, at least they are not constant anymore.
Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...
EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras
When I use:
`test=model.predict(trainX[0])`
to get the prediction for the first sequence, I get an dimension error:
"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"
I need to feed in an array of sequences like:
`test=model.predict(trainX[0:1])`
This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...

This is because you have not normalised input data.
Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.
For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help
data_df = (data_df - data_df.mean()) / data_df.std()
Or to have the parameters in the model, you can consider adding batch normalization layer to your model

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.