Simple RNN(LSTM) model doesn't make progress - python

I have np_final_x(71520, 2, 50) and np_final_y(71520, 1, 50) corpus
https://www.dropbox.com/s/k15dtcak78jaf34/np_final_x_len_2.npy?dl=0
https://www.dropbox.com/s/555lhbdnkl6gmrq/np_final_y_len_2.npy?dl=0
This means predict like this I use -> this,
you give -> me
predict next word from two words.
And each words are encoded into 50 dimension vector.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dropout
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
import numpy as np
final_x = np.load('np_final_x_len_2.npy')
final_y = np.load('np_final_y_len_2.npy')
in_out_neurons = 50
n_hidden = 512 # not so much change
#n_hidden = 1 # not so much change
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, 2, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons,activation="tanh")) #not so much change
#model.add(Dense(in_out_neurons,activation="sigmoid")) #not so much change
#model.add(Dense(in_out_neurons, activation="relu")) #not so much change
optimizer = Adam(learning_rate=0.0001)
model.compile(loss="mean_squared_error", optimizer=optimizer)
model.summary()
model.fit(
final_x,final_y,
batch_size=400,
epochs=10,
validation_split=0.1
)
However what I got was around 0.017~0.019 not so much progress, even I changed any parameters.
Moreover, even when n_hidden = 1,result doesn't change.
So I guess something is wrong basically.
appreciate any help and hints. thank you.
Epoch 1/10
161/161 [==============================] - 9s 46ms/step - loss: 0.0195 - val_loss: 0.0192
Epoch 2/10
161/161 [==============================] - 8s 49ms/step - loss: 0.0191 - val_loss: 0.0188
Epoch 3/10
161/161 [==============================] - 8s 52ms/step - loss: 0.0187 - val_loss: 0.0186
Epoch 4/10
161/161 [==============================] - 11s 68ms/step - loss: 0.0185 - val_loss: 0.0184
Epoch 5/10
161/161 [==============================] - 12s 77ms/step - loss: 0.0184 - val_loss: 0.0183
Epoch 6/10
161/161 [==============================] - 13s 83ms/step - loss: 0.0183 - val_loss: 0.0183
Epoch 7/10
161/161 [==============================] - 14s 85ms/step - loss: 0.0183 - val_loss: 0.0182

You want to predict from two word embeddings the next word. It doesn't make sense to set return_sequences to True in the LSTM. I played a bit with your data and model, my best mse validation is 0.015 The limitation comes from the data and the simplistic model to predict the next word from two consecutive words (would a human be able to do that?) . Also, how do you get the 50 dimensional word embedding? It would be interesting to be able to go back to the words and see what kind of prediction the model produces.

Related

Trying to predict numbers in a LSTM and having extremely high loss (even with MinMax Scaler and Dropout)

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM, Dropout
from keras.layers import Dense
def split_univariate_sequence(sequence, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out
# check if we are beyond the sequence
if out_end_ix > len(sequence):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
n_steps_in, n_steps_out = 30, 30
X1, y1 = split_univariate_sequence(sumpred, n_steps_in, n_steps_out)
transformer = MinMaxScaler()
X1_transformed = transformer.fit_transform(X1)
n_features = 1
X1_transformed = X1_transformed.reshape((X1_transformed.shape[0], X1_transformed.shape[1], n_features))
model = Sequential()
model.add(LSTM(150, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(Dropout(0.3))
model.add(LSTM(50, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')
model.fit(X1_transformed, y1, epochs=1000, verbose=1)
# demonstrate prediction
x_input = sumpred[-30:].reshape(1, -1)
x_input = transformer.transform(x_input)
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=1)
yhat_inverse = transformer.inverse_transform(yhat)
sumpred is a array of float-32 (144,) with values between 390.624 to 347471. I'm trying to predict the next 30 numbers based on the last 30 sumpred values.
When I train the model, I have results like this:
Epoch 990/1000
85/85 [==============================] - 0s 2ms/step - loss: 1031220211.9529
Epoch 991/1000
85/85 [==============================] - 0s 2ms/step - loss: 1087168440.4706
Epoch 992/1000
85/85 [==============================] - 0s 2ms/step - loss: 1011368153.6000
Epoch 993/1000
85/85 [==============================] - 0s 2ms/step - loss: 1104842800.1882
Epoch 994/1000
85/85 [==============================] - 0s 2ms/step - loss: 1086514331.1059
Epoch 995/1000
85/85 [==============================] - 0s 2ms/step - loss: 1050088100.8941
Epoch 996/1000
85/85 [==============================] - 0s 2ms/step - loss: 1003426751.2471
Epoch 997/1000
85/85 [==============================] - 0s 2ms/step - loss: 1139417025.5059
Epoch 998/1000
85/85 [==============================] - 0s 2ms/step - loss: 1129283814.4000
Epoch 999/1000
85/85 [==============================] - 0s 2ms/step - loss: 1107968009.0353
Epoch 1000/1000
85/85 [==============================] - 0s 2ms/step - loss: 1651960831.6235
The values in yhat_inverse are far beyond expected. It was not better with other losses, like mean squared logarithmic error. Even with the data transformation (MinMaxScaler) and Dropout layers, I'm still having this issue.
Someone has any clue to improve my model performance?
Your model is not able to learn, so, first increase the size of the network. Given how much the loss is coming out, the input size is quite large and you are not providing enough power to the neural network to learn the data.
Remove the dropouts first and just increase the layers and keep them all at 150 or more.
Dropout is usually used towards the end when you see overfitting, but, your model has not even started learning.

Why there's a bad accuracy on dataset when it's used both for validation and training?

I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!
You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation
not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly

Overfitting on image classification

I'm working on image classification problem of sign language digits dataset with 10 categories (numbers from 0 to 10). My models are highly overfitting for some reason, even though I tried simple ones (like 1 Conv Layer), classical ResNet50 and even state-of-art NASNetMobile.
Images are colored and 100x100 in size. I tried tuning learning rate but it doesn't help much, although decreasing batch size results in earlier increase of val accuracy.
I applied augmentation to images and it didn't help too: my train accuracy can hit 1.0 when val accuracy can't get higher than 0.6.
I looked at the data and it seems to load just fine. Distribution of classes in validation set is fair too. I have 2062 images in total.
When I change my loss to binary_crossentropy it seems to give better results for both train accuracy and val accuracy, but that doesn't seem to be right.
I don't understand what's wrong, could you please help me find out what I'm missing? Thank you.
Here's a link to my notebook: click
This is going to be a very interesting answer. There's so many things you need to pay attention to when looking at a problem. Fortunately, there's a methodology (might be vague, but still a methodology).
TLDR: Start your journey at the data, not the model.
Analysing the data
First let's look at your data?
You have 10 classes. Each image is (100,100). And there only 2062 images. There's your first problem. There's very little data compared to a standard image classification problem. Therefore, you need to make sure that your data is easy to learn from without sacrificing generalizability of the data (i.e. so that it can do well on the validation/test sets). How do we do that?
Understand your data
Normalize your data
Reduce the number of features
Understanding data is a recurring theme in the other sections. So I won't have a separate section for that.
Normalizing your data
Here's first problem. You are rescaling your data to be between [0,1]. But you can do so much better by standardizing your data (i.e. (x - mean(x))/std(x)). Here's how you do that.
def create_datagen():
return tf.keras.preprocessing.image.ImageDataGenerator(
samplewise_center=True,
samplewise_std_normalization=True,
horizontal_flip=False,
rotation_range=30,
shear_range=0.2,
validation_split=VALIDATION_SPLIT)
Another thing you might notice is I've set horizontal_flip=False. This brings me back to the first point. You have to make a judgement call to see what augmentation techniques might make sense.
Brightness/ Shear - Seems okay
Cropping/resizing - Seems okay
Horizontal/Vertical flip - This is not something I'd try at the beginning. If someone shows you a hand sign in two different horizontal orientations, you might have trouble understanding some signs.
Reducing the number of features
This is very important. You don't have that much data. And you want to make sure you get the most out of the data. The data has the original size of (100,100). You can do well with a significantly less size image (I have tried (64,64) - But you might be able to go even lower). So please reduce the size of the images whenever you can.
Next thing, it doesn't matter if you see a sign in RGB or Grayscale. You still can recognize the sign. But Grayscale cuts down the amount of samples by 66% compared to RGB. So use less color channels whenever you can.
This is how you do these,
def create_flow(datagen, subset, directory, hflip=False):
return datagen.flow_from_directory(
directory=directory,
target_size=(64, 64),
color_mode='grayscale',
batch_size=BATCH_SIZE,
class_mode='categorical',
subset=subset,
shuffle=True
)
So again to reiterate, you need to spend time understanding data before you go ahead with a model. This is a bare minimal list for this problem. Feel free to try other things as well.
Creating the model
So, here's the changes I did to the model.
Added padding='same' to all the convolutional layers. If you don't do that by default it has padding=valid, which results in an automatic dimensionality reduction. This means, the deeper you go, the smaller your output is going to be. And you can see in the model you had you have a final convolution output of size (3,3). This is probably too small for the dense layer to make sense of. So pay attention to what the dense layer is getting.
Reduced the kernel size - Kernel size is directly related to the number of parameters. So to reduce the chances of overfitting to your small dataset. Go with a smaller kernel size whenever possible.
Removed dropout from convolutional layers - This is something I did as a precaution. Personally, I don't know if dropout works with convolution layers as well as with Dense layers. So I don't want to have an unknown complexity in my model at the beginning.
Removed the last convolutional layer - Reducing the parameters in the model to reduce changes of overfitting.
About the optimizer
After you do these changes, you don't need to change the learning rate of Adam. Adam works pretty well without any tuning. So that's a worry you can leave for later.
About the batch size
You were using a batch size of 8. Which is not even big enough to contain a single image for each class in a batch. Try to set this to a higher value. I set it to 32. Whenever you can try to increase batch size. May be not to very large values. But up to around 128 (for this problem should be fine).
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Convolution2D(8, (5, 5), activation='relu', input_shape=(64, 64, 1), padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Convolution2D(16, (3, 3), activation='relu', padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Convolution2D(32, (3, 3), activation='relu', padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.summary()
Final result
By doing some pre-meditation before jumping to making a model I achieved significantly better results than what you have.
Your result
Epoch 1/10
233/233 [==============================] - 37s 159ms/step - loss: 2.6027 - categorical_accuracy: 0.2218 - val_loss: 2.7203 - val_categorical_accuracy: 0.1000
Epoch 2/10
233/233 [==============================] - 37s 159ms/step - loss: 1.8627 - categorical_accuracy: 0.3711 - val_loss: 2.8415 - val_categorical_accuracy: 0.1450
Epoch 3/10
233/233 [==============================] - 37s 159ms/step - loss: 1.5608 - categorical_accuracy: 0.4689 - val_loss: 2.7879 - val_categorical_accuracy: 0.1750
Epoch 4/10
233/233 [==============================] - 37s 158ms/step - loss: 1.3778 - categorical_accuracy: 0.5145 - val_loss: 2.9411 - val_categorical_accuracy: 0.1450
Epoch 5/10
233/233 [==============================] - 38s 161ms/step - loss: 1.1507 - categorical_accuracy: 0.6090 - val_loss: 2.5648 - val_categorical_accuracy: 0.1650
Epoch 6/10
233/233 [==============================] - 38s 163ms/step - loss: 1.1377 - categorical_accuracy: 0.6042 - val_loss: 2.5416 - val_categorical_accuracy: 0.1850
Epoch 7/10
233/233 [==============================] - 37s 160ms/step - loss: 1.0224 - categorical_accuracy: 0.6472 - val_loss: 2.3338 - val_categorical_accuracy: 0.2450
Epoch 8/10
233/233 [==============================] - 37s 158ms/step - loss: 0.9198 - categorical_accuracy: 0.6788 - val_loss: 2.2660 - val_categorical_accuracy: 0.2450
Epoch 9/10
233/233 [==============================] - 37s 160ms/step - loss: 0.8494 - categorical_accuracy: 0.7111 - val_loss: 2.4924 - val_categorical_accuracy: 0.2150
Epoch 10/10
233/233 [==============================] - 37s 161ms/step - loss: 0.7699 - categorical_accuracy: 0.7417 - val_loss: 1.9339 - val_categorical_accuracy: 0.3450
My result
Epoch 1/10
59/59 [==============================] - 14s 240ms/step - loss: 1.8182 - categorical_accuracy: 0.3625 - val_loss: 2.1800 - val_categorical_accuracy: 0.1600
Epoch 2/10
59/59 [==============================] - 13s 228ms/step - loss: 1.1982 - categorical_accuracy: 0.5843 - val_loss: 2.2777 - val_categorical_accuracy: 0.1350
Epoch 3/10
59/59 [==============================] - 13s 228ms/step - loss: 0.9460 - categorical_accuracy: 0.6676 - val_loss: 2.5666 - val_categorical_accuracy: 0.1400
Epoch 4/10
59/59 [==============================] - 13s 226ms/step - loss: 0.7066 - categorical_accuracy: 0.7465 - val_loss: 2.3700 - val_categorical_accuracy: 0.2500
Epoch 5/10
59/59 [==============================] - 13s 227ms/step - loss: 0.5875 - categorical_accuracy: 0.8008 - val_loss: 2.0166 - val_categorical_accuracy: 0.3150
Epoch 6/10
59/59 [==============================] - 13s 228ms/step - loss: 0.4681 - categorical_accuracy: 0.8416 - val_loss: 1.4043 - val_categorical_accuracy: 0.4400
Epoch 7/10
59/59 [==============================] - 13s 228ms/step - loss: 0.4367 - categorical_accuracy: 0.8518 - val_loss: 1.7028 - val_categorical_accuracy: 0.4300
Epoch 8/10
59/59 [==============================] - 13s 226ms/step - loss: 0.3823 - categorical_accuracy: 0.8711 - val_loss: 1.3747 - val_categorical_accuracy: 0.5600
Epoch 9/10
59/59 [==============================] - 13s 227ms/step - loss: 0.3802 - categorical_accuracy: 0.8663 - val_loss: 1.0967 - val_categorical_accuracy: 0.6000
Epoch 10/10
59/59 [==============================] - 13s 227ms/step - loss: 0.3585 - categorical_accuracy: 0.8818 - val_loss: 1.0768 - val_categorical_accuracy: 0.5950
Note: This is a minimal effort I put. You can increase your accuracy further by augmenting data, optimizing the model structure, choosing the right batch size etc.

How to train and tune an artificial multilayer perceptron neural network using Keras?

I am building my first artificial multilayer perceptron neural network using Keras.
This is my input data:
This is my code which I used to build my initial model which basically follows the Keras example code:
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
Output:
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Epoch 11/20
1213/1213 [==============================] - 0s - loss: 0.1923
Epoch 12/20
1213/1213 [==============================] - 0s - loss: 0.1910
Epoch 13/20
1213/1213 [==============================] - 0s - loss: 0.2104
Epoch 14/20
1213/1213 [==============================] - 0s - loss: 0.1976
Epoch 15/20
1213/1213 [==============================] - 0s - loss: 0.1979
Epoch 16/20
1213/1213 [==============================] - 0s - loss: 0.2036
Epoch 17/20
1213/1213 [==============================] - 0s - loss: 0.2019
Epoch 18/20
1213/1213 [==============================] - 0s - loss: 0.1978
Epoch 19/20
1213/1213 [==============================] - 0s - loss: 0.1954
Epoch 20/20
1213/1213 [==============================] - 0s - loss: 0.1949
How do I train and tune this model and get my code to output my best predictive model? I am new to neural networks and am just wholly confused as to what is the next step after building the model. I know I want to optimize it, but I'm not sure which features to tweak or if I am supposed to do it manually or how to write code to do so.
Some things that you could do are:
Change your loss function from mean_squared_error to binary_crossentropy. mean_squared_error is intended for regression, but you want to classify your data.
Add show_accuracy=True to your fit() function, which outputs the accuracy of your model at every epoch. That information is probably more useful to you than just the loss value.
Add validation_split=0.2 to your fit() function. Currently you are only training on a training set and validating on nothing. That's a no-go in machine learning as you can't be sure that your model hasn't simply memorized the correct answers for your dataset (without really understanding why these answers are correct).
Change from Obama/Romney to Democrat/Republican and add data from previous elections. ~1200 examples is a pretty small dataset for neural networks. Also add columns with valuable information, like unemployment rate or population density. Note that quite some of the values (like population number) are probably similar to providing the name of the state, so e.g. your net will likely learn that Texas means Republican.
If you haven't done that already, normalize all your values to the range of 0 to 1 (by subtracting from each value the minimum of the column and then dividing by the (max - min) of the column). Neural networks can handle normalized data better than unnormalized data.
Try Adam and Adagrad instead of SGD. Sometimes they perform better. (See documentation about optimizers.)
Try Activation('relu'), LeakyReLU, PReLU and ELU instead of Activation('tanh'). Tanh is rarely the best choice. (See advanced activation functions.)
Try increasing/decreasing your dense layers sizes (e.g. from 64 to 128). Also try adding/removing layers.
Try adding BatchNormalization layers (before the Activation layers). (See documentation.)
Try changing the dropout rates (e.g. from 0.5 to 0.25).

Keras autoencoder accuracy/loss doesn't change

Here is my code:
AE_0 = Sequential()
encoder = Sequential([Dense(output_dim=100, input_dim=256, activation='sigmoid')])
decoder = Sequential([Dense(output_dim=256, input_dim=100, activation='linear')])
AE_0.add(AutoEncoder(encoder=encoder, decoder=decoder, output_reconstruction=True))
AE_0.compile(loss='mse', optimizer=SGD(lr=0.03, momentum=0.9, decay=0.001, nesterov=True))
AE_0.fit(X, X, batch_size=21, nb_epoch=500, show_accuracy=True)
X has a shape (537621, 256). I'm trying to find a way to compress the vectors of size 256 to 100, then to 70, then to 50. I have done this is Lasagne but in Keras it seems to be easier to work w/ Autoencoders.
Here is the output:
Epoch 1/500
537621/537621 [==============================] - 27s - loss: 0.1339 - acc: 0.0036
Epoch 2/500
537621/537621 [==============================] - 32s - loss: 0.1339 - acc: 0.0036
Epoch 3/500
252336/537621 [=============>................] - ETA: 14s - loss: 0.1339 - acc: 0.0035
And it continues like this on and on..
It's now fixed on master:) openning issues is sometimes best choice
https://github.com/fchollet/keras/issues/1604

Categories

Resources