Keras SimpleRNNCell appears to fail to distribute learning among all its weights - python

This question was migrated from Cross Validated because it can be answered on Stack Overflow.
Migrated yesterday.
This question is about SimpleRNNCell, a class in Tensorflow to perform basic Recurrent Neural Network. Unless there's something fundamentally wrong in my code, it appears that training is not spread over all the available weights but only a subset of them, making irrelevant the recurrent machinery.
I've written with keras a minimal program with just one RNN cell and a dense layer. When I print out the learned weights the state weight doesn't appear to have changed since its initialization. Here's my code:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import RNN
from keras.layers import SimpleRNN, SimpleRNNCell
from sklearn.preprocessing import MinMaxScaler
from tensorflow import random as rnd
#Fix the seed
rnd.set_seed(0)
#The dataset can be downloaded from https://mantas.info/wp/wp-content/uploads/simple_esn/MackeyGlass_t17.txt
data = np.loadtxt('MackeyGlass_t17.txt')
#Normalize
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(data.reshape(-1, 1))
#Split Dataset in Train and Test
train, test = scaled[0:-100], scaled[-100:]
#Split into input and output
train_X, train_y = train[:-1], train[1:]
test_X, test_y = test[:-1], test[1:]
#Reshaping
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
#Batch and epochs
batch_size = 20
epochs = 2
#Design and run the model
model = Sequential()
model.add(RNN(SimpleRNNCell(1)))
#model.add(SimpleRNN(1))) # This generates the same results as the above line
model.add(Dense(train_y.shape[1]))
model.compile(loss='huber', optimizer='adam')
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_X, test_y), verbose=0, shuffle=False)
#Print the weights of the dense layer
for layer in model.layers: print(layer.get_weights())
If I run this code I receive the following output:
[array([[-0.8942287]], dtype=float32), array([[1.]], dtype=float32), array([0.05435111], dtype=float32)]
[array([[-1.272426]], dtype=float32), array([0.04711587], dtype=float32)]
Note that with a scalar series and just one cell, matrices and vectors reduce to only one element, so I end up having 5 weights: respectively the input weight, the state weight, the input bias, the dense layer weight, the dense layer bias. All these weights changed during learning apart from the state weight, which is stuck to its initialization i.e. 1.0.
Why the learning process doesn't affect the state weight? Is there any obvious mistake in the way I implemented the model?
Note:
Ubuntu 20.04, Python 3.9.15, Tensorflow 2.7.0, no GPU.

Related

Bootstrap like training to LSTM model

I have a dataset of size 273985 x 5 that I'm training as a path prediction problem. I chose an LSTM inspired by this paper: https://ieeexplore.ieee.org/abstract/document/9225479
I have a baseline implementation as such:
# lstm autoencoder recreate sequence
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.callbacks import EarlyStopping
from keras.layers import TimeDistributed
from keras.utils import plot_model
# define input sequence
my_sequence = np.array(sample)
# reshape input into [samples, timesteps, features]
n_in = len(my_sequence)
my_sequence = my_sequence.reshape((1, n_in, 5))
# define model
model = Sequential()
model.add(LSTM(10, activation='sigmoid', input_shape=(n_in,5)))
model.add(RepeatVector(n_in))
model.add(LSTM(10, activation='sigmoid', return_sequences=True))
model.add(TimeDistributed(Dense(5)))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(my_sequence, my_sequence, epochs=300, verbose=0)
# structure of the model and the layers
plot_model(model, show_shapes=True, to_file=path)
# demonstrate recreation
predicted = model.predict(my_sequence, verbose=0)
print(predicted)
print(my_sequence)
Right now, I am choosing my training sample by hand but I want to train my entire dataset much like bootstrapping where I train 1-50, predict the next 50; train 2-50, predict the next 50… until the end of the test set then compare my prediction against the actual values.
Would this be done via batching the data or k-fold validation? Also, how would one go about it or calculate the appropriate evaluation metric?
Thank you!

How do I train a neural network with tensorflow-datasets?

I am attempting to train a nural network on the emnist dataset but when I attempt to flatten my image, it throws the following error:
WARNING:tensorflow:Model was constructed with shape (None, 28, 28) for input Tensor("flatten_input:0", shape=(None, 28, 28), dtype=float32), but it was called on an input with incompatible shape (None, 1, 28, 28).
I can't figure out what seems to be the problem and have attempted changing my preprocessing, removing batch size from my model.fit and my ds.map.
Here is the full code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
def preprocess(dict):
image = dict['image']
image = tf.transpose(image)
label = dict['label']
return image, label
train_data, validation_data = tfds.load('emnist/letters', split = ['train', 'test'])
train_data_gen = train_data.map(preprocess).shuffle(1000).batch(32)
validation_data_gen = validation_data.map(preprocess).batch(32)
print(train_data_gen)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape = (28, 28)),
tf.keras.layers.Dense(128, activation = 'relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation = 'softmax')
])
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
early_stopping = keras.callbacks.EarlyStopping(monitor = 'val_accuracy', patience = 10)
history = model.fit(train_data_gen, epochs = 50, batch_size = 32, validation_data = validation_data_gen, callbacks = [early_stopping], verbose = 1)
model.save('emnistmodel.h5')
So there's actually a few things going on here, so let's address them one at a time.
Input shape
So to address your immediate question, you're receiving an incompatible shape error because, well, the shape of the input doesn't match the expected shape.
In this line tf.keras.layers.Flatten(input_shape=(28, 28)), we are telling the model to expect inputs of shape (28, 28), but this isn't accurate. Our inputs actually have shape (28, 28, 1) because we are taking a 28x28 pixel image with 1 channel (as opposed to a colour image which would have 3 channels r, g, and b). So to solve this immediate problem, we simply update the model to use the shape of the input. i.e. tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
Number of output nodes
As Rishabh suggested in his answer, the EMNIST dataset has more than 10 balanced classes. However, in your case, you appear to be using EMNIST Letters which has 26 balanced classes. So your neural net should correspondingly have 27 output nodes (since the class labels go from 1.. 26 while our output nodes correspond to 0.. 26) to be able to classify the given data. Of course, giving it extra output nodes will enable it to run as well, but these will give us additional weights to train that are not necessary which will increase the amount of training time needed for our model. In short, your final layer should be tf.keras.layers.Dense(27, activation='softmax')
Preprocessing TensorFlow Datasets
Reading your preprocess() function, I believe you're trying to convert the training and validation datasets into tuples of (image, label). Instead of creating our own function, TensorFlow conveniently implements this for us through the parameter as_supervised.
Additionally, I see some extra preprocessing that you're trying to achieve such as batching and shuffling the data. Again, TensorFlow implements batch_size and shuffle_files (See common arguments) for us! So loading the dataset would looking something like
train_data, validation_data = tfds.load('emnist/letters',
split=['train', 'test'],
shuffle_files=True,
batch_size=32,
as_supervised=True)
Some additional notes
Also, as a suggestion, consider excluding batch_size from model.fit(). Defining the same thing at two different places is a recipe for bugs and unexpected behaviours. Moreover, when using TensorFlow Datasets, it's not necessary because they already generate batches.
Overall your updated program should look something like this
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
from tensorflow import keras
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
train_data, validation_data = tfds.load('emnist/letters',
split=['train', 'test'],
shuffle_files=True,
batch_size=32,
as_supervised=True)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(27, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
early_stopping = keras.callbacks.EarlyStopping(
monitor='val_accuracy', patience=10)
history = model.fit(train_data,
epochs=50,
validation_data=validation_data,
callbacks=[early_stopping],
verbose=1)
model.save('emnistmodel.h5')
Hope this helps!
Hie #Rattandeep I just checked the emnist dataset It has 47 different classes and in your dense layer, you have mentioned 10.
If you change your code from
tf.keras.layers.Dense(10, activation = 'softmax')
To this one, it will work
tf.keras.layers.Dense(47, activation = 'softmax')
Thanks

Keras: unsupervised pre-training kills performance

I'm trying to train a deep classifier in Keras both with and without pretraining of the hidden layers via stacked autoencoders. My problem is that the pretraining seems to drastically degrade performance (i.e. if pretrain is set to False in the code below the training error of the final classification layer converges much faster). This seems completely outrageous to me given that pretraining should only initialize the weights of the hidden layers and I don't see how that could completely kill the models performance even if that initialization does not work very well. I can not include the specific dataset I used but the effect should occur for any appropriate dataset (e.g. minist). What is going on here and how can I fix it?
EDIT: code is now reproducible with the MNIST data, final line prints change in loss function, which is significantly lower with pre-training.
I have also slightly modified the code and added sample learning curves below:
from functools import partial
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.regularizers import l2
from keras.utils import to_categorical
(inputs_train, targets_train), _ = mnist.load_data()
inputs_train = inputs_train[:1000].reshape(1000, 784)
targets_train = to_categorical(targets_train[:1000])
hidden_nodes = [256] * 4
learning_rate = 0.01
regularization = 1e-6
epochs = 30
def train_model(pretrain):
model = Sequential()
layer = partial(Dense,
activation='sigmoid',
kernel_initializer='random_normal',
kernel_regularizer=l2(regularization))
for i, hn in enumerate(hidden_nodes):
kwargs = dict(units=hn, name='hidden_{}'.format(i + 1))
if i == 0:
kwargs['input_dim'] = inputs_train.shape[1]
model.add(layer(**kwargs))
if pretrain:
# train autoencoders
inputs_train_ = inputs_train.copy()
for i, hn in enumerate(hidden_nodes):
autoencoder = Sequential()
autoencoder.add(layer(units=hn,
input_dim=inputs_train_.shape[1],
name='hidden'))
autoencoder.add(layer(units=inputs_train_.shape[1],
name='decode'))
autoencoder.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='binary_crossentropy')
autoencoder.fit(
inputs_train_,
inputs_train_,
batch_size=32,
epochs=epochs,
verbose=0)
autoencoder.pop()
model.layers[i].set_weights(autoencoder.layers[0].get_weights())
inputs_train_ = autoencoder.predict(inputs_train_)
num_classes = targets_train.shape[1]
model.add(Dense(units=num_classes,
activation='softmax',
name='classify'))
model.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='categorical_crossentropy')
h = model.fit(
inputs_train,
targets_train,
batch_size=32,
epochs=epochs,
verbose=0)
return h.history['loss']
plt.plot(train_model(pretrain=False), label="Without Pre-Training")
plt.plot(train_model(pretrain=True), label="With Pre-Training")
plt.xlabel("Epoch")
plt.ylabel("Cross-Entropy")
plt.legend()
plt.show()

Same model produces consistently different accuracies in Keras and Tensorflow

I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy

Autoencoder Gridsearch Hyperparameter tuning Keras

My data shape is the same, I just generated here random numbers. In real the datas are float numbers from range -6 to 6, I scaled them as well. The Input layer size and Encoding dimension have to remain the same. When I am training the loss starts and stays at 0.631 all the time. I changed the learning rate manually. I am new to python and do not know to implement to a grid search to this code to find the right parameters. What else can I do to tune my network ?
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras import optimizers
#Train data
x_train=np.random.rand(2666000)
x_train = (train-min(train))/(max(train)-min(train))
x_train=x_train.reshape(-1,2000)
x_test=[]#empty testing later
#Enc Dimension
encoding_dim=100
#Input shape
input_dim = Input(shape=(2000,))
#Encoding Layer
encoded = Dense(encoding_dim, activation='relu')(input_dim)
#Decoding Layer
decoded = Dense(2000, activation='sigmoid')(encoded)
#Model AE
autoencoder = Model(input_dim, decoded)
#Model Encoder
encoder = Model(input_dim, encoded)
#Encoding
encoded_input = Input(shape=(encoding_dim,))
#Decoding
decoder_layer = autoencoder.layers[-1]
#Model Decoder
decoder = Model(encoded_input, decoder_layer(encoded_input))
optimizers.Adadelta(lr=0.1, rho=0.95, epsilon=None, decay=0.0)
autoencoder.compile(optimizer=optimizer, loss='binary_crossentropy',
metrics=['accuracy'])
#Train and test
autoencoder_train= autoencoder.fit(x_train, x_train,
epochs=epochs, shuffle=False, batch_size=2048)
I suggest adding more hidden layers. If your loss stays the same it means at least one of two things:
Your data is more or less random and there are no relationships to be drawn
Your model is not complex enough to learn meaningful relationships from your data
A rule of thumb for me is that a model should be powerful enough to overfit the data given enough training iterations.
Unfortunately there is a fine line between sufficiently complex and too complex. You have to play around with the number of hidden layers, the number of units in each layer, and the amount of epochs you take to train your network. Since you only have two Dense layers, a good starting point would be to increase model complexity.
If you insist on using a grid search keras has a wrapper for scikit_learn and sklearn has a grid search module. A toy example:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def create_model():
<return a compiled but untrained keras model>
model = KerasClassifier(build_fn = create_model, batch_size=1000, epochs=10)
#now write out all the parameters you want to try out for the grid search
activation = ['relu', 'tanh', 'sigmoid'...]
learn_rate = [0.1, 0.2, ...]
init = ['unform', 'normal', 'zero', ...]
optimizer = ['SGD', 'Adam' ...]
param_grid = dict(activation=activation, learn_rate=learn_rate, init=init, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
result = grid.fit(X, y)

Categories

Resources