Adding additional hidden layer and attention layer to LSTM model

Adding additional hidden layer and attention layer to LSTM model - python

sequence_input = Input(shape=(MAX_LENGTH_SEQUENCE,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
l_lstm = Bidirectional(LSTM(10))(embedded_sequences)
preds = Dense(len(macronum), activation='softmax')(l_lstm)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
I need to add additional hidden layer and an attention layer to the above LSTM model , usually i construct the model in this way:
model = tensorflow.keras.Sequential()
model.add(tensorflow.keras.layers.LSTM(128, dropout=0.3,
recurrent_dropout=0.2,input_shape=(N, K), return_sequences=True))
#model.add(tensorflow.keras.layers.LSTM(128, dropout=0.3,
recurrent_dropout=0.2,input_shape=(N, K), return_sequences=True))
model.add(Attention(name='attention_weight'))
model.add(tensorflow.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
print(_x_train.shape)
model.fit(_x_train, _y_train, epochs=10, batch_size=1)
scores = model.evaluate(_x_test, _y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
as shown in the second code i normaly stack and add as many layer as i need using
model.add
but for the first code i'm not familiar with this approach,
if i want to add extra LSTM layer and Attention layer to the first code, where should i include them inside the code? what is the right syntax ?

Related

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

How to classify job descriptions into their respective industries?
I'm trying to classify text using LSTM, in particular converting job description
Into industry categories, unfortunately the things I've tried so far
Have only resulted in 76% accuracy.
What is an effective method to classify text for more than 30 classes using LSTM?
I have tried three alternatives
Model_1
Model_1 achieves test accuracy of 65%
embedding_dimension = 80
max_sequence_length = 3000
epochs = 50
batch_size = 100
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model_2
Model_2 achieves test accuracy of 64%
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(100))
model.add(Dropout(rate=0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Model_3
Model_3 achieves test accuracy of 76%
model.add(Embedding(max_words, embedding_dimension, input_length= x_shape, trainable=False))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(100, dropout=0.4, recurrent_dropout=0.4))
model.add(Dense(128, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.039, seed=None)))
model.add(BatchNormalization())
model.add(Dense(64, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(32, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer= "adam" , loss='categorical_crossentropy', metrics=['acc'])
I'd like to know how to improve the accuracy of the network.

Start with a minimal base line
You have a simple network at the top of your code, but try this one as your baseline
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(output_dim//4)),
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The intuition here is to see how much work LSTM can do. We don't need it to output the full 30 output_dims (the number of classes) but instead a smaller set of features base the decision of the classes on.
Your larger networks have layers like Dense(128) with 100 input. That's 100x128 = 12,800 connections to learn.
Improving imbalance right away
Your data may have a lot of imbalance so for the next step, let's address that with a loss function called the top_k_loss. This loss function will make your network only train on the training examples that it is having the most trouble on. This does a great job of handling class imbalance without any other plumbing
def top_k_loss(k=16):
#tf.function
def loss(y_true, y_pred):
y_error_of_true = tf.keras.losses.categorical_crossentropy(y_true=y_true,y_pred=y_pred)
topk, indexs = tf.math.top_k( y_error_of_true, k=tf.minimum(k, y_true.shape[0]) )
return topk
return loss
Use this with a batch size of 128 to 512. You add it to your model compile like so
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy']
Now, you'll see that using model.fit on this will return some dissipointing numbers. That's because it is only reporting THE WORST 16 out of each training batch. Recompile with your regular loss and run model.evaluate to find out how it does on the training and again on the test.
Train for 100 epochs, and at this point you should already see some good results.
Next Steps
Make the whole model generate and testing into a function like so
def run_experiment(lstm_layers=1, lstm_size=output_dim//4, dense_layers=0, dense_size=output_dim//4):
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
for i in range(lstm_layers-1):
model.add(LSTM(lstm_size, return_sequences=True)),
model.add(LSTM(lstm_size)),
for i in range(dense_layers):
model.add(Dense(dense_size, activation='tanh'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy'])
model.fit(x=x,y=y,epochs=100)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
loss, accuracy = model.evaluate(x=x_test, y=y_test)
return loss
that can run a whole experiment for you. Now it is a matter of finding a better architecture by searching. One way to search is random. Random is actually really good. If you want to get fancy, I recommend hyperopt. Don't bother with grid search, random usually beats it for large search spaces.
best_loss = 10**10
best_config = []
for trial in range(100):
config = [
randint(1,4), # lstm layers
randint(8,64), # lstm_size
randint(0,8), # dense_layers
randint(8,64) # dense_size
]
result = run_experiment(*config)
if result < best_loss:
best_config = config
print('Found a better loss ',result,' from config ',config)

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?

I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

get a list of predictions of a neural network

I created a neural network to classify messages. Now I want to collect the predictions into a list in python. How do I do this?
So here is the model:
model = Sequential()
model.add(layers.Dense(500, activation = "relu", input_shape=(7600,)))
# Hidden - Layers
model.add(layers.Dropout(0.4, noise_shape=None, seed=None))
model.add(layers.Dense(300, activation = "relu"))
model.add(layers.Dropout(0.4, noise_shape=None, seed=None))
model.add(layers.Dense(100, activation = "relu"))
model.add(layers.Dropout(0.4, noise_shape=None, seed=None))
model.add(layers.Dense(20, activation = "softmax"))
model.summary()
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=['accuracy'])
model.fit( np.array(vectorized_training), np.array(y_train_neralnet),
batch_size=2000,
epochs=3,
verbose=1,
validation_data=(np.array(vectorized_validation), np.array(y_validation_neralnet)))
Here I tried to print the shape of validation_data that is inside of the model.fit() method but it gives an error.
NameError: name 'validation_data' is not defined

This is what you are looking for:
preds = model.predict(X_test)

python keras neural network prediction not working (outputs 0 or 1)

I have created with keras a neural network for predicting addition.
I have 2 inputs and 1 output (result of adding the 2 inputs).
I trained my neural network with tensorflow and then I tried to predict addition but the program returns 0 or 1 value not 3,4,5,etc.
This is my code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:2]
Y = dataset[:,2]
# create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(2, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
And my file data.csv:
1,2,3
3,3,6
4,5,9
10,8,18
1,3,4
5,3,8
For example:
1+2=3
3+3=6
4+5=9
...etc.
But I get this as output : 0,1,0,0,1,0,1...
Why didn't I get the output as 3,6,9...?
i updated code for use other loss function but i have same error :
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:2]
Y = dataset[:,2]
# create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(2, init='uniform', activation='relu'))
#model.add(Dense(1, init='uniform', activation='sigmoid'))
model.add(Dense(1, input_dim=2, init='uniform', activation='linear'))
# Compile model
#model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
outout=1,1,1,3,1,1,...etc

As #ebeneditos mentioned, you need to change your activation function in the last layer to something other than sigmoid. You can try changing it to linear.
model.add(Dense(1, init='uniform', activation='linear'))
You should also change your loss function to something like mean squared error, as your problem is more of a regression problem than a classification problem (binary_crossentropy is used as a loss function for binary classification problems)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

This is due to the Sigmoid function you have in the last layer. As it is defined:
It can only take values from 0 to 1. You should change last layer's activation function.
You can try this instead (with Dense(8) instead of Dense(2)):
# Create model
model = Sequential()
model.add(Dense(12, input_dim=2, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='linear'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)

concatenate flatten output with and other datasets keras python

have 2 datasets, for the first data set i want to apply convolution and keep the result of flatten layyer then concatenate it with an other data set and a do a simple feed forward it is possible with keras ?
def build_model(x_train,y_train):
np.random.seed(7)
left = Sequential()
left.add(Conv1D(nb_filter= 6, filter_length=3, input_shape= (48,1),activation = 'relu', kernel_initializer='glorot_uniform'))
left.add(Conv1D(nb_filter= 6, filter_length=3, activation= 'relu'))
#model.add(MaxPooling1D())
print model
#model.add(Dropout(0.2))
# flatten layer
#https://www.quora.com/What-is-the-meaning-of-flattening-step-in-a-convolutional-neural-network
left.add(Flatten())
left.add(Reshape((48,1)))
right = Sequential()
#model.add(Reshape((48,1)))
# Compile model
model.add(Merge([left, right], mode='sum'))
model.add(Dense(10, 10))
epochs = 100
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
#clipvalue=0.5)
model.compile(loss='mean_squared_error', optimizer='Adam')
model.fit(x_train,y_train, nb_epoch =epochs, batch_size=10, verbose=1)
#model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'] , )
return model

You need to look at the functional API. The sequential model you are using is not designed to take multiple network inputs.
Follow the "Multi-input and multi-output models" example and you will have it working in no time!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding additional hidden layer and attention layer to LSTM model - python

Related

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

Machine Learning with Keras: Different Validation Loss for the Same Model

get a list of predictions of a neural network

python keras neural network prediction not working (outputs 0 or 1)

concatenate flatten output with and other datasets keras python

Categories

Resources