I have access to a dataframe of 100 persons and how they performed on a certain motion test. This frame contains about 25,000 rows per person since the performance of this person is kept track of (approximately) each centisecond (10^-2). We want to use this data to predict a binary y-label, that is to say, if someone has a motor problem or not.
The columns and some values of the dataset are follows:
'Person_ID', 'time_in_game', 'python_time', 'permutation_game, 'round', 'level', 'times_level_played_before', 'speed', 'costheta', 'y_label', 'gender', 'age_precise', 'ax_f', 'ay_f', 'az_f', 'acc', 'jerk'
1, 0.25, 1.497942e+09, 2, 1, 'level_B', 1, 0.8, 0.4655, 1, [...]
I reduced the dataset to only 480 rows per person, by just using the row at each half of a second.
Now I want to use a recurrent neural network to predict the binary y_label.
This code extracts the costheta feature used for the input data X and the y-label for output Y.
X = []
Y = []
for ID in person_list:
person_frame = df.loc[df['Person_ID'] == Person_ID]
# costheta is a measurement of performance
coslist = list(person_frame['costheta'])
# extract y-label
score = list(person_frame['y_label'].head(1))[0]
I splitted the data in to training and testing data using a 0.2 test split. Then I tried to create the RNN with Keras as follows:
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout
# different_input_values are the set of possible input values
model.add(Embedding(different_input_values, embedding_size, input_length=480))
# output is binary
model.add(Dense(1, activation='sigmoid'))
At last, I began training with this code:
batch_size = 64
num_epochs = 100
X_valid, y_valid = X_train[:batch_size], Y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], Y_train[batch_size:]
model.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs).
However, the gained accuracy is really low. Depending on the batch size it varies between 0.4 and 0.6.
12/12 [==============================] - 13s 1s/step - loss: 0.6921 -
acc: 0.7500 - val_loss: 0.7069 - val_acc: 0.4219
My question is, in general, with complicated data like this, how does one efficiently train a RNN. Should one refrain from reducing the data to 480 rows per person and keep it around 25,000 rows per? Could multiple metrics, such as acc (acceleration in game) and jerk cause a significant accuracy gain? What are significant improvements that one could change and consider?
My model is like this:
def _get_model(input_shape, latent_dim, num_classes):
inputs = Input(shape=input_shape)
lstm_lyr,state_h,state_c = LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = Dense(num_classes)(lstm_lyr)
soft_lyr = Activation('relu')(fc_lyr)
model = Model(inputs, [soft_lyr,state_c])
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
return model
model =_get_model((n_steps_in, n_features),latent_dim ,n_steps_out)
history = model.fit(X_train,Y_train)
during training I get:
Epoch 1/2000
1/1 [==============================] - 1s 698ms/step - loss: 0.2338 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1185 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2341 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1181 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00
Epoch 2/2000
1/1 [==============================] - 0s 34ms/step - loss: 0.2328 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1175 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2329 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1169 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00
Epoch 3/2000
1/1 [==============================] - 0s 38ms/step - loss: 0.2316 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1163 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2315 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1155 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00
when i see history:
print (history.history.keys)
dict_keys(['loss', 'activation_26_loss', 'lstm_151_loss', 'activation_26_accuracy', 'lstm_151_accuracy', 'val_loss', 'val_activation_26_loss', 'val_lstm_151_loss', 'val_activation_26_accuracy', 'val_lstm_151_accuracy'])
which ones are the training loss and training accuracy?
Since there are only 2 outputs, why are there 3 losses,loss,activation_26_lossand lstm_151_loss BUT 2 accuracies:activation_26_accuracy and lstm_151_accuracy? what is each loss and each accuracy standing for?
Three losses (2+1), two losses for individual outputs, and one as the combination of the 2 losses weighed by 0.5 each. You can set both the losses explicitly and their weights as well.
Two accuracies since there are 2 outputs. metrics are just for the user to view and don't affect the neural network.
Detailed explanation;
Let's try to see what you are doing here first. (I am referring to the previous question you asked to get the shapes for inputs.
from tensorflow.keras import layers, Model, utils
def _get_model(input_shape, latent_dim, num_classes):
inputs = layers.Input(shape=input_shape)
lstm_lyr,state_h,state_c = layers.LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = layers.Dense(num_classes)(lstm_lyr)
soft_lyr = layers.Activation('relu')(fc_lyr)
model = Model(inputs, [soft_lyr,state_c]) #<------- One input, 2 outputs
model.compile(optimizer='adam', loss='mse')
return model
#Dummy data
X = np.random.random((100,15,5))
y1 = np.random.random((100,4))
y2 = np.random.random((100,7))
model =_get_model((15, 5), 7 , 4)
You are building a supervised model that takes an input of (15,5) shape and outputs 2 things: first a (7,) which should contain the cell_states from the 7 LSTM cells and second a (4,) vector that should contain probability values for the 4 classes. The loss you are using to train the model for learning how to predict both of the outputs is mse.
Since this is a supervised model, you will have to provide the model samples of inputs and outputs. If you have 100 samples then your inputs would be (100,15,5) shaped and your outputs will be (100,7) and (100,4), since you have 2 outputs.
Loss(y_actual, y_pred) is a function that tells the neural network how far is its prediction from the actual value. Based on this, it tells the neural network to update itself (its weights specifically using backpropagation) so that its predictions become closer and closer to actual and thus reduce the Loss.
If the above points are clear then let's look at what this network is doing specifically
Your current model has one input and 2 outputs.
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
Since you have defined mse as loss, both the outputs are trying to minimize mse. These are the 2 losses out of the 3: activation_26_loss which is the loss for the final Dense layer and lstm_151_loss which is the loss from the LSTM cell state. Keras just gives random names to these layers with numbers unless specified properly.
The loss mentioned is basically the weighted average of the other 2 losses. Ill talk about this more later.
The metrics=['accuracy'] is just a metric for users to track. Since there are 2 outputs, you get 2 different accuracy metrics, one for each output. They don't affect the neural network's training.
Now, when working with neural networks, it's important to know which loss to use where. Here is a table describing what loss and activation functions to use for which type of network.
As you can see, it's a good practice to use softmax and categorical_crossentropy for multi-class problems. So let's try to recreate the model with this change. We want each output to have a different loss to minimize.
Also, let's say the first output is more important than the second. We can also tell the model how to weigh the losses so that it prioritizes which loss to focus on more and by how much.
from tensorflow.keras import layers, Model, utils
def _get_model(input_shape, latent_dim, num_classes):
inputs = layers.Input(shape=input_shape)
lstm_lyr,state_h,state_c = layers.LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = layers.Dense(num_classes)(lstm_lyr)
soft_lyr = layers.Activation('softmax')(fc_lyr)
model = Model(inputs, [soft_lyr,state_c]) #<--- Softmax for first outputs activation
loss=['categorial_crossentropy','mse'], #<--- 2 losses, one for each output
loss_weights=[0.4, 0.6]) #<--- 2 loss weights for final loss
return model
#Dummy data
X = np.random.random((100,15,5))
y1 = np.random.random((100,4))
y2 = np.random.random((100,7))
model =_get_model((15, 5), 7 , 4)
utils.plot_model(model, show_layer_names=False, show_shapes=True)
Here, the final loss (named simply as loss) is the combination of the 2 separate losses after combining them with 0.4 and 0.6 weights.
Hope this clarifies what you are trying to achieve.
ONE A SIDE NOTE: I am curious as to how you are getting the actual values for the final cell state to train the model to predict a cell state. Do let me know if that is what your intention is. It's not very clear what your final goal here is (as I had asked your previous question as well).
I am using huggingface TFBertModel to do a classification task (from here: ), I am using the bare TFBertModel with an added head dense layer and not TFBertForSequenceClassification since I didn't see how I could use the latter using pretrained weights to only fine-tune the model.
As far as I know, fine tuning should give me about 80% or more accuracy in both BERT and ALBERT, but I am not coming even near that number:
Train on 3600 samples, validate on 400 samples
Epoch 1/2
3600/3600 [==============================] - 177s 49ms/sample - loss: 0.6531 - accuracy: 0.5792 - val_loss: 0.5296 - val_accuracy: 0.7675
Epoch 2/2
3600/3600 [==============================] - 172s 48ms/sample - loss: 0.6288 - accuracy: 0.6119 - val_loss: 0.5020 - val_accuracy: 0.7850
More epochs don't make much difference.
I am using CoLA public data set to fine-tune , this is how the data looks like:
gj04 1 Our friends won't buy this analysis, let alone the next one we propose.
gj04 1 One more pseudo generalization and I'm giving up.
gj04 1 One more pseudo generalization or I'm giving up.
gj04 1 The more we study verbs, the crazier they get.
And this is the code that loads the data into python:
import csv
def get_cola_data(max_items=None):
csv_file = open('cola_public/raw/in_domain_train.tsv')
reader = csv.reader(csv_file, delimiter='\t')
x = []
y = []
for row in reader:
if max_items is not None:
x = x[:max_items]
y = y[:max_items]
return x, y
I verified that the data is in the format that I want it to be in the lists, and this is the code of the model itself:
#!/usr/bin/env python
import tensorflow as tf
from tensorflow import keras
from transformers import BertTokenizer, TFBertModel
import numpy as np
from cola_public import get_cola_data
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = TFBertModel.from_pretrained('bert-base-uncased')
bert_model.trainable = False
x_input = keras.Input(shape=(512,), dtype=tf.int64)
x_mask = keras.Input(shape=(512,), dtype=tf.int64)
_, output = bert_model([x_input, x_mask])
output = keras.layers.Dense(1)(output)
model = keras.Model(
inputs=[x_input, x_mask],
train_data_x, train_data_y = get_cola_data(max_items=4000)
encoded_data = [tokenizer.encode_plus(data, add_special_tokens=True, pad_to_max_length=True) for data in train_data_x]
train_data_x = np.array([data['input_ids'] for data in encoded_data])
mask_data_x = np.array([data['attention_mask'] for data in encoded_data])
train_data_y = np.array(train_data_y)
[train_data_x, mask_data_x],
cmd_input = ''
while True:
print("Type an opinion: ")
cmd_input = input()
# print('Your opinion is: %s' % cmd_input)
if cmd_input == 'exit':
cmd_input_tokens = tokenizer.encode_plus(cmd_input, add_special_tokens=True, pad_to_max_length=True)
cmd_input_ids = np.array([cmd_input_tokens['input_ids']])
cmd_mask = np.array([cmd_input_tokens['attention_mask']])
result = model.predict([cmd_input_ids, cmd_mask])
Now, no matter if I use other dataset, other number of items from the datasets, if I use a dropout layer before the last dense layer, if I give another dense layer before the last one with higher number of units or if I use Albert instead of BERT, I always have low accuracy and high loss, and often, the validation accuracy is higher than training accuracy.
I have the same results if I try to use BERT/ALBERT for NER task, always the same result, which makes me believe I systematically make some fundamental mistake in fine tuning.
I know that I have bert_model.trainable = False and it is what I want, since I want to train only the last head and not the pretrained weights and I know that people train that way successfully. Even if I train with the pretrained weights, the results are much worse.
I see I have a very high underfit, but I just can't put my finger where I could improve here, especially seeing that people tend tohave good results with just a single dense layer on top of the model.
The default learning rate is too high for BERT. Try setting it to one of the recommended learning rates from the original paper Appendix A.3 of 5e-5, 3e-5 or 2e-5.
I'm working on a LSTM model in Keras with the goal of next word prediction utilizing BERT word vectors as a part of my inputs for the model.
This is a multi-class categorical problem, and I've done some weird steps to simplify English into clusters of words using BERT and stop-words and k-means, and for my initial practice model I'm using 144 target categories. I plan to up that to about 1000 after working out some kinks.
Here's the architecture of my Keras model:
model = Sequential()
model.add(LSTM(32, input_shape=(SENTENCE_LENGTH, COM_WORDS), dropout=0.2))
optimizer = Adam(lr=lr)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model.fit(X, y, validation_split=0.05, batch_size=128, epochs=epochs)
My loss starts arounds around 6 and goes down, which isn't unusual as far as I know. I then tried to incorporate class weights, since the model was over-predicting common words like 'the', which is expected. so I used this code to make the weights:
max_count = 0
for word in range(COM_WORDS):
if Ys.count(word) > max_count:
max_count = Ys.count(word)
class_weights = {}
for word in range(COM_WORDS):
class_weights[word] = (max_count - Ys.count(word) + 1)
So my most common y-input would have a value of 1 in the dictionary, and an y-input that is only represented once would be weighted at the count of the most common y-input: around 1 million in this case. Then I added it to my fit() and restarted the model.
When I run my model with the weights, i get insanely high loss (this is just a batch of 100,000 of all my inputs being run):
Epoch 1/3
950000/950000 [==============================] - 160s 168us/step - loss: 3014409.5359 - acc: 0.1261 - val_loss: 2808283.0898 - val_acc: 0.1604
The accuracy is fine though! Not too different than when I didn't use weights.
Does this high loss matter? Is it just a reflection of my huge weight numbers, or is it indicating something sinister? Are loss numbers relative?
Side question: Should I use a better method to weight my inputs?
Thank you!
I am using keras fit_generator(datagen.flow()) function for training of my inception model, I am so confused about the number of images it is taking on every epoch. Can anyone please help me telling this How it is working. My code is below.
I am using this keras documentation.
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rotation_range = 15, horizontal_flip = True)
# Fitting the model with
history = inc_model.fit_generator(datagen.flow(X_train, train_labels, batch_size=10), epochs=20, validation_data = (X_test, test_labels), callbacks=None)
Now my total images in X_train is 4676. However, everytime I run this history line, I get
Epoch 1/20
936/936 [========================] - 167s 179ms/step - loss: 1.4236 - acc: 0.3853 - val_loss: 1.0858 - val_acc: 0.5641
Why is it not taking whole of my X_train images?
Also, if I change batch_size from 10 to lets say 15 it start taking more less images such as
Epoch 1/20
Thank you.
The 936 and 436 actually refer to batches of samples per epoch. You set your batch size to 10 and 15, so in each case the model is trained on 936 X 10 and 436 X 15 samples per epoch. The samples is even more than your original training set, since you use the ImageDataGenerator which creates additional training instances by applying transformations to existing ones.
I am fitting a recurrent neural net in python using keras library. I am fitting the model with different epoch number by changing parameter nb_epoch in Sequential.fit() function. Currently I'm using for loop which starts over fitting each time I change nb_epoch which is lots of repeating work. Here is my code (the loop is in the bottom of the code, if you want to skip other parts of the code details):
from __future__ import division
import numpy as np
import pandas
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.learning_curve import learning_curve
### Here I do the data processing to create trainX, testX
#model create:
model = Sequential()
#this is the epoch array for different nb_epoch
### Here I define model architecture
model.compile(loss="mse", optimizer="rmsprop")
#### Defining arrays for different epoch number
epoch_array = range(100, 2100,100)
# I create the following arrays/matrices to store the result of NN fit
# different epoch number.
train_DBN_fitted_Y = np.zeros(shape=(len(epoch_array),trainX.shape[0]))
test_DBN_fitted_Y = np.zeros(shape=(len(epoch_array),testX.shape[0]))
### Following loop is the heart of the question
i = 0
for epoch in epoch_array:
model.fit( trainX, trainY,
batch_size = 16, nb_epoch = epoch, validation_split = 0.05, verbose = 2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = trainPredict.reshape(trainPredict.shape[0])
testPredict = testPredict.reshape(testPredict.shape[0])
train_DBN_fitted_Y[i] = trainPredict
test_DBN_fitted_Y[i] = testPredict
i = i + 1
Now this loops is very inefficient. Because for example, when it sets, say, nb_epoch = 100, it starts training from epoch = 1 and finishes at epoch = 100 like following :
Epoch 1/100
0s - loss: 1.9508 - val_loss: 296.7801
Epoch 100/100
0s - loss: 7.6575 - val_loss: 366.2218
In the next iteration of loop, where it says nb_epoch = 200 it starts training from epoch = 1 again and finishes at epoch = 200. But what I want to do is, in this iteration, start training from where it left in the last iteration of the loop i.e. epoch = 100 and then epoch = 101 and so on....
How can I modify this loop to achieve this?
Continuously calling fit is training your model further, starting from the state it was left from the previous call. For it to not continue it would have to reset the weights of your model, which fit does not do. You are just not seeing that it does that, as it is always starting to count epochs beginning at 1.
So in the end the problem is just that it does not print the correct number of epochs (which you cannot change).
If this is bothering you, you can implement your own fit by calling model.train_on_batch periodically.
You can use the initial_epoch parameter of fit (see docs)