Word2Vec embedding to LSTM layers?

Word2Vec embedding to LSTM layers? - python

I am now working on a neural network that should predict the next activity and the outcome (both or just one, depending on the self.net_out parameter of a trace (sequence of events, taken from an eventlog). The inputs of the net are windows (prefixes) of a trace of a specific size. Right now it looks like this:
def nn(self,params):
#done in this function so that, in case, win_size easily can become a parameter
X_train,Y_train,Z_train = self.build_windows(self.traces_train,self.win_size)
if(self.net_embedding==0):
if(self.net_out!=2):
Y_train = self.leA.fit_transform(Y_train)
Y_train = to_categorical(Y_train)
label=Y_train
if(self.net_out!=1):
Z_train = self.leO.fit_transform(Z_train)
Z_train = to_categorical(Z_train)
label=Z_train
unique_events = len(self.act_dictionary)
input_act = Input(shape=self.win_size, dtype='int32', name='input_act')
if(self.net_embedding==0):
x_act = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events + 1, input_length=self.win_size)(
input_act)
else:
print("WIP")
n_layers = int(params["n_layers"]["n_layers"])
l1 = LSTM(params["shared_lstm_size"], return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(x_act)
l1 = BatchNormalization()(l1)
if(self.net_out!=2):
l_a = LSTM(params["lstmA_size_1"], return_sequences=(n_layers != 1), kernel_initializer='glorot_uniform',dropout=params['dropout'])(l1)
l_a = BatchNormalization()(l_a)
elif(self.net_out!=1):
l_o = LSTM(params["lstmO_size_1"], return_sequences=(n_layers != 1), kernel_initializer='glorot_uniform',dropout=params['dropout'])(l1)
l_o = BatchNormalization()(l_o)
for i in range(2,n_layers+1):
if(self.net_out!=2):
l_a = LSTM(params["n_layers"]["lstmA_size_%s_%s" % (i, n_layers)], return_sequences=(n_layers != i), kernel_initializer='glorot_uniform',dropout=params['dropout'])(l_a)
l_a = BatchNormalization()(l_a)
if(self.net_out!=1):
l_o = LSTM(params["n_layers"]["lstmO_size_%s_%s" % (i, n_layers)], return_sequences=(n_layers != i), kernel_initializer='glorot_uniform',dropout=params['dropout'])(l_o)
l_o = BatchNormalization()(l_o)
outputs=[]
if(self.net_out!=2):
output_l = Dense(self.outsize_act, activation='softmax', name='act_output')(l_a)
outputs.append(output_l)
if(self.net_out!=1):
output_o = Dense(self.outsize_out, activation='softmax', name='outcome_output')(l_o)
outputs.append(output_o)
model = Model(inputs=input_act, outputs=outputs)
print(model.summary())
opt = Adam(lr=params["learning_rate"])
if(self.net_out==0):
loss = {'act_output':'categorical_crossentropy', 'outcome_output':'categorical_crossentropy'}
loss_weights= [params['gamma'], 1-params['gamma']]
if(self.net_out==1):
loss = {'act_output':'categorical_crossentropy'}
loss_weights= [1,1]
if(self.net_out==2):
loss = {'outcome_output':'categorical_crossentropy'}
loss_weights=[1,1]
model.compile(loss=loss, optimizer=opt, loss_weights=loss_weights ,metrics=['accuracy'])
early_stopping = EarlyStopping(monitor='val_loss',
patience=20)
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=0, mode='auto',
min_delta=0.0001, cooldown=0, min_lr=0)
if(self.net_out==0):
history = model.fit(X_train, [Y_train,Z_train], epochs=3, batch_size=2**params['batch_size'], verbose=2, callbacks=[early_stopping, lr_reducer], validation_split =0.2 )
else:
history = model.fit(X_train, label, epochs=300, batch_size=2**params['batch_size'], verbose=2, callbacks=[early_stopping, lr_reducer], validation_split =0.2 )
scores = [history.history['val_loss'][epoch] for epoch in range(len(history.history['loss']))]
score = min(scores)
#global best_score, best_model
if self.best_score > score:
self.best_score = score
self.best_model = model
return {'loss': score, 'status': STATUS_OK}
As it can be seen, I need to consider 2 types of embeddings: for the one that I already implemented and tested (self.net_embedding=0), each activity/event in each trace (and consequently window) is mapped as an integer; then I apply fit_transform and to_categorical.
The second type of embedding that I have to try is by using word2vec. To do so, I already changed the format of the input, not converting each activity in an integer but by keeping it as a string (the actual name of the activity, standardized to just numbers and letters). I don't know how to proceed though: I guess I should do something like
w2vModel= Word2Vec(X_train, size=params['word2vec_size'], min_count=1)
to get the embedded windows by w2vModel.wv, but how do I pass these to the lstm layers then? Into what should I change the embedding layer after the input one (where I put print(WIP) for now)?

Related

Training a RNN/LSTM model got KeyError equal to the val of the length

Trying to train this model
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
length = 60
n_features = X_train_s.shape[1]
batch_size = 1
early_stop = EarlyStopping(monitor = 'val_accuracy', mode = 'max', verbose = 1, patience = 5)
generator = TimeseriesGenerator(data = X_train_s,
targets = Y_train[['TARGET_KEEP_LONG',
'TARGET_KEEP_SHORT',
'TARGET_STAY_FLAT']],
length = length,
batch_size = batch_size)
RNN_model = Sequential()
RNN_model.add(LSTM(180, activation = 'relu', input_shape = (length, n_features)))
RNN_model.add(Dense(3))
RNN_model.compile(optimizer = 'adam', loss = 'binary_crossentropy')
validation_generator = TimeseriesGenerator(data = X_test_s,
targets = Y_test[['TARGET_KEEP_LONG',
'TARGET_KEEP_SHORT',
'TARGET_STAY_FLAT']],
length = length,
batch_size = batch_size)
RNN_model.fit(generator,
epochs=20,
validation_data = validation_generator,
callbacks = [early_stop])
I get the error "KeyError: 60" where actually 60 is the value of the variable "length" (if I change it, the error changes accordingly).
The shapes of the training dataset are
X_test_s.shape
(114125, 89)
same for X_train_s.shape as well as n_features == 89.

It was exhausting to find the cause due to the poor and misleading error message. Anyway, the trouble was on the target data set form, the TimeseriesGenerator does not accept panda dataframes, just np.arrays. Therefore this
generator = TimeseriesGenerator(data = X_train_s,
targets = Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT', 'TARGET_STAY_FLAT']], length = length, batch_size = batch_size)
shall have been written as
generator = TimeseriesGenerator(X_train_s, pd.DataFrame.to_numpy(Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT', 'TARGET_STAY_FLAT']]), length=length, batch_size=batch_size)
in the case of just one target, it was enough
generator = TimeseriesGenerator(data = X_train_s, targets = Y_train['TARGET_KEEP_LONG'], length = length, batch_size = batch_size)
just one level of squared brackets, not two.

Train a LSTM model using multiple datasets in for loop

I am in the process of training my LSTM neural networks that shall predict quintiles of stock price distributions. As I would like to train the model on not just one stock but a sample of 500 I wrote the below training loop that shall fit the model to each stock, save the model params and the load the params again when training the next stock. My question is if I can write the code in the for loop like below or whether I can also just use a complete dataset including all 500 stocks where data is just concatenated along the 0 axis.
The idea is, that the model iterates over each stock, the best model is then saved by the checkpoint function and is reloaded again for the fitting of the next stock.
This is the training loop I would like to use:
def compile_and_fit(model_type,model,checkpoint_path,config, stock_data,macro_data, factor_data, patience, batch_size,
num_epochs,train_set_ratio, val_set_ratio, Y_name):
"""
model = NN model,
data = stock data, factor data, macro data,
batch_size = timesteps per batch
alpha adam = learning rate optimizer
data set ratios = train_set_ratio, val_set_ratio (eg. 0.5)
"""
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='loss', #'loss'
patience=patience,
mode='min')
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path,
monitor= 'loss',
verbose=True,
save_best_only=True,
save_freq = batch_size,
mode='min')
permno_list = stock_data.permno.unique()
test_data = pd.DataFrame()
counter = 0
for p in permno_list:
#checkpoints
if counter == 0:
trained_model = model
cp_callback = cp_callback
else:
trained_model = tf.keras.models.load_model(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,monitor= 'loss',verbose=True, save_best_only=True,save_freq = batch_size, mode='min')
stock_data_length = len(stock_data.loc[stock_data.permno==p])
train_data_stocks = stock_data.loc[stock_data.permno==p][0:int(stock_data_length*train_set_ratio)]
val_data_stocks = stock_data.loc[stock_data.permno==p][int(stock_data_length*train_set_ratio):int(stock_data_length*(val_set_ratio+train_set_ratio))]
test_data_stocks = stock_data.loc[stock_data.permno==p][int(stock_data_length*(val_set_ratio+train_set_ratio)):]
test_data = pd.concat([test_data, test_data_stocks],axis=0)
train_date_index = train_data_stocks.index.values.tolist()
val_date_index = val_data_stocks.index.values.tolist()
train_data_factors = factor_data.loc[factor_data.index.isin(train_date_index)]
train_data_macro = macro_factors.loc[macro_factors.index.isin(train_date_index)]
train_data_macro_norm = train_data_macro.copy(deep=True)
for c in train_data_macro_norm.columns:
train_data_macro_norm[c] = MinMaxScaler([-1,1]).fit_transform(pd.DataFrame(train_data_macro_norm[c]))
train_data_merged = pd.concat([train_data_factors, train_data_macro_norm],axis=1)
val_data_factors = factor_data.loc[factor_data.index.isin(val_date_index)]
val_data_macro = macro_factors.loc[macro_factors.index.isin(val_date_index)]
val_data_macro_norm = val_data_macro.copy(deep=True)
for c in val_data_macro_norm.columns:
val_data_macro_norm[c] = MinMaxScaler([-1,1]).fit_transform(pd.DataFrame(val_data_macro_norm[c]))
val_data_merged = pd.concat([val_data_factors, val_data_macro_norm],axis=1)
if model_type=='combined':
x_train_factors = []
x_train_macro = []
y_train =[]
for i in range(batch_size, len(train_data_factors)):
x_train_factors.append(train_data_factors.values[i-batch_size:i,:])
x_train_macro.append(train_data_macro_norm.values[i-batch_size:i,:])
y_train.append(train_data_stocks[Y_name].values[i])
x_train_factors, x_train_macro, y_train= np.array(x_train_factors),np.array(x_train_macro), np.array(y_train)
x_val_factors = []
x_val_macro = []
y_val =[]
for i in range(batch_size, len(val_data_factors)):
x_val_factors.append(val_data_factors.values[i-batch_size:i,:])
x_val_macro.append(val_data_macro_norm.values[i-batch_size:i,:])
y_val.append(val_data_stocks[Y_name].values[i])
x_val_factors, x_val_macro, y_val = np.array(x_val_factors),np.array(x_val_macro), np.array(y_val)
score =trained_model.evaluate([x_train_macro,x_train_factors],y_train,batch_size=batch_size)
score = list(score)
score.sort(reverse=True)
score = score[-2]
cp_callback.best = score
trained_model.fit(x=[x_train_macro,x_train_factors],y=y_train,batch_size=batch_size, epochs=num_epochs,
validation_data=[[x_val_macro,x_val_factors], y_val], callbacks=[early_stopping,cp_callback])
if model_type=='merged':
x_train_merged = []
y_train =[]
for i in range(batch_size, len(train_data_merged)):
x_train_merged.append(train_data_merged.values[i-batch_size:i,:])
y_train.append(train_data_stocks[Y_name].values[i])
x_train_merged, y_train= np.array(x_train_merged), np.array(y_train)
x_val_merged = []
y_val =[]
for i in range(batch_size, len(val_data_merged)):
x_val_merged.append(val_data_merged.values[i-batch_size:i,:])
y_val.append(val_data_stocks[Y_name].values[i])
x_val_merged, y_val = np.array(x_val_merged), np.array(y_val)
score =trained_model.evaluate(x_train_merged,y_train,batch_size=batch_size)
score = list(score)
score.sort(reverse=True)
score = score[-2]
cp_callback.best = score
trained_model.fit(x=x_train_merged,y=y_train,batch_size=batch_size, epochs=num_epochs,
validation_data=[x_val_merged, y_val], callbacks=[early_stopping,cp_callback])
return trained_model, test_data
If someone has an idea whether this works or not, I would be incredibly grateful!
In my testing I could see the mse constantly decreasing, however if the loop continues for the next stop the mse starts with avery high value again.

According to this answer
How can I use multiple datasets with one model in Keras?
you can repeatedly fit the same model on more datasets.
If you want to save the model and load it at each iteration, that should also work with the caveat that you loose the optimizer state (see Loading a trained Keras model and continue training).

Selecting Best Performing Model over Various iterations

I want to create a for loop in order to run my model various times and keep the best performing model for each run. This because I've noticed that each time I train my model it might perform better on one run and much worse on another. Thus I want to store possibly each model in a list or just select the best.
I have the current process but I'm not sure if this is the most adequate manner and also I'm not actually sure on how to select the best performing model through all these iterations. Here I am doing it only for 10 iterations, but I want to know if there is a better way of doing this.
My Code Implementation
def build_model(input1, input2):
"""
Creates the a multi-channel ANN, capable of accepting multiple inputs.
:param: none
:return: the model of the ANN with a single output given
"""
input1 = np.expand_dims(input1,1)
# Define Inputs for ANN
input1 = Input(shape = (input1.shape[1], ), name = "input1")
input2 = Input(shape = (input2.shape[1],), name = "input2")
# First Branch of ANN (Weight)
x = Dense(units = 1, activation = "relu")(input1)
x = BatchNormalization()(x)
# Second Branch of ANN (Word Embeddings)
y = Dense(units = 36, activation = "relu")(input2)
y = BatchNormalization()(y)
# Merge the input models into a single large vector
combined = Concatenate()([x, y])
#Apply Final Output Layer
outputs = Dense(1, name = "output")(combined)
# Create an Interpretation Model (Accepts the inputs from previous branches and has single output)
model = Model(inputs = [input1, input2], outputs = outputs)
# Compile the Model
model.compile(loss='mse', optimizer = Adam(lr = 0.01), metrics = ['mse'])
# Summarize the Model Summary
model.summary()
return model
test_outcomes = [] # list of model scores
r2_outcomes = [] #list of r2 scores
stored_models = [] #list of stored_models
for i in range(10):
model = build_model(x_train['input1'], x_train['input2'])
print("Model Training")
model.fit([x_train['input1'], x_train['input2']], y_train,
batch_size = 25, epochs = 60, verbose = 0 #, validation_split = 0.2
,validation_data = ([x_valid['input1'],x_valid['input2']], y_valid))
#Determine Model Predictions
print("Model Predictions")
y_pred = model.predict([x_valid['input1'], x_valid['input2']])
y_pred = y_pred.flatten()
#Evaluate the Model
print("Model Evaluations")
score = model.evaluate([x_valid['input1'], x_valid['input2']], y_valid, verbose=1)
test_loss = round(score[0], 3)
print ('Test loss:', test_loss)
test_outcomes.append(test_loss)
#Calculate R_Squared
r_squared = r2_score(y_valid, y_pred)
print(r_squared)
r2_outcomes.append(r_squared)
#Store Final Model
print("Model Stored")
stored_models.append(model) #list of stored_models
mean_test= np.mean(test_outcomes)
r2_means = np.mean(r2_outcomes)
Output Example

You should use Callbacks
you can stop training using callback
Here an example of how you can create a custom callback in order stop training when certain accuracy threshold
#example
acc_threshold =0.95
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('acc') > acc_threshold):
print("\nReached %2.2f%% accuracy, so stopping training!!" %(acc_threshold))
self.model.stop_training = True
my_callback = myCallback()
model.fit([x_train['input1'], x_train['input2']], y_train,
batch_size = 25, epochs = 60, verbose = 0 #, validation_split = 0.2
,validation_data = ([x_valid['input1'],x_valid['input2']], y_valid),
callbacks=my_callback )
You can also use EarlyStopping to monitor metrics (Like stopping when loss isnt improving)

Encoder-Decoder LSTM model gives 'nan' loss and predictions

I am trying to create a basic encoder-decoder model for training a chatbot. X contains the questions or human dialogues and Y contains the bot answers. I have padded the sequences to the max size of input and output sentences. X.shape = (2363, 242, 1) and Y.shape = (2363, 144, 1). But during training, the loss has value 'nan' for all epochs and the prediction gives array with all values as 'nan'. I have tried using 'rmsprop' optimizer instead of 'adam'. I cannot use loss function 'categorical_crossentropy' as the output is not one-hot encoded but a sequence. What exactly is wrong with my code?
Model
model = Sequential()
model.add(LSTM(units=64, activation='relu', input_shape=(X.shape[1], 1)))
model.add(RepeatVector(Y.shape[1]))
model.add(LSTM(units=64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(units=1)))
print(model.summary())
model.compile(optimizer='adam', loss='mean_squared_error')
hist = model.fit(X, Y, epochs=20, batch_size=64, verbose=2)
model.save('encoder_decoder_model_epochs20.h5')
Data Preparation
def remove_punctuation(s):
s = s.translate(str.maketrans('','',string.punctuation))
s = s.encode('ascii', 'ignore').decode('ascii')
return s
def prepare_data(fname):
word2idx = {'PAD': 0}
curr_idx = 1
sents = list()
for line in open(fname):
line = line.strip()
if line:
tokens = remove_punctuation(line.lower()).split()
tmp = []
for t in tokens:
if t not in word2idx:
word2idx[t] = curr_idx
curr_idx += 1
tmp.append(word2idx[t])
sents.append(tmp)
sents = np.array(pad_sequences(sents, padding='post'))
return sents, word2idx
human = 'rdany-conversations/human_text.txt'
robot = 'rdany-conversations/robot_text.txt'
X, input_vocab = prepare_data(human)
Y, output_vocab = prepare_data(robot)
X = X.reshape((X.shape[0], X.shape[1], 1))
Y = Y.reshape((Y.shape[0], Y.shape[1], 1))

First of all check that you do not have any NaNs in your input. If this is not the case it might be exploding gradients. Standardize your inputs (MinMax- or Z-scaling), try smaller learning rates, clip gradients the gradients, try different weight initializations.

How to set up LSTM network for predict multi-sequence?

I am learning how to set up the RNN-LSTM network for prediction. I have created the dataset with one input variable.
x y
1 2.5
2 6
3 8.6
4 11.2
5 13.8
6 16.4
...
By the following python code, I have created the window data, like [x(t-2), x(t-1), x(t)] to predict [y(t)]:
df= pd.read_excel('dataset.xlsx')
# split a univariate dataset into train/test sets
def split_dataset(data):
train, test = data[:-328], data[-328:-6]
return train, test
train, test = split_dataset(df.values)
# scale train and test data
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(0,1))
scaler = scaler.fit(train)
# transform train
#train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
#test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled
scaler, train_scaled, test_scaled = scale(train, test)
def to_supervised(train, n_input, n_out=7):
# flatten data
data = train
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 1))
X.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return np.array(X), np.array(y)
train_x, train_y = to_supervised(train_scaled, n_input = 3, n_out = 1)
test_x, test_y = to_supervised(test_scaled, n_input = 3, n_out = 1)
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
model = Sequential()
model.add(LSTM(200, return_sequences= False, input_shape = (train_x.shape[1],train_x.shape[2])))
model.add(Dense(1))
model.compile(loss = 'mse', optimizer = 'adam')
history = model.fit(train_x, train_y, epochs=epochs, verbose=verbose, validation_data = (test_x, test_y))
However, I have other questions about this:
Q1: What is the meaning of units in LSTM? [model.add(LSTM(units, ...))]
(I have tried different units for the model, it would be more accurate as units increased.)
Q2: How many layers should I set?
Q3: How can I predict multi-steps ? e.g base on (x(t),x(t-1)) to predict y(t), y(t+1) I have tried to set the n_out = 2 in the to_supervised function, but when I applied the same method, it returned the error
train_x, train_y = to_supervised(train_scaled, n_input = 3, n_out = 2)
test_x, test_y = to_supervised(test_scaled, n_input = 3, n_out = 2)
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
model = Sequential()
model.add(LSTM(200, return_sequences= False, input_shape = (train_x.shape[1],train_x.shape[2])))
model.add(Dense(1))
model.compile(loss = 'mse', optimizer = 'adam')
history = model.fit(train_x, train_y, epochs=epochs, verbose=verbose, validation_data = (test_x, test_y))
ValueError: Error when checking target: expected dense_27 to have shape (1,) but got array with shape (2,)
Q3(cont): What should I add or change in the model setting?
Q3(cont): What is the return_sequences ? When should I set True?

Q1. Units in LSTM is the number of neurons in your LSTM layer.
Q2. That depends on your model / data. Try changing them around to see the effect.
Q3. That depends which apporach you take.
Q4. Ideally you'll want to predict a single time step every time.
It is possible to predict several at a time, but in my experience you will get better results like as i have described below
e.g
use y(t-1), y(t) to predict y_hat(t+1)
THEN
use y(t), y_hat(t+1) to predict y_hat(t+2)
Are you sure you're actually using X to predict Y in this case?
how does train x/y and test x/y look like?

Re Q1: It is the number of LSTM cells (=LSTM units), which consist of several neurons themselves but have (in the standard case as given) only one output each. Thus, the number of units corresponds directly to the dimensionality of your output.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Word2Vec embedding to LSTM layers? - python

Related

Training a RNN/LSTM model got KeyError equal to the val of the length

Train a LSTM model using multiple datasets in for loop

Selecting Best Performing Model over Various iterations

Encoder-Decoder LSTM model gives 'nan' loss and predictions

How to set up LSTM network for predict multi-sequence?

Categories

Resources