Keras: Re-use trained weights in a new experiment - python

I am quite new to Keras so apologies in advance for any stupid mistakes. I am currently attempting to try out some good old cross-domain transfer learning between two datasets. I have a model here that is trained and executed on a voice recognition dataset that I have generated (code is at the bottom of this question because it's quite long)
If I were to train a new model, say model_2 on a different dataset, then I'd get a baseline from the initial random distribution of weights.
I wonder, is it possible to train model_1 and model_2, then, and this is the bit I don't know how to do; can I take the two 256 and 128 dense layers from model_1 (with trained weights) and use them as starting points for a model_3 - which is dataset 2 with the initial weight distribution from model_1?
So, in the end, I have the following:
Model_1 which starts from a random distribution and trains on dataset 1
Model_2 which starts from a random distribution and trains on dataset 2
Model_3 which starts from the distribution trained in Model_1 and trains on dataset 2.
My question is, how would I go about doing step 3 in the above? I don't want to freeze the weights, I just want an initial distribution for training from a past experiment
Any help would be greatly appreciated. Thank you! Apologies if I didn't make it quite clear enough what I'm going for
My code to train Model_1 is as follows:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
from keras.utils import np_utils
from keras.layers.normalization import BatchNormalization
import time
start = time.clock()
# fix random seed for reproducibility
seed = 1
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("voice.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
numVars = len(dataframe.columns) - 1
numClasses = dataframe[numVars].nunique()
X = dataset[:,0:numVars].astype(float)
Y = dataset[:,numVars]
print("THERE ARE " + str(numVars) + " ATTRIBUTES")
print("THERE ARE " + str(numClasses) + " UNIQUE CLASSES")
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
calls = [EarlyStopping(monitor='acc', min_delta=0.0001, patience=100, verbose=2, mode='max', restore_best_weights=True)]
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(256, input_dim=numVars, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(numClasses, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=2000, batch_size=1000, verbose=1)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold, fit_params={'callbacks':calls})
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
#your code here
print (time.clock() - start)
PS: Input attributes and outputs will all be the same between the two datasets, all that will change are attribute values. I am curious, can this be done if the two datasets have different numbers of output classes?

In short, to fine-tune Model_3 from Model_1, just call model.load_weights('/path/to/model_1.h5', by_name=True) after model.compile(...). Of course, you must have saved the trained Model_1 first.
If I understood correct, you have the same number of features and classes among the two datasets, so you do not even need to re-design your model. If you had different set of classes, then you had to give different names to the last layers of Model_1 and Model_3:
model.add(Dense(numClasses, activation='softmax', name='some_unique_name'))

Related

tokenizer output unable to be converted to numpy array

I've been following the following tutorial to try and understand LSTMs and tensorflow a bit more. From running, it the training of the model goes smoothly, but when I try to use the trained tokenizer on the test data and then convert it to a numpy array, it doesn't work and I'm not really sure what the problem is. The relevant portion that goes wrong is below:
# test model
x_test = np.array(tokenizer.texts_to_sequences([str(txt) for txt in df_test['text'].values]))
The error it presents is as below:
Traceback (most recent call last):
File "/Users/pranavnair/Documents/Code/wpd/wpd.py", line 85, in <module>
x_test = np.array(x_test_data)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10824,) + inhomogeneous part.
I've tried using np.hstack instead of np.array, and that doesn't fix it. Would appreciate any help at all, thanks in advance.
Full code below for reference
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras import utils
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from keras.layers import Embedding
from keras.optimizers import Adam
# set random seed for reproducibility
RANDOM_SEED = 4
np.random.seed(RANDOM_SEED)
# import datasets
df_neut = pd.read_csv("./input/good.csv")
df_prom = pd.read_csv("./input/promotional.csv")
# clean up data to only include text
df_prom = df_prom.drop(df_prom.columns[1:], axis=1)
df_neut = df_neut.drop(df_neut.columns[1:], axis=1)
# combine datasets
df_neut.insert(1, 'label', 0) # neutral labels
df_prom.insert(1, 'label', 1) # promotional labels
# merge dataframes
df = pd.concat((df_neut, df_prom), ignore_index=True, axis=0)
# randomize order of dataframes
df = df.reindex(np.random.permutation(df.index))
# split into training and testing datasets
df_train, df_test = train_test_split(df, test_size=0.2, random_state=RANDOM_SEED)
# The maximum number of words to be used. (most frequent)
MAX_NB_WORDS = 50000
# perform data preprocessing using keras tokenizer
text_data = [str(txt) for txt in df_train['text'].values] # convert text data to strings
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~', lower=True) # create tokenizer
tokenizer.fit_on_texts(text_data) # make dictionary
# vectorize dataset
x_train = tokenizer.texts_to_sequences(text_data)
# Max number of words in each sequence
MAX_SEQUENCE_LENGTH = 400
# pad sequence lengths
x_train = utils.pad_sequences(x_train, maxlen=MAX_SEQUENCE_LENGTH)
# get test labels
y_train = df_train['label'].values
# create sequential model
model = Sequential()
# create embedding layer
EMBEDDING_DIM = 100
model.add(Embedding(MAX_NB_WORDS+1, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH))
# add LSTM layer to model
model.add(LSTM(80))
# setup model layers
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
# setup binary classification via binary cross entropy loss
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
# train for two epochs
EPOCHS = 4
BATCH_SIZE = 64
history = model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_split=0.15)
# test model
x_test = np.array(tokenizer.texts_to_sequences([str(txt) for txt in df_test['text'].values]))
x_test = utils.pad_sequences(x_test, maxlen=MAX_SEQUENCE_LENGTH)
y_test = np.array(df_test['label'].values)
# evaluate model
scores = model.evaluate(x_test, y_test, batch_size=128)
print("The model has a test loss of %.2f and a test accuracy of %.1f%%" % (scores[0], scores[1]*100))

Why do I get different predictions using Keras sequential neural network in a loop?

I came across a weird difference between keras model.fit() and sklearn model.fit() functions. When model.fit() is called inside a loop I get inconsistent predictions using a Keras sequential model. This is not the case when using an sklearn model. See sample code to reproduce the phenomenon.
from numpy.random import seed
seed(1337)
import tensorflow as tf
tf.random.set_seed(1337)
from sklearn.linear_model import LogisticRegression
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import InputLayer
from sklearn.datasets import make_blobs
from sklearn.preprocessing import MinMaxScaler
import numpy as np
def get_sequential_dnn(NUM_COLS, NUM_ROWS):
# code for model
if __name__ == "__main__":
input_size = 10
X, y = make_blobs(n_samples=100, centers=2, n_features=input_size,
random_state=1
)
scalar = MinMaxScaler()
scalar.fit(X)
X = scalar.transform(X)
model = get_sequential_dnn(X.shape[1], X.shape[0])
# print(model.summary())
# model = LogisticRegression()
for i in range(2):
model.fit(X, y, epochs=100, verbose=0, shuffle=False)
# model.fit(X, y)
Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=10, random_state=1)
Xnew = scalar.transform(Xnew)
# make a prediction
# ynew = model.predict_proba(Xnew)[:, 1]
ynew = model.predict_proba(Xnew)
ynew = np.array(ynew)
# show the inputs and predicted outputs
print('--------------')
for i in range(len(Xnew)):
print("X=%s \n Predicted=%s" % (Xnew[i], ynew[i]))
The output of this is
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=[0.9931685]
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=[0.35249507]
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=[0.35249507]
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=[1.]
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=[0.17942095]
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=[0.17942095]
While if I use a Logistic Regression (un-comment the commented lines) the predictions are consistent:
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=0.929209043999009
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=0.04643513037543502
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=0.038716408758471876
--------------
X=[0.32799209 0.32682211 0.62699485 0.89987274 0.59894281 0.94662653
0.77125788 0.73345369 0.2153754 0.35317172]
Predicted=0.929209043999009
X=[0.60876924 0.33208319 0.24770841 0.11435312 0.66211608 0.17361879
0.12891829 0.25729677 0.69975833 0.73165292]
Predicted=0.04643513037543502
X=[0.65154993 0.26153846 0.2416324 0.11793901 0.7047334 0.17706289
0.07761879 0.45189967 0.8481064 0.85092378]
Predicted=0.038716408758471876
I get that the obvious solution to this is fit the model before the loop, probably there is a strong randomness how Keras models fit the data to the labels, but there are a couple of cases where you need to have a loop to get prediction scores. For example if you want to perform a 10-fold cross validation to get the AUC, sensitivity, specificity values on a training data. In these situations this randomness is unacceptable.
What is causing this inconsistency and what is the solution to it?
There are couple of issue with the way your are trying to make reproducible results with keras.
You are calling the fit (when i==1) over the already fitted model (when i==0). So the optimizer sees different sets of inital weights in both the cases and so you will end up in two different models. Solution: Get a fresh model everytime. This is not the case with sklearn, which starts with fresh initialized weights every time a fit is called.
model.fit internally might use a current stage of random number generator. You seeded it outside the loop, so the state will be different when fit is called the second time. Solution: Seed inside the loop.
Sample code with issue
# Issue 2 here
tf.random.set_seed(1337)
def get_model():
model = Sequential()
model.add(Dense(4, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
X = np.random.randn(10,8)
y = np.random.randn(10,1)
# Issue 1 here
model = get_model()
results = []
for i in range(10):
model.fit(X, y, epochs=5, verbose=0, shuffle=False)
results.append(np.sum(model.predict(X)))
assert np.all(np.isclose(results, results[0]))
As you can see the assert fails
Corrected code
results = []
for i in range(10):
tf.random.set_seed(1337)
model = get_model()
model.fit(X, y, epochs=5, verbose=0, shuffle=False)
results.append(np.sum(model.predict(X)))
assert np.all(np.isclose(results, results[0]))

Looking for ways and experiences to improve Keras model accuracy

I am an electrical engineer and I am looking for a solution to calculate the DC current of a permanent synchronous motor. So I decided to check the ANN solutions with Keras and so on.Long story short, I'll show you a screenshot of some measured signals.
The first 5 signals are the measured signals. The last one is the DC current, which I will estimate. Here the value was recorded with the help of a current clamp. Okay, I started building a model in Python and tried some things that I assume will increase the accuracy of the model. But after all that, I am not getting that good results from the model and my hope is that maybe I am choosing wrong parameters or not an ideal model for this purpose.
Here is my code:
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from matplotlib import pyplot as plt
import seaborn as sns
# Import input (x) and output (y) data, and asign these to df1 and df1
df = pd.read_csv('train_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
plt.figure()
sns.heatmap(df.corr(),annot=True)
plt.show()
# Split the data into input (x) training and testing data, and ouput (y) training and testing data,
# with training data being 80% of the data, and testing data being the remaining 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)#, shuffle=True)
# Scale both training and testing input data
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)
model = Sequential()
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(1, input_shape=(4,)))
model.compile(optimizer="adam", loss="msle", metrics=['mean_squared_logarithmic_error','accuracy'])
# Pass several parameters to 'EarlyStopping' function and assign it to 'earlystopper'
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')
model.summary()
history = model.fit(X_train, y_train, epochs = 2000, validation_split = 0.3, verbose = 2, callbacks = [earlystopper])
# Runs model (the one with the activation function, although this doesn't really matter as they perform the same)
# with its current weights on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculates and prints r2 score of training and testing data
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))
df = pd.read_csv('test_two_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
X_validate = preprocessing.maxabs_scale(X)
y_pred = model.predict(X_validate)
plt.plot(Y)
plt.plot(y_pred)
plt.show()
(weight_0,bias_0) = model.layers[0].get_weights()
(weight_1,bias_1) = model.layers[1].get_weights()
One limitation is that I can't use LSTM layers or other complex algorithms because I need to implement the trained model in a microcontroller on a motor application later.
I guess you could find some words for me to make my model a little better in accuracy.
At the end here is a figure where I show you the worse prediction performance. Orange is the prediction and blue is the measured current.
The training dataset was this one.
The correlation between the individual values can be found here. Since the values of id and ud have no correlation to idc, I decided to delete them.
The most important thing to keep in mind when trying to improve the accuracy of the model is ALWAYS Normalise the input data which basically means rescaling real-valued numeric attributes into the range 0 and 1. I am not able to understand the way you are providing the training data to the model. Could you please explain that. It would be better in understanding and identifying the scope of higher accuracy.
Now if we talk about parameters, I would suggest you the addition of a Tuning Algorithm for the parameters to get the optimized value of each parameter.
It is always a good parctice to include hidden layers which could provide better feature extract.

Machine Learning Model overfitting

So I build a GRU model and I'm comparing 3 different datasets on the same model. I was just running the first dataset and set the number of epochs to 25, but I have noticed that my validation loss is increasing just after the 6th epoch, doesn't that indicate overfitting, am I doing something wrong?
import pandas as pd
import tensorflow as tf
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from google.colab import files
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc=TensorBoardColab() # Tensorboard
df10=pd.read_csv('/content/drive/My Drive/Isolation Forest/IF 10 PERCENT.csv',index_col=None)
df2_10= pd.read_csv('/content/drive/My Drive/2019 Dataframe/2019 10minutes IF 10 PERCENT.csv',index_col=None)
X10_train= df10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_train=X10_train.values
y10_train= df10['Power_kW']
y10_train=y10_train.values
X10_test= df2_10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_test=X10_test.values
y10_test= df2_10['Power_kW']
y10_test=y10_test.values
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X10_train= x_scale.fit_transform(X10_train)
y10_train= y_scale.fit_transform(y10_train.reshape(-1,1))
X10_test= x_scale.fit_transform(X10_test)
y10_test= y_scale.fit_transform(y10_test.reshape(-1,1))
X10_train = X10_train.reshape((-1,1,12))
X10_test = X10_test.reshape((-1,1,12))
# creating model using Keras
model10 = Sequential()
model10.add(GRU(units=512, return_sequences=True, input_shape=(1,12)))
model10.add(GRU(units=256, return_sequences=True))
model10.add(GRU(units=256))
model10.add(Dense(units=1, activation='sigmoid'))
model10.compile(loss=['mse'], optimizer='adam',metrics=['mse'])
model10.summary()
history10=model10.fit(X10_train, y10_train, batch_size=256, epochs=25,validation_split=0.20, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
score = model10.evaluate(X10_test, y10_test)
print('Score: {}'.format(score))
y10_predicted = model10.predict(X10_test)
y10_predicted = y_scale.inverse_transform(y10_predicted)
y10_test = y_scale.inverse_transform(y10_test)
plt.plot( y10_predicted, label='Predicted')
plt.plot( y10_test, label='Measurements')
plt.legend()
plt.savefig('/content/drive/My Drive/Figures/Power Prediction 10 Percent.png')
plt.show()
LSTMs(and also GRUs in spite of their lighter construction) are notorious for easily overfitting.
Reduce the number of units(the output size) in each of the layers(32(layer1)-64(layer2); you could also eliminate the last layer altogether.
The second of all, you are using the activation 'sigmoid', but your loss function + metric is mse.
Ensure that your problem is either a regression or a classification one. If it is indeed a regression, then the activation function should be 'linear' at the last step. If it is a classification one, you should change your loss_function to binary_crossentropy and your metric to 'accuracy'.
Therefore, the plot displayed is just misleading for the moment. If you modify like I suggested and you still get such a train-val loss plot, then we can state for sure that you have an overfitting case.

How to optimize accuracy of ANN

I have 50 target classes of 300 datasets.
This is my sample dataset, with 98 features:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = pd.read_csv(root_path + 'pima-indians-diabetes.data.csv', header=None)
X= dataset.iloc[:,0:8]
y= dataset.iloc[:,8]
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)
from keras import Sequential
from keras.layers import Dense
classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal', input_dim=8))
#Second Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(units = 1, activation='sigmoid',kernel_initializer='random_normal'))
#Compiling the neural network
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics =['accuracy'])
#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=2, epochs=10)
I get 19% accuracy here, and I don't know how to optimize my prediction result.
I am considering that you have performed the Dimentionality Reduction technique on your original data having 98 features and therefore you are using an 8-dimensional input feature in your model.
I have a few observations on your implementation:
[As a Classification Problem]
As you have mentioned that your samples belong to 50 diffecent classes, the problem is certainly a multiclass classification problem. So, you need to encode your label first like:
from keras.utils import to_categorical
y = to_categorical(y, num_classes=50, dtype='float32')
In this case, you need to change the number of output node (representing class) and activation function in the final layer as follows:
classifier.add(Dense(units = 50, activation='softmax'))
Furthermore, you have to ue categorical_crossentropy as a loss function while compiling your model.
classifier.compile(optimizer ='adam',loss='categorical_crossentropy', metrics =['accuracy'])
[As a Regression Problem]
You can also consider this problem as a multiple regression problem as the output is within the range of 0 to 50 (continuous) and can keep a single output node in the final layer as you did. But in that case, you should use a linear activation function instead of sigmoid.
So, the final layer should be like:
classifier.add(Dense(units = 1)) # default activation is linear
Additionally, In case of regression problem, mean_squared_error is the most relevant cost function to use (assuming not many outliers in your dataset) and accuracy as a performance metric is irrelevant (rather you may use mean_absolute_error which is analogous to loss). Hence, the second modification is:
classifier.compile(optimizer ='adam',loss='mean_squared_error')

Categories

Resources