How to plot training and test convergence of a multilayer perceptron

How to plot training and test convergence of a multilayer perceptron - python

I couldn't find anything helpful about plotting the process of converging test and training data of Sklearn.neural_network.MLPrgressor. I found that there is loss_curve_ attribute, but what about validation data?
I have built a simple model in which both inputs and outputs are randomly selected (say x = numpy.linspace(0, numpy.pi, 100), y = numpy.sin(x). I wrote this one to obtain variation of sklearn.metrics.mean_squared_error for a different number of hidden layers.
How can I overcome this problem?
from sklearn.preprocessing import RobustScaler
inputs /= 10
ERE /= 10
scaler = RobustScaler()
inputs = scaler.fit_transform(inputs)
X_train, X_test, y_train, y_test = train_test_split(inputs, ERE,
train_size=0.8,
random_state=123)
from sklearn.neural_network import MLPRegressor
hidden_layer_size = (10, )
activation = "tanh"
solver = "adam"
alpha = 1e-4
batch_size = 6
learning_rate = "adaptive"
learning_rate_init = 1e-4
power_t = "sgd"
max_iter = 1000
shuffle = True
random_state = 123
verbose = True
early_stopping = True
validation_fraction = 0.15
n_iter_no_change = 35
from sklearn.metrics import mean_squared_error as mse
import numpy as np
error_scores = np.zeros(shape = (11,))
for _iterator, hidden_layer_size in enumerate(range(1, 110, 10)):
mlr = MLPRegressor(hidden_layer_sizes=hidden_layer_size,
activation=activation,
solver=solver,
batch_size=batch_size,
learning_rate=learning_rate,
learning_rate_init=learning_rate_init,
shuffle=shuffle,
random_state=random_state,
early_stopping=early_stopping,
validation_fraction=validation_fraction,
n_iter_no_change=n_iter_no_change,
alpha=alpha)
mlr.fit(X_train, y_train)
error_scores[_iterator] = mse(y_test, mlr.predict(X_test))

Class MLPrgressor (well, BaseMultilayerPerceptron really) has an undocumented validation_scores_ attribute which keeps track of scores on validation data. However, it is only populated if you pass True as parameter early_stopping when initialising the solver object.

Related

How to halt keras model training according to val_accuracy when there is only 1 epoch

I am training a keras model to perform some simple categorisation tasks. In my case, the model needs to learn how to make a judgement based on the task cue and a given task stimulus. The task stimulus is an array of 5 numbers, which is randomly generated. Here, to train the model, I don't need the epoch more than 1, as learning a specific stimulus is not the goal. Thus, I set up the epoch as 1. However, I don't want the model to have a 100% accuracy on the validation dataset, but 80%.
To achieve this goal, I used callback function to stop training when the training accuracy reached 80%. But then I found the accuracy on the validation dataset is much better than the training accuracysee here. As I only have one epoch here, how should I setup the callback function to make sure the model has 80% accuracy on the validation dataset? Thanks in advance!
Here are codes:
import numpy as np
import random
import tensorflow as tf
from tensorflow import keras
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
# random seed
from numpy.random import seed
seed(1234)
random.seed(10)
# set up a simple categorization task
def tasksets(train_num, test_num, task_num):
x_train, y_train, rules_train = train_sequence(train_num, task_num)
x_test, y_test, rules_test = train_sequence(test_num, task_num)
return x_train, x_test, rules_train, y_train, y_test, rules_test
# generating training sequence
def train_sequence(trial_num, task_num):
x = np.zeros((trial_num, task_num), dtype=np.float64)
y = np.zeros(trial_num, dtype=np.float64)
rules = np.zeros(trial_num, dtype=np.float64)
rulepool = []
for r in range(task_num):
rulepool = rulepool + [r]*int(trial_num/task_num)
random.shuffle(rulepool)
for i in range(trial_num):
for t in range(task_num):
x[i,t] = random.random() # multi-dimentional stimuli
rule_idx = rulepool.pop(random.randint(0, len(rulepool)-1))
rules[i] = rule_idx
if x[i, rule_idx] <= 0.5:
answer = 0 # no
elif x[i, rule_idx] > 0.5:
answer = 1 # yes
y[i] = answer
x = np.reshape(x, (trial_num,task_num))
y = tf.one_hot(y, 2)
rules = np.reshape(rules, (trial_num))
rules = tf.one_hot(rules, depth = task_num)
return x, y, rules
def build_network(task_num, learning_rate: float = 0.001):
# nsteps = 1
input_dims = task_num
inputs_stimuli = Input(shape=(input_dims), name = "stimulus")
inputs_rule = Input(shape=(input_dims), name = "rule")
hid1 = Dense(6, activation='relu', name = "stimulus_representation")(inputs_stimuli)
hid2 = Dense(6, activation='relu', name = "rule_representation")(inputs_rule) # back to dense
fuse = keras.layers.concatenate([hid1, hid2]) # combine multiple inputs
decision = Dense(100, activation = "relu", name = "decision")(fuse) # back to dense
output = Dense(2, activation="softmax", name = "output")(decision)
model = Model(inputs=[inputs_stimuli, inputs_rule], outputs=output)
loss = tf.keras.losses.CategoricalCrossentropy()
model.compile(optimizer = \
tf.keras.optimizers.Adam(learning_rate = learning_rate), loss = loss, metrics = ["accuracy"])
return model
# Instantiate a callback object
accuracy_threshold = 0.80
class myCallback(tf.keras.callbacks.Callback):
def on_train_batch_end(self, batch, logs={}):
keys = list(logs.keys())
print("...Training: start of batch {}; got log keys: {}".format(batch, keys))
if(logs.get('accuracy') > accuracy_threshold):
print("\nReached %2.2f%% accuracy, so stopping training!!" %(accuracy_threshold*100))
self.model.stop_training = True
callbacks = myCallback()
# the model is trained until it reaches 80% accuracy in the test
check = 0
test_threshold = 1
task_num = 5
batch_size = 8
model = build_network(task_num)
x_train, x_test, rule_train, y_train, y_test, rule_test = tasksets(10000*task_num, 100*task_num, task_num)
results = model.fit([x_train, rule_train], y_train, epochs = 1,
batch_size = batch_size, callbacks=callbacks,
validation_data = ([x_test, rule_test], y_test))

LSTM model has poor prediction in simple example

I am trying to generate a LSTM model using Keras. I create a simple sine wave example which contain more thang 1000 point to predict the next point. But the result is not good as i expected. When i fit the model the result is moves between 0~1 not like the sine wave. I have tried to change parameter like epoch, batchsize, learning rate, but it is not better.
model predict image
What am I doing wrong?
import joblib
import numpy as np
import matplotlib.pyplot as plt
import copy
import gc
import os
import sys
from sklearn.preprocessing import MinMaxScaler
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from keras.callbacks import Callback
learning_rate = 0.001
len_train = 30
total_predict = 300
len_test = 400
epoch = 100
batch_size = 32
workers = -1
class Callback_Class(Callback):
def load_data(self, x_test, y_test):
self.x_test = x_test
self.y_test = np.array(y_test)
def model_predict(self, data_close):
output_predict = []
for i in range(total_predict):
if (i==0):
data_close_ = data_close.reshape(-1, len_train, 1)
else:
data_close_ = np.delete(data_close_, 0)
data_close_ = np.append(data_close_, pred_close)
data_close_ = data_close_.reshape(-1, len_train, 1)
pred_close = model.predict(data_close_)
pred_close = pred_close.ravel()
pred_close = np.array(pred_close).reshape(len(pred_close), 1)
pred_cl = sc.inverse_transform(pred_close)
output_predict.append(pred_cl)
output_predict = np.array(output_predict)
return output_predict
def on_epoch_end(self, epoch, logs=None):
if (epoch % 20 == 0):
output_predict = self.model_predict(self.x_test)
fig, ax = plt.subplots(figsize=(12,6))
ax.grid(True)
plt.title(f"Model predict")
plt.plot(output_predict.ravel(), color="red", label='Predict')
plt.plot(self.y_test.ravel(), color="blue", label='REAL')
fig.tight_layout()
plt.legend(loc='lower left')
plt.savefig(f'Demo_lstm_epoch_{epoch}.png')
plt.clf()
plt.close()
def lstm_reg(input_shape=(60, 1), unit=40, clustering_params=None):
inputs = Input(input_shape)
lstm1f = Bidirectional(LSTM(units=32, return_sequences=True))(inputs)
lstm1f = Bidirectional(LSTM(units=32, return_sequences=False))(lstm1f)
outputs = Dense(units=1, activation='linear')(lstm1f)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), loss='mean_squared_error', metrics=["accuracy"])
return model
def create_data_train(data_time_series):
data_time_series = np.array(data_time_series).ravel()
X_train = []
y_train = []
for i in range(len_train, len(data_time_series)):
X_train.append(data_time_series[i-len_train:i])
y_train.append(data_time_series[i])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
return X_train, y_train
x = np.linspace(-20*np.pi, 20*np.pi, 2001)
sin_alpha = np.sin(x).ravel()
sin_alpha_train = np.array(copy.deepcopy(sin_alpha))[:len(sin_alpha)-len_test]
sin_alpha_train = np.array(sin_alpha_train).reshape(len(sin_alpha_train), 1)
sc = MinMaxScaler(feature_range=(0, 1))
sin_alpha_train = sc.fit_transform(sin_alpha_train)
X_train, y_train = create_data_train(sin_alpha_train)
joblib.dump(sc, f'Demo_MinMaxScaler.gz')
sc = joblib.load(f"Demo_MinMaxScaler.gz")
X_test = np.array(copy.deepcopy(sin_alpha))[len(sin_alpha)-len_test:len(sin_alpha)-len_test+len_train]
X_test = np.array(X_test).reshape(len(X_test), 1)
X_test = sc.fit_transform(X_test)
y_test = np.array(copy.deepcopy(sin_alpha))[len(sin_alpha)-len_test+len_train:len(sin_alpha)-len_test+len_train+total_predict]
model = lstm_reg(input_shape=(len_train, 1), unit=int(2*(len_train+len(y_train))/3))
model.summary()
callback_class = Callback_Class()
callback_class.load_data(X_test, y_test)
model.fit(X_train, y_train, epochs=epoch, use_multiprocessing=True, verbose=1, callbacks=[callback_class], workers=workers, batch_size=batch_size)

It seems like you are normalizing your features and your labels in these lines
sc = MinMaxScaler(feature_range=(0, 1))
sin_alpha_train = sc.fit_transform(sin_alpha_train)
X_train, y_train = create_data_train(sin_alpha_train)
Try it without scaling your label set. Due to your output layer using the linear activation function, which is correct as you're working on a regression problem, the model should be able to handle non scaled labels. The model only learns your data in a range of 0 to 1 while your sine wave goes from -1 to 1.

Predicting the square root of a number using Machine Learning

I am trying to create a program in python that uses machine learning to predict the square root of a number. I am listing what all I have done in my program:-
created a csv file with numbers and their squares
extracted the data from csv into suitable variables (X stores squares, y stores numbers)
scaled the data using sklearn's, StandardScaler
built the ANN with two hidden layers each of 6 units (no activation functions)
compiled the ANN using SGD as the optimizer and mean squared error as the loss function
trained the model. Loss was around 0.063
tried predicting but the result is something else.
My actual code:-
import numpy as np
import tensorflow as tf
import pandas as pd
df = pd.read_csv('CSV/SQUARE-ROOT.csv')
X = df.iloc[:, 1].values
X = X.reshape(-1, 1)
y = df.iloc[:, 0].values
y = y.reshape(-1, 1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_test_sc = sc.fit_transform(X_test)
X_train_sc = sc.fit_transform(X_train)
sc1 = StandardScaler()
y_test_sc1 = sc1.fit_transform(y_test)
y_train_sc1 = sc1.fit_transform(y_train)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=1))
ann.compile(optimizer='SGD', loss=tf.keras.losses.MeanSquaredError())
ann.fit(x = X_train_sc, y = y_train_sc1, batch_size=5, epochs = 100)
print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]]))))
OUTPUT:- array([[143.99747]], dtype=float32)
Shouldn't the output be 12? Why is it giving me the wrong result?
I am attaching the csv file I used to train my model as well: SQUARE-ROOT.csv

TL;DR: You really need those nonlinearities.
The reason behind it not working could be one (or a combination) of several causes, like bad input data range, flaws in your data, over/underfitting, etc.
However, in this specific case the model you build literally can't learn the function you're trying to approximate, because not having nonlinearities makes this a purely linear model, which can't accurately approximate nonlinear functions.
A Dense layer is implemented as follows:
x_res = activ_func(w*x + b)
where x is the layer input, w the weights, b the bias vector and activ_func the activation function (if one is defined).
Your model, then, mathematically becomes (I'm using indices 1 to 3 for the three Dense layers):
pred = w3 * (w2 * ( w1 * x + b1 ) + b2 ) + b3
= w3*w2*w1*x + w3*w2*b1 + w3*b2 + b3
As you see, the resulting model is still linear.
Add activation functions and your mode becomes capable of learning nonlinear functions too. From there, experiment with the hyperparameters and see how the performance of your model changes.

The reason your code does not work is because you apply fit_transform to your test set, which is wrong. You can fix it by replacing fit_transform(test) to transform(test). Although I don't think StandardScaler is neccessary, please try this:
import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
N = 10000
X = np.arange(1, N).reshape(-1, 1)
y = np.sqrt(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
#X_test_sc = sc.fit_transform(X_test) # wrong!!!
X_test_sc = sc.transform(X_test)
sc1 = StandardScaler()
y_train_sc1 = sc1.fit_transform(y_train)
#y_test_sc1 = sc1.fit_transform(y_test) # wrong!!!
y_test_sc1 = sc1.transform(y_test)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=32, activation='relu')) # you have 10000 data, maybe you need a little deeper network
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=1))
ann.compile(optimizer='SGD', loss='MSE')
ann.fit(x=X_train_sc, y=y_train_sc1, batch_size=32, epochs=100, validation_data=(X_test_sc, y_test_sc1))
#print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]])))) # wrong!!!
print(sc1.inverse_transform(ann.predict(sc.transform([[144]]))))

Why does tensorflow show inaccurate loss?

I'm using Tensorflow to train a network to predict the third item in a list of numbers.
When I train, the network appears to train quite well and do well on both the training and test set. However, when I evaluate its performance myself, it seems to be doing quite poorly.
For example, at the end of training, Tensorflow says that the validation loss is 2.1 x 10^(-5). However, when I compute it myself, I get 0.17 x 10^0. What am I doing wrong?
Here's code that can be run on Google Colab:
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
def create_dataset(k=5, n=2, example_amount=200):
'''Create a dataset of numbers where the goal is to always output the nth number'''
# UPGRADE: this could be done better with numpy to just generate all the examples at once
example_amount = 1000
x = []
y = []
ans = [x, y]
for i in range(example_amount):
example_x = np.random.rand(k)
example_y = example_x[n]
x.append(example_x)
y.append(example_y)
return ans
def tensorize(tensor_like) -> tf.Tensor:
'''Turn stuff into tensors'''
return tf.convert_to_tensor(tensor_like, dtype=tf.float32)
def split_dataset(dataset, train_split=0.8, random_state=42):
'''
Takes in a list (or tuple) where index 0 contains the inputs and index 1 contains the outputs
outputs x_train, x_test, y_train, y_test, train_indexes, test_indexes all as tf.Tensor
'''
indices = np.arange(len(dataset[0]))
return tuple([tensorize(data) for data in train_test_split(dataset[0], dataset[1], indices, train_size=train_split, random_state=random_state)])
# how many numbers in each example
K = 5
# the index of the solution
N = 2
# how many examples
EXAMPLE_AMOUNT = 20000
# what percentage of the examples are in the training set
TRAIN_SPLIT = 0.5
# how long to train for
epochs = 50
dataset = create_dataset(K, N, EXAMPLE_AMOUNT)
x_train, x_test, y_train, y_test, train_indexes, test_indexes = split_dataset(dataset, train_split=TRAIN_SPLIT)
model_input = tf.keras.layers.Input(shape=(K,), name="input")
model_dense1 = tf.keras.layers.Dense(10, name="dense1")(model_input)
model_dense2 = tf.keras.layers.Dense(10, name="dense2")(model_dense1)
model_output = tf.keras.layers.Dense(1, name="output")(model_dense2)
model = tf.keras.Model(inputs=model_input, outputs=model_output)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss="mse")
history = model.fit(x=x_train, y=y_train, validation_data=(x_test, y_test), epochs=epochs)
# the validation loss as Tensorflow computes it
print(history.history["val_loss"][-1]) # 2.1036579710198566e-05
# the validation loss as I compute it
val_loss = tf.math.reduce_mean(tf.keras.losses.MSE(y_test, model.predict(x_test))).numpy()
print(val_loss) # 0.1655631

What you miss is that the shape of y_test.
y_test.numpy().shape
(500,) <-- causing the behaviour
Simply reshape it like:
val_loss = tf.math.reduce_mean(tf.keras.losses.MSE(y_test.numpy().reshape(-1,1), model.predict(x_test))).numpy()
print(val_loss) # 1.1548506e-05
Also:
history.history["val_loss"][-1] # 1.1548506336112041e-05
Or you can flatten() both of the data while calculating it:
val_loss = tf.math.reduce_mean(tf.keras.losses.MSE(y_test.numpy().flatten(), model.predict(x_test).flatten())).numpy()
print(val_loss) # 1.1548506e-05

Linear Regression Neural Network Tensorflow Keras Python program

I wrote a small
"Linear Regression Neural Network Tensorflow Keras Python program"
Input dataset is
y = mx + c straight line data.
Predicted y values are not correct and are giving horizontal line kind of
values, instead of a line with some slope.
I ran this program on Windows laptop with tensorflow, Keras and
Jupyter notebook.
What to do to fix this program please?
Thanks and best regards,
SSJ
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n2 = 50
count = 20
n4 = n2 + count
p = 100
m = 10
c = 5
x = np.linspace(n2, n4, p)
y = m * x + c
x
y
plt.scatter(x,y)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
x_normalizer = preprocessing.Normalization(input_shape=[1,])
x_normalizer.adapt(x)
x_normalized = x_normalizer(x)
y_normalizer = preprocessing.Normalization(input_shape=[1,])
y_normalizer.adapt(y)
y_normalized = x_normalizer(y)
y_model = tf.keras.Sequential([
y_normalizer,
layers.Dense(1)
])
y_model.compile(optimizer='rmsprop', loss='mse', metrics = ['mae'])
y_hist = y_model.fit(x, y, epochs=100, verbose=0, validation_split = 0.2)
hist = pd.DataFrame(y_hist.history)
hist['epoch'] = y_hist.epoch
hist.head()
hist.tail()
xin = [51,53,59,64]
ypred = y_model.predict(xin)
ypred
plt.scatter(x, y)
plt.scatter(xin, ypred, color = 'r')
plt.grid(linestyle = '--')

Use StandardScaler instead of Normalization
Normalizer acts row-wise and StandardScaler column-wise.
Normalizer does not remove the mean and scale by deviation but scales
the whole row to unit norm.
Found here: Difference between StandardScaler and Normalizer
This is how you can process the data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import StandardScaler
x = np.linspace(50, 70, 100).reshape(-1, 1)
y = 10 * x + 5
x_standard_scaler = StandardScaler().fit(x)
y_standard_scaler = StandardScaler().fit(y)
x_scaled = x_standard_scaler.transform(x)
y_scaled = y_standard_scaler.transform(y)
Remember that you need two separate scalers for x and y so don't use the same object for that. Also if you want to use that scaler to process new data for testing, save the scaler in some variable. A good practice is to not refit the scaler again on test data.
model = Sequential([
Dense(1, input_dim=1, activation='linear'),
])
model.compile(optimizer='rmsprop', loss='mse')
history = model.fit(x_scaled, y_scaled, epochs=1000, verbose=0, validation_split = 0.2).history
pd.DataFrame(history).plot()
plt.show()
As you can see the model is converging. Its worth to plot the loss history which helps to tell if your model is learning or not.
x_test = np.linspace(20, 100, 10).reshape(-1, 1)
y_test = 10 * x_test + 5
x_test_scaled = x_standard_scaler.transform(x_test)
y_test_scaled = y_standard_scaler.transform(y_test)
If you have a test data that you want to use for validation or just predict it, remember to use standard scaler again, but without fitting. It should be fitted on train data only in most cases.
y_test_pred_scaled = model.predict(x_test_scaled)
y_test_pred = y_standard_scaler.inverse_transform(y_test_pred_scaled)
plt.scatter(x_test, y_test, s=30, label='true')
plt.scatter(x_test, y_test_pred, s=15, label='pred')
plt.legend()
plt.show()
If you want to get your prediction rescaled back to its original range use inverse_transform. Notice that prediction on x_test after rescaling is very close to y_test.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to plot training and test convergence of a multilayer perceptron - python

Class MLPrgressor (well, BaseMultilayerPerceptron really) has an undocumented validation_scores_ attribute which keeps track of scores on validation data. However, it is only populated if you pass True as parameter early_stopping when initialising the solver object.

Related

How to halt keras model training according to val_accuracy when there is only 1 epoch

LSTM model has poor prediction in simple example

Predicting the square root of a number using Machine Learning

Why does tensorflow show inaccurate loss?

Linear Regression Neural Network Tensorflow Keras Python program

Categories

Resources