Neural network for adding two integer numbers - python

I want to create a neural network which can add two integer numbers. I have designed it as follows:
question I have really low accuracy of 0.002% . what can i do to increase it?
For creating data:
import numpy as np
import random
a=[]
b=[]
c=[]
for i in range(1, 1001):
a.append(random.randint(1,999))
b.append(random.randint(1,999))
c.append(a[i-1] + b[i-1])
X = np.array([a,b]).transpose()
y = np.array(c).transpose().reshape(-1, 1)
scaling my data :
from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()
minmax2 = MinMaxScaler()
X = minmax.fit_transform(X)
y = minmax2.fit_transform(y)
The network :
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
clfa = Sequential()
clfa.add(Dense(input_dim=2, output_dim=2, activation='sigmoid', kernel_initializer='he_uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=1, activation='relu'))
opt = SGD(lr=0.01)
clfa.compile(opt, loss='mean_squared_error', metrics=['acc'])
clfa.fit(X, y, epochs=140)
outputs :
Epoch 133/140
1000/1000 [==============================] - 0s 39us/step - loss: 0.0012 - acc: 0.0020
Epoch 134/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 135/140
1000/1000 [==============================] - 0s 41us/step - loss: 0.0012 - acc: 0.0020
Epoch 136/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 137/140
1000/1000 [==============================] - 0s 41us/step - loss: 0.0012 - acc: 0.0020
Epoch 138/140
1000/1000 [==============================] - 0s 42us/step - loss: 0.0012 - acc: 0.0020
Epoch 139/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 140/140
1000/1000 [==============================] - 0s 42us/step - loss: 0.0012 - acc: 0.0020
That is my code with console outputs..
I have tried every different combinations of optimizers, losses, and activations, plus this data fits perfectly a Linear Regression.

Two mistakes, several issues.
The mistakes:
This is a regression problem, so the activation of the last layer should be linear, not relu (leaving it without specifying anything will work, since linear is the default activation in a Keras layer).
Accuracy is meaningless in regression; remove metrics=['acc'] from your model compilation - you should judge the performance of your model only with your loss.
The issues:
We don't use sigmoid activations for the intermediate layers; change all of them to relu.
Remove the kernel_initializer argument, thus leaving the default glorot_uniform, which is the recommended one.
A number of Dense layers each one only with two nodes is not a good idea; try reducing the number of layers and increasing the number of nodes. See here for a simple example network for the iris data.

You are trying to fit a linear function, but internally use sigmoid nodes, which map values to a range (0,1). Sigmoid is very useful for classification, but not really for regression if the values are outside (0,1). It could MAYBE work if you restricted your random number to floating point in the interval [0,1]. OR input into your nodes all the bits seperately, and have it learn an adder.

Related

Why is my multiclass keras model not training at a high accuracy despite parameters?

Firstly I read in my cvs file which contained a 1 or 0 matrix
df = pd.read_csv(url)
print(df.head())
print(df.columns)
Next I gathered the Pictures and resized them
image_directory = 'Directory/'
dir_list = os.listdir(path)
print("Files and directories in '", image_directory, "' :")
# print the list
print(dir_list)
They were saved into an X2 variable.
SIZE = 200
X_dataset = []
for i in tqdm(range(df.shape[0])):
img2 = cv2.imread("Cell{}.png".format(i), cv2.IMREAD_UNCHANGED)
img = tf.keras.preprocessing.image.load_img(image_directory +df['ID'][i], target_size=(SIZE,SIZE,3))
#numpy array of each image at size 200, 200, 3 (color)
img = np.array(img)
img = img/255.
X_dataset.append(img)
X2 = np.array(X_dataset)
print(X2.shape)
I created the y2 data by getting the cvs data, dropping two columns and getting a shape of (1000, 16)
y2 = np.array(df.drop(['Outcome', 'ID'], axis=1))
print(y2.shape)
I then did the train_test_split
I wonder if my random state or test_size is not optimal
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, random_state=10, test_size=0.3)
Next, I created a sequential model
SIZE = (200,200,3) which was made above in the resized model.
model2 = Sequential()
model2.add(Conv2D(filters=16, kernel_size=(10, 10), activation="relu", input_shape=(SIZE,SIZE,3)))
model2.add(BatchNormalization())
model2.add(MaxPooling2D(pool_size=(5, 5)))
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=64, kernel_size=(5, 5), activation="relu"))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Flatten())
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.5))
model2.add(Dense(128, activation='relu'))
model2.add(Dropout(0.5))
model2.add(Dense(16, activation='sigmoid'))
#Do not use softmax for multilabel classification
#Softmax is useful for mutually exclusive classes, either cat or dog but not both.
#Also, softmax outputs all add to 1. So good for multi class problems where each
#class is given a probability and all add to 1. Highest one wins.
#Sigmoid outputs probability. Can be used for non-mutually exclusive problems.
#like multi label, in this example.
#But, also good for binary mutually exclusive (cat or not cat).
model2.summary()
#Binary cross entropy of each label. So no really a binary classification problem but
#Calculating binary cross entropy for each label.
opt = tf.keras.optimizers.Adamax(
learning_rate=0.02,
beta_1=0.8,
beta_2=0.9999,
epsilon=1e-9,
name='Adamax')
model2.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy', 'mse' ])
The model uses a custom optimizer, ands the shape generated has 473,632 trainable params.
I then specify the sample weight which was calculated by taking the highest sampled number and divided the other numbers by that.
sample_weight = { 0:1,
1:0.5197368421,
2:0.4385964912,
3:0.2324561404,
4:0.2302631579,
5:0.399122807,
6:0.08114035088,
7:0.5723684211,
8:0.08552631579,
9:0.2061403509,
10:0.3815789474,
11:0.125,
12:0.08333333333,
13:0.1206140351,
14:0.1403508772,
15:0.4824561404
}
finally I ran the model.fit
history = model2.fit(X_train2, y_train2, epochs=25, validation_data=(X_test2, y_test2), batch_size=64, class_weight = sample_weight, shuffle = False)
My issue was that the model was maxing out at around 30 to 40% accuracy. I looked into it, and they said tuning the learning rate was important. I also saw that raising the epochs would help to a point, as would lowering the batch size.
Is there any other thing I may have missed? I noticed the worse models only predicted one class frequently (100% normal, 0% anything else) but the better model predicted on a sliding scale where some items were at 10% and some at 70%.
I also wonder if I inverted my sample weights, my item 0 has the most items in it... Should it be inverted, where 1 sample 1 counts for 2 sample 0s?
Thanks.
Things I tried.
Changing the batch size down to 16 or 8. (resulted in longer epoch times, slightly better results)
Changing the learning rate to a lower number (resulted in slightly better results, but over more epochs)
Changing it to 100 epochs (results plateaued around 20 epochs usually.)
Attempting to create more params higher filters, larger initial kernel size, larger initial pool size, more and higher value dense layers. (This resulted in it eating the RAM and not getting much better results.)
Changing the optimizer to Adam or RAdam or AdamMax. (Didn't really change much, the other optimizers sucked though). I messed with the beta_1 and epsilon too.
Revising the Cvs. (the data is fairly vague, had help and it still was hard to tell)
Removing bad data (I didn't want to get rid of too many pictures.)
Edit: Added sample accuracy. This one was unusually low, but starts off well enough (accuracy initially is 25.9%)
14/14 [==============================] - 79s 6s/step - loss: 0.4528 - accuracy: 0.2592 - mse: 0.1594 - val_loss: 261.8521 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 2/25
14/14 [==============================] - 85s 6s/step - loss: 0.2817 - accuracy: 0.3188 - mse: 0.1310 - val_loss: 22.7037 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 3/25
14/14 [==============================] - 79s 6s/step - loss: 0.2611 - accuracy: 0.3555 - mse: 0.1243 - val_loss: 11.9977 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 4/25
14/14 [==============================] - 80s 6s/step - loss: 0.2420 - accuracy: 0.3521 - mse: 0.1172 - val_loss: 6.6056 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 5/25
14/14 [==============================] - 80s 6s/step - loss: 0.2317 - accuracy: 0.3899 - mse: 0.1151 - val_loss: 4.9567 - val_accuracy: 0.3881 - val_mse: 0.1415
Epoch 6/25
14/14 [==============================] - 80s 6s/step - loss: 0.2341 - accuracy: 0.3899 - mse: 0.1141 - val_loss: 2.7395 - val_accuracy: 0.3881 - val_mse: 0.1389
Epoch 7/25
14/14 [==============================] - 76s 5s/step - loss: 0.2277 - accuracy: 0.4128 - mse: 0.1107 - val_loss: 2.3758 - val_accuracy: 0.3881 - val_mse: 0.1375
Epoch 8/25
14/14 [==============================] - 85s 6s/step - loss: 0.2199 - accuracy: 0.4106 - mse: 0.1094 - val_loss: 1.4526 - val_accuracy: 0.3881 - val_mse: 0.1319
Epoch 9/25
14/14 [==============================] - 76s 5s/step - loss: 0.2196 - accuracy: 0.4151 - mse: 0.1086 - val_loss: 0.7962 - val_accuracy: 0.3881 - val_mse: 0.1212
Epoch 10/25
14/14 [==============================] - 80s 6s/step - loss: 0.2187 - accuracy: 0.4140 - mse: 0.1087 - val_loss: 0.6308 - val_accuracy: 0.3744 - val_mse: 0.1211
Epoch 11/25
14/14 [==============================] - 81s 6s/step - loss: 0.2175 - accuracy: 0.4071 - mse: 0.1086 - val_loss: 0.5986 - val_accuracy: 0.3242 - val_mse: 0.1170
Epoch 12/25
14/14 [==============================] - 80s 6s/step - loss: 0.2087 - accuracy: 0.3968 - mse: 0.1034 - val_loss: 0.4003 - val_accuracy: 0.3333 - val_mse: 0.1092
Epoch 13/25
12/14 [========================>.....] - ETA: 10s - loss: 0.2092 - accuracy: 0.3945 - mse: 0.1044
Here are some notes that might help:
When using Batch Normalization, avoid too small batch sizes. For more details see the Group Normalization paper by Yuxin Wu and Kaiming He.
It may worth looking at metrics like AUC, and F1 as well since you have an imbalanced multiclass case. You can add tf.keras.metrics.AUC(curve='PR') to your metrics list.
The training loss seems to have stalled at the end of epoch 13. If the training loss does not decrease anymore, you may want to 1. use smaller learning rate, and/or 2. reduce your dropout parameters. Particularly, the relatively-large dropout right before your last layer seems suspicious to me. First, try to obtain a model that fits well your training dataset (with low to no dropout). That is an important step. If your model cannot fit well your training dataset without any regularization, it may need more trainable parameters. After achieving a minimum model that fits the training dataset, you can then add regularization mechanisms to mitigate the overfitting issue.
Unless you have a good reason to do otherwise set shuffle = True (which is also the default setting) to shuffle the training data before each epoch.
Although it is probably not the root cause of your problem, there is a debate on whether the normalization should come before the activation or after that. Some prefer to use it before the activation.
The following was not clear to me:
I then specify the sample weight which was calculated by taking the
highest sampled number and divided the other numbers by that.
Your class weights may have already been calculated correctly. I'd like to emphasize though that an under-represented class should be assigned a larger weight. Refer to this tutorial from TensorFlow as needed.

Trying to predict numbers in a LSTM and having extremely high loss (even with MinMax Scaler and Dropout)

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM, Dropout
from keras.layers import Dense
def split_univariate_sequence(sequence, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out
# check if we are beyond the sequence
if out_end_ix > len(sequence):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
n_steps_in, n_steps_out = 30, 30
X1, y1 = split_univariate_sequence(sumpred, n_steps_in, n_steps_out)
transformer = MinMaxScaler()
X1_transformed = transformer.fit_transform(X1)
n_features = 1
X1_transformed = X1_transformed.reshape((X1_transformed.shape[0], X1_transformed.shape[1], n_features))
model = Sequential()
model.add(LSTM(150, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(Dropout(0.3))
model.add(LSTM(50, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')
model.fit(X1_transformed, y1, epochs=1000, verbose=1)
# demonstrate prediction
x_input = sumpred[-30:].reshape(1, -1)
x_input = transformer.transform(x_input)
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=1)
yhat_inverse = transformer.inverse_transform(yhat)
sumpred is a array of float-32 (144,) with values between 390.624 to 347471. I'm trying to predict the next 30 numbers based on the last 30 sumpred values.
When I train the model, I have results like this:
Epoch 990/1000
85/85 [==============================] - 0s 2ms/step - loss: 1031220211.9529
Epoch 991/1000
85/85 [==============================] - 0s 2ms/step - loss: 1087168440.4706
Epoch 992/1000
85/85 [==============================] - 0s 2ms/step - loss: 1011368153.6000
Epoch 993/1000
85/85 [==============================] - 0s 2ms/step - loss: 1104842800.1882
Epoch 994/1000
85/85 [==============================] - 0s 2ms/step - loss: 1086514331.1059
Epoch 995/1000
85/85 [==============================] - 0s 2ms/step - loss: 1050088100.8941
Epoch 996/1000
85/85 [==============================] - 0s 2ms/step - loss: 1003426751.2471
Epoch 997/1000
85/85 [==============================] - 0s 2ms/step - loss: 1139417025.5059
Epoch 998/1000
85/85 [==============================] - 0s 2ms/step - loss: 1129283814.4000
Epoch 999/1000
85/85 [==============================] - 0s 2ms/step - loss: 1107968009.0353
Epoch 1000/1000
85/85 [==============================] - 0s 2ms/step - loss: 1651960831.6235
The values in yhat_inverse are far beyond expected. It was not better with other losses, like mean squared logarithmic error. Even with the data transformation (MinMaxScaler) and Dropout layers, I'm still having this issue.
Someone has any clue to improve my model performance?
Your model is not able to learn, so, first increase the size of the network. Given how much the loss is coming out, the input size is quite large and you are not providing enough power to the neural network to learn the data.
Remove the dropouts first and just increase the layers and keep them all at 150 or more.
Dropout is usually used towards the end when you see overfitting, but, your model has not even started learning.

Keras Neural Network Accuracy is always 0 While Training

I'm making a simple classification algo with a keras neural network. The goal is to take 3 data points on weather and decide whether or not there's a wildfire. Here's an image of the .csv dataset that I'm using to train the model(this image is only the top few lines and isn't the entire thing ):
wildfire weather dataset
As you can see, there are 4 columns with the fourth being either a "1" which means "fire", or a "0" which means "no fire". I want the algo to predict either a 1 or a 0. This is the code that I wrote:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import csv
#THIS IS USED TO TRAIN THE MODEL
# Importing the dataset
dataset = pd.read_csv('Fire_Weather.csv')
dataset.head()
X=dataset.iloc[:,0:3]
Y=dataset.iloc[:,3]
X.head()
obj=StandardScaler()
X=obj.fit_transform(X)
X_train,X_test,y_train,y_test=train_test_split(X, Y, test_size=0.25)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation =
'relu', input_dim = 3))
# classifier.add(Dropout(p = 0.1))
# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation
= 'relu'))
# classifier.add(Dropout(p = 0.1))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation
= 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics
= ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 3, epochs = 10)
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
print(y_pred)
classifier.save("weather_model.h5")
The problem is that whenever I run this, my accuracy is always "0.0000e+00" and my training output looks like this:
Epoch 1/10
2146/2146 [==============================] - 2s 758us/step - loss: nan - accuracy: 0.0238
Epoch 2/10
2146/2146 [==============================] - 1s 625us/step - loss: nan - accuracy: 0.0000e+00
Epoch 3/10
2146/2146 [==============================] - 1s 604us/step - loss: nan - accuracy: 0.0000e+00
Epoch 4/10
2146/2146 [==============================] - 1s 609us/step - loss: nan - accuracy: 0.0000e+00
Epoch 5/10
2146/2146 [==============================] - 1s 624us/step - loss: nan - accuracy: 0.0000e+00
Epoch 6/10
2146/2146 [==============================] - 1s 633us/step - loss: nan - accuracy: 0.0000e+00
Epoch 7/10
2146/2146 [==============================] - 1s 481us/step - loss: nan - accuracy: 0.0000e+00
Epoch 8/10
2146/2146 [==============================] - 1s 476us/step - loss: nan - accuracy: 0.0000e+00
Epoch 9/10
2146/2146 [==============================] - 1s 474us/step - loss: nan - accuracy: 0.0000e+00
Epoch 10/10
2146/2146 [==============================] - 1s 474us/step - loss: nan - accuracy: 0.0000e+00
Does anyone know why this is happening and what I could do to my code to fix this?
Thank You!
EDIT: I realized that my earlier response was highly misleading, which was thankfully pointed out by #xdurch0 and #Timbus Calin. Here is an edited answer.
Check that all your input values are valid. Are there any nan or inf values in your training data?
Try using different activation functions. ReLU is good, but it is prone to what is known as the dying ReLu problem, where the neural network basically learns nothing since no updates are made to its weight. One possibility is to use Leaky ReLu or PReLU.
Try using gradient clipping, which is a technique used to tackle vanishing or exploding gradients (which is likely what is happening in your case). Keras allows users to configure clipnorm clip value for optimizers.
There are posts on SO that report similar problems, such as this one, which might also be of interest to you.

EarlyStopping TensorFlow 2.0

I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification.
I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience = 3,
min_delta=0.001
)
]
According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:
min_delta: Minimum change in the monitored quantity to qualify as an
improvement, i.e. an absolute change of less than min_delta, will
count as no improvement.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15 60000/60000 [==============================] - 10s
173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 -
val_accuracy: 0.9648
Epoch 2/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 -
val_accuracy: 0.9748
Epoch 3/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 -
val_accuracy: 0.9795
Epoch 4/15 60000/60000 [==============================] - 9s
149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 -
val_accuracy: 0.9761
Epoch 5/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 -
val_accuracy: 0.9808
Epoch 6/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 -
val_accuracy: 0.9779
Epoch 7/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 -
val_accuracy: 0.9805
Epoch 8/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 -
val_accuracy: 0.9748
Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:
0.0688, 0.0843 and 0.0847.
And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.
The code I came up with for EarlyStopping is as follows:
# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]
# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)
differences
# array([0.0155, 0.0004])
# minimum change required for monitored metric's improvement-
min_delta = 0.001
# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta
check
# array([ True, False])
# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
print("Stop Training - EarlyStopping is called")
# stop training
But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).
Also, according to patience parameter definition of "EarlyStopping":
patience: Number of epochs with no improvement after which training will be stopped.
So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.
Then why is EarlyStopping called?
Code for model definition and 'fit()' are:
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from tensorflow.keras.models import Sequential, Model
# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here:
pruning_params_unpruned = {
'pruning_schedule': sparsity.PolynomialDecay(
initial_sparsity=0.0, final_sparsity=0.0,
begin_step = 0, end_step=0, frequency=100) }
def pruned_nn(pruning_params):
"""
Function to define the architecture of a neural network model
following 300 100 architecture for MNIST dataset and using
provided parameter which are used to prune the model.
Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
Output: Returns designed and compiled neural network model
"""
pruned_model = Sequential()
pruned_model.add(l.InputLayer(input_shape=(784, )))
pruned_model.add(Flatten())
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.2))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.1))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = num_classes, activation='softmax'),
**pruning_params))
# Compile pruned CNN-
pruned_model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.001),
metrics=['accuracy'])
return pruned_model
batch_size = 32
epochs = 50
# Instantiate NN-
orig_model = pruned_nn(pruning_params_unpruned)
# Train unpruned Neural Network-
history_orig = orig_model.fit(
x = X_train, y = y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callbacks,
validation_data = (X_test, y_test),
shuffle = True )
The behaviour of the Early Stopping callback is related to:
Metric or Loss to be monitored
min_delta which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.
patience which is the number of epochs without improvements (taking into consideration that improvements have to be of a greater change than min_delta) before stopping the algorithm.
In your case, the best val_lossis 0.0644 and should have a value of lower than 0.0634 to be registered as improvement:
Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444
Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.
Reference

How to train and tune an artificial multilayer perceptron neural network using Keras?

I am building my first artificial multilayer perceptron neural network using Keras.
This is my input data:
This is my code which I used to build my initial model which basically follows the Keras example code:
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
Output:
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Epoch 11/20
1213/1213 [==============================] - 0s - loss: 0.1923
Epoch 12/20
1213/1213 [==============================] - 0s - loss: 0.1910
Epoch 13/20
1213/1213 [==============================] - 0s - loss: 0.2104
Epoch 14/20
1213/1213 [==============================] - 0s - loss: 0.1976
Epoch 15/20
1213/1213 [==============================] - 0s - loss: 0.1979
Epoch 16/20
1213/1213 [==============================] - 0s - loss: 0.2036
Epoch 17/20
1213/1213 [==============================] - 0s - loss: 0.2019
Epoch 18/20
1213/1213 [==============================] - 0s - loss: 0.1978
Epoch 19/20
1213/1213 [==============================] - 0s - loss: 0.1954
Epoch 20/20
1213/1213 [==============================] - 0s - loss: 0.1949
How do I train and tune this model and get my code to output my best predictive model? I am new to neural networks and am just wholly confused as to what is the next step after building the model. I know I want to optimize it, but I'm not sure which features to tweak or if I am supposed to do it manually or how to write code to do so.
Some things that you could do are:
Change your loss function from mean_squared_error to binary_crossentropy. mean_squared_error is intended for regression, but you want to classify your data.
Add show_accuracy=True to your fit() function, which outputs the accuracy of your model at every epoch. That information is probably more useful to you than just the loss value.
Add validation_split=0.2 to your fit() function. Currently you are only training on a training set and validating on nothing. That's a no-go in machine learning as you can't be sure that your model hasn't simply memorized the correct answers for your dataset (without really understanding why these answers are correct).
Change from Obama/Romney to Democrat/Republican and add data from previous elections. ~1200 examples is a pretty small dataset for neural networks. Also add columns with valuable information, like unemployment rate or population density. Note that quite some of the values (like population number) are probably similar to providing the name of the state, so e.g. your net will likely learn that Texas means Republican.
If you haven't done that already, normalize all your values to the range of 0 to 1 (by subtracting from each value the minimum of the column and then dividing by the (max - min) of the column). Neural networks can handle normalized data better than unnormalized data.
Try Adam and Adagrad instead of SGD. Sometimes they perform better. (See documentation about optimizers.)
Try Activation('relu'), LeakyReLU, PReLU and ELU instead of Activation('tanh'). Tanh is rarely the best choice. (See advanced activation functions.)
Try increasing/decreasing your dense layers sizes (e.g. from 64 to 128). Also try adding/removing layers.
Try adding BatchNormalization layers (before the Activation layers). (See documentation.)
Try changing the dropout rates (e.g. from 0.5 to 0.25).

Categories

Resources