On running the first training epoch of a 3-layer convnet on CIFAR-10, I am neither able to achieve a high enough validation accuracy nor minimize the objective function.
Specifically, the accuracy varies on the first iteration, and then settles at 8.7% for the following iterations. What's peculiar is that I've also trained a 2-layer, fully-connected network which does substantially better, consistently getting around 43% accuracy on the validation set.
NOTE: Bulk of the code is from a Jupyter notebook designed as an introduction to barebones Tensorflow (and Keras) provided as part of an assignment for Stanford's CS231n Convolutional Neural Networks for Visual Recognition and although I am neither a student of the course nor of the university, I am doing this purely for experiential purposes and out of my newborn interests in CV / deep learning.
My contribution is only implementations for the forward pass and the initialization of the network's parameters.
The author of the notebook left a comment stating that when correctly implemented this model should achieve above 40% accuracy after the first epoch without any hyperparameter tuning.
Implementation Notes
49,000 / 1000 : train/validation split, batch size = 64
Weights are initialized using Kaiming normalization, bias initialized with 0s
learning rate = 3e-3
Here are each of the layers of convnet in detail:
Convolutional layer (with bias) with 32 5x5 filters, with zero-padding 2
ReLU Convolutional layer (with bias) with 16 3x3 filters, with zero-padding 1
ReLU Fully-connected layer (with bias) to compute scores for 10 classes
Code
( mine is written between the 'TODO' comment blocks )
import tensorflow as tf
import numpy as np
def load_cifar10(num_training=49000, num_validation=1000, num_test=10000):
cifar10 = tf.keras.datasets.cifar10.load_data()
(X_train, y_train), (X_test, y_test) = cifar10
X_train = np.asarray(X_train, dtype=np.float32)
y_train = np.asarray(y_train, dtype=np.int32).flatten()
X_test = np.asarray(X_test, dtype=np.float32)
y_test = np.asarray(y_test, dtype=np.int32).flatten()
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
mean_pixel = X_train.mean(axis=(0, 1, 2), keepdims=True)
std_pixel = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean_pixel) / std_pixel
X_val = (X_val - mean_pixel) / std_pixel
X_test = (X_test - mean_pixel) / std_pixel
return X_train, y_train, X_val, y_val, X_test, y_test
class Dataset(object):
def __init__(self, X, y, batch_size, shuffle=False):
assert X.shape[0] == y.shape[0], 'Got different numbers of data and labels'
self.X, self.y = X, y
self.batch_size, self.shuffle = batch_size, shuffle
def __iter__(self):
N, B = self.X.shape[0], self.batch_size
idxs = np.arange(N)
if self.shuffle:
np.random.shuffle(idxs)
return iter((self.X[i:i+B], self.y[i:i+B]) for i in range(0, N, B))
def flatten(x):
N = tf.shape(x)[0]
return tf.reshape(x, (N, -1))
def three_layer_convnet(x, params):
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
############################################################################
# TODO: Implement the forward pass for the three-layer ConvNet. #
############################################################################
h1_conv = tf.nn.conv2d(x,
conv_w1 + conv_b1,
strides=[1, 1, 1, 1],
padding='SAME'
)
h1 = tf.nn.relu(h1_conv)
h2_conv = tf.nn.conv2d(h1,
conv_w2 + conv_b2,
strides=[1, 1, 1, 1],
padding='SAME'
)
h2 = tf.nn.relu(h2_conv)
fc_params = flatten(fc_w + fc_b)
h2 = flatten(h2)
scores = tf.matmul(h2, fc_params)
############################################################################
# END OF YOUR CODE #
############################################################################
return scores
def training_step(scores, y, params, learning_rate):
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
loss = tf.reduce_mean(losses)
grad_params = tf.gradients(loss, params)
new_weights = []
for w, grad_w in zip(params, grad_params):
new_w = tf.assign_sub(w, learning_rate * grad_w)
new_weights.append(new_w)
with tf.control_dependencies(new_weights):
return tf.identity(loss)
def check_accuracy(sess, dset, x, scores, is_training=None):
num_correct, num_samples = 0, 0
for x_batch, y_batch in dset:
feed_dict = {x: x_batch, is_training: 0}
scores_np = sess.run(scores, feed_dict=feed_dict)
y_pred = scores_np.argmax(axis=1)
num_samples += x_batch.shape[0]
num_correct += (y_pred == y_batch).sum()
acc = float(num_correct) / num_samples
print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))
def kaiming_normal(shape):
if len(shape) == 2:
fan_in, fan_out = shape[0], shape[1]
elif len(shape) == 4:
fan_in, fan_out = np.prod(shape[:3]), shape[3]
return tf.random_normal(shape) * np.sqrt(2.0 / fan_in)
def three_layer_convnet_init():
params = None
############################################################################
# TODO: Initialize the parameters of the three-layer network. #
############################################################################
conv_w1 = tf.Variable(kaiming_normal((5, 5, 3, 32)))
conv_b1 = tf.Variable(tf.zeros((32,)))
conv_w2 = tf.Variable(kaiming_normal((3, 3, 32, 16)))
conv_b2 = tf.Variable(tf.zeros((16,)))
fc_w = tf.Variable(kaiming_normal((32 * 32 * 16, 10)))
fc_b = tf.Variable(tf.zeros((10,)))
params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
############################################################################
# END OF YOUR CODE #
############################################################################
return params
def main():
learning_rate = 3e-3
tf.reset_default_graph()
is_training = tf.placeholder(tf.bool, name='is_training')
X_train, y_train, X_val, y_val, X_test, y_test = load_cifar10()
train_dset = Dataset(X_train, y_train, batch_size=64, shuffle=True)
test_dset = Dataset(X_test, y_test, batch_size=64)
val_dset = Dataset(X_val, y_val, batch_size=64, shuffle=False)
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape, y_train.dtype)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
device = '/cpu:0'
with tf.device(device):
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int32, [None])
params = three_layer_convnet_init()
scores = three_layer_convnet(x, params)
loss = training_step(scores, y, params, learning_rate)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t, (x_np, y_np) in enumerate(train_dset):
feed_dict = {x: x_np, y: y_np}
loss_np = sess.run(loss, feed_dict=feed_dict)
if t % 100 == 0:
print('Iteration %d, loss = %.4f' % (t, loss_np))
check_accuracy(sess, val_dset, x, scores, is_training)
if __name__=="__main__":
main()
EDIT: removed unnecessary comments and code
the problem is here
h1_conv = tf.nn.conv2d(x,
conv_w1 + conv_b1,
strides=[1, 1, 1, 1],
padding='SAME'
)
This is wrong as here you are adding bias values (conv_b1) to the filter conv_w1 but bias has to be added to the output of conv layer. The right way would be something like this
h1_conv = tf.nn.conv2d(x,
conv_w1,
strides=[1, 1, 1, 1],
padding='SAME'
)
h1_bias = tf.nn.bias_add(h1_conv, conv_b1)
h1 = tf.nn.relu(h1_bias)
And Correct it for h2 too.
Related
I am working on a Pytorch LSTM model that is able to detect patterns in sequence of N variables that leads to a good outcome vs bad outcome.
I tested it with a simple test-case which has 12 training examples.
I see that using batch size of 1 (i.e compute loss function, call .backward() and .step() after every sample) results in 1s and 0s being separated much more compared to updating loss function once with all samples). Difference is shown below.
Can someone give me the intuition behind this behavior?
Is there a rule of thumb of batch size to use based on training data size? Smaller Batch size makes things slow but I assume there is also higher chance things may not converge?
Final predictions with batch size 1
Train set targets: [[0.0]], predictions:[0.24779687821865082]
Train set targets: [[1.0]], predictions:[0.9567258954048157]
Train set targets: [[1.0]], predictions:[0.8191764950752258]
Train set targets: [[0.0]], predictions:[0.20435290038585663]
Train set targets: [[0.0]], predictions:[0.1295892596244812]
Train set targets: [[1.0]], predictions:[0.9186112284660339]
Train set targets: [[1.0]], predictions:[0.6797895431518555]
Train set targets: [[1.0]], predictions:[0.9642216563224792]
Train set targets: [[1.0]], predictions:[0.9764360785484314]
Train set targets: [[0.0]], predictions:[0.670409619808197]
Train set targets: [[1.0]], predictions:[0.7026165723800659]
Train set targets: [[1.0]], predictions:[0.8404821157455444]
Test set targets: [[0.0]], predictions:[0.08575308322906494]
Test set targets: [[1.0]], predictions:[0.7602054476737976]
Test set targets: [[1.0]], predictions:[0.7767713069915771]
Final predictions with batch size that includes all samples
Train set
targets: [[0.0], [1.0], [0.0], [1.0], [1.0], [1.0], [1.0], [0.0], [1.0], [1.0], [0.0], [1.0]],
predictions:[0.6307732462882996, 0.6873687505722046, 0.6007956862449646, 0.7481836080551147, 0.7676156759262085, 0.6568607091903687, 0.7259970307350159, 0.597843587398529, 0.6819412708282471, 0.6660482287406921, 0.5785030126571655, 0.7716434597969055]
Test set
Targets: [[0.0], [1.0], [1.0]],
Predictions:[0.36408719420433044, 0.7265898585319519, 0.7854364514350891]
Code
Model
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim):
super(LSTMFrzModel, self).__init__()
self.input_dim = input_dim
self.hidden_dim = hidden_dim
# LSTM
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
# Readout layer
self.fc = nn.Linear(hidden_dim, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, inputs):
mlp_out, (hidden, _) = self.lstm(inputs)
output = self.fc(hidden)
output = self.sigmoid(output)
return output
Dataset code
class LSTMDataset(Dataset):
def __init__(self, x, y):
self.x = x
self.y = y
def __len__(self):
return len(self.y)
def __getitem__(self,idx):
inputs = [torch.from_numpy(self.x[idx][ts]).unsqueeze(0) for ts in range(len(self.x[idx]))]
inputs = torch.cat(inputs)
target = torch.Tensor([self.y[idx]])
return inputs, target
Training code
# This function is used to train a single model
def train_lstm(
X_training,
y_training,
X_testing,
y_testing,
hidden_dim=10,
lr=1e-4,
f1_thresh = 0.5,
use_gpu=False,
batch_size=100,
num_epochs=4000,
assym_wt=0, #Equal weights for 0 and 1
):
start_time = time.time()
clf = LSTMModel(len(X_training[0][0]), hidden_dim)
# Move to GPU if available
use_gpu = use_gpu and torch.cuda.is_available()
device = torch.device("cuda" if use_gpu else "cpu")
# Define the loss function and optimizer
optimizer = torch.optim.Adam(clf.parameters(), lr=lr)
clf = clf.to(device)
loss_function = nn.BCELoss()
loss_function = loss_function.to(device)
dataset = LSTMDataset(X_training, y_training)
trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
# Run the training loop
# per_epoch_precision = []
# per_epoch_recall = []
train_preds = []
train_targets = []
for epoch in range(0, num_epochs):
# Set current loss value
current_loss = 0.0
# Iterate over the DataLoader for training data
clf.train() # set to train mode
for i, data in enumerate(trainloader):
# Get inputs
inputs, targets = data
# Zero the gradients
optimizer.zero_grad()
# Perform forward pass
outputs = clf(inputs)
# Store predictions/targets in last epoch to compute accuracy stats
if epoch == num_epochs - 1:
train_targets += targets
train_preds += outputs.view(-1).tolist()
print(f'Train set targets: {targets.tolist()}, predictions:{outputs.view(-1).tolist()}')
# Compute loss
targets = torch.FloatTensor(targets).unsqueeze(0)
# Apply assymetric weights to handle unbalanced datasets
if assym_wt > 0:
loss_function = nn.BCELoss(weight=assym_wt * targets + 1)
loss_function = loss_function.to(device)
loss = loss_function(outputs, targets)
# Perform backward pass
loss.backward()
# Perform optimization
optimizer.step()
# Print statistics
current_loss += loss.item()
if (epoch % 250) == 249:
print("Loss after epoch %5d: %.3f" % (epoch + 1, current_loss / 500))
current_loss = 0.0
# Process is complete.
print("Training process has finished.")
train_preds = torch.FloatTensor(train_preds)
train_targets = torch.FloatTensor(train_targets)
clf.eval() # set to eval mode
dataset = LSTMDataset(X_testing, y_testing)
testloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
test_preds = []
test_targets = []
with torch.no_grad():
for i, data in enumerate(testloader):
# Get inputs
inputs, targets = data
# Perform forward pass
preds = clf(inputs)
print(f'Test set targets: {targets.tolist()}, predictions:{preds.view(-1).tolist()}')
# Store predictions and targets so that we can compute stats
test_preds += preds.view(-1).tolist()
test_targets += targets
test_preds = torch.FloatTensor(test_preds)
test_targets = torch.FloatTensor(test_targets)
pr_fig = go.Figure()
pr_fig.update_xaxes(title_text="Recall")
pr_fig.update_yaxes(title_text="Precision")
plot_pr_withtorch(test_targets, test_preds, pr_fig, "Test PR ")
plot_pr_withtorch(train_targets, train_preds, pr_fig, "Train PR ")
pr_fig.show()
train_F1 = get_scores(train_targets, train_preds > f1_thresh, "Train set scores")
test_F1 = get_scores(test_targets, test_preds > f1_thresh, "Test set scores")
print(f"Train time is {time.time() - start_time}")
return train_F1, test_F1
Test case
# test case
train = np.array([[10, 20, 5, 10, 1], [10, 5, 8, 3, 0], [10, 5, 5, 10, 1]])
for ind in range(0, 3):
for spread in [5, 10, 50]:
newrow = spread + train[ind]
newrow[-1] -= spread
# print(newrow)
train = np.vstack([train, newrow])
test = np.array([[0, 8, 3, 7, 1], [19, 8, 12, 3, 0], [1000, 450, 75, 135, 1]])
train_df = pd.DataFrame(train, columns = ['f1_1','f1_2','f2_1','f2_2', 'op'])
test_df = pd.DataFrame(test, columns = ['f1_1','f1_2','f2_1','f2_2', 'op'])
tc_Xtrain = train_df[train_df.columns[~train_df.columns.isin(['op'])]]
tc_ytrain = train_df['op'].astype(np.float32)
tc_Xtest = test_df[test_df.columns[~test_df.columns.isin(['op'])]]
tc_ytest = test_df['op'].astype(np.float32)
#normalize values
scaler = StandardScaler()
tc_Xtrain = scaler.fit_transform(tc_Xtrain)
# print(tc_Xtrain)
# print(f'Type after standard scaler = {type(tc_Xtrain)}')
tc_Xtest = scaler.transform(tc_Xtest)
X_train = np.apply_along_axis(create_ts_features, 1, tc_Xtrain, num_features=2).astype(np.float32)
X_test = np.apply_along_axis(create_ts_features, 1, tc_Xtest, num_features=2).astype(np.float32)
# print(X_train)
# print(tc_ytrain)
train_lstm(X_train, tc_ytrain, X_test, tc_ytest, num_epochs=4000, f1_thresh=0.6, batch_size=100)
Metric computation
def plot_pr_withtorch(target, pred, fig, title):
pr_curve = PrecisionRecallCurve(pos_label=1)
precision, recall, thresholds = pr_curve(pred, target)
print_key_prs(precision, recall, thresholds, title)
N = len(recall)
fig.add_trace(go.Scatter(x=recall[0 : N - 1], y=precision[0 : N - 1], name=title))
def get_scores(y, y_preds, print_label):
print(f"{print_label} summary:")
confusion = confusion_matrix(y, y_preds)
print(f"Confusion matrix: {confusion}")
print("Accuracy: {:.2f}".format(accuracy_score(y, y_preds)))
print("Precision: {:.2f}".format(precision_score(y, y_preds)))
print("Recall: {:.2f}".format(recall_score(y, y_preds)))
F1_score = f1_score(y, y_preds)
print("F1: {:.2f}".format(f1_score(y, y_preds)))
return F1_score
i have build ANFIS model with tensorflow for classification problem. For every epoch i am getting precision and recall as zero. I am using guassian membership function but when i print sigma it is giving 0.Used below code for training
## settings
n = X_train.shape[1] # no of input features
m = 2*n # number of fuzzy rules
learning_rate = 0.01
epochs = 1000
################################ train
X_train_t = tf.placeholder(tf.float32, shape=[None, n]) # Train input
y_train_t = tf.placeholder(tf.float32, shape=None) # Train output
mu = tf.get_variable(name="mu", shape=[m * n], initializer=tf.random_normal_initializer(0, 1)) # mean of Gaussian MFS
sigma = tf.get_variable(name="sigma", shape = [m * n], initializer=tf.random_normal_initializer(0, 1)) # std_dev of Gaussian MFS
w = tf.get_variable(name="w", shape= [1, m], initializer=tf.random_normal_initializer(0, 1))
rula = tf.reduce_prod(tf.reshape(tf.exp( -0.5* ((tf.tile(X_train_t, (1, m))- mu)**2) / (sigma**2)),
(-1, m, n)), axis=2) #activations
Y_train_t = tf.reduce_sum(rula*w,axis=1) / tf.clip_by_value(tf.reduce_sum(rula,axis=1), 1e-8, 1e8)
#loss = tf.losses.log_loss(y_train, Y_train) # loss function
loss = tf.losses.sigmoid_cross_entropy(y_train_t, Y_train_t) # loss function
#loss = tf.sqrt(tf.losses.mean_squared_error(y_train, Y_train))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss) # optimizer
################################ test
X_test_t = tf.placeholder(tf.float32, shape=[None, n]) # Test input
y_test_t = tf.placeholder(tf.float32, shape=None) # Train output
rula_test = tf.reduce_prod(tf.reshape(tf.exp( -0.5* ((tf.tile(X_test_t, (1, m))- mu)**2) / (sigma**2)),
(-1, m, n)), axis=2) # rule activation
Y_test_t = tf.reduce_sum(rula_test*w,axis=1) / tf.clip_by_value(tf.reduce_sum(rula_test,axis=1), 1e-8, 1e8)
loss_test = tf.losses.sigmoid_cross_entropy(y_test_t, Y_test_t) # loss function
################################ start session
x_axis = []
tr_loss, te_loss = [],[]
tr_prec, te_prec = [], []
tr_rec, te_rec = [], []
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for e in range(epochs):
Y_train, loss_tr, _ = sess.run([Y_train_t, loss, optimizer], feed_dict={X_train_t: X_train, y_train_t: y_train})
Y_test, loss_te = sess.run([Y_test_t, loss_test], feed_dict={X_test_t: X_test, y_test_t: y_test})
if (e+1) % 10 == 0:
x_axis.append(e+1)
tr_loss.append(loss_tr)
te_loss.append(loss_te)
Y_train = np.where(Y_train > 0, 1, 0)
Y_test = np.where(Y_test > 0, 1, 0)
prec_tr = precision_score(y_train,Y_train)
prec_te = precision_score(y_test,Y_test)
rec_tr = recall_score(y_train,Y_train)
rec_te = recall_score(y_test,Y_test)
tr_prec.append(prec_tr)
te_prec.append(prec_te)
tr_rec.append(rec_tr)
te_rec.append(rec_te)
code is referenced from https://github.com/subhalingamd/ANFIS-diabetes-prediction/blob/main/main.py
I am new to this algorithm.Please, help me where am gone wrong.
I have tried and failed to make Keras model.fit() work on my multi-output model with a custom loss that uses all outputs' targets and predictions (specifically for 2 outputs) in TF 2.
When I tried to do this on a model made with the Keras functional API, I get the error: "SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found ..."
meaning I can't use my loss function because it returns an eager tensor to a Keras DAG that works with symbolic tensors (functional API model). To get around this, I used model.add_loss() instead of passing my loss function into model.compile(), but I believe this hogged GPU memory and caused OOM errors.
I've tried workarounds, where I put my functional API model inside a Keras subclassed model or make a completely new Keras subclassed model.
Workaround 1 is below in code, and runs yet gives me NaNs across the epochs on training on a variety of gradient clippings, and gives 0-valued outputs.
Workaround 2 gives me an error inside the override call() method because the inputs param is different shapes during model compile-time and run-time because my model (in a quirky way) has 3 inputs: 1 is the actual input to the DLNN, and the 2 others are the targets for the input sample. This is so that I can get the targets from each sample into the loss function.
from scipy.io import wavfile
import scipy.signal as sg
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Input, SimpleRNN, Dense, Lambda, TimeDistributed, Layer, LSTM, Bidirectional, BatchNormalization, Concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.activations import relu
from tensorflow.keras.callbacks import EarlyStopping
import numpy as np
import datetime
import numpy as np
import math
import random
import json
import os
import sys
# Loss function
def discriminative_loss(piano_true, noise_true, piano_pred, noise_pred, loss_const):
last_dim = piano_pred.shape[1] * piano_pred.shape[2]
return (
tf.math.reduce_mean(tf.reshape(noise_pred - noise_true, shape=(-1, last_dim)) ** 2, axis=-1) -
(loss_const * tf.math.reduce_mean(tf.reshape(noise_pred - piano_true, shape=(-1, last_dim)) ** 2, axis=-1)) +
tf.math.reduce_mean(tf.reshape(piano_pred - piano_true, shape=(-1, last_dim)) ** 2, axis=-1) -
(loss_const * tf.math.reduce_mean(tf.reshape(piano_pred - noise_true, shape=(-1, last_dim)) ** 2, axis=-1))
)
def make_model(features, sequences, name='Model'):
input_layer = Input(shape=(sequences, features), dtype='float32',
name='piano_noise_mixed')
piano_true = Input(shape=(sequences, features), dtype='float32',
name='piano_true')
noise_true = Input(shape=(sequences, features), dtype='float32',
name='noise_true')
x = SimpleRNN(features // 2,
activation='relu',
return_sequences=True) (input_layer)
piano_pred = TimeDistributed(Dense(features), name='piano_hat') (x) # source 1 branch
noise_pred = TimeDistributed(Dense(features), name='noise_hat') (x) # source 2 branch
model = Model(inputs=[input_layer, piano_true, noise_true],
outputs=[piano_pred, noise_pred])
return model
# Model "wrapper" for many-input loss function
class RestorationModel2(Model):
def __init__(self, model, loss_const):
super(RestorationModel2, self).__init__()
self.model = model
self.loss_const = loss_const
def call(self, inputs):
return self.model(inputs)
def compile(self, optimizer, loss):
super(RestorationModel2, self).compile()
self.optimizer = optimizer
self.loss = loss
def train_step(self, data):
# Unpack data - what generator yeilds
x, piano_true, noise_true = data
with tf.GradientTape() as tape:
piano_pred, noise_pred = self.model((x, piano_true, noise_true), training=True)
loss = self.loss(piano_true, noise_true, piano_pred, noise_pred, self.loss_const)
trainable_vars = self.model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
return {'loss': loss}
def test_step(self, data):
x, piano_true, noise_true = data
piano_pred, noise_pred = self.model((x, piano_true, noise_true), training=False)
loss = self.loss(piano_true, noise_true, piano_pred, noise_pred, self.loss_const)
return {'loss': loss}
def make_imp_model(features, sequences, loss_const=0.05,
optimizer=tf.keras.optimizers.RMSprop(clipvalue=0.7),
name='Restoration Model', epsilon=10 ** (-10)):
# NEW Semi-imperative model
model = RestorationModel2(make_model(features, sequences, name='Training Model'),
loss_const=loss_const)
model.compile(optimizer=optimizer, loss=discriminative_loss)
return model
# MODEL TRAIN & EVAL FUNCTION
def evaluate_source_sep(train_generator, validation_generator,
num_train, num_val, n_feat, n_seq, batch_size,
loss_const, epochs=20,
optimizer=tf.keras.optimizers.RMSprop(clipvalue=0.75),
patience=10, epsilon=10 ** (-10)):
print('Making model...') # IMPERATIVE MODEL - Customize Fit
model = make_imp_model(n_feat, n_seq, loss_const=loss_const, optimizer=optimizer, epsilon=epsilon)
print('Going into training now...')
hist = model.fit(train_generator,
steps_per_epoch=math.ceil(num_train / batch_size),
epochs=epochs,
validation_data=validation_generator,
validation_steps=math.ceil(num_val / batch_size),
callbacks=[EarlyStopping('val_loss', patience=patience, mode='min')])
print(model.summary())
# NEURAL NETWORK DATA GENERATOR
def my_dummy_generator(num_samples, batch_size, train_seq, train_feat):
while True:
for offset in range(0, num_samples, batch_size):
# Initialise x, y1 and y2 arrays for this batch
x, y1, y2 = (np.empty((batch_size, train_seq, train_feat)),
np.empty((batch_size, train_seq, train_feat)),
np.empty((batch_size, train_seq, train_feat)))
yield (x, y1, y2)
def main():
epsilon = 10 ** (-10)
train_batch_size = 5
loss_const, epochs, val_split = 0.05, 10, 0.25
optimizer = tf.keras.optimizers.RMSprop(clipvalue=0.9)
TRAIN_SEQ_LEN, TRAIN_FEAT_LEN = 1847, 2049
TOTAL_SMPLS = 60
# Validation & Training Split
indices = list(range(TOTAL_SMPLS))
val_indices = indices[:math.ceil(TOTAL_SMPLS * val_split)]
num_val = len(val_indices)
num_train = TOTAL_SMPLS - num_val
train_seq, train_feat = TRAIN_SEQ_LEN, TRAIN_FEAT_LEN
print('Train Input Stats:')
print('N Feat:', train_feat, 'Seq Len:', train_seq, 'Batch Size:', train_batch_size)
# Create data generators and evaluate model with them
train_generator = my_dummy_generator(num_train,
batch_size=train_batch_size, train_seq=train_seq,
train_feat=train_feat)
validation_generator = my_dummy_generator(num_val,
batch_size=train_batch_size, train_seq=train_seq,
train_feat=train_feat)
evaluate_source_sep(train_generator, validation_generator, num_train, num_val,
n_feat=train_feat, n_seq=train_seq,
batch_size=train_batch_size,
loss_const=loss_const, epochs=epochs,
optimizer=optimizer, epsilon=epsilon)
if __name__ == '__main__':
main()
Thanks for the help!
Solution, don't pass your loss into model.add_loss(). Instead concatenate your outputs together which lets you pass your custom loss into model.compile(). Then deal with the outputs in the custom loss function.
class TimeFreqMasking(Layer):
# Init is for input-independent variables
def __init__(self, epsilon, **kwargs):
super(TimeFreqMasking, self).__init__(**kwargs)
self.epsilon = epsilon
# No build method, b/c passing in multiple inputs to layer (no single shape)
def call(self, inputs):
y_hat_self, y_hat_other, x_mixed = inputs
mask = tf.abs(y_hat_self) / (tf.abs(y_hat_self) + tf.abs(y_hat_other) + self.epsilon)
y_tilde_self = mask * x_mixed
return y_tilde_self
def discrim_loss(y_true, y_pred):
piano_true, noise_true = tf.split(y_true, num_or_size_splits=2, axis=-1)
loss_const = y_pred[-1, :, :][0][0]
piano_pred, noise_pred = tf.split(y_pred[:-1, :, :], num_or_size_splits=2, axis=0)
last_dim = piano_pred.shape[1] * piano_pred.shape[2]
return (
tf.math.reduce_mean(tf.reshape(noise_pred - noise_true, shape=(-1, last_dim)) ** 2) -
(loss_const * tf.math.reduce_mean(tf.reshape(noise_pred - piano_true, shape=(-1, last_dim)) ** 2)) +
tf.math.reduce_mean(tf.reshape(piano_pred - piano_true, shape=(-1, last_dim)) ** 2) -
(loss_const * tf.math.reduce_mean(tf.reshape(piano_pred - noise_true, shape=(-1, last_dim)) ** 2))
)
def make_model(features, sequences, epsilon, loss_const):
input_layer = Input(shape=(sequences, features), name='piano_noise_mixed')
x = SimpleRNN(features // 2,
activation='relu',
return_sequences=True) (input_layer)
x = SimpleRNN(features // 2,
activation='relu',
return_sequences=True) (x)
piano_hat = TimeDistributed(Dense(features), name='piano_hat') (x) # source 1 branch
noise_hat = TimeDistributed(Dense(features), name='noise_hat') (x) # source 2 branch
piano_pred = TimeFreqMasking(epsilon=epsilon,
name='piano_pred') ((piano_hat, noise_hat, input_layer))
noise_pred = TimeFreqMasking(epsilon=epsilon,
name='noise_pred') ((noise_hat, piano_hat, input_layer))
preds_and_gamma = Concatenate(axis=0) ([piano_pred,
noise_pred,
# loss_const_tensor
tf.broadcast_to(tf.constant(loss_const), [1, sequences, features])
])
model = Model(inputs=input_layer, outputs=preds_and_gamma)
model.compile(optimizer=optimizer, loss=discrim_loss)
return model
def dummy_generator(num_samples, batch_size, num_seq, num_feat):
while True:
for _ in range(0, num_samples, batch_size):
x, y1, y2 = (np.random.rand(batch_size, num_seq, num_feat),
np.random.rand(batch_size, num_seq, num_feat),
np.random.rand(batch_size, num_seq, num_feat))
yield ([x, np.concatenate((y1, y2), axis=-1)])
total_samples = 6
batch_size = 2
time_steps = 3
features = 4
loss_const = 2
epochs = 10
val_split = 0.25
epsilon = 10 ** (-10)
model = make_model(features, time_steps, epsilon, loss_const)
print(model.summary())
num_val = math.ceil(actual_samples * val_split)
num_train = total_samples - val_samples
train_dataset = dummy_generator(num_train, batch_size, time_steps, features)
val_dataset = dummy_generator(num_val, batch_size, time_steps, features)
model.fit(train_dataset,
steps_per_epoch=math.ceil(num_train / batch_size),
epochs=epochs,
validation_data=val_dataset,
validation_steps=math.ceil(num_val / batch_size)
as the title says, I am trying to train a neural network to predict outcomes, and I can't figure out what is wrong with my model. I keep getting the exact same accuracy level, and the loss is Nan. I'm so confused... I have looked at other similar questions and still can't seem to get it working. My code for the model and training is below:
import numpy as np
import pandas as pd
import tensorflow as tf
import urllib.request as request
import matplotlib.pyplot as plt
from FlowersCustom import get_MY_data
def get_data():
IRIS_TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'species']
train = pd.read_csv(IRIS_TRAIN_URL, names=names, skiprows=1)
test = pd.read_csv(IRIS_TEST_URL, names=names, skiprows=1)
# Train and test input data
Xtrain = train.drop("species", axis=1)
Xtest = test.drop("species", axis=1)
# Encode target values into binary ('one-hot' style) representation
ytrain = pd.get_dummies(train.species)
ytest = pd.get_dummies(test.species)
return Xtrain, Xtest, ytrain, ytest
def create_graph(hidden_nodes):
# Reset the graph
tf.reset_default_graph()
# Placeholders for input and output data
X = tf.placeholder(shape=Xtrain.shape, dtype=tf.float64, name='X')
y = tf.placeholder(shape=ytrain.shape, dtype=tf.float64, name='y')
# Variables for two group of weights between the three layers of the network
print(Xtrain.shape, ytrain.shape)
W1 = tf.Variable(np.random.rand(Xtrain.shape[1], hidden_nodes), dtype=tf.float64)
W2 = tf.Variable(np.random.rand(hidden_nodes, ytrain.shape[1]), dtype=tf.float64)
# Create the neural net graph
A1 = tf.sigmoid(tf.matmul(X, W1))
y_est = tf.sigmoid(tf.matmul(A1, W2))
# Define a loss function
deltas = tf.square(y_est - y)
loss = tf.reduce_sum(deltas)
# Define a train operation to minimize the loss
# optimizer = tf.train.GradientDescentOptimizer(0.005)
optimizer = tf.train.AdamOptimizer(0.001)
opt = optimizer.minimize(loss)
return opt, X, y, loss, W1, W2, y_est
def train_model(hidden_nodes, num_iters, opt, X, y, loss, W1, W2, y_est):
# Initialize variables and run session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
losses = []
# Go through num_iters iterations
for i in range(num_iters):
sess.run(opt, feed_dict={X: Xtrain, y: ytrain})
local_loss = sess.run(loss, feed_dict={X: Xtrain.values, y: ytrain.values})
losses.append(local_loss)
weights1 = sess.run(W1)
weights2 = sess.run(W2)
y_est_np = sess.run(y_est, feed_dict={X: Xtrain.values, y: ytrain.values})
correct = [estimate.argmax(axis=0) == target.argmax(axis=0)
for estimate, target in zip(y_est_np, ytrain.values)]
acc = 100 * sum(correct) / len(correct)
if i % 10 == 0:
print('Epoch: %d, Accuracy: %.2f, Loss: %.2f' % (i, acc, local_loss))
print("loss (hidden nodes: %d, iterations: %d): %.2f" % (hidden_nodes, num_iters, losses[-1]))
sess.close()
return weights1, weights2
def test_accuracy(weights1, weights2):
X = tf.placeholder(shape=Xtest.shape, dtype=tf.float64, name='X')
y = tf.placeholder(shape=ytest.shape, dtype=tf.float64, name='y')
W1 = tf.Variable(weights1)
W2 = tf.Variable(weights2)
A1 = tf.sigmoid(tf.matmul(X, W1))
y_est = tf.sigmoid(tf.matmul(A1, W2))
# Calculate the predicted outputs
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
y_est_np = sess.run(y_est, feed_dict={X: Xtest, y: ytest})
# Calculate the prediction accuracy
correct = [estimate.argmax(axis=0) == target.argmax(axis=0)
for estimate, target in zip(y_est_np, ytest.values)]
accuracy = 100 * sum(correct) / len(correct)
print('final accuracy: %.2f%%' % accuracy)
def get_inputs_and_outputs(train, test, output_column_name):
Xtrain = train.drop(output_column_name, axis=1)
Xtest = test.drop(output_column_name, axis=1)
ytrain = pd.get_dummies(getattr(train, output_column_name))
ytest = pd.get_dummies(getattr(test, output_column_name))
return Xtrain, Xtest, ytrain, ytest
if __name__ == '__main__':
train, test = get_MY_data('output')
Xtrain, Xtest, ytrain, ytest = get_inputs_and_outputs(train, test, 'output')#get_data()
# Xtrain, Xtest, ytrain, ytest = get_data()
hidden_layers = 10
num_epochs = 500
opt, X, y, loss, W1, W2, y_est = create_graph(hidden_layers)
w1, w2 = train_model(hidden_layers, num_epochs, opt, X, y, loss, W1, W2, y_est)
# test_accuracy(w1, w2)
Here is a screenshot of what the training is printing out:
And this is a screenshot of the Pandas Dataframe that I am using for the input data (5 columns of floats):
And finally, here is the Pandas Dataframe that I am using for the expected outputs (1 column of either -1 or 1):
This is almost always a problem with the input data.
I would suggest inspecting in detail the values you are feeding into the model to make sure the model is receiving what you think it is.
I am just starting out with Tensorflow, trying to create a classic neural net for binary classification.
# Loading Dependencies
import math
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.framework import ops
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
seed = 1234
tf.set_random_seed(seed)
np.random.seed(seed)
# Load and Split data
data = pd.read_json(file)
X = data["X"]
y = data["y"]
X = X.astype(np.float32)
y = y.astype(np.float32)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size = 0.3)
X_train = X_train.reshape(X_train.shape[0], -1).T
y_train = y_train.values.reshape((1, y_train.shape[0]))
X_valid = X_valid.reshape(X_valid.shape[0], -1).T
y_valid = y_valid.values.reshape((1, y_valid.shape[0]))
print("X Train: ", X_train.shape)
print("y Train: ", y_train.shape)
print("X Dev: ", X_valid.shape)
print("y Dev: ", y_valid.shape)
X Train: (16875, 1122)
y Train: (1, 1122)
X Dev: (16875, 482)
y Dev: (1, 482)
The training data contains float numbers, while the labels are just 0 or 1. However, these are also converted to float because I had some issues in the past.
Initializing the parameters
def initialize_parameters(layer_dimensions):
tf.set_random_seed(seed)
layers_count = len(layer_dimensions)
parameters = {}
for layer in range(1, layers_count):
parameters['W' + str(layer)] = tf.get_variable('W' + str(layer),
[layer_dimensions[layer], layer_dimensions[layer - 1]],
initializer = tf.contrib.layers.xavier_initializer(seed = seed))
parameters['b' + str(layer)] = tf.get_variable('b' + str(layer),
[layer_dimensions[layer], 1],
initializer = tf.zeros_initializer())
return parameters
Shapes are:
W1 - (50, 16875)
W2 - (25, 50)
W3 - (10, 25)
W4 - (5, 10)
W5 - (1, 5)
b1 - (50, 1)
b2 - (25, 1)
b3 - (10, 1)
b4 - (5, 1)
b5 - (1, 1)
I am specifying the number and the dimension of each layer when I am calling the model (see below)
Forward Propagation
def forward_propagation(X, parameters):
parameters_count = len(parameters) // 2
A = X
for layer in range(1, parameters_count):
W = parameters['W' + str(layer)]
b = parameters['b' + str(layer)]
Z = tf.add(tf.matmul(W, A), b)
A = tf.nn.relu(Z)
W = parameters['W' + str(parameters_count)]
b = parameters['b' + str(parameters_count)]
Z = tf.add(tf.matmul(W, A), b)
return Z
Compute the cost (I am using the sigmoid function since we are dealing with binary classification)
def compute_cost(Z, Y):
logits = tf.transpose(Z)
labels = tf.transpose(Y)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = logits, labels = labels))
return cost
Putting it together
def model(X_train, y_train, X_valid, y_valid, layer_dimensions, alpha = 0.0001, epochs = 10):
ops.reset_default_graph()
tf.set_random_seed(seed)
(x_rows, m) = X_train.shape
y_rows = y_train.shape[0]
costs = []
X = tf.placeholder(tf.float32, shape=(x_rows, None), name="X")
y = tf.placeholder(tf.float32, shape=(y_rows, None), name="y")
parameters = initialize_parameters(layer_dimensions)
Z = forward_propagation(X, parameters)
cost = compute_cost(Z, y)
optimizer = tf.train.AdamOptimizer(learning_rate = alpha).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
_ , epoch_cost = sess.run([optimizer, cost], feed_dict={X: X_train, y: y_train})
print ("Cost after epoch %i: %f" % (epoch + 1, epoch_cost))
costs.append(epoch_cost)
parameters = sess.run(parameters)
correct_predictions = tf.equal(tf.argmax(Z), tf.argmax(y))
accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, y: y_train}))
print ("Test Accuracy:", accuracy.eval({X: X_valid, y: y_valid}))
return parameters
Now when I try to train my model it appears to reaches an optimum from the second epoch and the cost changes very little from that point on
parameters = model(X_train, y_train, X_valid, y_valid, [X_train.shape[0], 50, 25, 10, 5, 1])
Cost after epoch 1: 8.758244
Cost after epoch 2: 0.693096
Cost after epoch 3: 0.692992
Cost after epoch 4: 0.692737
Cost after epoch 5: 0.697333
Cost after epoch 6: 0.693062
Cost after epoch 7: 0.693151
Cost after epoch 8: 0.693152
Cost after epoch 9: 0.693152
Cost after epoch 10: 0.693155
Now for the predictions
def predict(X, parameters):
parameters_count = len(parameters) // 2
params = {}
for layer in range(1, parameters_count + 1):
params['W' + str(layer)] = tf.convert_to_tensor(parameters['W' + str(layer)])
params['b' + str(layer)] = tf.convert_to_tensor(parameters['b' + str(layer)])
(x_columns, x_rows) = X.shape
X_test = tf.placeholder(tf.float32, shape=(x_columns, x_rows))
Z = forward_propagation(X_test, params)
p = tf.argmax(Z)
sess = tf.Session()
prediction = sess.run(p, feed_dict = {X_test: X})
return prediction
However, this will predict 0 in every case..
predictions = predict(X_valid, parameters)
predictions
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0....
X Train: (16875, 1122)
You have 16875 features for each sample, but only 1122 train data.
I think this may be not enough.
The sample code in tensorflow get-started only takes 784 features.
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
The MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!
https://www.tensorflow.org/get_started/mnist/beginners