How to implement a modified cross entropy loss function? - python

I am currently working on a change detection project for my university course and I was stuck at writing a custom loss function.I know i have to use function closure to be able to use data from layers of the model but i don't know enough tensorflow/keras knowledge to write effecient code.
The loss function equation
This is the modified cross entropy loss equation that i'm trying to turn into code.The loss needs the matrix W which I have to calculate using the inputs to the model, that is X1 and X2. So at the moment I have this.
def cmg_loss(X1,X2):
def loss(y_true,y_pred):
print(X1)
if X1.shape[0] == None:
X1 = tf.reshape(X1,(224,224,3))
X2 = tf.reshape(X2,(224,224,3))
cmm = [get_cmm(X1,X2)]
else:
cmm = [get_cmm(X1[i],X2[i]) for i in range(X1.shape[0])]
N = tf.convert_to_tensor(y_true.shape[0],dtype=tf.float32)
N_val = y_true.shape[0]
loss = tf.convert_to_tensor(0.0)
if(N_val == None):
loss = get_cmgloss(y_true[0],y_pred[0],cmm[0])
loss = tf.math.multiply(-1.0,loss)
return tf.math.divide(loss,N)
else:
for i in range(N_val):
print(i)
print("CMM len ", len(cmm))
x = get_cmgloss(y_true[i],y_pred[i],cmm[i])
loss = tf.math.add(loss,get_cmgloss(y_true[i],y_pred[i],cmm[i]))
loss = tf.math.multiply(-1.0,loss)
return tf.math.divide(loss,N)
return loss
def get_cmgloss(y_true,y_pred,W):
y_true = tf.cast(y_true,dtype=tf.float32)
y_pred = tf.cast(y_pred, dtype=tf.float32)
betaminus = findbetaminus(y_pred)
betaplus = 1 - betaminus
betaminus = betaminus.astype('float32')
betaplus = betaplus.astype('float32')
loss = tf.convert_to_tensor(0.0)
N = tf.convert_to_tensor(y_true.shape[0] * y_true.shape[0],dtype=tf.float32)
betaminus_matrix = tf.fill((224,224), betaminus)
betaplus_matrix = tf.fill((224,224), betaplus)
one_matrix = tf.fill((224,224), 1.0)
first_term = tf.math.multiply(betaminus_matrix,tf.math.multiply(y_true,tf.math.log(y_pred)))
second_term = tf.math.multiply(betaplus_matrix,tf.math.multiply(tf.math.subtract(one_matrix,y_true), tf.math.log(tf.math.subtract(one_matrix,y_pred))))
sum_first_second = tf.math.add(first_term, second_term)
prod = tf.math.multiply(W,sum_first_second)
loss = tf.math.reduce_sum(prod)
#loss = K.sum(K.sum(betaminus_matrix * y_true * tf.math.log(y_pred),betaplus_matrix * (1 - y_true) * tf.math.log(1 - y_pred)))
loss = tf.math.multiply(-1.0, loss)
return tf.math.divide(loss,N)
def findbetaminus(gt):
count_1 = tf.math.count_nonzero(gt == 1)
size = gt.shape[0] * gt.shape[1]
return count_1 / size;
def get_cmm(x1,x2):
b1_diff_sq = tf.math.squared_difference(x1[:,:,0],x2[:,:,0])
b2_diff_sq = tf.math.squared_difference(x1[:,:,1],x2[:,:,1])
b3_diff_sq = tf.math.squared_difference(x1[:,:,2],x2[:,:,2])
sum_3bands = tf.math.add(tf.math.add(b1_diff_sq,b2_diff_sq),b3_diff_sq)
cmm = tf.math.sqrt(sum_3bands)
#print(cmm)
max_val = tf.reduce_max(cmm)
#print("MAX_VAL ", max_val)
max_val_matrix = tf.fill((224,224), max_val)
cmm_bar = tf.divide(cmm,max_val_matrix)
#print(cmm_bar)
mean_cmm_bar = tf.reduce_mean(cmm_bar)
#print("MEAN_VAL ", mean_cmm_bar)
mean_cmm_bar_matrix = tf.fill((224,224), mean_cmm_bar)
#print(mean_cmm_bar_matrix)
condition = tf.math.greater(mean_cmm_bar_matrix, cmm_bar)
return tf.where(condition, mean_cmm_bar_matrix, cmm_bar)
#print(weights)
It would be great help if you could guide me on how to develop a loss function that makes use of data from other layers and also call multiple functions in its computation.

If you want to use more advanced loss functions, you will have to use tf.GradientTape to do train steps by yourself instead of using the fit method.
You can find many examples on the web and in the tensorflow documentation. This requires a little more work but it is much more powerful because you can essentially output a list of tensors out of your custom Model in the call method and then calculate the target losses and choose which parameters are changed.

Related

Use if/else logic in tensorflow to either add an element to one tensor or another

I am building a custom loss function that needs to know whether the truth and the prediction have N pixels above a threshold. This is because the logic breaks if I supply an np.where() array which is empty. I can get around this issue by using try/else to return a 'flagged constant' in the case that the function fails on the empty set, but I'd like to do something different. Here is my current method.
def some_loss(cutoff=20, min_pix=10):
def gen_loss(y_true, y_pred):
trues = tf.map_fn(fn = lambda x: x, elems = y_true)
preds = tf.map_fn(fn = lambda x: x, elems = y_pred)
for idx in tf.range(tf.shape(y_true)[0]):
# binarize both by cutoff
true = y_true[idx]
pred = y_pred[idx]
true = tf.where(true < cutoff, 0.0, 1.0)
pred = tf.where(pred < cutoff, 0.0, 1.0)
# now I sum each to get the number of pixels above threshold
n_true, n_pred = tf.reduce_sum(true), tf.reduce_sum(pred)
# then I create a switch using tf.conditional
switch = tf.cond(tf.logical_or(n_true < min_pix, n_pred < min_pix), lambda: tf.zeros_like(true), lambda: tf.ones_like(true))
# this essentially allows me to turn off the loss if either condition is met
# so I then run the function
loss = get_loss(true, pred) # returns random constant if either is below threshold
loss += tf.reduce_sum(tf.math.multiply(loss, switch))
return loss
return gen_loss
This may work, it compiles and trains a convolutional model. However, I don't like that there are random constants wandering about my loss function, and I'd rather only operate the function get_loss() if both true and pred meet the minimum conditions.
I'd prefer to make two tensors, one with samples not meeting the condition, the other with samples meeting the condition.
Separately, I've tried to use tf.conditional to test for each case and call a separate loss function in either case. The code is repeated below.
def avgMED(scaler, cutoff=20, min_N=30,c=3):
def AVGmed(y_true, y_pred):
const = tf.constant([c],tf.float32) # constant c, multiplied by MED (
batch_size = tf.cast(tf.shape(y_true)[0], tf.float32)
MSE = tf.reduce_mean(tf.square(y_true-y_pred))
y_true = tf.reshape(y_true, shape=(tf.shape(y_true)[0], -1))
y_pred = tf.reshape(y_pred, shape=(tf.shape(y_pred)[0], -1))
loss, loss_med = tf.cast(0,dtype=tf.float32), tf.cast(0,dtype=tf.float32)
# rescale
y_true = y_true*scaler.scale_
y_true = y_true+scaler.mean_
y_pred = y_pred*scaler.scale_
y_pred = y_pred+scaler.mean_
trues = tf.map_fn(fn = lambda x: x, elems=y_true)
preds = tf.map_fn(fn = lambda x: x, elems=y_pred)
min_nonzero_pixels = tf.reduce_sum(tf.constant(min_N, dtype=tf.float32))
for idx in tf.range(batch_size):
idx = tf.cast(idx, tf.int32)
true = trues[idx]
pred = preds[idx]
MSE = tf.reduce_mean(tfm.square(tfm.subtract(true,pred)))
true = tf.where(true<cutoff,0.0,1.0)
pred = tf.where(pred<cutoff,0.0,1.0)
n_true = tf.reduce_sum(true)
n_pred = tf.reduce_sum(pred)
loss_TA = tf.cond(tf.logical_or(n_true < min_nonzero_pixels, n_pred < min_nonzero_pixels), get_zero(true,pred), get_MED(true,pred))
loss_med += loss_TA.read(0)
loss += loss_med + MSE # do we benefit from reducing across the batch dimension? we should be able to look at familiar batches and see the little increase due to the distance component
tf.print(n_true,n_pred)
tf.print(loss_med)
return loss # this is essentially MSE given c ~ 0. Thus, this will show if there are some weird gradients flowing through that are preventing the model from learning
return AVGmed
def get_MED(A,B):
# takes in binary tensors
indices_A, indices_B = tf.where(A), tf.where(B)
coordX_A_TA, coordY_A_TA = find_coord(indices_A) # finds x,y coordinates and returns tensor array
coordX_B_TA, coordY_B_TA = find_coord(indices_B)
mindists_AB_TA = find_min_distances(coordX_A_TA, coordY_A_TA, coordX_B_TA, coordY_B_TA)
mindists_BA_TA = find_min_distances(coordX_B_TA, coordY_B_TA, coordX_A_TA, coordY_A_TA)
# MED = mean error distance =
med_AB = tf.reduce_mean(mindists_AB_TA.read(0))
med_BA = tf.reduce_mean(mindists_BA_TA.read(0))
avg_med = tfm.divide(tfm.add(med_AB,med_BA),tf.constant(0.5))
loss_TA = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
loss_TA.write(loss_TA.size(), avg_med)
return loss_TA
def get_zero(A,B):
loss_TA = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
loss_TA.write(loss_TA.size(), 0)
return loss_TA
However, with this framework I am now getting new errors about my generator not having enough data, which is absurd given the batch size I test with is 10 and 1 step_per_epoch on a train size of 100. Got a warning about not closing the TensorArray, which I expect happens whether the conditional is true or false. Inching closer to a solution but could use some guidance on how problematic my tensorflow logic is.

Listwrapper not allowing multiplication of learning rate and thus no update of weight for Nueral Network

I am new to tensorflow and nueral networks. I am trying to create a NN to estimate y = x^2
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
x_train = tf.constant(value = np.linspace(-10,10,50),dtype='float32')
x_train = tf.reshape(x_train,shape=[50,1])
y_train = x_train**2
layers = [1,3,4,1]
I created a nueral network class to obtain my weights and biases and run forward propagation.
class NN(tf.Module):
def __init__(self,layers,name=None):
super().__init__(name=name)
self.layers = layers
self.weights, self.biases = self.initialze(layers)
def initialze(self,layers) :
num_layers = len(layers)
weights = []
biases = []
for i in range(num_layers-1):
in_dim = layers[i]
out_dim = layers[i+1]
stddev = np.sqrt(2/(in_dim + out_dim))
b = tf.Variable(tf.zeros([1,layers[i+1]], dtype='float32'), dtype='float32')
W = tf.Variable(tf.random.truncated_normal([in_dim, out_dim], stddev=stddev), dtype='float32')
weights.append(W)
biases.append(b)
return weights, biases
def __call__(self,x):
Z = x
num_layers = len(self.layers)
for i in range(num_layers-1):
Z =tf.math.add(tf.linalg.matmul(Z ,self.weights[i]),self.biases[i])
return Z
My_NN = NN(layers)
Next I created a class updat to do backward propogation
class updat:
def __init__(self,y_train,x_train):
self.y_train = y_train
self.x_train = x_train
self.l_r = 0.1
def get_grad(self,My_NN):
with tf.GradientTape(persistent=True) as tape:
tape.watch(My_NN.weights)
tape.watch(My_NN.biases)
loss = tf.reduce_mean(tf.square(self.y_train-My_NN(self.x_train)))
dw,db = tape.gradient(loss, [My_NN.weights,My_NN.biases])
print(dw,'weight')
print(db,'biases')
My_NN.weights -= (self.l_r * dw)
My_NN.biases -=(self.l_r * db)
del tape
return loss
def report(self, loss):
return f"W = {My_NN.weights.numpy():1.2f}, b = {My_NN.biases.numpy():1.2f}, loss={loss:2.5f}"
def prop(self,epochs,My_NN):
for epoch in epochs:
loss = self.get_grad(My_NN)
current_loss = loss
print(f"Epoch {epoch:2d}:")
print(" ", report(current_loss,My_NN))
But when I run the code
model = updat(y_train,x_train)
epochs = range(10)
model.prop(epochs,My_NN)
I get an error saying
My_NN.weights -= (self.l_r * dw)
My_NN.biases -=(self.l_r * db)
TypeError: can't multiply sequence by non-int of type 'float'
I tried substituting My_NN.weights -= (lr*dw)
with My_NN.weights.assign_sub(lr*dw)
still it shows that
'ListWrapper' object has no attribute 'assign_sub'
Is there any solution for this?
TURN
My_NN.weights -= (self.l_r * dw)
My_NN.biases -=(self.l_r * db)
TO
for weight,d_weight in zip(My_NN.weights,dw):
weight.assign_sub(self.l_r * d_weight)
for bias,d_bias in zip(My_NN.biases,db):
bias.assign_sub(self.l_r * d_bias)
can solve the problem.
Because My_NN.weights is a list of tf.Variable's ref and dw is corresponding list of tf.constant. We cannot modify it outside the list unless we iterate over the list. Additionally, if we want to update tf.Variable, we should use its assign .etc methods, this is like modifying the content specified by the pointer variable in C language.
More conveniently, we usually use tf.keras.optimizers's apply_gridents(), even minimize() to updata varibales directly.
For this specific task and your more process oriented coding approach, here I give out some suggestions for stable training:
add activations to constrain the fitting ability of this model:
def __call__(self,x):
Z = x
num_layers = len(self.layers)
for i in range(num_layers-2):
y = tf.math.add(tf.linalg.matmul(Z ,self.weights[i]),self.biases[i])
Z = tf.nn.relu(y)
i+=1
return tf.math.add(tf.linalg.matmul(Z ,self.weights[i]),self.biases[i])
make lower learning_rate:
self.l_r = 0.001 # self.l_r = 0.1
do more epochs:
epochs = range(1000) # epochs = range(10)
Since initial value of trainable wights will also influence the training stability, you may need to re-train several times. In my
tests, the above modification works.

simple example of mxnet model parallelism

The simple examples in the Guon tutorial for mxnet are very helpful to those of us who are just getting started with mxnet. As yet, there is not a simple example for model parallelism. I see the model parallelism example code for LSTM, but I am new to mxnet and it would help me (and perhaps others) to have a more streamlined example. So, I have created a model parallelism example by working off the regression example in the gluon tutorial, and by mixing in some code from mxnet.gluon.Trainer.
However, I am clearly getting something wrong. The gradients do not seem to be updated. Can anyone assist by identifying the problem(s)? The goal here is to create a linear regression model that has three layers, each held on a different gpu. The model itself is not useful, except as an example to show how initialization and training can occur for model parallelism, when using a custom block and imperative programming.
As I understand it, Trainer() is written for data parallelism. It will not work for model parallelism in that it requires all parameters to be initialized on all GPUs.
import os
import numpy as np
import mxnet as mx
from mxnet import nd, autograd, gluon
from mxnet.gluon import Block
# make some data
num_inputs = 2
num_outputs = 1
num_examples = 10000
def real_fn(X):
return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
X = np.random.normal(0,1, (num_examples, num_inputs))
noise = 0.001 * np.random.normal(0,1, (num_examples))
y = real_fn(X) + noise
y = y.reshape(-1,1)
# configuration
hidden_layers = 2
num_gpus = hidden_layers + 1
ctxList = [mx.gpu(i) for i in range(num_gpus)]
#ctxList = [mx.gpu() for i in range(num_gpus)]
#os.environ["MXNET_ENGINE_TYPE"] = "NaiveEngine"
print("\n")
# ======================================================================
class myDenseBlock(Block):
"""
A custom layer
"""
def __init__(self, layer_number, size_input, size_output, **kwargs):
super(myDenseBlock, self).__init__(**kwargs)
self.layer_number = layer_number
self.size_input = size_input
self.size_output = size_output
with self.name_scope():
# add parameters to the Block's ParameterDict.
self.w = self.params.get(
'weight',
init= mx.init.Xavier(magnitude=2.24),
shape=(size_input, size_output),
grad_req = 'write')
self.b = self.params.get(
'bias',
init= mx.init.Constant(0.5),
shape=(size_output,),
grad_req = 'write')
def forward(self, x):
x = x.as_in_context(ctxList[self.layer_number])
with x.context:
linear = nd.dot(x, self.w.data()) + self.b.data()
return linear
# ======================================================================
# create net
net = gluon.nn.Sequential()
with net.name_scope():
# initial layer, with X as input
net.add(myDenseBlock(0,
size_input = 2,
size_output = 2))
for ii in range(hidden_layers-1):
net.add(myDenseBlock(ii+1,
size_input = 2,
size_output = 2))
# final block, Y is nx1
net.add(myDenseBlock(ii+2,
size_input = 2,
size_output = 1))
# ititialize paramerters for different blocks (layers) on different gpus.
params = net.collect_params()
"""
The parameters are:
sequential0_mydenseblock0_weight
sequential0_mydenseblock0_bias
sequential0_mydenseblock1_weight
sequential0_mydenseblock1_bias
sequential0_mydenseblock2_weight
sequential0_mydenseblock2_bias
"""
print("\ninitializing:")
for i, param in enumerate(params):
if 'mydenseblock0' in param:
params[param].initialize(ctx=ctxList[0])
elif 'mydenseblock1' in param:
params[param].initialize(ctx=ctxList[1])
elif 'mydenseblock2' in param:
params[param].initialize(ctx=ctxList[2])
print(" ", i, param, " ", params[param].list_data()[0].context)
print("\n")
def square_loss(yhat, y):
return nd.mean((yhat - y) ** 2)
def mytrainer(updaters, params, ignore_stale_grad=False):
#print("\n")
for i, param in enumerate(params):
#print(i, param, " ", len(params[param].list_data()), params[param].list_data()[0].context)
if params[param].grad_req == 'null':
continue
if not ignore_stale_grad:
for data in params[param].list_data():
if not data._fresh_grad:
print(
"`%s` on context %s has not been updated"%(params[param].name, str(data.context)))
assert False
for upd, arr, grad in zip(updaters, params[param].list_data(), params[param].list_grad()):
if not ignore_stale_grad or arr._fresh_grad:
upd(i, grad, arr)
arr._fresh_grad = False
#print ("grad= ", grad)
batch_size = 100
epochs = 100000
iteration = -1
opt = mx.optimizer.create('adam', learning_rate=0.001, rescale_grad = 1 / batch_size)
updaters = [mx.optimizer.get_updater(opt)]
# the following definition for updaters does not work either
#updaters = [mx.optimizer.get_updater(opt) for _ in ctxList]
results = []
for e in range(epochs):
train_groups = np.array_split(np.arange(X.shape[0]), X.shape[0]/batch_size)
for ii, idx in enumerate(train_groups):
iteration += 1
xtrain, ytrain = X[idx,:], y[idx]
xtrain = nd.array(xtrain)
xtrain = xtrain.as_in_context(ctxList[0])
ytrain = nd.array(ytrain).reshape((-1, 1))
ytrain = ytrain.as_in_context(ctxList[0])
with autograd.record():
yhat = net(xtrain)
error = square_loss(yhat, ytrain.as_in_context(ctxList[-1]))
# Question: does the call to error.backward() go under the indent
# for autograd.record() or outside the indent? The gluon examples have
# it both ways
error.backward()
mytrainer(updaters, net.collect_params())
if iteration%10 == 0:
results.append([iteration, error.asnumpy().item()])
print(("epoch= {:5,d}, iter= {:6,d}, error= {:6.3E}").format(
e, iteration, error.asnumpy().item()))
The code fails at the "if not data._fresh_grad" test in mytrainer(). The output is:
initializing:
0 sequential0_mydenseblock0_weight gpu(0)
1 sequential0_mydenseblock0_bias gpu(0)
2 sequential0_mydenseblock1_weight gpu(1)
3 sequential0_mydenseblock1_bias gpu(1)
4 sequential0_mydenseblock2_weight gpu(2)
5 sequential0_mydenseblock2_bias gpu(2)
`sequential0_mydenseblock0_weight` on context gpu(0) has not been updated
I can verify using mx.autograd.get_symbol(error).tojson() that the computational graph only extends to the parameters on gpu(2), and does not reach other gpus.
Yes, per #sergei's comment, moving to v1.0.0 solves this.

Pre-training weights with one loss function and then change loss function to a custom one

I'm using the layers API and i want to pre-train my weights in order to create better initialisation values for a loss function that has some requirements about the output values from the network.
I'm pre-training atm on some generated data that has random values inside the ranges i want the output to me. It is trained by just using a MSE loss.
However after I pre-trained the network i would want to change the loss function and of cause also the labels however the labels are not a problem. When I change te loss function and try to run the network again I get an error:
InvalidArgumentError: TensorArray has inconsistent shapes.
Index 0 has shape: [1] but index 1 has shape: []
[[Node: map/TensorArrayStack/TensorArrayGatherV3 =
TensorArrayGatherV3[_class=["loc:#map/TensorArray_1"],
dtype=DT_FLOAT, element_shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]
(map/TensorArray_1, map/TensorArrayStack/range, map/while/Exit_1)]]
I don't know if it is impossible to switch loss functions due to some tensors in Adam maybe needs to be another shape or something.
If anyone knows please enlighten me.
Custom Loss:
def total_loss(y_true, y_pred):
def single_loss(total_index):
def iou_fn(box, y_true_x, y_true_w):
box_x, box_w = box[0], box[1]
left = tf.maximum(y_true_x-y_true_w/2., box_x-box_w/2.)
right = tf.minimum(y_true_x+y_true_w/2., box_x+box_w/2.)
overlap = right - left
intersection = tf.maximum(overlap*1., 0.)
union = y_true_w*1. + box_w*1. - intersection
return (intersection*1.) / (union*1.)
cell_size = 16
box_count = 5
cell_count = 10
y_true_single = y_true[total_index]
y_pred_single = y_pred[total_index]
coord = tf.constant(5., name='coord')
noobj = tf.constant(0.5, name='noobj')
x_error = 0.
w_error = 0.
c_error = 0.
c_noobj_error = 0.
class_error = 0.
y_true_x, y_true_w, y_true_class = tf.split(y_true_single[1:4], 3, axis=0)
for cell_index in range(cell_count):
cell = y_pred_single[cell_index*cell_size:(cell_index+1)*cell_size]
boxes = cell[0:-1]
_class = cell[-1]
boxes_split = tf.split(boxes, box_count, axis=0)
ious = []
for box_index in range(box_count):
ious.append(iou_fn(boxes_split[box_index], y_true_x, y_true_w))
index = tf.argmax(ious, output_type=tf.int32)
x_error = tf.cond((tf.gather(ious, tf.argmax(ious, output_type=tf.int32))[0]>0.)[0],
lambda: tf.add(x_error, (tf.square(tf.gather(boxes_split, index)[0][0]-y_true_x[0]))),
lambda: x_error)
w_error = tf.cond((tf.gather(ious, tf.argmax(ious, output_type=tf.int32))[0]>0.)[0],
lambda: tf.add(w_error, (tf.square(tf.gather(boxes_split, index)[0][1]-y_true_w[0]))),
lambda: w_error)
c_error = tf.cond((tf.gather(ious, tf.argmax(ious, output_type=tf.int32))[0]>0.)[0],
lambda: tf.add(c_error, (tf.square(tf.gather(ious, index)[0][0]))),
lambda: c_error)
class_error = tf.cond((tf.gather(ious, index)[0]>0.)[0],
lambda: tf.add(class_error, tf.square(_class-y_true_class)),
lambda: class_error)
for box_index in range(box_count):
c_noobj_error = tf.cond((tf.gather(ious, index)[0]>0.)[0],
lambda: tf.cond(tf.equal(box_index, index[0]),
lambda: c_noobj_error,
lambda: tf.add(c_noobj_error, tf.square(tf.gather(ious, index)[0]))),
lambda: c_noobj_error)
loss = x_error*coord + w_error*coord + c_error + c_noobj_error*noobj + class_error
return loss
full_return = tf.map_fn(single_loss, tf.range(tf.shape(y_pred)[0]), dtype=tf.float32)
return full_return

How to use mini-batch instead of SGD

Here is a quick implementation of a one-layer neural network in python:
import numpy as np
# simulate data
np.random.seed(94106)
X = np.random.random((200, 3)) # 100 3d vectors
# first col is set to 1
X[:, 0] = 1
def simu_out(x):
return np.sum(np.power(x, 2))
y = np.apply_along_axis(simu_out, 1, X)
# code 1 if above average
y = (y > np.mean(y)).astype("float64")*2 - 1
# split into training and testing sets
Xtr = X[:100]
Xte = X[100:]
ytr = y[:100]
yte = y[100:]
w = np.random.random(3)
# 1 layer network. Final layer has one node
# initial weights,
def epoch():
err_sum = 0
global w
for i in range(len(ytr)):
learn_rate = .1
s_l1 = Xtr[i].T.dot(w) # signal at layer 1, pre-activation
x_l1 = np.tanh(s_l1) # output at layer 1, activation
err = x_l1 - ytr[i]
err_sum += err
# see here: https://youtu.be/Ih5Mr93E-2c?t=51m8s
delta_l1 = 2 * err * (1 - x_l1**2)
dw = Xtr[i] * delta_l1
w -= learn_rate * dw
print("Mean error: %f" % (err_sum / len(ytr)))
epoch()
for i in range(1000):
epoch()
def predict(X):
global w
return np.sign(np.tanh(X.dot(w)))
# > 80% accuracy!!
np.mean(predict(Xte) == yte)
It is using stochastic gradient descent for optimization. I am thinking how do I apply mini-batch gradient descent here?
The difference from "classical" SGD to a mini-batch gradient descent is that you use multiple samples (a so-called mini-batch) to calculate the update for w. This has the advantage, that the steps you take in direction of the solution are less noisy, as you follow a smoothed gradient.
To do that, you need an inner loop to calculate the update dw, where you iterate over the mini batch. For example (quick-n-dirty code):
def epoch():
err_sum = 0
learn_rate = 0.1
global w
for i in range(int(ceil(len(ytr) / batch_size))):
batch = Xtr[i:i+batch_size]
target = ytr[i:i+batch_size]
dw = np.zeros_like(w)
for j in range(batch_size):
s_l1 = batch[j].T.dot(w)
x_l1 = np.tanh(s_l1)
err = x_l1 - target[j]
err_sum += err
delta_l1 = 2 * err * (1 - x_l1**2)
dw += batch[j] * delta_l1
w -= learn_rate * (dw / batch_size)
print("Mean error: %f" % (err_sum / len(ytr)))
gave an accuracy of 87 percent in a test.
Now, one more thing: you always go through the training set from start to end. You should definitely shuffle the data in each iteration. Always going through in the same order can really affect your performance, especially if you e.g. first have all samples of class A, and then all of class B. This can also make your training go in cycles. So just go through the set in a random order, e.g. with
order = np.random.permutation(len(ytr))
and replace all occurrences of i by order[i] in the epoch() function.
And a more general remark: Global variables are often considered bad design, as you don't have any control over which snippet modifies your variables. Rather pass w as a parameter. The same goes for the learning rate and the batch size.

Categories

Resources