TensorFlow Dense and ReLU layers causing Segmentation fault error - python

I'm creating a neural network using TensorFlow.
I have some helper functions in the help.py file:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
def read_mnist(folder_path="MNIST_data/"):
return input_data.read_data_sets(folder_path, one_hot=True)
def build_training(y_labels, y_output, learning_rate=0.5):
# Define loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_labels, logits=y_output))
#train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(y_output,1), tf.argmax(y_labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return train_step, accuracy
def train_test_model(mnist, x_input, y_labels, accuracy, train_step, steps=1000, batch=100):
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for i in range(steps):
input_batch, labels_batch = mnist.train.next_batch(batch)
feed_dict = {x_input: input_batch, y_labels: labels_batch}
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict=feed_dict)
print("Step %d, training batch accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict=feed_dict)
print("The end of training!")
print("Test accuracy: %g"%accuracy.eval(feed_dict={x_input: mnist.test.images, y_labels: mnist.test.labels}))
Then I try to use it when training the network.
First I use very simple 1 output layer:
import help
import tensorflow as tf
import numpy as np
# Read mnist data
mnist = help.read_mnist()
image_size = 28
labels_size = 10
# Input layer - flattened images
x_input = tf.placeholder(tf.float32, [None, image_size*image_size])
y_labels = tf.placeholder(tf.float32, [None, labels_size])
# Layers:
# - Input
# - Output (Dense)
# Output dense layer
y_output = tf.layers.dense(inputs=x_input, units=labels_size)
# Define training
train_step, accuracy = help.build_training(y_labels, y_output)
# Run the training & test
help.train_test_model(mnist, x_input, y_labels, accuracy, train_step)
Then I add another ReLU layer:
# Hidden Layer
hidden = tf.layers.dense(inputs=x_input, units=hidden_size, activation=tf.nn.relu)
# Output dense layer
y_output = tf.layers.dense(inputs=hidden, units=labels_size)
I get Segmentation fault error both times.
I tried few things that I found online like reordering numpy and tensorflow import clauses, putting the help.py code in the same file as the network architecture and training process or increasing the memory for the docker image. Nothing worked.
Can someone help?

I upgraded to the next version of TensorFlow, version 1.2. It fixed all the issues.

Related

Tensorflow 1.14 no gradients provided

I'm trying to create an attack on a cnn following the example [here][1]. https://www.anishathalye.com/2017/07/25/synthesizing-adversarial-examples/
Their notebook runs on my system without any issues.
Instead of loading Inception, I want to attack my own network. For simplicity I'm training the network in the same notebook first:
tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape=(None, *img_shape), name='input')
y = tf.placeholder(tf.float32, shape=(None, total_labels), name='output')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
logits = conv_net(x, keep_prob, total_labels) # some standard cnn architecture is used
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
So far so good. The network is trained with
sess.run(optimizer, feed_dict={x: feature_batch, y: label_batch, keep_prob: keep_probability})
Now I want to create an attack by training an overlay on the image:
overlay = tf.Variable(tf.zeros(1, *img_shape), name='Overlay')
assign_op = tf.assign(overlay, x)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y)
optim = tf.train.GradientDescentOptimizer(0.0001).minimize(loss, var_list=[overlay])
My understanding is that the code above assigns the trainable Variable overlay to my input placeholder x as an input to my network, the loss is then calculated on the network predictions and the optimizer minimizes that loss. By passing var_list I tell the optimizer "Hey, you are training overlay that you cannot see explicitly."
However, this doesn't work and throws this error:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'Overlay:0' shape=(1, 55, 65, 1) dtype=float32_ref>"] and loss Tensor("softmax_cross_entropy_with_logits_3/Reshape_2:0", shape=(?,), dtype=float32).
Clearly I'm missing some step/understanding here. I don't see the source notebook doing anything else.
placeholder is not a trainable and gradients will not propagate through placeholders. If I were you, I would try next workflow: feed the data to net through a varaible at training also.
x = tf.placeholder(size)
x_var = tf.Variable(same_size)
assign = tf.assign(x_var, x)
logits = conv_net(x_var)
...
sess.run(assign, data)
sess.run(minimize)
# not sure, if it would work in a single sess.run([assign, minimize], data)

Same model produces consistently different accuracies in Keras and Tensorflow

I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy

Tensorflow Histogram error

I new to tensorflow and my code is below:
import tensorflow as tf
logdir="/tmp/mnist_tutorial5/"
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir=logdir+"data",one_hot = True)
tf.reset_default_graph()
sess = tf.Session()
writer = tf.summary.FileWriter(logdir)
def model(input):
w = tf.Variable(tf.truncated_normal([784,10], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[10]), name="B")
act = tf.matmul(input,w) + b
tf.summary.histogram("weights",w)
tf.summary.histogram("biases",b)
tf.summary.histogram("activations",act)
return act
def train():
x = tf.placeholder(tf.float32, shape=[None, 784], name="input_img")
y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")
my_label = model(x)
print("linear_regression is completed")
mean_error = tf.reduce_mean(tf.reduce_sum(tf.square(my_label-y)))
tf.summary.scalar("loss", mean_error)
train_step=tf.train.GradientDescentOptimizer(0.0003).minimize(mean_error)
correct_prediction = tf.equal(tf.argmax(my_label, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("accuracy", accuracy)
sess.run(tf.global_variables_initializer())
summ = tf.summary.merge_all()
for i in range(2000):
batch = mnist.train.next_batch(100)
train_accuracy = sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})
print("%s th iteration"%i)
if i%500==0:
print("over 2")
summarys = sess.run(summ, {x: batch[0], y: batch[1]})##i'm getting error here
print("over 3")
writer.add_summary(summarys,i)
print("one over")
train()
writer.add_graph(sess.graph)
And this is the error I am getting:
InvalidArgumentError (see above for traceback): Nan in summary histogram for: weights
[[Node: weights = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](weights/tag, W/read)]]
first of all its just a one layer network with no hidden layer
you have not applied any kind of activation function
no activation means your outputs are not being squashed
your gradients are getting exploded because of which the weights during updation are being exploded which may result in nan
You are training for 2000 epochs which means obviously after few epochs the weights will be nan
try to use an activation function like sigmoid and add atleast one hidden layer and you will be fine
also reduce the number of epochs ... For mnist classification you do not need the model to train for 2000 epochs ... Waste of time

Inferior performance of Tensorflow compared to sklearn

I'm comparing the performance of Tensorflow with sklearn on two datasets:
A toy dataset in sklearn
MNIST dataset
Here is my code (Python):
from __future__ import print_function
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
from sklearn.datasets import load_digits
import numpy as np
# digits = load_digits()
# data = digits.data
# labels = digits.target
# convert to binary labels
# y = np.zeros((labels.shape[0],10))
# y[np.arange(labels.shape[0]),labels] = 1
x_train = mnist.train.images
y_train = mnist.train.labels
x_test = mnist.test.images
y_test = mnist.test.labels
n_train = mnist.train.images.shape[0]
# import pdb;pdb.set_trace()
# Parameters
learning_rate = 1e-3
lambda_val = 1e-5
training_epochs = 30
batch_size = 200
display_step = 1
# Network Parameters
n_hidden_1 = 300 # 1st layer number of neurons
n_input = x_train.shape[1] # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
# Activation
layer_1_relu = tf.nn.relu(layer_1)
# Output fully connected layer with a neuron for each class
out_layer = tf.matmul(layer_1_relu, weights['out']) + biases['out']
return out_layer
# Construct model
logits = multilayer_perceptron(X)
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y)) + lambda_val*tf.nn.l2_loss(weights['h1']) + lambda_val*tf.nn.l2_loss(weights['out'])
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
# Test model
pred = tf.nn.softmax(logits) # Apply softmax to logits
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Initializing the variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(n_train/batch_size)
# Loop over all batches
ptr = 0
for i in range(total_batch):
next_ptr = ptr + batch_size
if next_ptr > len(x_train):
next_ptr = len(x_train)
batch_x, batch_y = x_train[ptr:next_ptr],y_train[ptr:next_ptr]
ptr += batch_size
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
Y: batch_y})
# Compute average loss
avg_cost += c / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
print("Optimization Finished!")
print("Accuracy on training set: ", accuracy.eval({X:x_train,Y:y_train}))
print("Accuracy on testing set:", accuracy.eval({X: x_test, Y: y_test}))
print("Experimenting sklearn...")
# now experiment with sklearn
from sklearn.datasets import load_digits
import numpy as np
from sklearn.neural_network import MLPClassifier
import time
# use MLP
t_start = time.time()
print('fitting MLP...')
clf = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(300,),max_iter=training_epochs)
clf.fit(x_train,y_train)
print('fitted MLP in {:.2f} seconds'.format(time.time() - t_start))
print('predicting...')
labels_predicted = clf.predict(x_test)
print('accuracy: {:.2f} %'.format(np.mean(np.argmax(y_test,axis=1) == np.argmax(labels_predicted,axis=1)) * 100))
The code is adapted from a github repository. For this testing, I'm using a traditional neural network (MLP) with only one hidden layer of size 300.
Following is the result for the both datasets:
sklearn digits: ~83% (tensorflow), ~90% (sklearn)
MNIST: ~94% (tensorflow), ~97% (sklearn)
I'm using the same model for both libraries. All the parameters (number of hidden layers, number of hidden units, learning_rate, l2 regularization constant, number of training epochs, batch size) and optimization algorithms are the same (Adam optimizer, beta parameters for Adam optimizer, no momentum, etc).
I wonder if sklearn has done a magic implementation over tensorflow? Can anyone help answer?
Thank you very much.

what's wrong with my tensorflow code

I just begin to study tensorflow and I want to create a DNN for MNIST. In the tutorial, there is a very simple neural network with 784 input nodes, 10 output nodes and no hidden nodes. I try to modify these codes to create a DNN network. Here is my code. I think I just add a hidden layer with 500 nodes between input and output layers, but the test accuracy is just 10%, which means it is not trained. Do you know what's wrong with my codes?
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.zeros([784,500]))
B_h1=tf.Variable(tf.zeros([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.zeros([5,5]))
B_h2=tf.Variable(tf.zeros([5]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.zeros([10]))
W_o=tf.Variable(tf.zeros([500,10]))
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
number_steps = 10000
batch_size = 100
for _ in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
OK, according to #lejlot's suggestion, I change my code as following.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.random_normal([784,500]))
B_h1=tf.Variable(tf.random_normal([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.random_normal([500,500]))
B_h2=tf.Variable(tf.random_normal([500]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.random_normal([10]))
W_o=tf.Variable(tf.random_normal([500,10]))
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
number_steps = 10000
batch_size = 100
for i in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if i % 1000==0:
acc=sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print('Current loop %d, Accuracy: %g'%(i,acc))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
There are two modification:
change the initial value of W_h1 and B_h1 with tf.random_normal
change the define of y and cross_entropy
The modification dose work. But I still don't know what's wrong with my original code. I call the tf.global_variables_initializer().run(), and I think this function will random the value of W_h1 and B_h1. Besides, if I define y and cross_entropy as following, it doesn't work.
y= tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]))
First of all this is not valid classifier model.
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
You are using explicit equation for cross entropy which requires y to be a (row-wise) probability distribution, yet you produce y by applying relu, meaning that you are simply outputing some non-negative numbers. In fact, if you ever output zeros, your code will produce NaNs and fail (as log of 0 is minus infinity).
You should use
y = tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
instead. Or even better (for better numerical stability):
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(
-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
update
Second issue is initialisation - you cannot initialise neural network weights to zeros, they have to be random numbers, typically sampled from low-variance zero-mean Gaussians. Global initialiser does not randomise weights, it simply runs all the initialisation ops - if the initialisation ops are constant ones (like zeros), it simply makes sure these zeros are assigned to variables, nothing else (thus it can be used to reset the network etc.). Zero initialisation works only for convex problems, such as logistic regression, but cannot work for complex model like neural network.

Categories

Resources