TensorFlow MLP example outputs binary instead of decimal - python

I'm trying to train a multilayer perseptron to classify between true or false, based on the given input.
So far I'm using the example:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
But this gives me the output as a binary value and I rather have a decimal or percentage based output.
What I've tried:
I've tried to change the optimizer for the other available ones with no success.
optimizer =
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

The optimizer will not change the output that is actually given by the layers.
The provided example uses ReLu for the layers, which is good for classification but to model probability it wouldn't work. You would be better off with a sigmoid function instead.
The sigmoid function can be used to model probability, whereas ReLu can be used to model positive real number.
In order to make it work for the provided example, change the multilayer_perceptron function to:
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.sigmoid(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1']), name="sigmoid_l1") #Hidden layer with sigmoid activation
layer_2 = tf.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']), name="sigmoid_l2") #Hidden layer with sigmoid activation
return tf.matmul(layer_2, _weights['out'], name="matmul_lout") + _biases['out']
It basically replaces the ReLu activation for a sigmoid one.
Then, for the evaluation, use softmax as follows:
output1 = tf.nn.softmax((multilayer_perceptron(x, weights, biases)), name="output")
avd = sess.run(output1, feed_dict={x: features_t})
It will provide you a range between 0 and 1 for each class. Also, you'll probably have to increase the number of epochs for this to work.

Related

Altering pytorch resnet head from sigmoid to Softmax

I'm new to pytorch. I wrote the below code to do predication using Resnet with Sigmoid for binary classification. I just need to change it to softmax because I might have more than 2 classes.
I understood that pytorch, unlike, Keras, the softmax is in the CrossEntropyLoss. So I'm not sure how could I change the top layer to make the model uses softmax:
model = torchvision.models.resnet50(pretrained=False)
model.fc = torch.nn.Sequential(
torch.nn.Linear(
in_features=2048,
out_features=1
) , torch.nn.Sigmoid()
)
model = model.cpu()
and later:
lossFunc=torch.nn.BCELoss(class_weights)
You can try this:
model.fc[1] = torch.nn.Softmax(10)
where 10 are the number of classes, you can put value based on your needs.

Activation functions: Softmax vs Sigmoid

I've been trying to build an image classifier with CNN. There are 2300 images in my dataset and two categories: men and women. Here's the model I used:
early_stopping = EarlyStopping(min_delta = 0.001, patience = 30, restore_best_weights = True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(256, (3, 3), input_shape=X.shape[1:], activation = 'relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(256, (3, 3), input_shape=X.shape[1:], activation = 'relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(tf.keras.layers.Dense(64))
model.add(tf.keras.layers.Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
h= model.fit(xtrain, ytrain, validation_data=(xval, yval), batch_size=32, epochs=30, callbacks = [early_stopping], verbose = 0)
Accuracy of this model is 0.501897 and loss 7.595693(the model is stuck on these numbers in every epoch) but if I replace Softmax activation with Sigmoid, accuracy is about 0.98 and loss 0.06. Why does such strange thing happen with Softmax? All info I could find was that these two activations are similar and softmax is even better but I couldn't find anything about such abnormality. I'll be glad if someone could explain what the problem is.
Summary of your results:
a) CNN with Softmax activation function -> accuracy ~ 0.50, loss ~ 7.60
b) CNN with Sigmoid activation function -> accuracy ~ 0.98, loss ~ 0.06
TLDR
Update:
Now that I also see you are using only 1 output neuron with Softmax, you will not be able to capture the second class in binary classification. With Softmax you need to define K neurons in the output layer - where K is the number of classes you want to predict. Whereas with Sigmoid: 1 output neuron is sufficient for binary classification.
so in short, this should change in your code when using softmax for 2 classes:
#use 2 neurons with softmax
model.add(tf.keras.layers.Dense(2, activation='softmax'))
Additionally:
When doing binary classification, a sigmoid function is more suitable as it is simply computationally more effective compared to the more generalized softmax function (which is normally being used for multi-class prediction when you have K>2 classes).
Further Reading:
Some attributes of selected activation functions
If the short answer above is not enough for you, I can share with you some things I've learned from my research about activation functions with NNs in short:
To begin with, let's be clear with the terms activation and activation function
activation (alpha): is the state of a neuron. The state of neurons in hidden or output layers will be quantified by the weighted sum of input signals from a previous layer
activation function f(alpha): Is a function that transforms an activation to a neuron signal. Usually a non-linear and differentiable function as for instance the sigmoid function. Many applications & research has been applied with the sigmoid function (see Bengio & Courville, 2016, p.67 ff.). Mostly the same activation function is being used throughout the neural network, but it is possible to use multiple (e.g. different ones in different layers).
Now to the effects of activation functions:
The choice of activation function can have an immense impact on learning of neural networks (as you have seen in your example). Historically it was common to use the sigmoid function, as it was a good function to depict a saturated neuron. Today, especially in CNNs other activation functions, also only partially linear activation functions (like relu) is being preferred over sigmoid function. There are many different functions, just to name some: sigmoid, tanh, relu, prelu, elu ,maxout, max, argmax, softmax etc.
Now let's only compare sigmoid, relu/maxout and softmax:
# pseudo code / formula
sigmoid = f(alpha) = 1 / (1 + exp(-alpha))
relu = f(alpha) = max(0,alpha)
maxout = f(alpha) = max(alpha1, alpha2)
softmax = f(alpha_j) = alpha_j / sum_K(alpha_k)
sigmoid:
in binary classification preferably used for output layer
values can range between [0,1], suitable for a probabilistic interpretation (+)
saturated neurons can eliminate gradient (-)
not zero centered (-)
exp() is computationally expensive (-)
relu:
no saturated neurons in positive regions (+)
computationally less expensive (+)
not zero centered (-)
saturated neurons in negative regions (-)
maxout:
positive attributes of relu (+)
doubles the number of parameters per neuron, normally requires an increased learning effort (-)
softmax:
can bee seen as a generalization of sigmoid function
mainly being used as output activation function in multi-class prediction problems
values range between [0,1], suitable for a probabilistic interpretation (+)
computationally more expensive because of exp() terms (-)
Some good references for further reading:
http://cs231n.stanford.edu/2020/syllabus
http://deeplearningbook.org (Bengio & Courtville)
https://arxiv.org/pdf/1811.03378.pdf
https://papers.nips.cc/paper/2018/file/6ecbdd6ec859d284dc13885a37ce8d81-Paper.pdf
The reason why you see those different results is the size of your output layer - it is 1 neuron.
Softmax by definition requires more than 1 output neuron to make sense. 1 Softmax neuron will always output 1 (lookup the formula and think about it). That is why you see ~50% accuracy, since your network always predicts class 1.
Sigmoid doesn't have this problem and can output anything, that's why it trains.
If you want to test softmax, you have to make an output neuron for each class and then "one-hot encode" your ytrain and yval (look up one-hot encoding for more explanations). In your case this means: label 0 -> [1, 0], label 1 -> [0, 1]. You can see, the index of the one encodes the class. I'm not sure but in that case I believe you'd use the categorical cross entropy. I was not able to tell conclusively from the docs, but it seems to me that binary cross entropy expects 1 output neuron that's either 0 or 1 (where Sigmoid is the correct activation to use) whereas the categorical cross entropy expects one output neuron for each class, where Softmax makes sense. You could use Sigmoid even for the multioutput case, but it's not common.
So in short, it seems to me that binary xentropy expects the class encoded by the value of the 1 neuron, whereas categorical xentropy expects the class encoded by which output neuron is the most active. (in simplifying terms)

Change Sigmoid output to probability

I already trained a neural network with the last layer using sigmoid. If I can not retrain the network with softmax, can I change the final predictions as probability? Now the output of
pred = fin_model.predict_proba(x_train)
is like
array([[0.65247375, 0.45892698],
[0.65919983, 0.4590024 ],
[0.15964866, 0.47771254],
[0.53297156, 0.47564888],
[0.16078213, 0.4779702 ]], dtype=float32)
The sum of each one like 0.6524+0.4589 is not 1, and thus can not be a probability. Is there a way to change it to probabilities?
The Sigmoid function always returns a value between 0 and 1 and mainly used in binary classification. Sigmoid activation function will mark one class as close to 0 (<=0.5) and other close to 1 (>0.5).
To use sigmoid, you need to define final layer as:
model.add(Dense(1, activation_function='sigmoid'))
However you can also use Softmax activation for binary classification. Softmax converts a vector of values to a probability distribution which gives output vector in range (0, 1) and sum to 1.
It can be declared in final layer as :
model.add(Dense(2, activation_function='softmax'))
You can get more details on softmax and sigmoid here.

Tensorflow: Sigmoid cross entropy loss does not force network outputs to be 0 or 1

I would like to learn image segmentation in TensorFlow with values in {0.0,1.0}. I have two images, ground_truth and prediction and each have shape (120,160). The ground_truth image pixels only contain values that are either 0.0 or 1.0.
The prediction image is the output of a decoder and the last two layers of it are a tf.layers.conv2d_transpose and tf.layers.conv2d like so:
transforms (?,120,160,30) -> (?,120,160,15)
outputs = tf.layers.conv2d_transpose(outputs, filters=15, kernel_size=1, strides=1, padding='same')
# ReLU
outputs = activation(outputs)
# transforms (?,120,160,15) -> (?,120,160,1)
outputs = tf.layers.conv2d(outputs, filters=1, kernel_size=1, strides=1, padding='same')
The last layer does not carry an activation function and thus it's output is unbounded. I use the following loss function:
logits = tf.reshape(predicted, [-1, predicted.get_shape()[1] * predicted.get_shape()[2]])
labels = tf.reshape(ground_truth, [-1, ground_truth.get_shape()[1] * ground_truth.get_shape()[2]])
loss = 0.5 * tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits))
This setup converges nicely. However, I have realized that the outputs of my last NN layer at validation time seem to be in [-inf, inf]. If I visualize the output I can see that the segmented object is not segmented since almost all pixels are "activated". The distributions of values for a single output of the last conv2d layer looks like this:
Question:
Do I have to post-process the outputs (crop negative values or run output trough a sigmoid activation etc.)? What do I need to do to enforce my output values to be {0,1}?
Solved it. The problem was that the tf.nn.sigmoid_cross_entropy_with_logits runs the logits through a sigmoid which is of course not used at validation time since the loss operation is only called during train time. The solution therefore is:
make sure to run the network outputs through a tf.nn.sigmoid at validation/test time like this:
return output if is_training else tf.nn.sigmoid(output)

How can I set Bias and change Sigmoid to ReLU function in ANN?

I'm trying to create a data prediction model through artificial neural networks. The following code is part of the Python-based ANN code created through many books. Also, the error rate between the predicted value and the actual value doesn't meet below 19%. I tried to increase the number of hidden layers, but it did not tremendously affect the error rate. I think this is probably a limitation of Sigmoid function and not considering Bias. I looked around for a month and found out how to build ReLU and Bias, but I could not find the range of Bias and ReLU.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
I'm sorry to ask an elementary question.
class neuralNetwork:
# initialize the neural network
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
#
self.inodes = input_nodes
self.hnodes = hidden_nodes
self.onodes = output_nodes
# link weight matrices, wih and who
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
# learning rate
self.lr = learning_rate
# activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)
pass
# train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
# output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
# update the weights for the links between the hidden and output layers
self.who += self.lr*numpy.dot((output_errors*final_outputs*(1.0-final_outputs)), numpy.transpose(hidden_outputs))
# update the weights for the links between the input and output layers
self.wih += self.lr*numpy.dot((hidden_errors*hidden_outputs*(1.0-hidden_outputs)), numpy.transpose(inputs))
pass
# query the neural network
def query(self, inputs_list) :
inputs = numpy.array(inputs_list, ndmin=2).T
# convert hidden list to 2d array
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
final_inputs = numpy.dot(self.who, hidden_outputs)
final_outputs = self.activation_function(final_inputs)
return final_outputs
pass
Your question is too broad and there are lots of concept behind ReLU vs sigmoid.
But in short:
Sigmoid Saturate and kill gradients (look at Gradient descent) sigmoid are not zero centered because output of sigmoid is 0<output<1. I can see for sigmoid you are using
scipy but for ReLU its easy. Relu is defined by the following function
f(x) = max(0,x)
This means if the input is greater then zero return input else return 0. And ReLU is prefered for hidden layers and other like softmax for output layers.
I would say, look different activation function and why we need activation functions on neural net. How sigmoid kills gradients and why they slow converge.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
simply write a method on your own based on the ReLU function above and update the following line
self.activation_function = max(0,x) # instead of lambda x: scipy.special.expit(x)
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
Answer of this question depends on your network and your data, but yes you normalize the data. And there is no such range that you need to make your data in. Because for ReLU: If the input is less than zero, it will return 0 and if input is >= 0, it will return the input. So no such range like in sigmoid. Answer of this question
If you wanna look at how ReLU works and can be used, following detailed example will help though these examples are written using framework (PyTorch) to build the network and train.
PyTorch Basic Projects Link
ReLU vs sigmoid vs TanH Video

Categories

Resources