Difference between logloss in sklearn and BCEloss in Pytorch? - python

Looking at the documentation for logloss in Sklearn and BCEloss in Pytorch, these should be the same, i.e. just the normal log loss with weights applied. However, they behave differently - both with and without weights applied. Can anyone explain it to me? I could not find the source code for BCEloss (which refers to binary_cross_entropy internally).
input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)
w = torch.randn((3, 1), requires_grad=False)
# ----- With weights
w = F.sigmoid(w)
criterion_test = nn.BCELoss(weight=w)
print(criterion_test(input=F.sigmoid(input), target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(),
y_pred=F.sigmoid(input).detach().numpy(), sample_weight=w.detach().numpy().reshape(-1), labels=np.array([0.,1.])))
print("")
print("")
# ----- Without weights
criterion_test = nn.BCELoss()
print(criterion_test(input=F.sigmoid(input),target=F.sigmoid(target)))
print(log_loss(y_true=target.detach().numpy(),
y_pred=F.sigmoid(input).detach().numpy(), labels=np.array([0.,1.])))

Regarding the computation without weights, using BCEWithLogitsLoss you get the same result as for sklearn.metrics.log_loss:
import torch
import torch.nn as nn
from sklearn.metrics import log_loss
import numpy as np
input = torch.randn((3, 1), requires_grad=True)
target = torch.ones((3, 1), requires_grad=False)
# ----- Without weights
criterion = torch.nn.BCEWithLogitsLoss()
criterion(input, target)
print('{:.6f}'.format(criterion(input, target)))
print('{:.6f}'.format((log_loss(y_true=target.detach().numpy(),
y_pred=torch.sigmoid(input).detach().numpy(),
labels=np.array([0.,1.])))))
Note that:
This loss combines a Sigmoid layer and the BCELoss in one single
class. This version is more numerically stable than using a plain
Sigmoid followed by a BCELoss as, by combining the operations into one
layer, we take advantage of the log-sum-exp trick for numerical
stability.

Actually, I found out. It turns out that BCELoss and log_loss behaves differently when the weights sum up to more than the dimension of the input array. Interesting.

Related

Subset model outputs in custom loss function in tensorflow/keras

I am interested in using a neural network to estimate the parameters of a linear regression. To do this I am creating a network that makes two-parameter prediction, and I am trying to write a custom loss function that will determine how well the two parameters do as a slope and intercept in a logistic regression model, using a third dataset as a predictor in the logistic regression.
So I have a matrix of predictors X, with dimensions 10,000 by 20, and a binary outcome variable y. Additionally, I have a 10,000 observations linear_predictor that I want to use to use in the custom loss function evaluate the two outputs of the model.
import numpy as np
from tensorflow.keras import Model, Input
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense
import tensorflow as tf
# create some dummy data
X = np.random.rand(10_000, 20)
y = (np.random.rand(10_000) > 0.8).astype(int)
linear_predictor = np.random.rand(10_000)
# define custom loss function
def CustomLoss(y_true, y_pred, input_):
y_estim = y_pred[:,0]*input_ + y_pred[:,1]
y_estim = tf.gather(y_pred, 0, axis=1)*input_ + tf.gather(y_pred, 1, axis=1)
return tf.keras.losses.BinaryCrossentropy(from_logits=True)(y_true, y_estim)
# create inputs to model
lp_input = Input(shape=linear_predictor.shape)
X_input = Input(shape=X.shape)
y_input = Input(shape=y.shape)
# create network
hidden1 = Dense(32, activation='relu')(X_input)
hidden2 = Dense(8, activation='relu')(hidden1)
output = Dense(2, activation='linear')(hidden2)
model = Model([y_input, X_input, lp_input], output)
# add loss function
model.add_loss(CustomLoss(y_input, output, lp_input))
# fit model
model.fit(x=X_input, y=y_input, epochs=3)
However, I am unable to get the CustomLoss function to work. Something is going wrong with subsetting the model's two-parameter output to get one parameter to use as a scalar as the slope and another to use as the intercept.
The error I am getting is:
ValueError: Exception encountered when calling layer "tf.math.multiply_1" (type TFOpLambda).
Dimensions must be equal, but are 2 and 10000 for '{{node tf.math.multiply_1/Mul}} = Mul[T=DT_FLOAT](
Placeholder, Placeholder_1)' with input shapes: [?,2], [?,10000].
Call arguments received by layer "tf.math.multiply_1" (type TFOpLambda):
• x=tf.Tensor(shape=(None, 2), dtype=float32)
• y=tf.Tensor(shape=(None, 10000), dtype=float32)
• name=None
This suggests that the variable y_pred is not being subset, even though I have tried using the method recommended here with numpy-like indexing (y_pred[:1]) as well as the gather_nd method here, among others.
I think this should be possible, any help is appreciated.

Using Gradient Tape for Jacobian of LSTM model - Python

I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5
I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??
The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:
Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.
However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:
#Get the Jacobian for each persistent gradient evaluation
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
x = tf.constant([[5., 6., 3.]])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
y = model(x)
loss = tf.reduce_mean(y**2)
print('Gradients\n')
jacobian_wrt_loss=tape.jacobian(loss,x)
print(f'{jacobian_wrt_loss}\n')
jacobian_wrt_y=tape.jacobian(y,x)
print(f'{jacobian_wrt_y}\n')
But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor.
However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with #tf.function wrapping as eager execution is already present in TF>2.0)
So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:
#General Syntax for getting jacobians for each layer output
import numpy as np
import tensorflow as tf
tf.executing_eagerly()
x=tf.constant([[15., 60., 32.]])
x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
model=tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))
aux_model=tf.keras.Sequential()
aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
#model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
x_y = model(x)
act_y=aux_model(x)
print(x_y,type(x_y))
ops=[layer.output for layer in model.layers]
# ops=[layer.output for layer in model.layers]
# inps=[layer.input for layer in model.layers]
print('Jacobian of Full FFNN\n')
jacobian=tape.jacobian(x_y,x)
print(f'{jacobian[0]}\n')
print('Jacobian of FFNN with Just first Dense\n')
jacobian=tape.jacobian(act_y,x)
print(f'{jacobian[0]}\n')
Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.
The details can be found here.
With the help from #Abhilash Majumder, I have done it this way. I am posting it here so that it might help someone in the future.
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])
For those looking to compute the Jacobian over a series of inputs and outputs that are independent of each other for input[i], output[j], i != j, consider the batch_jacobian method.
This will reduce the number of dimensions in your computed Jacobian tensor by one and could be the difference between running out of memory and not.
See: batch_jacobian in the TensorFlow GradientTape docs.

Pixel-wise weighted loss function in Keras - TensorFlow 2.0

I'm trying to write a pixel-wise weighted loss function for my model written in Keras but in TensorFlow 2.0 it seems that it is not possible anymore, i.e. it is not possible to have a loss function with other inputs than y_true and y_pred
I used to write it as follows:
from tensorflow.keras.layers import Input, Conv2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
def my_keras_model():
input = Input((256,256,1), name='input')
weight = Input((256,256,1), name='weights')
c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='glorot_uniform', padding='same')(input)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(c1)
model=Model(input=[input,weight], output=outputs)
model.compile(optimizer=Adam(learning_rate=0.001, name='adam'), loss=my_weighted_loss(weight))
return model
def my_weighted_loss(weight):
def loss(y_true, y_pred):
return K.mean(weight * K.binary_crossentropy(y_true, y_pred), axis=-1)
return loss
Any idea of how to do it in TF 2?
One "hacky" way of implementing this would be adding the original input to the output, and writing your own loss function. This way you can do
weight = y_true[...,0]
y_true = y_true[...,1:]
I would also love to hear a better answer :)
Actually it is possible to implement weight maps and do it computation inside the model.
Since it is binary cross_entropy
model=Model(inputs=[image,weight,mask], outputs=outputs)
Define your model in such a way that incase if your using tf dataset.
output_types=((tf.float32,tf.float32,tf.float32),tf.float32)
output_shapes=(([1024,1024,1],[1024,1024,1],[1024,1024,1]),[1024,1024,1])
Now compute the loss function inside the model
bce = y_true * K.log(y_pred+epsilon) + (1-y_true) * K.log(1-y_pred+epsilon) #you have values of y_true also
Here model output would be this computed loss.
Incase if you need a computation out of your model. Just use a Lambda layer for the weights.
weights_out=layers.Lambda(lambda x:x)(weights)
and then output this layer also from your model. So model would have 2 outputs to compute the loss in the form of a tuple and this way also pixelwise weighted loss can be calculated.
model=Model(inputs=[image,weights,mask], outputs=[outputs,weighted_out])

Neural network with no hidden layers and a linear activation function should approximate a linear regression?

To my understanding, a neural network will produce the same equation form as a linear regression assuming that you use no hidden layers and a linear activation function. i.e. y = SUM(w_i * x_i + b_i) where i is 0 to the number of features you have.
I've tried to prove this to myself by using the weights and bias of a linear regression, inputting that into a neural network and seeing if the results are the same. They are not.
I am wondering if my understanding is incorrect or if my code is or maybes it's both.
from sklearn.linear_model import LinearRegression
import tensorflow as tf
from tensorflow import keras
import numpy as np
linearModel = LinearRegression()
linearModel.fit(np.array(normTrainFeaturesDf), np.array(trainLabelsDf))
# Gets the weights of the linear model and the intercept in a form that can be passed into the neural network
linearWeights = np.array(linearModel.coef_)
intercept = np.array([linearModel.intercept_])
trialWeights = np.reshape(linearWeights, (len(linearWeights), 1))
trialWeights = trialWeights.astype('float32')
intercept = intercept.astype('float32')
newTrialWeights = [trialWeights, intercept]
# Create a neural network and set the weights of the model to the linear model
nnModel = keras.Sequential([keras.layers.Dense(1, activation='linear', input_shape=[len(normTrainFeaturesDf.keys())]),])
nnModel.set_weights(newTrialWeights)
# Print predictions of both models (the results are vastly different)
print(linearModel.predict(np.array(normTestFeaturesDf))
print(nnModel.predict(normTestFeaturesDf).flatten())
Yes, a neural network with a single layer and no activation function is equivalent to linear regression.
Defining some variables you did not include:
normTrainFeaturesDf = np.random.rand(100, 10)
normTestFeaturesDf = np.random.rand(10, 10)
trainLabelsDf = np.random.rand(100)
Then the output is as expected:
>>> linear_model_preds = linearModel.predict(np.array(normTestFeaturesDf))
>>> nn_model_preds = nnModel.predict(normTestFeaturesDf).flatten()
>>> print(linear_model_preds)
>>> print(nn_model_preds)
[0.46030349 0.69676376 0.43064266 0.4583325 0.50750268 0.51753189
0.47254946 0.50654825 0.52998559 0.35908762]
[0.46030346 0.69676375 0.43064266 0.45833248 0.5075026 0.5175319
0.47254944 0.50654817 0.52998555 0.3590876 ]
The numbers are identical, except for small variations due to float precision.
>>> np.allclose(linear_model_preds, nn_model_preds)
True

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient, however the gradient returns None, why is it unable to compute the gradient?
import tensorflow as tf
import matplotlib.pyplot as plt
import time
import numpy as np
from tensorflow.keras.applications import vgg16
class maxFeatureMap():
def __init__(self, model):
self.model = model
self.optimizer = tf.keras.optimizers.Adam()
def getNumLayers(self, layer_name):
for layer in self.model.layers:
if layer.name == layer_name:
weights = layer.get_weights()
num = weights[1].shape[0]
return ("There are {} feature maps in {}".format(num, layer_name))
def getGradient(self, layer, feature_map):
pic = vgg16.preprocess_input(np.random.uniform(size=(1,96,96,3))) ## Creates values between 0 and 1
pic = tf.convert_to_tensor(pic)
model = tf.keras.Model(inputs=self.model.inputs,
outputs=self.model.layers[layer].output)
with tf.GradientTape() as tape:
## predicts the output of the model and only chooses the feature_map indicated
predictions = model.predict(pic, steps=1)[0][:,:,feature_map]
loss = tf.reduce_mean(predictions)
print(loss)
gradients = tape.gradient(loss, pic[0])
print(gradients)
self.optimizer.apply_gradients(zip(gradients, pic))
model = vgg16.VGG16(weights='imagenet', include_top=False)
x = maxFeatureMap(model)
x.getGradient(1, 24)
This is a common pitfall with GradientTape; the tape only traces tensors that are set to be "watched" and by default tapes will watch only trainable variables (meaning tf.Variable objects created with trainable=True). To watch the pic tensor, you should add tape.watch(pic) as the very first line inside the tape context.
Also, I'm not sure if the indexing (pic[0]) will work, so you might want to remove that -- since pic has just one entry in the first dimension it shouldn't matter anyway.
Furthermore, you cannot use model.predict because this returns a numpy array, which basically "destroys" the computation graph chain so gradients won't be backpropagated. You should simply use the model as a callable, i.e. predictions = model(pic).
Did you define your own loss function? Did you convert tensor to numpy in your loss function?
As a freshman, I also met the same problem:
When using tape.gradient(loss, variables), it turns out None because I convert tensor to numpy array in my own loss function. It seems to be a stupid but common mistake for freshman.
FYI: When GradientTape is not working, there is a possibility of TensorFlow issue. Checking the TF github if the TF functions being used have known issues would be one of the problem determinations.
Gradients do not exist for variables after tf.concat(). #37726.

Categories

Resources