I am trying to create a custom loss function in Keras for generator that generates matrix. The matrix consists of higher number of elements and low number of their centers. Centers have high value comparing to elements - elements have value <0.1, while centers should reach value >0.5. It is important that the centers are at exact correct indices, while it is less important to fit elements. That is why I am trying to create loss that would do the following:
select all elements from y_true where value is >0.5, in numpy I would do indices = np.argwhere(y_true>0.5)
compare values at the given indices for y_true and y_pred, something like loss=(K.square(y_pred[indices]-y_true[indices]))
select all other elements indices_low = np.argwhere(y_true<0.5)
same as step 2, save i.e. as loss_low
return weighted loss, i.e. return loss*100+loss_low, simply to give higher wight to more important data
However, I cannot find a way to achieve this in keras backend, I have found a question about tf.where, trying to look for something similar to my problem but there seem to be nothing like tf.argwhere (can't find in docs, neither browsing net/SO). So how can I achieve this?
Note that the number and positions of centers can vary, and the generator is bad from start so it will not generate any or will generate way more than really should be, so I think that I can't simply use tf.where. I might be incorrect here as I am new to custom loss functions, any thoughts are welcome.
EDIT
After all it seems K.tf.where was exactly what I was looking for, so I have tried it out:
def custom_mse():
def mse(y_true, y_pred):
indices = K.tf.where(y_true>0.5)
loss = K.square(y_true[indices]-y_pred[indices])
indices = K.tf.where(y_true<0.5)
loss_low = K.square(y_true[indices]-y_pred[indices])
return 100*loss+loss_low
return mse
but this keeps throwing an error:
ValueError: Shape must be rank 1 but is rank 3 for 'loss_1/Generator_loss/strided_slice' (op: 'StridedSlice') with input shapes: [?,?,?,?], [1,?,4], [1,?,4], [1].
How can I use the where output?
After a while I finally found the correct solution, so it might help somebody in the future:
Firstly my code was biased by my long time work with numpy and Pandas, thus I have expected tf elements can be addressed as y_true[indices], there are actually built in functions tf.gather and tf.gather_nd for getting elements of a tensor. However, since number of elements in both losses are different, I can't use this because counting losses together will lead to incorrect size error.
This led me to a different approach, thanks to this Q&A. Understanding the code in the accepted answer I have found that you can use tf.where not only to get indices, but as well to apply masks to your tensors. The final solution for my problem is then to apply two masks on the input tensor and calculate two losses, one where I count loss for higher values and one where I count loss for lower values, then multiply the loss that should have higher weight.
def custom_mse():
def mse(y_true, y_pred):
great = K.tf.greater(y_true,0.5)
loss = K.square(tf.where(great, y_true, tf.zeros(tf.shape(y_true)))-tf.where(great, y_pred, tf.zeros(tf.shape(y_pred))))
lower = K.tf.less(y_true,0.5)
loss_low = K.square(tf.where(lower, y_true, tf.zeros(tf.shape(y_true)))-tf.where(lower, y_pred, tf.zeros(tf.shape(y_pred))))
return 100*loss+loss_low
return mse
Related
I have a program in which I'm trying to calculate the jacobian of a neural network, but in order to properly define the jacobian I used tf.reshapeto make the data vectors (as far as I know, jacobian: dy/dx is only defined when y and x are vectors (not matrices nor tensors))
this is my code
#tf.function
def A_calculator():
with tf.GradientTape(watch_accessed_variables=False) as gtape:
noise=tf.random.normal([1000, 100])
gtape.watch(noisex)
fakenoise=tf.reshape(gen(noise),[1000,-1])
reshaped_noise=tf.reshape(noise,[1000,-1])
#caulculate jacobian
Jz=gtape.batch_jacobian(fakenoise,reshaped_noise)
return Jz
where genis a neural network that returns an image(generator)
My problem is that Jz is always a tensor with zero as elements
I searched for a solution for this but the closest thing was here(this is what made me suspect that the problem is tf.reshape), but the solution there doesn't solve my problem as I want to do reshape after I insert the value to the functiongen, does anybody know how solve this ? or why Jz always gives a tensor with zero values ?
Reshaping every tensor is unnecessary, as reshaping (1000,100) tensor to(1000,-1) will result in same shape. Skip reshaping altogether at all stages.
Please check the generator it could take a lot of time to produce the "fakenoise"
I am writing a custom loss function which calculates mean squared error while ignoring nans. The issue is that my data is an image which occasionally has NaN pixels. I simply want to ignore these nan pixels and calculate the summed squared error between prediction and data, then calculate mean over examples. If I were to write a function for this in Tensorflow I would write:
def nanmean_squared_error(y_true, y_pred):
residuals = y_true - y_pred
residuals_no_nan = tf.where(tf.is_nan(residuals), tf.zeros_like(residuals), residuals)
sum_residuals = tf.reduce_sum(residuals_no_nan, [1, 2])
return sum_residuals
But this code does not work as a custom Keras loss function.
I believe I can use keras.backend.switch/zeros_like/sum instead of the tensorflow versions. But I cannot find any replacement for tf.is_nan. Does anyone have a suggestion on how to implement this?
It seems it doesn't work because you are not taking absolute or square values.
If you mean "squared" error, there must be a square in your code (or you will have negative errors and everything will blow to huge negative errors).
def nanmean_squared_error(y_true, y_pred):
residuals = K.square(y_true - y_pred)
residuals_no_nan = tf.where(tf.is_nan(residuals), tf.zeros_like(residuals), residuals)
sum_residuals = tf.reduce_sum(residuals_no_nan, [1, 2])
return sum_residuals
But to be honest, I'd probably try to replace the image nans with a certain value before entering the model. I don't know what kind of problems may appear from having nans all around, considering gradients, all intermediate layers, etc.
I'd like to use a neural network to predict a scalar value which is the sum of a function of the input values and a random value (I'm assuming gaussian distribution) whose variance also depends on the input values. Now I'd like to have a neural network that has two outputs - the first output should approximate the deterministic part - the function, and the second output should approximate the variance of the random part, depending on the input values. What loss function do I need to train such a network?
(It would be nice if there was an example with Python for Tensorflow, but I'm also interested in general answers. I'm also not quite clear how I could write something like in Python code - none of the examples I found so far show how to address individual outputs from the loss function.)
You can use dropout for that. With a dropout layer you can make several different predictions based on different settings of which nodes dropped out. Then you can simply count the outcomes and interpret the result as a measure for uncertainty.
For details, read:
Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.
Since I've found nothing simple to implement, I wrote something myself, that models that explicitly: here is a custom loss function that tries to predict mean and variance. It seems to work but I'm not quite sure how well that works out in practice, and I'd appreciate feedback. This is my loss function:
def meanAndVariance(y_true: tf.Tensor , y_pred: tf.Tensor) -> tf.Tensor :
"""Loss function that has the values of the last axis in y_true
approximate the mean and variance of each value in the last axis of y_pred."""
y_pred = tf.convert_to_tensor(y_pred)
y_true = math_ops.cast(y_true, y_pred.dtype)
mean = y_pred[..., 0::2]
variance = y_pred[..., 1::2]
res = K.square(mean - y_true) + K.square(variance - K.square(mean - y_true))
return K.mean(res, axis=-1)
The output dimension is twice the label dimension - mean and variance of each value in the label. The loss function consists of two parts: a mean squared error that has the mean approximate the mean of the label value, and the variance that approximates the difference of the value from the predicted mean.
When using dropout to estimate the uncertainty (or any other stochastic regularization method), make sure to also checkout our recent work on providing a sampling-free approximation of Monte-Carlo dropout.
https://arxiv.org/pdf/1908.00598.pdf
We essentially follow ur idea. Treat the activations as random variables and then propagate mean and variance using error propagation to the output layer. Consequently, we obtain two outputs - the mean and the variance.
I am implementing a custom loss function in keras. The output of the model is 10 dimensional softmax layer. To calculate loss: first I need to find the index of y firing 1 and then subtract that value with true value. I'm doing the following:
from keras import backend as K
def diff_loss(y_true,y_pred):
# find the indices of neuron firing 1
true_ind=K.tf.argmax(y_true,axis=0)
pred_ind=K.tf.argmax(y_pred,axis=0)
# cast it to float32
x=K.tf.cast(true_ind,K.tf.float32)
y=K.tf.cast(pred_ind,K.tf.float32)
return K.abs(x-y)
but it gives error "raise ValueError("None values not supported.")
ValueError: None values not supported."
What's the problem here?
This happens because your function is not differentiable. It's made of constants.
There is simply no solution for this if you want argmax as result.
An approach to test
Since you're using "softmax", that means that only one class is correct (you don't have two classes at the same time).
And since you want index differences, maybe you could work with a single continuous result (continuous values are differentiable).
Work with only one output ranging from -0.5 to 9.5, and take the classes by rounding the result.
That way, you can have the last layer with only one unit:
lastLayer = Dense(1,activation = 'sigmoid', ....) #or another kind if it's not dense
And change the range with a lambda layer:
lambdaLayer = Lambda(lambda x: 10*x - 0.5)
Now your loss can be a simple 'mae' (mean absolute error).
The downside of this attempt is that the 'sigmoid' activation is not evenly distributed between the classes. Some classes will be more probable than others. But since it's important to have a limit, it seems at first the best idea.
This will only work if you classes follow a logical increasing sequence. (I guess they do, otherwise you'd not be trying that kind of loss, right?)
The function keras.metrics.binary_accuracy is very straightforward:
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
https://github.com/fchollet/keras/blob/master/keras/metrics.py#L20
However the function keras.metrics.categorical_accuracy has something different:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
https://github.com/fchollet/keras/blob/master/keras/metrics.py#L24
I am very confused that why this function uses K.cast, instead of K.mean? Because I think that this function should return a number just like the function keras.metrics.binary_accuracy
The reason for cast is because argmax returns an integer, It's the index of the highest value. But the result must be a float.
The argmax function:
The argmax function will also reduce the rank of the inputs. Notice it uses axis=-1, meaning it will take the index of the maximum value in the last axis, eliminating that axis, but keeping the other axes.
Supposing your input had shape (10 samples, 5 features), the returned tensor would be just (10 samples,)
The mean function with axis=-1:
Normally, the mean function returns a scalar, but if you look closely at binary_accuracy, you will also notice that, by using axis=-1 in the mean function, it doesn't reduce the input to a single scalar value. It reduces the tensor exactly the same way argmax does, but in this case, calculating a mean value.
An input (10,5) would come out also as (10,).
Final result:
So, we can conclude that both metrics return tensors with the same shape. Now, the reason they both don't reduce everthing to a scalar value is because Keras offers more possibilities, such as sample weighting and a few other additional operations with the loss (including your own custom losses if you want to take Keras' as a base). These will rely on having the loss separed by samples.
Later somewhere, Keras will calculate the final mean.