How do I implement this metric in Keras? My code below gives the wrong result!
Note that I'm undoing a previous log(x + 1) transformation via exp(x) - 1, also negative predictions are clipped to 0:
def rmsle_cust(y_true, y_pred):
first_log = K.clip(K.exp(y_pred) - 1.0, 0, None)
second_log = K.clip(K.exp(y_true) - 1.0, 0, None)
return K.sqrt(K.mean(K.square(K.log(first_log + 1.) - K.log(second_log + 1.)), axis=-1)
For comparison, here's the standard numpy implementation:
def rmsle_cust_py(y, y_pred, **kwargs):
# undo 1 + log
y = np.exp(y) - 1
y_pred = np.exp(y_pred) - 1
y_pred[y_pred < 0] = 0.0
to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
return (sum(to_sum) * (1.0/len(y))) ** 0.5
What I'm doing wrong? Thanks!
EDIT: Setting axis=0 seems to give a value very close to the correct one, but I'm not sure since all the code I've seem uses axis=-1.
I ran into the same problem and searched for it, here is what I found
https://www.kaggle.com/jpopham91/rmlse-vectorized
After modified a bit, this seems to work for me,rmsle_K method implemented with Keras and TensorFlow.
import numpy as np
import math
from keras import backend as K
import tensorflow as tf
def rmsle(y, y0):
assert len(y) == len(y0)
return np.sqrt(np.mean(np.power(np.log1p(y)-np.log1p(y0), 2)))
def rmsle_loop(y, y0):
assert len(y) == len(y0)
terms_to_sum = [(math.log(y0[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y0)]
return (sum(terms_to_sum) * (1.0/len(y))) ** 0.5
def rmsle_K(y, y0):
return K.sqrt(K.mean(K.square(tf.log1p(y) - tf.log1p(y0))))
r = rmsle(y=[5, 20, 12], y0=[8, 16, 12])
r1 = rmsle_loop(y=[5, 20, 12], y0=[8, 16, 12])
r2 = rmsle_K(y=[5., 20., 12.], y0=[8., 16., 12.])
print(r)
print(r1)
sess = tf.Session()
print(sess.run(r2))
Result:
Using TensorFlow backend
0.263978210565
0.263978210565
0.263978
By the use of a list (to_sum) in the numpy implementation, I suspect your numpy array has shape (length,).
And on Keras, since you've got different results with axis=0 and axis=1, you probably got some shape like (length,1).
Also, when creating the to_sum list, you're using y[i] and y_pred[i], which means you're taking elements from the axis=0 in numpy implementation.
The numpy implementation also sums everything for calculating the mean in sum(to_sum). So, you really don't need to use any axis in the K.mean.
If you make sure your model's output shape is either (length,) or (length,1), you can use just K.mean(value) without passing the axis parameter.
Related
I found an unexpected behavior with nn.LayerNorm.
There seems to be some approximation of the Mean leading to a numerical error. For example, applying nn.LayerNorm to a tensor with elements all equal, the result is expected to be a tensor of all zeros (since X - E[X] = 0), but it is not so, there are elements with values of the order of 1e-4.
Here an example.
import torch
import torch.nn as nn
c = 25
x = torch.ones(1,112,112,128)*c
layer = nn.LayerNorm(normalized_shape=128)
y = layer(x)
print(y.detach().numpy().max())
Output:
0.00024414062
Note that the output depends on the factor c:
with c = 1 the output is correct, that is 0.
with c = 2500, the output is 1e-2. The discrepancy increases with the order of inputs.
This discrepancy is not present by implementing from scratch the behavior expected from nn.LayerNorm.
from typing import Tuple
def layer_norm(
x: torch.Tensor, dim: Tuple[int], eps: float = 1e-5
) -> torch.Tensor:
mean = torch.mean(x, dim=dim, keepdim=True)
var = x.var(dim = dim, keepdim = True, unbiased=False)
return (x - mean) / torch.sqrt(var + eps)
y2 = layer_norm(x,dim=3)
print(y2.detach().numpy().max())
Output:
0.0
Can anyone explain this behavior?
Thanks.
python==3.9.7
torch=='1.11.0+cu102'
OS: Ubuntu 20.04.4 LTS x86_64
I'm implementing Andrew Ng's Coursera course in Python and I'm doing Ex2 right now, Logistic Regression. I'm trying to use SciPy's optimize.minimize but I can't seem to get it to run correctly. I'll try to give as brief a summary of my code as possible while being thorough. I'm using Python3. Here is my variable setup, I move everything to numpy after using pandas to read in the csv file:
import numpy as np
import pandas as pd
from scipy.optimize import fmin_bfgs
from scipy import optimize as opt
from scipy.optimize import minimize
class Ex2:
def __init__(self):
self.pandas_data = pd.read_csv("ex2data1.txt", skipinitialspace=True)
self.data = self.pandas_data.values
self.data = np.insert(self.data, 0, 1, axis=1)
self.x = self.data[:, 0:3]
self.y = self.data[:, 3:]
self.theta = np.zeros(shape=(self.x.shape[1]))
x: (100, 3) numpy ndarray
y: (100, 1) numpy ndarray
theta: (3,) numpy ndarray (1-d)
Then, I define a sigmoid, cost and gradient function to give to Scipy's minimize:
#staticmethod
def sigmoid(x):
return 1/(1 + np.exp(x))
def cost(self, theta):
x = self.x
y = self.y
m = len(y)
h = self.sigmoid(x.dot(theta))
j = (1/m) * ((-y.T.dot(np.log(h))) - ((1-y).T.dot(np.log(1-h))))
return j[0]
def grad(self, theta):
x = self.x
y = self.y
theta = np.expand_dims(theta, axis=0)
m = len(y)
h = self.sigmoid(x.dot(theta.T))
grad = (1/m) * (x.T.dot(h-y))
grad = np.squeeze(grad)
return grad
These take theta, a 1-D numpy ndarray. Cost returns a scalar (the cost associated with the theta given) and gradient returns a 1-D numpy ndarray of updates for theta.
When I then run this code:
def run(self):
options = {'maxiter': 100}
print(minimize(self.cost, self.theta, jac=self.grad, options=options))
ex2 = Ex2()
ex2.run()
I get:
fun: 0.69314718055994529
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
jac: array([ -0.1 , -12.00921659, -11.26284221])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 106
nit: 0
njev: 94
status: 2
success: False
x: array([ 0., 0., 0.])
Process finished with exit code 0
Can't quite get the formatting right on the output, apologies. That's the gist of what I'm doing, am I returning something from cost or gradient incorrectly? That seems most likely to me but I've been trying various combinations and formats of return values and nothing seems to work. Any help is greatly appreciated.
Edit: Among other things, to debug this I've made sure that cost and grad are returning what I expect, which they are (cost: float, grad: 1-D ndarray). Running both on an initial theta array of zeros gives me the same values as I get in Octave (which I know to be correct thanks to the provided code for the exercises). However, giving these values to the minimize function does not seem to be minimizing the theta values as expected.
If anyone stumbles across this and happens to have the same problem, I figured out that in my sigmoid function I should have had
return 1/(1 + np.exp(-x))
but had
return 1/(1 + np.exp(x))
After fixing that, the minimize function converged normally.
Easiest thing might be for me to just post the numpy code that I'm trying to perform directly in Theano if it's possible:
tensor = shared(np.random.randn(7, 16, 16)).eval()
tensor2 = tensor[0,:,:].eval()
tensor2[tensor2 < 1] = 0.0
tensor2[tensor2 > 0] = 1.0
new_tensor = [tensor2]
for i in range(1, tensor.shape[0]):
new_tensor.append(np.multiply(tensor2, tensor[i,:,:].eval()))
output = np.array(new_tensor).reshape(7,16,16)
If it's not immediately obvious, what I'm trying to do is use the values from one matrix of a tensor made up of 7 different matrices and apply that to the other matrices in the tensor.
Really, the problem I'm solving is doing conditional statements in an objective function for a fully convoltional network in Keras. Basically the loss for some of the feature map values is going to be calculated (and subsequently weighted) differently from others depending on some of the values in one of the feature maps.
You can easily implement conditionals with switch statement.
Here would be the equivalent code:
import theano
from theano import tensor as T
import numpy as np
def _check_new(var):
shape = var.shape[0]
t_1, t_2 = T.split(var, [1, shape-1], 2, axis=0)
ones = T.ones_like(t_1)
cond = T.gt(t_1, ones)
mask = T.repeat(cond, t_2.shape[0], axis=0)
out = T.switch(mask, t_2, T.zeros_like(t_2))
output = T.join(0, cond, out)
return output
def _check_old(var):
tensor = var.eval()
tensor2 = tensor[0,:,:]
tensor2[tensor2 < 1] = 0.0
tensor2[tensor2 > 0] = 1.0
new_tensor = [tensor2]
for i in range(1, tensor.shape[0]):
new_tensor.append(np.multiply(tensor2, tensor[i,:,:]))
output = theano.shared(np.array(new_tensor).reshape(7,16,16))
return output
tensor = theano.shared(np.random.randn(7, 16, 16))
out1 = _check_new(tensor).eval()
out2 = _check_old(tensor).eval()
print out1
print '----------------'
print ((out1-out2) ** 2).mean()
Note: since your masking on the first filter, I needed to use split and join operations.
I'm starting to play around with theano, and so I tried computing a simple function and testing the output, however when I test a theano compiled version versus a non theano version the outputs are a bit different....
The code:
import numpy as np
import theano.tensor as T
from theano import function
np.random.seed(1)
S = np.random.rand(4,3)
Q = np.random.rand(4,3)
def MSE(a, b):
n = min(a.shape[0], b.shape[0])
fhat = T.dvector('fhat')
y = T.dvector('y')
mse = ((y - fhat)**2).sum() / n
mse_f = function([y, fhat], mse)
return mse_f(a,b)
for row in range(S.shape[0]):
print(MSE(S[row], Q[row]))
for i in range(S.shape[0]):
print(((S[i] - Q[i])**2).sum() / S.shape[0])
the outputs:
# from MSE function
0.0623486922837
0.0652202301174
0.151698460419
0.187325204482
# non theano output
0.0467615192128
0.0489151725881
0.113773845314
0.140493903362
What am I over looking here?
In the expression in this statement
print(((S[i] - Q[i])**2).sum() / S.shape[0])
you should divide by S.shape[1], not S.shape[0].
You created S using S = np.random.rand(4,3), which means S has shape (4, 3). That is, S.shape is (4, 3). The length of each row in S is S.shape[1].
I would like to perform a multidimensional ODR with scipy.odr. I read the API documentation, it says that multi-dimensionality is possible, but I cannot make it work. I cannot find working example on the internet and API is really crude and give no hints how to proceed.
Here is my MWE:
import numpy as np
import scipy.odr
def linfit(beta, x):
return beta[0]*x[:,0] + beta[1]*x[:,1] + beta[2]
n = 1000
t = np.linspace(0, 1, n)
x = np.full((n, 2), float('nan'))
x[:,0] = 2.5*np.sin(2*np.pi*6*t)+4
x[:,1] = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
e = 0.25*np.random.randn(n)
y = 3*x[:,0] + 4*x[:,1] + 5 + e
print(x.shape)
print(y.shape)
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
It raises the following exception:
scipy.odr.odrpack.odr_error: number of observations do not match
Which seems to be related to my matrix shapes, but I do not know how must I shape it properly. Does anyone know?
Firstly, in my experience scipy.odr uses mostly arrays, not matrices. The library seems to make a large amount of size checks along the way and getting it to work with multiple variables seems to be quite troublesome.
This is the workflow how I usually get it to work (and worked at least on python 2.7):
import numpy as np
import scipy.odr
n = 1000
t = np.linspace(0, 1, n)
def linfit(beta, x):
return beta[0]*x[0] + beta[1]*x[1] + beta[2] #notice changed indices for x
x1 = 2.5*np.sin(2*np.pi*6*t)+4
x2 = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
x = np.row_stack( (x1, x2) ) #odr doesn't seem to work with column_stack
e = 0.25*np.random.randn(n)
y = 3*x[0] + 4*x[1] + 5 + e #indices changed
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
So using identical (1D?) arrays, using row_stack and adressing by single index number seems to work.