I'm starting to play around with theano, and so I tried computing a simple function and testing the output, however when I test a theano compiled version versus a non theano version the outputs are a bit different....
The code:
import numpy as np
import theano.tensor as T
from theano import function
np.random.seed(1)
S = np.random.rand(4,3)
Q = np.random.rand(4,3)
def MSE(a, b):
n = min(a.shape[0], b.shape[0])
fhat = T.dvector('fhat')
y = T.dvector('y')
mse = ((y - fhat)**2).sum() / n
mse_f = function([y, fhat], mse)
return mse_f(a,b)
for row in range(S.shape[0]):
print(MSE(S[row], Q[row]))
for i in range(S.shape[0]):
print(((S[i] - Q[i])**2).sum() / S.shape[0])
the outputs:
# from MSE function
0.0623486922837
0.0652202301174
0.151698460419
0.187325204482
# non theano output
0.0467615192128
0.0489151725881
0.113773845314
0.140493903362
What am I over looking here?
In the expression in this statement
print(((S[i] - Q[i])**2).sum() / S.shape[0])
you should divide by S.shape[1], not S.shape[0].
You created S using S = np.random.rand(4,3), which means S has shape (4, 3). That is, S.shape is (4, 3). The length of each row in S is S.shape[1].
Related
I understand that most FFT/IFFT routines have an error floor. I was expecting NumPy's FFT to have an error floor in the same orders as FFTW (say 1e-15), but the following experiment shows errors in the order of 1e-5.
Consider calculating the IDFT of a box. It is well-known that the result is the sinc-like Dirichlet kernel. But that is not what I get from numpy.fft.irfft. In fact even the first sample that should simply equal the width of the box divided by the number of FFT points is off by an amount around 4e-5 as the following example shows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import diric
N = 40960
K = 513
X = np.ones(K, dtype=np.complex)
x = np.fft.irfft(X, N)
print("x[0] = %g: expected %g - error = %g" % (x[0], (2*K+1)/N, x[0]-(2*K+1)/N))
# expected IDFT of a box is Dirichlet function (see
# https://en.wikipedia.org/wiki/Discrete_Fourier_transform#Some_discrete_Fourier_transform_pairs)
y = diric(2*np.pi*np.arange(N)/N, 2*K+1) * (2*K+1) / N
plt.figure()
plt.plot(x[:1024] - y[:1024])
plt.title('error')
plt.show(block=True)
It looks like the error is of sinusoidal form:
Has anybody experience same issue? Am I misunderstanding something about the NumPy's FFT pack or it is just not accurate?
Update
Here is the equivalent of part of the script in Octave:
N = 40960;
K = 513;
X = zeros(1, N);
X(1:K) = 1;
X(N-K:N) = 1;
x = ifft(X);
fprintf("x[0] = %g, expected = %g - error = %g\n", x(1), (2*K+1)/N, x(1)-(2*K+1)/N);
The error on x[0] is practically zero in Octave. (I did not check other samples because I am not aware of equivalent of diric function in Octave.)
Thanks to MarkDickinson, I realized that my math was wrong. The correct comparison would be carried out by:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import diric
N = 40960
K = 513
X = np.ones(K+1, dtype=np.complex)
x = np.fft.irfft(X, N)
print("x[0] = %g: expected %g - error = %g" % (x[0], (2*K+1)/N, x[0]-(2*K+1)/N))
# expected IDFT of a box is Dirichlet function (see
# https://en.wikipedia.org/wiki/Discrete_Fourier_transform#Some_discrete_Fourier_transform_pairs)
y = diric(2*np.pi*np.arange(N)/N, 2*K+1) * (2*K+1) / N
plt.figure()
plt.plot(x[:1024] - y[:1024])
plt.title('error')
plt.show(block=True)
that shows irfft is accurate. Here is the error plot:
Numpy is correct, my math was incorrect. I am sorry for posting this misleading question. I don't know what is the standard procedure in these cases. Should I delete my question or leave it here with this answer? I just don't want it to be undermining NumPy or challanging its accuracy (as this was clearly a false alarm).
I have an array of scalars of m rows and n columns. I have a Variable(m) and a Variable(n) that I would like to find solutions for.
The two variables represent values that need to be broadcast over the columns and rows respectively.
I was naively thinking of writing the variables as Variable((m, 1)) and Variable((1, n)), and adding them together as if they're ndarrays. However, that doesn't work, as broadcasting is not allowed.
import cvxpy as cp
import numpy as np
# Problem data.
m = 3
n = 4
np.random.seed(1)
data = np.random.randn(m, n)
# Construct the problem.
x = cp.Variable((m, 1))
y = cp.Variable((1, n))
objective = cp.Minimize(cp.sum(cp.abs(x + y + data)))
# or:
#objective = cp.Minimize(cp.sum_squares(x + y + data))
prob = cp.Problem(objective)
result = prob.solve()
print(x.value)
print(y.value)
This fails on the x + y expression: ValueError: Cannot broadcast dimensions (3, 1) (1, 4).
Now I'm wondering two things:
Is my problem indeed solvable using convex optimization?
If yes, how can I express it in a way that cvxpy understands?
I'm very new to the concept of convex optimization, as well as cvxpy, and I hope I described my problem well enough.
I offered to show you how to represent this as a linear program, so here it goes. I'm using Pyomo, since I'm more familiar with that, but you could do something similar in PuLP.
To run this, you will need to first install Pyomo and a linear program solver like glpk. glpk should work for reasonable-sized problems, but if you are finding it's taking too long to solve, you could try a (much faster) commercial solver like CPLEX or Gurobi.
You can install Pyomo via pip install pyomo or conda install -c conda-forge pyomo. You can install glpk from https://www.gnu.org/software/glpk/ or via conda install glpk. (I think PuLP comes with a version of glpk built-in, so that might save you a step.)
Here's the script. Note that this calculates absolute error as a linear expression by defining one variable for the positive component of the error and another for the negative part. Then it seeks to minimize the sum of both. In this case, the solver will always set one to zero since that's an easy way to reduce the error, and then the other will be equal to the absolute error.
import random
import pyomo.environ as po
random.seed(1)
# ~50% sparse data set, big enough to populate every row and column
m = 10 # number of rows
n = 10 # number of cols
data = {
(r, c): random.random()
for r in range(m)
for c in range(n)
if random.random() >= 0.5
}
# define a linear program to find vectors
# x in R^m, y in R^n, such that x[r] + y[c] is close to data[r, c]
# create an optimization model object
model = po.ConcreteModel()
# create indexes for the rows and columns
model.ROWS = po.Set(initialize=range(m))
model.COLS = po.Set(initialize=range(n))
# create indexes for the dataset
model.DATAPOINTS = po.Set(dimen=2, initialize=data.keys())
# data values
model.data = po.Param(model.DATAPOINTS, initialize=data)
# create the x and y vectors
model.X = po.Var(model.ROWS, within=po.NonNegativeReals)
model.Y = po.Var(model.COLS, within=po.NonNegativeReals)
# create dummy variables to represent errors
model.ErrUp = po.Var(model.DATAPOINTS, within=po.NonNegativeReals)
model.ErrDown = po.Var(model.DATAPOINTS, within=po.NonNegativeReals)
# Force the error variables to match the error
def Calculate_Error_rule(model, r, c):
pred = model.X[r] + model.Y[c]
err = model.ErrUp[r, c] - model.ErrDown[r, c]
return (model.data[r, c] + err == pred)
model.Calculate_Error = po.Constraint(
model.DATAPOINTS, rule=Calculate_Error_rule
)
# Minimize the total error
def ClosestMatch_rule(model):
return sum(
model.ErrUp[r, c] + model.ErrDown[r, c]
for (r, c) in model.DATAPOINTS
)
model.ClosestMatch = po.Objective(
rule=ClosestMatch_rule, sense=po.minimize
)
# Solve the model
# get a solver object
opt = po.SolverFactory("glpk")
# solve the model
# turn off "tee" if you want less verbose output
results = opt.solve(model, tee=True)
# show solution status
print(results)
# show verbose description of the model
model.pprint()
# show X and Y values in the solution
for r in model.ROWS:
print('X[{}]: {}'.format(r, po.value(model.X[r])))
for c in model.COLS:
print('Y[{}]: {}'.format(c, po.value(model.Y[c])))
Just to complete the story, here's a solution that's closer to your original example. It uses cvxpy, but with the sparse data approach from my solution.
I don't know the "official" way to do elementwise calculations with cvxpy, but it seems to work OK to just use the standard Python sum function with a lot of individual cp.abs(...) calculations.
This gives a solution that is very slightly worse than the linear program, but you may be able to fix that by adjusting the solution tolerance.
import cvxpy as cp
import random
random.seed(1)
# Problem data.
# ~50% sparse data set
m = 10 # number of rows
n = 10 # number of cols
data = {
(i, j): random.random()
for i in range(m)
for j in range(n)
if random.random() >= 0.5
}
# Construct the problem.
x = cp.Variable(m)
y = cp.Variable(n)
objective = cp.Minimize(
sum(
cp.abs(x[i] + y[j] + data[i, j])
for (i, j) in data.keys()
)
)
prob = cp.Problem(objective)
result = prob.solve()
print(x.value)
print(y.value)
I did not get the idea, but just some hacky stuff based on the assumption:
you want some cvxpy-equivalent to numpy's broadcasting-rules behaviour on arrays (m, 1) + (1, n)
So numpy-wise:
m = 3
n = 4
np.random.seed(1)
a = np.random.randn(m, 1)
b = np.random.randn(1, n)
a
array([[ 1.62434536],
[-0.61175641],
[-0.52817175]])
b
array([[-1.07296862, 0.86540763, -2.3015387 , 1.74481176]])
a + b
array([[ 0.55137674, 2.48975299, -0.67719333, 3.36915713],
[-1.68472504, 0.25365122, -2.91329511, 1.13305535],
[-1.60114037, 0.33723588, -2.82971045, 1.21664001]])
Let's mimic this with np.kron, which has a cvxpy-equivalent:
aLifted = np.kron(np.ones((1,n)), a)
bLifted = np.kron(np.ones((m,1)), b)
aLifted
array([[ 1.62434536, 1.62434536, 1.62434536, 1.62434536],
[-0.61175641, -0.61175641, -0.61175641, -0.61175641],
[-0.52817175, -0.52817175, -0.52817175, -0.52817175]])
bLifted
array([[-1.07296862, 0.86540763, -2.3015387 , 1.74481176],
[-1.07296862, 0.86540763, -2.3015387 , 1.74481176],
[-1.07296862, 0.86540763, -2.3015387 , 1.74481176]])
aLifted + bLifted
array([[ 0.55137674, 2.48975299, -0.67719333, 3.36915713],
[-1.68472504, 0.25365122, -2.91329511, 1.13305535],
[-1.60114037, 0.33723588, -2.82971045, 1.21664001]])
Let's check cvxpy semi-blindly (we only dimensions; too lazy to setup a problem and fix variable to check the output :-D):
import cvxpy as cp
x = cp.Variable((m, 1))
y = cp.Variable((1, n))
cp.kron(np.ones((1,n)), x) + cp.kron(np.ones((m, 1)), y)
# Expression(AFFINE, UNKNOWN, (3, 4))
# looks good!
Now some caveats:
i don't know how efficient cvxpy can reason about this matrix-form internally
unclear if more efficient as a simple list-comprehension based form using cp.vstack and co (it probably is)
this operation itself kills all sparsity
(if both vectors are dense; your matrix is dense)
cvxpy and more or less all convex-optimization solvers are based on some sparsity assumption
scaling this problem up to machine-learning dimensions will not make you happy
there is probably a much more concise mathematical theory for your problem then to use (sparsity-assuming) (pretty) general (DCP implemented in cvxpy is a subset) convex-optimization
import numpy as np
from matplotlib import pyplot as plot
def sigmoid(x):
return 1.0/(1+np.asmatrix(np.exp(-x)))
def graD(X,y,alpha,s0,numda):
m=np.size(X,0)
n=np.size(X,1)
X0=X[:,0]
X1=X[:,1:]
theta=np.asmatrix(np.zeros(np.size(X,1))).T
s=100
lit=0
Jlist=[]
while abs(s)>s0 and lit<=10000:
theta0=theta[0]
theta1=theta[1:]
theta0-=(float(alpha)/m)*X0.T*(sigmoid(X*theta)-y)
theta1-=float(alpha)*((1.0/m)*X1.T*(sigmoid(X*theta)-y)+float(numda)/m*theta1)
theta=np.vstack((np.asmatrix(theta0),np.asmatrix(theta1)))
Jlist.append( cost(X,y,theta,numda) )
lit+=1
s=sum((float(1.0)/m)*X.T*(sigmoid(X*theta)-y))/float(n)
plot.scatter( range(0, len(Jlist)), Jlist )
return theta
def cost(X,y,theta,numda):
m=X.shape[0]
J = (-1.0/m)*( (-y).T*np.log( sigmoid(X*theta) ) - (1-y).T*np.log(1- sigmoid(X*theta) ) ) + (numda/ (m*2)) * (theta[0,1:].T * theta[0,1:] )
return J
I have printed out the result of the function cost which calculates the cost function for logistic regression, but I found that it is a [] with a length 0
also, I tried it separately using:
c(X,y,theta,30)
Out[69]: matrix([], shape=(0, 0), dtype=float64)
and the problem also exists, I am new to ml and python, and I really can not solve this problem
I think that what you want to do is:
def cost(X,y,theta,numda):
m=X.shape[0]
J = (-1.0/m)*(np.dot(-y.T, np.log(sigmoid(np.dot(X,theta)))) - np.dot(1-y.T, np.log(1-sigmoid(np.dot(X, theta))))) + (numda/(m*2)) * (np.linalg.norm(theta[1:])**2)
return J
In numpy, * is the element-wise product, while np.dot() is the matrix product. I think that was your main confusion.
How do I implement this metric in Keras? My code below gives the wrong result!
Note that I'm undoing a previous log(x + 1) transformation via exp(x) - 1, also negative predictions are clipped to 0:
def rmsle_cust(y_true, y_pred):
first_log = K.clip(K.exp(y_pred) - 1.0, 0, None)
second_log = K.clip(K.exp(y_true) - 1.0, 0, None)
return K.sqrt(K.mean(K.square(K.log(first_log + 1.) - K.log(second_log + 1.)), axis=-1)
For comparison, here's the standard numpy implementation:
def rmsle_cust_py(y, y_pred, **kwargs):
# undo 1 + log
y = np.exp(y) - 1
y_pred = np.exp(y_pred) - 1
y_pred[y_pred < 0] = 0.0
to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
return (sum(to_sum) * (1.0/len(y))) ** 0.5
What I'm doing wrong? Thanks!
EDIT: Setting axis=0 seems to give a value very close to the correct one, but I'm not sure since all the code I've seem uses axis=-1.
I ran into the same problem and searched for it, here is what I found
https://www.kaggle.com/jpopham91/rmlse-vectorized
After modified a bit, this seems to work for me,rmsle_K method implemented with Keras and TensorFlow.
import numpy as np
import math
from keras import backend as K
import tensorflow as tf
def rmsle(y, y0):
assert len(y) == len(y0)
return np.sqrt(np.mean(np.power(np.log1p(y)-np.log1p(y0), 2)))
def rmsle_loop(y, y0):
assert len(y) == len(y0)
terms_to_sum = [(math.log(y0[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y0)]
return (sum(terms_to_sum) * (1.0/len(y))) ** 0.5
def rmsle_K(y, y0):
return K.sqrt(K.mean(K.square(tf.log1p(y) - tf.log1p(y0))))
r = rmsle(y=[5, 20, 12], y0=[8, 16, 12])
r1 = rmsle_loop(y=[5, 20, 12], y0=[8, 16, 12])
r2 = rmsle_K(y=[5., 20., 12.], y0=[8., 16., 12.])
print(r)
print(r1)
sess = tf.Session()
print(sess.run(r2))
Result:
Using TensorFlow backend
0.263978210565
0.263978210565
0.263978
By the use of a list (to_sum) in the numpy implementation, I suspect your numpy array has shape (length,).
And on Keras, since you've got different results with axis=0 and axis=1, you probably got some shape like (length,1).
Also, when creating the to_sum list, you're using y[i] and y_pred[i], which means you're taking elements from the axis=0 in numpy implementation.
The numpy implementation also sums everything for calculating the mean in sum(to_sum). So, you really don't need to use any axis in the K.mean.
If you make sure your model's output shape is either (length,) or (length,1), you can use just K.mean(value) without passing the axis parameter.
Easiest thing might be for me to just post the numpy code that I'm trying to perform directly in Theano if it's possible:
tensor = shared(np.random.randn(7, 16, 16)).eval()
tensor2 = tensor[0,:,:].eval()
tensor2[tensor2 < 1] = 0.0
tensor2[tensor2 > 0] = 1.0
new_tensor = [tensor2]
for i in range(1, tensor.shape[0]):
new_tensor.append(np.multiply(tensor2, tensor[i,:,:].eval()))
output = np.array(new_tensor).reshape(7,16,16)
If it's not immediately obvious, what I'm trying to do is use the values from one matrix of a tensor made up of 7 different matrices and apply that to the other matrices in the tensor.
Really, the problem I'm solving is doing conditional statements in an objective function for a fully convoltional network in Keras. Basically the loss for some of the feature map values is going to be calculated (and subsequently weighted) differently from others depending on some of the values in one of the feature maps.
You can easily implement conditionals with switch statement.
Here would be the equivalent code:
import theano
from theano import tensor as T
import numpy as np
def _check_new(var):
shape = var.shape[0]
t_1, t_2 = T.split(var, [1, shape-1], 2, axis=0)
ones = T.ones_like(t_1)
cond = T.gt(t_1, ones)
mask = T.repeat(cond, t_2.shape[0], axis=0)
out = T.switch(mask, t_2, T.zeros_like(t_2))
output = T.join(0, cond, out)
return output
def _check_old(var):
tensor = var.eval()
tensor2 = tensor[0,:,:]
tensor2[tensor2 < 1] = 0.0
tensor2[tensor2 > 0] = 1.0
new_tensor = [tensor2]
for i in range(1, tensor.shape[0]):
new_tensor.append(np.multiply(tensor2, tensor[i,:,:]))
output = theano.shared(np.array(new_tensor).reshape(7,16,16))
return output
tensor = theano.shared(np.random.randn(7, 16, 16))
out1 = _check_new(tensor).eval()
out2 = _check_old(tensor).eval()
print out1
print '----------------'
print ((out1-out2) ** 2).mean()
Note: since your masking on the first filter, I needed to use split and join operations.