fmin_cg function usage for minimizing neural network cost function - python

I am trying to port some of my code from MatLab into Python and am running into problems with scipy.optimize.fmin_cg function - this is the code I have at the moment:
My cost function:
def nn_costfunction2(nn_params,*args):
Theta1, Theta2 = reshapeTheta(nn_params)
input_layer_size, hidden_layer_size, num_labels, X, y, lam = args[0], args[1], args[2], args[3], args[4], args[5]
m = X.shape[0] #Length of vector
X = np.hstack((np.ones([m,1]),X)) #Add in the bias unit
layer1 = sigmoid(Theta1.dot(np.transpose(X))) #Calculate first layer
layer1 = np.vstack((np.ones([1,layer1.shape[1]]),layer1)) #Add in bias unit
layer2 = sigmoid(Theta2.dot(layer1))
y_matrix = np.zeros([y.shape[0],layer2.shape[0]]) #Create a matrix where vector position of one corresponds to label
for i in range(y.shape[0]):
y_matrix[i,y[i]-1] = 1
#Cost function
J = (1/m)*np.sum(np.sum(-y_matrix.T.conj()*np.log(layer2),axis=0)-np.sum((1-y_matrix.T.conj())*np.log(1-layer2),axis=0))
#Add in regularization
J = J+(lam/(2*m))*np.sum(np.sum(Theta1[:,1:].conj()*Theta1[:,1:])+np.sum(Theta2[:,1:].conj()*Theta2[:,1:]))
#Backpropagation with vectorization and regularization
delta_3 = layer2 - y_matrix.T
r2 = delta_3.T.dot(Theta2[:,1:])
z_2 = Theta1.dot(X.T)
delta_2 = r2*sigmoidGradient(z_2).T
t1 = (lam/m)*Theta1[:,1:]
t1 = np.hstack((np.zeros([t1.shape[0],1]),t1))
t2 = (lam/m)*Theta2[:,1:]
t2 = np.hstack((np.zeros([t2.shape[0],1]),t2))
Theta1_grad = (1/m)*(delta_2.T.dot(X))+t1
Theta2_grad = (1/m)*(delta_3.dot(layer1.T))+t2
nn_params = np.hstack([Theta1_grad.flatten(),Theta2_grad.flatten()]) #Unroll parameters
return nn_params
My call of the function:
args = (input_layer_size, hidden_layer_size, num_labels, X, y, lam)
fmin_cg(nn_costfunction2,nn_params, args=args,maxiter=50)
Gives the following error:
File "C:\WinPython3\python-3.3.2.amd64\lib\site-packages\scipy\optimize\optimize.py", line 588, in approx_fprime
grad[k] = (f(*((xk+d,)+args)) - f0) / d[k]
ValueError: setting an array element with a sequence.
I tried various permutations in passing arguments to fmin_cg but this is the farthest I got. Running the cost function on its own does not throw any errors in this form.

The input variable in cost function should be an 1D array. So your Theta1 and Theta2 in J have to be derived from nn_params. And you need to return J as well.

Try to add epsilon argument in function call:
fmin_cg(nn_costfunction2,nn_params, args=args,epsilon,maxiter=50)

I see this issue is due to the fact you let nnCostFunction2 return cost and grad.
But the scipy.optimize.fmin_cg function will only take single cost output of nnCostFunction2.
So retain single J or cost output from nnCostFunction2 function.
this is my function which is working:
scipy.optimize.fmin_cg(nnCostFunction, initial_rand_theta, backpropagate, \
args=(hidden_s, input_s, num_labels, X, y, lamb), maxiter=1000, \
disp=True, full_output=True)

Related

Trying to write a custom loss function in tensorflow

I am trying to make a custom loss function where I perform an inverse fast Fourier transform to a set of data and then do the following calculations. When I run this model the gradient returns an array of Nones , size 14. and gives me the following error: ValueError: No gradients provided for any variable this is the loss function code snippet:
loss = []
for i in range(batchsize):
x3 = tf.signal.ifft(data.numpy()[:, i])
loss.append(tf.reduce_max(KB.square(abs_with_grad(x3)), axis = -1) / tf.reduce_mean(KB.square(abs_with_grad(x3)), axis = -1))
x = tf.reduce_sum(loss, axis=-1)
return x
i suspect the ifft is not differentiable but i don't know how to fix this problem. any help or hints is much appreciated
Edit: there is some input output conditioning happening before passing data to loss function, here is the loss function with this operations done inside
def PAPR_Loss(y_true, y_pred):
batchsize = 400
print(y_pred)
datanp = np.zeros((N, batchsize), dtype = complex)
for i in range(batchsize):
counter_input = 0
counter_output = 0
for j in range(0, N):
if j not in Reserved_phases:
datanp[j, i] = y_true.numpy()[i, counter_input] + 1j* y_true.numpy()[i, counter_input+1]
counter_input += 2
else:
datanp[j, i] = y_pred.numpy()[0, counter_output] + 1j* y_pred.numpy()[0, counter_output+1]
counter_output += 2
data = tf.Variable(datanp, dtype=tf.complex64, trainable= True)
print(data)
loss = np.zeros(batchsize, dtype = float)
for i in range(batchsize):
x3 = tf.signal.ifft(data.numpy()[:, i])
loss[i] = tf.reduce_max(KB.square(abs_with_grad(x3.numpy())), axis = -1) / tf.reduce_mean(KB.square(abs_with_grad(x3.numpy())), axis = -1)
print(loss)
return loss

Pytorch: multiplication between parameters is inplace for LBFGS optimizer?

I am trying to solve a kind of inverse problem by backward propagation with pytorch. I am trying to recover the parameters (r, theta) that generate a vector field U(r,theta).
As I intended to use the LBFGS optimizer from pytorch, I realize that the operation
r*theta
is detected as inplace and thus not supported for the backward computation of the gradient, whereas
r+theta is not.
How can I overcome this ? I actually need to recover fields that use transformations of the form r*theta.
Here is an example of a code that reproduces the error: it is running fine if you change
field = Wrong_U_param(r, theta, positions)
by
field = U_param(r, theta, positions)
in the loop. Is also works if you replace the r*theta operation by r.item()*theta (but is does not optimize over r since there is no more gradient depending on r.
I tried to use torch.mul() to run the product but it also fails.
The error message is the following
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
and the automatic detection points towards this very product.
Thank you for your help !
import numpy as np
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import torch.optim as optim
from geomloss import SamplesLoss
torch.autograd.set_detect_anomaly(True)
def model(field):
return field
def U_param(r, theta, pos):
result = r + theta + 0. * pos
return result
def Wrong_U_param(r, theta, pos):
result = r * theta + 0. * pos
return result
def learn_U_param(Zobs, ngrad, params, r_guess=0., theta_guess=0., lambd=1.):
Npts = params[0]
positions = torch.tensor(np.arange(0, 1, 1 / Npts) + 1 / 2 / Npts).reshape((Npts, 1))
lab = torch.tensor(np.arange(0, Npts))
r = torch.tensor(float(r_guess)).to(device)
r.requires_grad = True
theta = torch.tensor(float(theta_guess)).to(device)
theta.requires_grad = True
r_hist = [r.item()]
theta_hist = [theta.item()]
loss_hist = []
optimizer = optim.LBFGS([r, theta])
for i in range(ngrad):
field = Wrong_U_param(r, theta, positions)
Z = model(field)
Loss = SamplesLoss(loss="sinkhorn", p=2, blur=.05)
Wass = Loss(lab, Z, positions, lab, Zobs, positions)
def closure():
optimizer.zero_grad()
Wass.backward(retain_graph=True)
return Wass
optimizer.step(closure)
optimizer.zero_grad()
r_hist.append(r.item())
theta_hist.append(theta.item())
loss_hist.append(Wass.item())
return r_hist, theta_hist, loss_hist
N=100
r = 2
theta = 2
params = [N]
positions = torch.tensor(np.arange(0, 1, 1 / N) + 1 / 2 / N).reshape((N, 1))
Zobs = U_param(r, theta, positions)
ngrad = 10
print(learn_U_param(Zobs, ngrad, params, r_guess=0.1, theta_guess=0.1, lambd=1.))

Incremental Bayesian updates with multi-dimensional parameters

I am trying to use PYMC3 for a Bayesian model where I would like to repeatedly train my model on new unseen data. I am thinking I would need to update the priors with the posterior of the previously trained model every time I see the data, similar to how is achieved here https://docs.pymc.io/notebooks/updating_priors.html. They use the following function that finds the KDE from the samples and replacing each of the original definitions of the parameters in the model with a call to from_posterior.
def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
y = stats.gaussian_kde(samples)(x)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return Interpolated(param, x, y)
And here is my original model.
def create_model(batsmen, bowlers, id1, id2, X):
testval = [[-5,0,1,2,3.5,5] for i in range(0, 9)]
l = [i for i in range(9)]
model = pm.Model()
with model:
delta_1 = pm.Uniform("delta_1", lower=0, upper=1)
delta_2 = pm.Uniform("delta_2", lower=0, upper=1)
inv_sigma_sqr = pm.Gamma("sigma^-2", alpha=1.0, beta=1.0)
inv_tau_sqr = pm.Gamma("tau^-2", alpha=1.0, beta=1.0)
mu_1 = pm.Normal("mu_1", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(batsmen))
mu_2 = pm.Normal("mu_2", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(bowlers))
delta = pm.math.ge(l, 3) * delta_1 + pm.math.ge(l, 6) * delta_2
eta = [pm.Deterministic("eta_" + str(i), delta[i] + mu_1[id1[i]] - mu_2[id2[i]]) for i in range(9)]
cutpoints = pm.Normal("cutpoints", mu=0, sigma=1/pm.math.sqrt(inv_sigma_sqr), transform=pm.distributions.transforms.ordered, shape=(9,6), testval=testval)
X_ = [pm.OrderedLogistic("X_" + str(i), cutpoints=cutpoints[i], eta=eta[i], observed=X[i]-1) for i in range(9)]
return model
Here, the problem is that some of my parameters such as mu_1, are multidimensional. This is why I get the following error:
ValueError: points have dimension 1, dataset has dimension 1500
because of the line y = stats.gaussian_kde(samples)(x).
Can someone please help me make this work for multi-dimensional parameters? I don't properly understand what KDE is and how the code computes it.
Thank you in advance!!

Scipy `fmin_cg` args are not match with my functions args

I am trying to build a linear regression model and find optimal values using fmin_cg optimizer.
I have two functions for this job. First linear_reg_cost which is cost function and second linear_reg_grad which is gradient of cost function. This functions both have same argument.
def hypothesis(x,theta):
return np.dot(x,theta)
Cost function:
def linear_reg_cost(x_flatten, y, theta_flatten, lambda_, num_of_features,num_of_samples):
x = x_flatten.reshape(num_of_samples, num_of_features)
theta = theta_flatten.reshape(n,1)
loss = hypothesis(x,theta)-y
regularizer = lambda_*np.sum(theta[1:,:]**2)/(2*m)
j = np.sum(loss ** 2)/(2*m)
return j
Gradient function:
def linear_reg_grad(x_flatten, y, theta_flatten, lambda_, num_of_features,num_of_samples):
x = x_flatten.reshape(num_of_samples, num_of_features)
m,n = x.shape
theta = theta_flatten.reshape(n,1)
new_theta = np.zeros(shape=(theta.shape))
loss = hypothesis(x,theta)-y
gradient = np.dot(x.T,loss)
new_theta[0:,:] = gradient/m
new_theta[1:,:] = gradient[1:,:]/m + lambda_*(theta[1:,]/m)
return new_theta
and fmin_cg:
theta = np.ones(n)
from scipy.optimize import fmin_cg
new_theta = fmin_cg(f=linear_reg_cost, x0=theta, fprime=linear_reg_grad,args=(x.flatten(), y, lambda_, m,n))
Note: I flatten x as input and retrieve in the cost and gradient function as matrix.
the output error:
<ipython-input-98-b29c1b8f6e58> in linear_reg_grad(x_flatten, y, theta_flatten, lambda_, num_of_features, num_of_samples)
1 def linear_reg_grad(x_flatten, y, theta_flatten, lambda_,num_of_features, num_of_samples):
----> 2 x = x_flatten.reshape(num_of_samples, num_of_features)
3 m,n = x.shape
4 theta = theta_flatten.reshape(n,1)
5 new_theta = np.zeros(shape=(theta.shape))
ValueError: cannot reshape array of size 2 into shape (2,12)
Note: x.shape = (12,2), y.shape = (12,1) ,theta.shape = (2,). So num_of_features =2 and num_of_samples=12. But error shows that my input x is parsing instead of theta. Why this happening even when I explicitly assigned args in fmin_cg? And how I should solve this problem?
Thanks for any advice
All of your implementations are correct but you have a little mistake.
Be inform to pass arguments in order for both of your functions.
Your problem is the order of num_of_feature and num_of_samples. You can replace their position with each other in linear_reg_grad or linear_reg_cost. Of course you should change this order in scipy.optimize.fmin_cg, args argument.
Second important thing is, x as first argument in fmin_cg is the variable you want to update each time and find the optimal one. So in your solution, x in fmin_cg must be theta not your x which is your input.

scipy.optimize.minimize function with L-BFGS-B method maxiter attribute not working

I have a simple cost function, which I want to optimize using scipy.optimize.minimize function.
opt_solution = scipy.optimize.minimize(costFunction, theta, args = (training_data,), method = 'L-BFGS-B', jac = True, options = {'maxiter': 100)
where costFunction is the function to be optimized, theta are the parameters to be optimized. Inside costFunction, I printed the value of cost function. But the parameter maxiter seems to have no effect whether I increase value from 10 to 100000. The time it is taking is same. Also, I was expecting the printed value of cost function should be equal to the values of maxiter. So I am feeling maxiter has no effect. What might be the problem ?
Cost function is
def costFunction(self, theta, input):
""" Extract weights and biases from 'theta' input """
W1 = theta[self.limit0 : self.limit1].reshape(self.hidden_size, self.visible_size)
W2 = theta[self.limit1 : self.limit2].reshape(self.visible_size, self.hidden_size)
b1 = theta[self.limit2 : self.limit3].reshape(self.hidden_size, 1)
b2 = theta[self.limit3 : self.limit4].reshape(self.visible_size, 1)
""" Compute output layers by performing a feedforward pass
Computation is done for all the training inputs simultaneously """
hidden_layer = self.sigmoid(numpy.dot(W1, input) + b1)
output_layer = self.sigmoid(numpy.dot(W2, hidden_layer) + b2)
""" Compute intermediate difference values using Backpropagation algorithm """
diff = output_layer - input
sum_of_squares_error = 0.5 * numpy.sum(numpy.multiply(diff, diff)) / input.shape[1]
weight_decay = 0.5 * self.lamda * (numpy.sum(numpy.multiply(W1, W1)) + numpy.sum(numpy.multiply(W2, W2)))
cost = sum_of_squares_error + weight_decay
""" Compute the gradient values by averaging partial derivatives
Partial derivatives are averaged over all training examples """
W1_grad = numpy.dot(del_hid, numpy.transpose(input))
W2_grad = numpy.dot(del_out, numpy.transpose(hidden_layer))
b1_grad = numpy.sum(del_hid, axis = 1)
b2_grad = numpy.sum(del_out, axis = 1)
W1_grad = W1_grad / input.shape[1] + self.lamda * W1
W2_grad = W2_grad / input.shape[1] + self.lamda * W2
b1_grad = b1_grad / input.shape[1]
b2_grad = b2_grad / input.shape[1]
""" Transform numpy matrices into arrays """
W1_grad = numpy.array(W1_grad)
W2_grad = numpy.array(W2_grad)
b1_grad = numpy.array(b1_grad)
b2_grad = numpy.array(b2_grad)
""" Unroll the gradient values and return as 'theta' gradient """
theta_grad = numpy.concatenate((W1_grad.flatten(), W2_grad.flatten(),
b1_grad.flatten(), b2_grad.flatten()))
# Update counter value
self.counter += 1
print "Index ", self.counter, "cost ", cost
return [cost, theta_grad]
maxiter gives the maximum number of iterations that scipy will try before giving up on improving the solution. But it may very well be satisfied with a solution and stop earlier.
If you look at the docs for minimize when using the 'l-bfgs-b' method, notice there are three parameters you can pass as options (factr, ftol and gtol) that can also cause the iteration to stop.
In simple cases like yours, especially if your cost function also provides the gradient (as indicated by jac=True in your call), convergence typically happens in the first few iterations, hence way before the maxiter limit is reached.

Categories

Resources