Derivative of a sum in Theano - python

So I want to calculate the gradient and Hessian of the following sum. Afaik Theano should be able to do that, however I can't figure out how.
X is a Matrix of size M x N; y M sized vector; beta a N sized vector.
One way to compute the sum is using the scan() function, which I did like this:
res,ups = theano.scan(lambda v,w: v*np.log(1/(1+np.exp(-1*w.dot(beta))))
+((1-v)*(np.log(1/(1+np.exp(w.dot(beta)))))), sequences = [y,X])
t7 = theano.function(inputs = [X,y,beta],outputs = res)
and that works fine as far as I can tell. However, I can't use this as an Input for the grad() function with respect to beta.
So what I would like to know is if there is a way to either use the scan function as input of the grad function or a different way to compute the sum.
(I first tried in sympy, but sympy can't lambdify Indexedbase objects, so I can compute the grad but can't use it as a function, maybe that helps? )
The Sum adds up a function of the Dot Product of a line in X and beta while the binary vector y decides which of two functions will be used.
log(1/(1+exp(-X_i*beta)))
Hope that helps?

Related

Scipy optimize to target

I am trying to optimize a function to get it as close to zero as possible.
The function is:
def goal_seek_func(x: float) -> float:
lcos_list_temp = [energy_output[i] * x for i in range(life)]
npv_lcos_temp = npv(cost_capital, lcos_list_temp)
total = sum([cost_energy_capacity,
cost_power_conversion,
balance_of_plant,
cost_construction_commissioning,
npv_o_m,
npv_eol,
npv_cost_charging,
npv_lcos_temp,
])
return total
All the variables calculated previously in the code.
It is a linear equation, where as x gets smaller, so does total.
I am trying to find the value of x where total is as close to 0 as possible.
I have tried to use:
scipy.optimize.minimize_scalar(goal_seek_func)
but this clearly minimizes the equation to -inf. I have read the docs, but cannot see where to define a target output of the function. Where can I define this, or is there a better method?
I am trying to find the value of x where total is as close to 0 as possible.
Then you want to solve the equation goal_seek_func(x) = 0 instead of minimizing goal_seek_func(x). See here for an explanation of why these two things are not the same. That being said, you can easily solve the equation by minimizing some vector norm of your objective function:
res = scipy.optimize.minimize_scalar(lambda x: goal_seek_func(x)**2)
If the objective value res.fun is zero, res.x solves your equation. Otherwise, res.x is at least the best possible value.

Is there an efficient function to calculate a product?

I'm looking for a numpy function (or a function from any other package) that would efficiently evaluate
with f being a vector-valued function of a vector-valued input x. The product is taken to be a simple component-wise multiplication.
The issue here is that both the length of each x vector and the total number of result vectors (f of x) to be multiplied (N) is very large, in the order of millions. Therefore, it is impossible to generate all the results at once (it wouldn't fit in memory) and then multiply them afterwards using np.multiply.reduce or the like .
A toy example of the type of code I would like to replace is:
import numpy as np
x = np.ones(1000000)
prod = f(x)
for i in range(2, 1000000):
prod *= f(i * np.ones(1000000))
with f a vector-valued function with the dimension of its output equal to the dimension of its input.
To be sure: I'm not looking for equivalent code, but for a single, highly optimized function. Is there such a thing?
For those familiar with Wolfram Mathematica: It would be the equivalent to Product. In Mathematica, I would be able to simply write Product[f[i ConstantArray[1,1000000]],{i,1000000}].
Numpy ufuncs all have a reduce method. np.multiply is a ufunc. So it's a one-liner:
np.multiply.reduce(v)
Where v is the vector of values you compute in what is hopefully an equally efficient manner.
To compute the vector, just apply your function to the input:
v = f(x)
So with your example:
np.multiply.reduce(np.sin(x))
Alternative
A simpler way to phrase the same thing is np.prod:
np.prod(v)
You can also use the prod method directly on your vector:
v.prod()

Linear combination of function objects in python

Problem: I want to numerically integrate a function f(t,N) that may be written as a linear combination of N other known functions g_1(t), ..., g_N(t).
My Solution I: I know the functions g_i and also the coefficients, so my initial idea was to create an row vector of coefficients and a column vector containing the lambda functions g_i and then use np.dot for the inner product to get the function object I want. Unfortunately, you cannot just add two function objects nor multiply a function object by a scalar.
My Solution II: Of course I can do something like (basically defining point wise what I want):
def f(t,N,a,g):
"""
a = numpy array of coefficients
g = numpy array of lambda functions corresponding to functions g_i
"""
res = 0
for i in xrange(N):
res += a[i] * g[i](t)
return res
But the for loop is of course not very great, especially when:
I need to run this function at many many time steps t
I pass this function f into a numerical integration routine like scipy.integrate.quad.
briefly:
In Cython You could speed up indexing using memoryviews.
If these equations are linear You could superimpose them using sympy:
example:
import sympy as sy
x,y = sy.symbols('x y')
g0 = x*0.33 + 6
g1 = x*0.72 + 1.3
g2 = x*11.2 - 6.5
gn = x*3.3 - 7.3
G = [g0,g1,g2,gn]
#this is superimposition
print sum(G).subs(x,15.1)
print sum(gi.subs(x,15.1) for gi in G)
'''
output:
228.305000000000
228.305000000000
'''
If its not what You want, give some example input and output, so that I can try and dont go blind...
With low ram avaiable You could get finall equation to numexpr and evaluate it with some input. Otherwise its best to work on numpy arrays.

TensorFlow: Compute Hessian matrix (and higher order derivatives)

I would like to be able to compute higher order derivatives for my loss function. At the very least I would like to be able to compute the Hessian matrix. At the moment I am computing a numerical approximation to the Hessian but this is more expensive, and more importantly, as far as I understand, inaccurate if the matrix is ill-conditioned (with very large condition number).
Theano implements this through symbolic looping, see here, but Tensorflow does not seem to support symbolic control flow yet, see here. A similar issue has been raised on TF github page, see here, but it looks like nobody has followed up on the issue for a while.
Is anyone aware of more recent developments or ways to compute higher order derivatives (symbolically) in TensorFlow?
Well, you can , with little effort, compute the hessian matrix!
Suppose you have two variables :
x = tf.Variable(np.random.random_sample(), dtype=tf.float32)
y = tf.Variable(np.random.random_sample(), dtype=tf.float32)
and a function defined using these 2 variables:
f = tf.pow(x, cons(2)) + cons(2) * x * y + cons(3) * tf.pow(y, cons(2)) + cons(4) * x + cons(5) * y + cons(6)
where:
def cons(x):
return tf.constant(x, dtype=tf.float32)
So in algebraic terms, this function is
Now we define a method that compute the hessian:
def compute_hessian(fn, vars):
mat = []
for v1 in vars:
temp = []
for v2 in vars:
# computing derivative twice, first w.r.t v2 and then w.r.t v1
temp.append(tf.gradients(tf.gradients(f, v2)[0], v1)[0])
temp = [cons(0) if t == None else t for t in temp] # tensorflow returns None when there is no gradient, so we replace None with 0
temp = tf.pack(temp)
mat.append(temp)
mat = tf.pack(mat)
return mat
and call it with:
# arg1: our defined function, arg2: list of tf variables associated with the function
hessian = compute_hessian(f, [x, y])
Now we grab a tensorflow session, initialize the variables, and run hessian :
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print sess.run(hessian)
Note: Since the function we used is quadratic in nature (and we are differentiating twice), the hessian returned will have constant values irrespective of the variables.
The output is :
[[ 2. 2.]
[ 2. 6.]]
A word of caution: Hessian matrices (or more generally, tensors) are expensive to compute and store. You may actually re-think if you really need the full Hessian, or just some hessian properties. A number of them, including traces, norms, and top eigen-values can be obtained without explicit hessian matrix, just using the Hessian-vector product oracle. In turn, hessian-vector products can be implemented efficiently (also in leading autodiff frameworks such as Tensorflow and PyTorch)

Fitting a sum to data in Python

Given that the fitting function is of type:
I intend to fit such function to the experimental data (x,y=f(x)) that I have. But then I have some doubts:
How do I define my fitting function when there's a summation involved?
Once the function defined, i.e. def func(..) return ... is it still possible to use curve_fit from scipy.optimize? Because now there's a set of parameters s_i and r_i involved compared to the usual fitting cases where one has few single parameters.
Finally are such cases treated completely differently?
Feel a bit lost here, thanks for any help.
This is very well within reach of scipy.optimize.curve_fit (or just scipy.optimize.leastsqr). The fact that a sum is involved does not matter at all, nor that you have arrays of parameters. The only thing to note is that curve_fit wants to give your fit function the parameters as individual arguments, while leastsqr gives a single vector.
Here's a solution:
import numpy as np
from scipy.optimize import curve_fit, leastsq
def f(x,r,s):
""" The fit function, applied to every x_k for the vectors r_i and s_i. """
x = x[...,np.newaxis] # add an axis for the summation
# by virtue of numpy's fantastic broadcasting rules,
# the following will be evaluated for every combination of k and i.
x2s2 = (x*s)**2
return np.sum(r * x2s2 / (1 + x2s2), axis=-1)
# fit using curve_fit
popt,pcov = curve_fit(
lambda x,*params: f(x,params[:N],params[N:]),
X,Y,
np.r_[R0,S0],
)
R = popt[:N]
S = popt[N:]
# fit using leastsq
popt,ier = leastsq(
lambda params: f(X,params[:N],params[N:]) - Y,
np.r_[R0,S0],
)
R = popt[:N]
S = popt[N:]
A few things to note:
Upon start, we need the 1d arrays X and Y of measurements to fit to, the 1d arrays R0 and S0 as initial guesses and Nthe length of those two arrays.
I separated the implementation of the actual model f from the objective functions supplied to the fitters. Those I implemented using lambda functions. Of course, one could also have ordinary def ... functions and combine them into one.
The model function f uses numpy's broadcasting to simultaneously sum over a set of parameters (along the last axis), and calculate in parallel for many x (along any axes before the last, though both fit functions would complain if there is more than one... .ravel() to help there)
We concatenate the fit parameters R and S into a single parameter vector using numpy's shorthand np.r_[R,S].
curve_fit supplies every single parameter as a distinct parameter to the objective function. We want them as a vector, so we use *params: It catches all remaining parameters in a single list.
leastsq gives a single params vector. However, it neither supplies x, nor does it compare it to y. Those are directly bound into the objective function.
In order to use scipy.optimize.leastsq to estimate multiple parameters, you need to pack them into an array and unpack them inside your function. You can then do anything you want with them. For example, if your s_i are the first 3 and your r_i are the next three parameters in your array p, you would just set ssum=p[:3].sum() and rsum=p[3:6].sum(). But again, your parameters are not identified (according to your comment), so estimation is pointless.
For an example of using leastsq, see the Cookbook's Fitting Data example.

Categories

Resources