can't understand the grad() method in custom Theano Op

can't understand the grad() method in custom Theano Op - python

when I read the documents about Creating a new Op, I can't understand the grad() in the examples http://deeplearning.net/software/theano/extending/extending_theano.html#example-op-definition. Why do they return output_grads[0] * 2 not 2? and what's output_grads[0] represent for?
If output_grads[0] represent a chain derivative with respect to the input x, in the next example http://deeplearning.net/software/theano/extending/extending_theano.html#example-props-definition, why the grad() return a * output_grads[0] + b (it should be self.a * output_grads[0] + self.b) not self.a * output_grads[0]?
How about a more complicated custom Op? Like y = exp(x1)/(a*(x1**3)+log(x2)), how to write its grad()? And furthermore, if the inputs are vectors or matrix, how to write the grad()?

As the extended .grad() documentation points out, the output_grads argument is
(where f is one of your Op's outputs and C is the cost on which you called theano.tensor.grad(...))
The page also says that the .grad(...) method of an Op must return
(where x is an input to your Op)
I think the ax+b example is just wrong. If you look at the actual code, for example, the Sigmoid or the XLogX,
it seems to just implement the chain rule.
Disclaimer: I haven't implemented a custom Op so far and was looking into this myself and this is how I understood it.

Related

How do I set 2 expressions equal to eachother and solve for x [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Let's say I have an equation:
2x + 6 = 12
With algebra we can see that x = 3. How can I make a program in Python that can solve for x? I'm new to programming, and I looked at eval() and exec() but I can't figure out how to make them do what I want. I do not want to use external libraries (e.g. SAGE), I want to do this in just plain Python.

How about SymPy? Their solver looks like what you need. Have a look at their source code if you want to build the library yourself…

There are two ways to approach this problem: numerically and symbolically.
To solve it numerically, you have to first encode it as a "runnable" function - stick a value in, get a value out. For example,
def my_function(x):
return 2*x + 6
It is quite possible to parse a string to automatically create such a function; say you parse 2x + 6 into a list, [6, 2] (where the list index corresponds to the power of x - so 6*x^0 + 2*x^1). Then:
def makePoly(arr):
def fn(x):
return sum(c*x**p for p,c in enumerate(arr))
return fn
my_func = makePoly([6, 2])
my_func(3) # returns 12
You then need another function which repeatedly plugs an x-value into your function, looks at the difference between the result and what it wants to find, and tweaks its x-value to (hopefully) minimize the difference.
def dx(fn, x, delta=0.001):
return (fn(x+delta) - fn(x))/delta
def solve(fn, value, x=0.5, maxtries=1000, maxerr=0.00001):
for tries in xrange(maxtries):
err = fn(x) - value
if abs(err) < maxerr:
return x
slope = dx(fn, x)
x -= err/slope
raise ValueError('no solution found')
There are lots of potential problems here - finding a good starting x-value, assuming that the function actually has a solution (ie there are no real-valued answers to x^2 + 2 = 0), hitting the limits of computational accuracy, etc. But in this case, the error minimization function is suitable and we get a good result:
solve(my_func, 16) # returns (x =) 5.000000000000496
Note that this solution is not absolutely, exactly correct. If you need it to be perfect, or if you want to try solving families of equations analytically, you have to turn to a more complicated beast: a symbolic solver.
A symbolic solver, like Mathematica or Maple, is an expert system with a lot of built-in rules ("knowledge") about algebra, calculus, etc; it "knows" that the derivative of sin is cos, that the derivative of kx^p is kpx^(p-1), and so on. When you give it an equation, it tries to find a path, a set of rule-applications, from where it is (the equation) to where you want to be (the simplest possible form of the equation, which is hopefully the solution).
Your example equation is quite simple; a symbolic solution might look like:
=> LHS([6, 2]) RHS([16])
# rule: pull all coefficients into LHS
LHS, RHS = [lh-rh for lh,rh in izip_longest(LHS, RHS, 0)], [0]
=> LHS([-10,2]) RHS([0])
# rule: solve first-degree poly
if RHS==[0] and len(LHS)==2:
LHS, RHS = [0,1], [-LHS[0]/LHS[1]]
=> LHS([0,1]) RHS([5])
and there is your solution: x = 5.
I hope this gives the flavor of the idea; the details of implementation (finding a good, complete set of rules and deciding when each rule should be applied) can easily consume many man-years of effort.

Python may be good, but it isn't God...
There are a few different ways to solve equations. SymPy has already been mentioned, if you're looking for analytic solutions.
If you're happy to just have a numerical solution, Numpy has a few routines that can help. If you're just interested in solutions to polynomials, numpy.roots will work. Specifically for the case you mentioned:
>>> import numpy
>>> numpy.roots([2,-6])
array([3.0])
For more complicated expressions, have a look at scipy.fsolve.
Either way, you can't escape using a library.

If you only want to solve the extremely limited set of equations mx + c = y for positive integer m, c, y, then this will do:
import re
def solve_linear_equation ( equ ):
"""
Given an input string of the format "3x+2=6", solves for x.
The format must be as shown - no whitespace, no decimal numbers,
no negative numbers.
"""
match = re.match(r"(\d+)x\+(\d+)=(\d+)", equ)
m, c, y = match.groups()
m, c, y = float(m), float(c), float(y) # Convert from strings to numbers
x = (y-c)/m
print ("x = %f" % x)
Some tests:
>>> solve_linear_equation("2x+4=12")
x = 4.000000
>>> solve_linear_equation("123x+456=789")
x = 2.707317
>>>
If you want to recognise and solve arbitrary equations, like sin(x) + e^(i*pi*x) = 1, then you will need to implement some kind of symbolic maths engine, similar to maxima, Mathematica, MATLAB's solve() or Symbolic Toolbox, etc. As a novice, this is beyond your ken.

Use a different tool. Something like Wolfram Alpha, Maple, R, Octave, Matlab or any other algebra software package.
As a beginner you should probably not attempt to solve such a non-trivial problem.

Modular (pythonic) way for fitting combined parameters of a composite function

My question is about fitting parameters of a complicated model composed of different parametric functions.
More precisely, I want to describe a complicated experiment.
The experiment produces a one-dimensional array of measured data data, where each its entries corresponds to (a set of) experimental control variables x.
I now a theoretical model (actually multiple models, see below) model(x,pars), which takes x and a lot of parameters pars to give a prediction for data. However, not all parameters are known and I need to fit them.
Moreover, some details of the model are not yet certain. Because of that, I actually have a family of multiple models which are in some parts very similar, but where some internal component of the model is different (but a large part of the model is the same).
Unfortunately, switching one component for another might introduce new (unknown) parameters, that is we now have modelA(x,parsA) and modelB(x,parsB)
which have different parameters.
Basically, the model is composed of functions f(x, pars, vals_of_subfuncs) where x is the independent variable, pars are some explicit parameters of f, and vals_of_subfuncs are the results of evaluating some lower-level functions, which themselves depend on their own parameters (and maybe the results of their own lower-level functions etc.)
Obviously, there are no recursions possible, and at there is a lowest level of functions that do not rely on the value of other functions.
The situation is best illustrated in this picture:
Modular model architecture
The independent variable is x (blue), parameters are a,b,c,d (red), and the values of subfunctions appear as green arrows into nodes that represent functions.
In (1), we have a lowest-level function G(x; (a,b); {}) with no sub-functions and a higher-level function F(x; c; G(x; (a,b)) whose evaluation gives the model result, which depends on x and pars=(a,b,c).
In (2) and (3) we change a component of the model, (F->F') and (G->G'), respectively. This changes the parameter dependence of the final model.
Now I am looking for a most pythonic/modular way to approach the problem of implementing parameter fitting in this situation, without having to re-write the fit function everytime I swap/change a component of my model, thereby possibly introducing a new parameter.
At the moment, I am trying to find solutions to this problem using lmfit. I also thought about maybe trying to use sympy to work with symbolic "parameters", but I don't think all the functions that appear can be easily written as expressions that can be evaluated by asteval.
Does anyone know of a natural way to approach such a situation?

I think this question would definitely be improved with a more concrete example, (that is, with actual code). If I understand correctly, you have a general model
def model_func(x, a, b, c, d):
gresult = G(x, a, b, d)
return F(x, b, c, gresult)
but you also want to control whether d and b are really variables, and whether c gets passed to F. Is that correct?
If that is correct (or at least captures the spirit), then I think you can do this with lmfit (disclaimer: I'm a lead author) with a combination of adding keyword arguments to the model function and setting some Parameter values as fixed.
For example, you might do some rearranging like this:
def G(x, a, b=None, d=None):
if b is not None and d is None:
return calc_g_without_d(x, a, b)
return calc_g_with_d(x, a, d)
def F(x, gresult, b, c=None):
if c is None:
return calc_f_without_c(x, gresult, b)
return calc_f_with_c(x, gresult, b, c)
def model_func(x, a, b, c, d, g_with_d=True, f_with_c=True):
if g_with_d:
gresult = G(x, a, d)
else:
gresult = G(x, a, b)
if f_with_c:
return F(x, gresult, b, c=c)
else:
return F(x, gresult, b)
Now, when you make your model you can override the default values f_with_c and/or g_with_d:
import lmfit
mymodel = lmfit.Model(model_func, f_with_c=False)
params = mymodel.make_params(a=100, b=0.2201, c=2.110, d=0)
and then evaluate the model with mymodel.eval() or run a fit with mymodel.fit() and passing in explicit values for the keyword arguments f_with_c and/or g_with_d, like
test = mymodel.eval(params, x=np.linspace(-1, 1, 41),
f_with_c=False, g_with_d=False)
or
result = mymodel.fit(ydata, params, x=xdata, g_with_d=False)
I think the way you have it specified, you'd want to make sure that d was not a variable in the fit when g_with_d=False, and there are cases where you would want b to not vary in the fit. You can do that with
params['b'].vary = False
params['d'].vary = False
as needed. I can imagine your actual problem is slightly more involved than that, but I hope that helps get you started in the right direction.

Thanks for the answers.
I think lmfit might be able to do what I want, but I will have to implement the "modularity" myself.
The example I have was just conceptional and a miminal model. In general, the "networks" of function and their dependencies are much more intricate that what I do in the example.
My current plan is as follows:
I will write a class Network for the "network", which contains certain Nodes.
Notes specify their possible "symbolic" dependence on subNodes, explicit parameters and independent variables.
The Network class will have routines to check that the such constructed network is consisten. Moreover, it will have a (lmfit) Parameters object (i.e. the unification of all the parameters that the nodes explicitly depend on) and provide some method to generate an lmfit Model from that.
Then I will use lmfit for the fitting.
At least this is the plan.
If I succeed in building this, I will publish update this post with my code.

Since you brought up sympy, I think you should take a look at symfit, which does precisely what you ask for in the last paragraph. With symfit you can write symbolic expressions, which are then fitted with scipy. It will make it very easy for you to combine your different submodels willy-nilly.
Let me implement your second example using symfit:
from symfit import variables, parameters, Fit, Model
a, b, c = parameters('a, b, c')
x, G, F = variables('x, G, F')
model_dict = {
G: a * x + b,
F: b * G + c * x
}
model = Model(model_dict)
print(model.connectivity_mapping)
I choose these rather trivial functions, but you can obviously choose whatever you want. To see that this model matches your illustration, this is what connectivity_mapping prints:
{F: {b, G, x, c}, G: {b, a, x}}
So you see that this is really a mapping representing what you drew. (The arguments are in no particular order within each set, but they will be evaluated in the right order, e.g. G before F.) To then fit to your data, simply do
fit = Fit(model, x=xdata, F=Fdata)
fit_results = fit.execute()
And that's it! I hope this makes it clearer why I think symfit does fit your use case. I'm sorry I couldn't clarify that earlier, I was still finalizing this feature into the API so up to now it only existed in the development branch. But I made a release with this and many other features just now :).
Disclaimer: I'm the author of symfit.

Python how to get function formula given it's inputs and results

Assume we have a function with unknown formula, given few inputs and results of this function, how can we get the function's formula.
For example we have inputs x and y and result r in format (x,y,r)
[ (2,4,8) , (3,6,18) ]
And the desired function can be
f(x,y) = x * y

As you post the question, the problem is too generic. If you want to find any formula mapping the given inputs to the given result, there are simply too many possible formulas. In order to make sense of this, you need to somehow restrict the set of functions to consider. For example you could say that you're only interested in polynomial solutions, i.e. where
r = sum a_ij * x^i * y^j for i from 0 to n and j from 0 to n - i
then you have a system of equations, with the a_ij as parameters to solve for. The higher the degree n the more such parameters you'd have to find, so the more input-output combinations you'd need to know. Variations of this use rational functions (so you divide by another polynomial), or allow some trigonometric functions, or something like that.
If your setup were particularly easy, you'd have just linear equations, i.e. r = a*x + b*y + c. As you can see, even that has three parameters a,b,c so you can't uniquely find all three of them just given the two inputs you provided in your question. And even then the result would not be the r = x*y you were aiming for, since that's technically of degree 2.
If you want to point out that r = x*y is a particularly simple formula, and you would like to look for simple formulas, then one approach would be enumerating formulas in order of increasing complexity. But if you do this without parameters (since ugly parameters will make a simple formula like a*x + b*y + c appear complex), then it's hard to guilde this enumeration towards the one you want, so you'd really have to enumerate all possible formulas, which will become infeasible very quickly.

Integrating a function with Python (sympy, quad) where the result is another function I want to plot

I want to integrate a function using python, where the output is a new function rather than a numerical value. For example, I have the equation (from Arnett 1982 -- analytical description of a supernova):
def A(z,tm,tni):
y=tm/(2*tni)
tm=8.8 # diffusion parameter
tni=8.77 # efolding time of Ni56
return 2*z*np.exp((-2*z*y)+(z**2))
I want to then find the integral of A, and then plot the results. First, I naively tried scipy.quad:
def Arnett(t,z,tm,tni,tco,Mni,Eni,Eco):
x=t/tm
Eni=3.90e+10 # Heating from Ni56 decay
Eco=6.78e+09 # Heating from Co56 decay
tni=8.77 # efolding time of Ni56
tco=111.3 # efolding time of Co56
tm=8.8 # diffusion parameter
f=integrate.quad(A(z,tm,tni),0,x) #integral of A
h=integrate.quad(B(z,tm,tni,tco),0,x) #integral of B
g=np.exp((-(x/tm)**2))
return Mni*g*((Eni-Eco)*f+Eco*h)
Where B is also a pre-defined function (not presented here). Both A and B are functions of z, however the final equation is a function of time, t. (I believe that it is herein I am causing my code to fail.)
The integrals of A and B run from zero to x, where x is a function of time t. Attempting to run the code as it stands gives me an error: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".
So after a short search I thought that maybe sympy would be the way to go. However I am failing with this as well.
I wonder if anyone has a helpful suggestion how to complete this task please?
Many thanks,
Zach

You can integrate A analytically. Assuming I'm not missing something silly due to being up way too late, does the following help?
import sympy as sy
sys.displayhook = sy.pprint
A, y, z, tm, t, tni = sy.symbols('A, y, z, tm, t, tni')
A = 2*z*sy.exp(-2*z*y + z**2)
expr = sy.integrate(A, (z,0,t)) # patience - this takes a while
expr
# check:
(sy.diff(expr,t).simplify() - A.replace(z,t)).simplify()
# thus, the result:
expr.replace(y,tm/(2*tni)).replace(t,t/tm)
The last line yields the integral of your A function in analytic form, though it does require evaluating the imaginary error function (which you can do with scipy.special.erfi()).

I think what you are looking for are lambda expression (if i understood correctly what you said.. see here for extra information and some examples on lambda functions).
What they allow you to do is define an anonymous function in A and return it so that you get your B function, should work something like this:
def A(parameters):
return lambda x: x * parameters # for simplicity i applied a multiplication
# but you can apply anything you want to x
B = A(args)
x = B(2)
Hope I could provide you with a decent response!

I think the error you get comes from an incorrect call to scipy.integrate.quad:
The first argument needs to be just the function name, integration is then performed over the first variable of this function. The values of the other variables can be passed to the function via the args keyword.
The output of scipy.integrate.quad contains not only the value of the integral, but also an error estimate. So a tuple of 2 values is returned!
In the end the following function should work:
def Arnett(t, z, Mni, tm=8.8, tni=8.77, tco=111.3, Eni=3.90e+10,
Eco=6.78e+09):
x=t/tm
f,err=integrate.quad(A,0,x,args=(tm,tni)) #integral of A
h,err=integrate.quad(B,0,x,args=(tm,tni,tco)) #integral of B
g=np.exp((-(x/tm)**2))
return Mni*g*((Eni-Eco)*f+Eco*h)
But an even better solution would probably be integrating A and B analytically and then evaluating the expression as murison suggested.

Python fsolve ValueError

Why does the following code return a ValueError?
from scipy.optimize import fsolve
import numpy as np
def f(p,a=0):
x,y = p
return (np.dot(x,y)-a,np.outer(x,y)-np.ones((3,3)),x+y-np.array([1,2,3]))
x,y = fsolve(f,(np.ones(3),np.ones(3)),9)
ValueError: setting an array element with a sequence.

The basic problem here is that your function f does not satisfy the criteria required for fsolve to work. These criteria are described in the documentation - although arguably not very clearly.
The particular things that you need to be aware of are:
the input to the function that will be solved for must be an n-dimensional vector (referred to in the docs as ndarray), such that the value of x you want is the solution to f(x, *args) = 0.
the output of f must be the same shape as the x input to f.
Currently, your function takes a 2 member tuple of 1x3-arrays (in p) and a fixed scalar offset (in a). It returns a 3 member tuple of types (scalar,3x3 array, 1x3 array)
As you can see, neither condition 1 nor 2 is met.
It is hard to advise you on exactly how to fix this without being exactly sure of the equation you are trying to solve. It seems you are trying to solve some particular equation f(x,y,a) = 0 for x and y with x0 = (1,1,1) and y0 = (1,1,1) and a = 9 as a fixed value. You might be able to do this by passing in x and y concatenated (e.g. pass in p0 = (1,1,1,1,1,1) and in the function use x=p[:3] and y = p[3:] but then you must modify your function to output x and y concatenated into a 6-dimensional vector similarly. This depends on the exact function your are solving for and I can't work this out from the output of your existing f (i.e based on a dot product, outer product and sum based tuple).
Note that arguments that you don't pass in the vector (e.g. a in your case) will be treated as fixed values and won't be varied as part of the optimisation or returned as part of any solution.
Note for those who like the full story...
As the docs say:
fsolve is a wrapper around MINPACK’s hybrd and hybrj algorithms.
If we look at the MINPACK hybrd documentation, the conditions for the input and output vectors are more clearly stated. See the relevant bits below (I've cut some stuff out for clarity - indicated with ... - and added the comment to show that the input and output must be the same shape - indicated with <--)
1 Purpose.
The purpose of HYBRD is to find a zero of a system of N non-
linear functions in N variables by a modification of the Powell
hybrid method. The user must provide a subroutine which calcu-
lates the functions. The Jacobian is then calculated by a for-
ward-difference approximation.
2 Subroutine and type statements.
SUBROUTINE HYBRD(FCN,N,X, ...
...
FCN is the name of the user-supplied subroutine which calculates
the functions. FCN must be declared in an EXTERNAL statement
in the user calling program, and should be written as follows.
SUBROUTINE FCN(N,X,FVEC,IFLAG)
INTEGER N,IFLAG
DOUBLE PRECISION X(N),FVEC(N) <-- input X is an array length N, so is output FVEC
----------
CALCULATE THE FUNCTIONS AT X AND
RETURN THIS VECTOR IN FVEC.
----------
RETURN
END
N is a positive integer input variable set to the number of
functions and variables.
X is an array of length N. On input X must contain an initial
estimate of the solution vector. On output X contains the
final estimate of the solution vector.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.