I have a problem, I have 2 lists with x and y values, and I would like to create a function based on these. But the problem is that I would like to build a function like this one:
f(x) = a * (x-b)**c
I already know scipy.interpolate but I couldn't find anything to return a function like this one.
is there a quite easy way to try to create the best function I can by searching which values of a,b and c match the most?
thanks for your help!
Edit:
here is what my current values of x and y look like:
I created this function :
def problem(values):
s = sum((y - values[0]*(x-values[1])**values[2])**2 for x,y in zip(X,Y))
return(s)
and I tried to find the best values of a,b and c with scipy.optimize.minimize but I don't know with which values of a,b and c I should start...
values = minimize(problem,(a,b,c))
(Edited to account for the OP's added code and sub-question.)
The general idea is to use a least-squares minimization to find the "best" values of a, b, and c. First define a function whose parameters are a, b, c that returns the sum of the squares of the differences between the given y values and the calculated values of a * (x-b)**c. (That function can be done as a one-liner.) Then use an optimization routine, such as one found in scipy, to minimize the value of that function value. Those values of a, b, c are what you want--use them to define your desired function.
There are a few details to examine, such as restrictions on the allowed values of a, b, c, but those depend somewhat on your lists of x and y values.
Now that you have shown a graph of your x and y values, I see that your values are all positive and the function is generally increasing. For that common situation I would use the initial values
a = 1.0
b = 0.0
c = 1.0
That gives a straight line through the origin, in fact the line y = x, which is often a decent first guess. In your case the x and y values have a very different scale, with y about a hundred times larger than x, so you would probably get better results with changing the value of a:
a = 100.0
b = 0.0
c = 1.0
I can see even better values and some restrictions on the end values but I would prefer to keep this answer more general and useful for other similar problems.
Your function problem() looks correct to me, though I would have written it a little differently for better clarity. Be sure to test it.
def problem (a , b, c, d):
return a * (x[d]-b)**c
I guess is what you are after. With D being what value of the X array. Not sure where Y comes into it.
Related
I am having trouble pulling out values out of a function that I am trying to optimize. The code looks similar to what follows. I want to minimize c by changing x through scipy.optimize.minimize, but am also trying to pull a and b out of the function as well. What is the best way to do this?
def function(x, inputs)
a = math
b = math
c = math
return(c)
You should be able to just compute a and b outside of the function. It doesn't need to be part of the function if the result is not optimized. Just use x, inputs, and c to compute a and b outside of the function.
Assume we have a function with unknown formula, given few inputs and results of this function, how can we get the function's formula.
For example we have inputs x and y and result r in format (x,y,r)
[ (2,4,8) , (3,6,18) ]
And the desired function can be
f(x,y) = x * y
As you post the question, the problem is too generic. If you want to find any formula mapping the given inputs to the given result, there are simply too many possible formulas. In order to make sense of this, you need to somehow restrict the set of functions to consider. For example you could say that you're only interested in polynomial solutions, i.e. where
r = sum a_ij * x^i * y^j for i from 0 to n and j from 0 to n - i
then you have a system of equations, with the a_ij as parameters to solve for. The higher the degree n the more such parameters you'd have to find, so the more input-output combinations you'd need to know. Variations of this use rational functions (so you divide by another polynomial), or allow some trigonometric functions, or something like that.
If your setup were particularly easy, you'd have just linear equations, i.e. r = a*x + b*y + c. As you can see, even that has three parameters a,b,c so you can't uniquely find all three of them just given the two inputs you provided in your question. And even then the result would not be the r = x*y you were aiming for, since that's technically of degree 2.
If you want to point out that r = x*y is a particularly simple formula, and you would like to look for simple formulas, then one approach would be enumerating formulas in order of increasing complexity. But if you do this without parameters (since ugly parameters will make a simple formula like a*x + b*y + c appear complex), then it's hard to guilde this enumeration towards the one you want, so you'd really have to enumerate all possible formulas, which will become infeasible very quickly.
I noticed previous versions of my question suggested the use of queries, but I have unique data frames that do not have the same column names. I want to code this formula without for loops and only with apply function:
Here is the variables initialized. mu=μ and the other variables are as follows:
mu=pd.DataFrame(0, index=['A','B','C'], columns=['x','y'])
pij=pd.DataFrame(np.random.randn(500,3),columns=['A','B','C'])
X=pd.DataFrame(np.random.randn(500,2),columns=['x','y'])
Next, I am able to use nested for loops to solve this
for j in range(len(mu)):
for i in range(len(X)):
mu.ix[j,:]+=pij.ix[i,j]*X.ix[i,['x','y']]
mu.ix[j,:]=(mu.ix[j,:])/(pij.ix[:,j].sum())
mu
x y
A 0.147804 0.169263
B -0.299590 -0.828494
C -0.199637 0.363423
My question is if it is possible to not use the nested for loops or even remove one for loop to solve this. I have made feeble attempts to no avail.
Even my initial attempts result in multiple NaN's.
The code you pasted suggests you meant the index on mu on the left hand side of the formula to be j, so I'll assume that's the case.
Also since you generated random matrices for your example, my results will turn out different than yours, but I checked that your pasted code gives the same results as my code on the matrices I generated.
The numerator of the RHS of the formula can be computed with the appropriate transpose and matrix multiplication:
>>> num = pij.transpose().dot(X)
>>> num
x y
A -30.352924 -22.405490
B 14.889298 -16.768464
C -24.671337 9.092102
The denominator is simply summing over columns:
>>> denom = pij.sum()
>>> denom
A 23.460325
B 20.106702
C -46.519167
dtype: float64
Then the "division" is element-wise division by column:
>>> num.divide(denom, axis='index')
x y
A -1.293798 -0.955037
B 0.740514 -0.833974
C 0.530348 -0.195449
I'd normalize pij first then take inner product with X. The formula looks like:
mu = (pij / pij.sum()).T.dot(X)
I want to integrate a function using python, where the output is a new function rather than a numerical value. For example, I have the equation (from Arnett 1982 -- analytical description of a supernova):
def A(z,tm,tni):
y=tm/(2*tni)
tm=8.8 # diffusion parameter
tni=8.77 # efolding time of Ni56
return 2*z*np.exp((-2*z*y)+(z**2))
I want to then find the integral of A, and then plot the results. First, I naively tried scipy.quad:
def Arnett(t,z,tm,tni,tco,Mni,Eni,Eco):
x=t/tm
Eni=3.90e+10 # Heating from Ni56 decay
Eco=6.78e+09 # Heating from Co56 decay
tni=8.77 # efolding time of Ni56
tco=111.3 # efolding time of Co56
tm=8.8 # diffusion parameter
f=integrate.quad(A(z,tm,tni),0,x) #integral of A
h=integrate.quad(B(z,tm,tni,tco),0,x) #integral of B
g=np.exp((-(x/tm)**2))
return Mni*g*((Eni-Eco)*f+Eco*h)
Where B is also a pre-defined function (not presented here). Both A and B are functions of z, however the final equation is a function of time, t. (I believe that it is herein I am causing my code to fail.)
The integrals of A and B run from zero to x, where x is a function of time t. Attempting to run the code as it stands gives me an error: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".
So after a short search I thought that maybe sympy would be the way to go. However I am failing with this as well.
I wonder if anyone has a helpful suggestion how to complete this task please?
Many thanks,
Zach
You can integrate A analytically. Assuming I'm not missing something silly due to being up way too late, does the following help?
import sympy as sy
sys.displayhook = sy.pprint
A, y, z, tm, t, tni = sy.symbols('A, y, z, tm, t, tni')
A = 2*z*sy.exp(-2*z*y + z**2)
expr = sy.integrate(A, (z,0,t)) # patience - this takes a while
expr
# check:
(sy.diff(expr,t).simplify() - A.replace(z,t)).simplify()
# thus, the result:
expr.replace(y,tm/(2*tni)).replace(t,t/tm)
The last line yields the integral of your A function in analytic form, though it does require evaluating the imaginary error function (which you can do with scipy.special.erfi()).
I think what you are looking for are lambda expression (if i understood correctly what you said.. see here for extra information and some examples on lambda functions).
What they allow you to do is define an anonymous function in A and return it so that you get your B function, should work something like this:
def A(parameters):
return lambda x: x * parameters # for simplicity i applied a multiplication
# but you can apply anything you want to x
B = A(args)
x = B(2)
Hope I could provide you with a decent response!
I think the error you get comes from an incorrect call to scipy.integrate.quad:
The first argument needs to be just the function name, integration is then performed over the first variable of this function. The values of the other variables can be passed to the function via the args keyword.
The output of scipy.integrate.quad contains not only the value of the integral, but also an error estimate. So a tuple of 2 values is returned!
In the end the following function should work:
def Arnett(t, z, Mni, tm=8.8, tni=8.77, tco=111.3, Eni=3.90e+10,
Eco=6.78e+09):
x=t/tm
f,err=integrate.quad(A,0,x,args=(tm,tni)) #integral of A
h,err=integrate.quad(B,0,x,args=(tm,tni,tco)) #integral of B
g=np.exp((-(x/tm)**2))
return Mni*g*((Eni-Eco)*f+Eco*h)
But an even better solution would probably be integrating A and B analytically and then evaluating the expression as murison suggested.
I am looking for the correct approach to use a variable number of parameters as input for the optimizer in scipy.
I have a set of input parameters p1,...,pn and I calculate a quality criteria with a function func(p1,...,pn). I want to minimize this value.
The input parameters are either 0 or 1 indicating they should be used or not. I cannot simply delete all unused ones from the parameter list, since my function for the quality criteria requires them to be "0" to remove unused terms from equations.
def func(parameters):
...calculate one scalar as quality criteria...
solution = optimize.fmin_l_bfgs_b(func,parameters,approx_grad=1,bounds=((0.0, 5.0),...,(0.0,5.0)) # This will vary all parameters
Within my code the optimizer runs without errors, but of course all given parameters are changed to achieve the best solution.
Is there a way to have e.g. 10 input parameters for func, but only 5 of them are used in the optimizer?
So far I can only think of changing my func definition in a way that I will not need the "0" input from unused parameters. I would appreciate any ideas how to avoid that.
Thanks a lot for the help!
If I understand correctly, you are asking for a constrained best fit, such that rather than finding the best [p0,p1,p2...p10] for function func(), you want to find the best best [p0, p1, ...p5] for function func() under a condition that p6=fixed6, p7=fixed7, p8=fixed8... and so on.
Translate it into python code is straight forward if you use args=(somthing) in scipy.optimize.fmin_l_bfgs_b. Firstly, write a partially fixed function func_fixed()
def func_fixed(p_var, p_fixed):
return func(p_var+p_fixed)
# this will only work if both of them are lists. If they are numpy arrays, use hstack, append or similar
solution = optimize.fmin_l_bfgs_b(func_fixed,x0=guess_parameters,\
approx_grad=your_grad,\
bounds=your_bounds,\
args=(your_fixed_parameters), \ #this is the deal
other_things)
It is not necessary to have func_fixed(), you can use lambda. But it reads much easier this way.
I recently solved a similar problem where I want to optimise a different subset of parameters at each run but need all parameters to calculate the objective function. I added two arguments to my objective function:
an index array x_idx which indicates which parameters to optimise, i.e. 0 don't optimise and 1 optimise
an array x0 with the initial values of all parameters
In the objective function I set the list of the parameters according to the index array either to the parameters which are to be optimised or the initial values.
import numpy
import scipy.optimize
def objective_function(x_optimised, x_idx, x0):
x = []
j = 0
for i, idx in enumerate(x_idx):
if idx is 1:
x.append(x_optimised[j])
j = j + 1
else:
x.append(x0[i])
x = numpy.array(x)
return sum(x**2)
if __name__ == '__main__':
x_idx = [1, 1, 0]
x0 = [1.1, 1.3, 1.5]
x_initial = [x for i, x in enumerate(x0) if x_idx[i] is 1]
xopt, fopt, iter, funcalls, warnflag = scipy.optimize.fmin(objective_function, \
x_initial, args=(x_idx, x0,), \
maxfun = 200, full_output=True)
print xopt