I'm searching for minimum in 100d space. I'm using gp_minimize from skopt (python 3.6).
space = [(0., 1.) for _ in range(100)]
res = gp_minimize(f, space)
However, I also have a constraint that value in each subsequent dimension is not larger than in the previous dimensions. For example for the case of 5d, point [1, 0.9, 0.9, 0.8, 0.7] is ok, while point [1, 0.3, 0.5, 0.4, 0.2] is not.
How to add this constraint using skopt?
The best way I found is to modify the function f. Choose an upper bound for f, and everywhere in the domain where f is not supposed to be evaluated, have it return this upper bound.
It is straightforward to this that this is a mathematically sound approach, as it doesn't change the minimum and constrains your search space kind of in the same way that Lagrange multipliers do. However, I don't know if it plays nicely with the algorithm, since I don't really know how Bayesian optimization handles wide plateaus.
Related
How can I optimize a function with fixed steps? I have developed a function with five thresholds as entry that I want to optimize. I actually tried to optimize them with different solvers, but the steps that the solver takes are so tiny that the function never converge in a good solution.
Defined thresholds vary from 0 to 1, and I want them to take steps of 0.01. For example, in case of threshold_0, I want It to vary from initial guess 0.6 to 0.61 or 0.59, etc. depending on error result.
from scipy import optimize
initial_guess = [0.6,0.3,0.6,0.5,0.5]
def get_sobel3d_accuracy_from_thresholds(thresholds,array_dicts,ponderation_dict):
...
return error
result = optimize.minimize(
get_sobel3d_accuracy_from_thresholds, # function to optimize
initial_guess,
args=(array_dicts,ponderation_dict), # extra fixed args
method='nelder-mead',
options={'xatol': 1e-8, 'disp': True})
What I want to get is a solution that minimizes de error returned from the function get_sobel3d_accuracy_from_thresholds as follows:
optimized_thresholds = [0.61, 0.3, 0.81, 0.52, 0.44]
I would also like to fix boundaries for thresholds from 0 to 1, but I think that It can be done only with some solvers, right?
bounds = [(0, 1) for n in range(0,5)]
thank you all.
I am trying to minimize a function with scipy.optimize with three input variables, two of which are bounded and one has to be chosen from a set of values. To ensure that the third variable is chosen from a predefined set of values, I introduced the following constraint:
from scipy.optimize import rosin, shgo
import numpy as np
# Set from which the third variable to be optimized can hold
Z = np.array([-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1])
def Reson_Test(x): # arbitrary objective function
print (x)
return rosen(x)**2 - np.sin(x[0])
def Cond_1(x):
if x[2] in Z:
return 1
else:
return -1
bounds = [(-512,512),]*3
conds = ({'type': 'ineq' , 'fun' : Cond_1})
result = shgo(Rosen_Test, bounds, constraints=conds)
print (result)
However, when looking at the print results from Rosen_Test, it is evident that the condition is not being enforced - perhaps condition is not defined correctly?
I was wondering if anyone has any ideas to ensure that the third variable can be chosen from a set.
Note: The use of the shgo method was chosen such that constraints can be introduced and can be changed. Also, I am open to use other optimization packages if this condition is met
The inequality constraints do not work like that.
As mentioned in the docs they are defined as
g(x) <= 0
and you need to write g(x) work like that. In your cases that is not the case. You are only returning a single scalar for one dimension. You need to return a vector with three dimensions, of shape (3,).
In your case you could try to use the equality constraints instead, as this could allow a slightly better hack. But I am still not sure if it will work as these optimizers don't work like that.
And the whole thing will probably leave the optimizer with a rather bumpy and discontinuous objective function. You can read on Mixed-Integer Nonlinear Programming (MINLP), maybe start here.
There are is one more reasons why your approach won't work as expected
As optimizers work with floating point numbers it will likely never find a number in your array when optimizing and guessing new solutions.
This illustrates the issue:
import numpy as np
Z = np.array([-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1])
print(0.7999999 in Z) # False, this is what the optimizer will find
print(0.8 in Z) # True, this is what you want
Maybe you should try to define your problem in a way that allows to use an inequality constraint on the whole range of Z.
But let's see how it could work.
An equality constraint is defined as
h(x) == 0
So you could use
def Cond_1(x):
if x[2] in Z:
return numpy.zeros_like(x)
else:
return numpy.ones_like(x) * 1.0 # maybe multiply with some scalar?
The idea is to return an array [0.0, 0.0, 0.0] that satisfies the equality constraint if the number is found. Else return [1.0, 1.0, 1.0] to show that it is not satisfied.
Caveats:
1.)
You might have to tune this to return an array like [0.0, 0.0, 1.0] to show the optimizer which dimension you are unhappy about so the optimizer can make better guesses by only adjusting a single dimension.
2.)
You might have to return a larger value than 1.0 to state an non-satisfied equality constraint. This depends on the implementation. The optimizer could think that 1.0 is fine as it is close to 0.0. So maybe you have to try something [0.0, 0.0, 999.0].
This solves the problem with the dimension. But still will not find any numbers do to the floating point number thing mentioned above.
But we can try to hack this like
import numpy as np
Z = np.array([-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1])
def Cond_1(x):
# how close you want to get to the numbers in your array
tolerance = 0.001
delta = np.abs(x[2] - Z)
print(delta)
print(np.min(delta) < tolerance)
if np.min(delta) < tolerance:
return np.zeros_like(x)
else:
# maybe you have to multiply this with some scalar
# I have no clue how it is implemented
# we need a value stating to the optimizer "NOT THIS ONE!!!"
return np.ones_like(x) * 1.0
sol = np.array([0.5123, 0.234, 0.2])
print(Cond_1(sol)) # True
sol = np.array([0.5123, 0.234, 0.202])
print(Cond_1(sol)) # False
Here are some recommendations on optimization. To make sure it works in a reliable way try to start the optimization at different initial values. Global optimization algorithms might not have initial values if used with boundaries. The optimizer somehow discretizes the space.
What you could do to check the reliability of your optimization and get better overall results:
Optimize on the complete region [-512, 512] (for all three dimensions)
Try 1/2 of that: [-512, 0] and [0, 512] (8 sub-optimizations, 2 for each dimension)
Try 1/3 of that: [-512, -171], [-171, 170], [170, 512] (27 sub-optimizations, 3 for each dimension)
Now compare the converged results to see if the complete global optimization found the same result
If the global optimizer did not find the "real" minima but the sub-optimization:
your objective function is too difficult on the whole domain
try a different global optimizer
tune the parameters (maybe the 999 for the equality constraint)
I often use the sub-optimization as part of the normal process, not only for testing. Especially for blackbox problems.
Please also see these answers:
Scipy.optimize Inequality Constraint - Which side of the inequality is considered?
Scipy minimize constrained function
I'm trying to calculate the maximum log-likelihood (MLE) for the following probability density function (PDF):
I'm computing it by minimising the objective function (negative log-likelihood) without relying on any predefined log-likelihood python built-in modules whatsoever. The code is:
# Alpha Distribution (PDF)
def AD(z, *params):
a, scale = z
diameters = params
return -np.sum(np.log((((diameters)/(a**2) * np.exp(-diameters/a))) / scale))
# load data
currpath = ('path')
os.chdir(currpath)
diameters = scipy.io.loadmat('data.mat')["m1"]
# minimise
x0 = [1,1] # initial guesses
res = optimize.minimize(AD, x0, args = diameters, method='Nelder-Mead',
tol=1e-6)
print(res.x)
My data vector (here already sorted) comprises a number of diameters in the following form (0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27 ...).
First question: Since I'm fairly new to the topic of MLE, is the form of my data vector correct? I'm not completely sure whether I use a data vector containing every observed diameter (like shown above), or a data vector which only contains the "possible" diameters (which would be: 0.19, 0.2, 0.21, 0.22, 0.25, 0.27 ...), or just the frequencies of the observed diameters (which would be: 3, 1, 2, 3, 1, 1 ...). I think the first option is the right one, but I just wanted to be completely sure.
Second question: If I wish to use a cumulative distribution function (CDF) instead of a PDF to perform my MLE on, I would have to change my PDF function to a CDF, right? I was just wondering if I could alternatively somehow modify my data vector and still use the PDF.
However, for the minimisation in python (if I understood it correctly) I had to rethink the definition of my variables. That means, normally I would assume that the parameters of my PDF (here "a" and "scale") are the variables which should be passed to "args" in "optimize.minimize". However, in the documentation it is stated, that args should contain the "constant" parameters, therefore I used my data vector as a constant "parameter vector" for the minimisation.
Third question: Is this assumption an error in reasoning?
Fourth question: Is the optimisation method "Nelder-Mead" appropriate? I'm not really familiar with optimisation methods and not sure which of the options I should use/is the best.
Finally, the program returns an error "TypeError: bad operand type for unary -: 'tuple'", where I have no clue how to deal with it, since I'm not passing any tuples to the minimisation function ...
Fifth question: Where does the tuple come from and how can I solve this error?
I'd appreciate any help you could give me very much!
Best regards!
PS: Since this post is kind of a mixture between general math and programming, I wasn't completely sure if this is the right place to put the question. Sorry if I'm mistaken!
First, apart from the first part (before the multiplication operator), we are discussing what is generally called maximum likelihood estimation (MLE) for the exponential distribution. It has just been reparameterised in terms of something called a.
We want to estimate this single parameter based on a sample of diameters; there is no scale parameter. Under MLE, we pretend that the sample is fixed and treat the parameter as something that can be varied. We form the likelihood of the sample by taking the product of the density functions (not the cdfs) where each density function is to be calculated for one element of the sample.
(Likelihood is, in concept, like throwing a die twice. In ultra ugly terms, we could say that the likelihood of getting two ones in a row might be (1/6)(1/6).)
We want to maximise this likelihood. However, to make the optimisation problem mathematically and/or computationally tractable we take the function's logarithm. Since all of its constituent functions are densities, less than one, this function must be everywhere less than zero. Thus, the maximisation problem becomes one of minimisatiion.
If you want to avoid almost all of the algebra then you would:
Write a function to calculate the density function for a given diameter and parameter value.
Write another function that would accept a density function parameter value as its Python parameter, and the sample as its second. Make it call the first function once for each sample value, take the log of each of these and return the sum of these.
Call minimize with the second function as its first argument, some reasonable guess for the density function parameter, in a list, as the second argument, the sample for args. Nelder-Mead is probably ok.
Edit: In a nutshell:
diameters =[ 0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27]
from scipy.optimize import minimize
from math import exp, log
def pdf(d, a):
result = d*exp(-d/a)/a**2
return result
def log_L(a, diameters):
result = sum(log(pdf(d, a)) for d in diameters)
return result
res = minimize(log_L, [1], args=diameters)
print (res)
Output:
fun: -337.80985348524604
hess_inv: array([[ 8.71770021e+10]])
jac: array([ -7.62939453e-06])
message: 'Optimization terminated successfully.'
nfev: 93
nit: 30
njev: 31
status: 0
success: True
x: array([ 2157576.39996697])
Addendum:
The wikipedia article offers the following form for the pdf of the exponential.
The constant 'lambda' can be viewed as a value that scales the integral of the remainder of the expression from zero to infinity to one. We can ignore it and equate the exponents of your pdf, without the scaling factor, and the exponential. We have to remember that d takes the role of x.
Solve for 'lambda'.
We see that this is the normalising expression in your pdf. In other words, the alpha is an exponential expressed with different parameters.
Here is another approach, assuming that you're analysing data and not simply working out the details of MLE.
scipy provides means for generating samples from arbitrary distributions. Here I define just the pdf for your alpha. Your parameter a becomes p because a is used as the lower limit for the distribution support, which I define to be zero.
I draw a sample of size 100 with p set somewhat arbitrarily to 0.4. I did a little experimentation, trying to find a value that would give me a sample whose lowest 11 values would approximate those in your sample.
The scipy rv_continuous object has a method called fit that will attempt calculation of MLE estimates of location, scale and 'shape'. In this case, the value for shape, about 0.36, is not all that far from 0.4.
from scipy.stats import rv_continuous
import numpy as np
class Alpha(rv_continuous):
'alpha distribution'
def _pdf(self, x, p):
return x*np.exp(-x/p)/p**2
alpha = Alpha(a=0, shapes='p')
sample = sorted(alpha.rvs(size=100,p=0.4))
for a in sample[:12]:
print ('{:10.2f}'.format(a))
print (Alpha(a=0, shapes='p').fit(sample))
I don't believe that your sample is alpha-distributed. The values seem to be too 'uniform' compared with what I could generate. But I've been wrong before.
I would suggest plotting your sample cdf to see if you can recognise what it is.
Incidentally, when I changed the sign of the log-likelihood in the other answer the code croaked. I suspect that the alpha is just a poor fit.
0.00
0.03
0.04
0.04
0.08
0.09
0.09
0.11
0.12
0.14
0.19
0.20
(1.0902616847853124, -0.039102949269294023, 0.35922022997329517)
I have 4 different distributions which I've fitted to a sample of observations. Now I want to compare my results and find the best solution. I know there are a lot of different methods to do that, but I'd like to use a quantile-quantile (q-q) plot.
The formulas for my 4 distributions are:
where K0 is the modified Bessel function of the second kind and zeroth order, and Γ is the gamma function.
My sample style looks roughly like this: (0.2, 0.2, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.6, 0.7 ...), so I have multiple identical values and also gaps in between them.
I've read the instructions on this site and tried to implement them in python. So, like in the link:
1) I sorted my data from the smallest to the largest value.
2) I computed "n" evenly spaced points on the interval (0,1), where "n" is my sample size.
3) And this is the point I can't manage.
As far as I understand, I should now use the values I calculated beforehand (those evenly spaced values), put them in the inverse functions of my above distributions and thus compute the theoretical quantiles of my distributions.
For reference, here are the inverse functions (partly calculated with wolframalpha, and as far it was possible):
where W is the Lambert W-function and everything in brackets afterwards is the argument.
The problem is, apparently there doesn't exist an inverse function for the first distribution. The next one would probably produce complex values (negative under the root, because b = 0.55 according to the fit) and the last two of them have a Lambert W-Function (where I'm unsecure how to implement them in python).
So my question is, is there a way to calculate the q-q plots without the analytical expressions of the inverse distribution functions?
I'd appreciate any help you could give me very much!
A simpler and more conventional way to go about this is to compute the log likelihood for each model and choose that one that has the greatest log likelihood. You don't need the cdf or quantile function for that, only the density function, which you have already.
The log likelihood is just the sum of log p(x|model) where p(x|model) is the probability density of datum x under a given model. Here "model" = model with parameters selected by maximizing the log likelihood over the possible values of the parameters.
You can be more careful about this by integrating the log likelihood over the parameter space, taking into account also any prior probability assigned to each model; that would be a Bayesian approach.
It sounds like you are essentially looking to choose a model by minimizing the Kolmogorov-Smirnov (KS) statistic, which despite it's heavy name, is pretty simple -- it is the difference between the would-be quantile function and the empirical quantile. That's defensible, but I think comparing log likelihoods is more conventional, and also simpler since you need only the pdf.
It happens that there is an easier way. It's taken me a day or two to dig around until I was pointed toward the right method in scipy.stats. I was looking for the wrong sort of name!
First, build a subclass of rv_continuous to represent one of your distributions. We know the pdf for your distributions, so that's what we define. In this case there's just one parameter. If more are needed just add them to the def statement and use them in the return statement as required.
>>> from scipy import stats
>>> param = 3/2
>>> from math import exp
>>> class NoName(stats.rv_continuous):
... def _pdf(self, x, param):
... return param*exp(-param*x)
...
Now create an instance of this object, declare the lower end of its support (ie, the lowest value that the r.v. can assume), and what the parameters are called.
>>> noname = NoName(a=0, shapes='param')
I don't have an actual sample of values to play with. I'll create a pseudo-random sample.
>>> sample = noname.rvs(size=100, param=param)
Sort it to make it into the so-called 'empirical cdf'.
>>> empirical_cdf = sorted(sample)
The sample has 100 elements, therefore generate 100 points at which to sample the inverse cdf, or quantile function, as discussed in the paper your referenced.
>>> theoretical_points = [(_-0.5)/len(sample) for _ in range(1, 1+len(sample))]
Get the quantile function values at these points.
>>> theoretical_cdf = [noname.ppf(_, param=param) for _ in theoretical_points]
Plot it all.
>>> from matplotlib import pyplot as plt
>>> plt.plot([0,3.5], [0, 3.5], 'b-')
[<matplotlib.lines.Line2D object at 0x000000000921B400>]
>>> plt.scatter(empirical_cdf, theoretical_cdf)
<matplotlib.collections.PathCollection object at 0x000000000921BD30>
>>> plt.show()
Here's the Q-Q plot that results.
Darn it ... Sorry, I was fixated on a slick solution to somehow bypass the missing inverse CDF and calculate the quantiles directly (and avoid any numerically approaches). But it can also be done by simple brute force.
At first you have to define the quantiles for your distributions yourself (for instance ten times more accurate than the original/empirical quantiles). Then you need to calculate the corresponding CDF values. Then you have to compare these values one by one with the ones which were calculated in step 2 in the question. The according quantiles of the CDF values with the smallest deviations are the ones you were looking for.
The precision of this solution is limited by the resolution of the quantiles you defined yourself.
But maybe I'm wrong and there is a more elegant way to solve this problem, then I would be happy to hear it!
I have a big list of numbers, and I would like to create a distribution out of this data, plot it, then find the p-value for every number in my list with regards to the distribution.
Is it possible to do this in python? I can't find it in the matplotlib documentation. Should i be using something else?
I would suggest to look into the stats module of scipy; it offers numerous statistical functions for things like this. For plotting, I would still use matplotlib.
You can use the searchsorted function from the numpy module, which will give you the order of a set of values in an ordered array. you can then transform it to a pvalue just by renormalizing it to the dimension of the original array:
data = sorted(rand(10))
new_data = rand(5)
pvals = searchsorted(data,new_data)*1./len(data)
print pvals
#array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Well, in fact if you want the pvalues of the original number you don't need any special function at all: the pvalues are just the order in the sorted dataset divided by it's lenght.
If you need the pvalues of new values in respect to your original ones, you can use the snippet I gave you