Program error and questions regarding max. log-likelihood

Program error and questions regarding max. log-likelihood - python

I'm trying to calculate the maximum log-likelihood (MLE) for the following probability density function (PDF):
I'm computing it by minimising the objective function (negative log-likelihood) without relying on any predefined log-likelihood python built-in modules whatsoever. The code is:
# Alpha Distribution (PDF)
def AD(z, *params):
a, scale = z
diameters = params
return -np.sum(np.log((((diameters)/(a**2) * np.exp(-diameters/a))) / scale))
# load data
currpath = ('path')
os.chdir(currpath)
diameters = scipy.io.loadmat('data.mat')["m1"]
# minimise
x0 = [1,1] # initial guesses
res = optimize.minimize(AD, x0, args = diameters, method='Nelder-Mead',
tol=1e-6)
print(res.x)
My data vector (here already sorted) comprises a number of diameters in the following form (0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27 ...).
First question: Since I'm fairly new to the topic of MLE, is the form of my data vector correct? I'm not completely sure whether I use a data vector containing every observed diameter (like shown above), or a data vector which only contains the "possible" diameters (which would be: 0.19, 0.2, 0.21, 0.22, 0.25, 0.27 ...), or just the frequencies of the observed diameters (which would be: 3, 1, 2, 3, 1, 1 ...). I think the first option is the right one, but I just wanted to be completely sure.
Second question: If I wish to use a cumulative distribution function (CDF) instead of a PDF to perform my MLE on, I would have to change my PDF function to a CDF, right? I was just wondering if I could alternatively somehow modify my data vector and still use the PDF.
However, for the minimisation in python (if I understood it correctly) I had to rethink the definition of my variables. That means, normally I would assume that the parameters of my PDF (here "a" and "scale") are the variables which should be passed to "args" in "optimize.minimize". However, in the documentation it is stated, that args should contain the "constant" parameters, therefore I used my data vector as a constant "parameter vector" for the minimisation.
Third question: Is this assumption an error in reasoning?
Fourth question: Is the optimisation method "Nelder-Mead" appropriate? I'm not really familiar with optimisation methods and not sure which of the options I should use/is the best.
Finally, the program returns an error "TypeError: bad operand type for unary -: 'tuple'", where I have no clue how to deal with it, since I'm not passing any tuples to the minimisation function ...
Fifth question: Where does the tuple come from and how can I solve this error?
I'd appreciate any help you could give me very much!
Best regards!
PS: Since this post is kind of a mixture between general math and programming, I wasn't completely sure if this is the right place to put the question. Sorry if I'm mistaken!

First, apart from the first part (before the multiplication operator), we are discussing what is generally called maximum likelihood estimation (MLE) for the exponential distribution. It has just been reparameterised in terms of something called a.
We want to estimate this single parameter based on a sample of diameters; there is no scale parameter. Under MLE, we pretend that the sample is fixed and treat the parameter as something that can be varied. We form the likelihood of the sample by taking the product of the density functions (not the cdfs) where each density function is to be calculated for one element of the sample.
(Likelihood is, in concept, like throwing a die twice. In ultra ugly terms, we could say that the likelihood of getting two ones in a row might be (1/6)(1/6).)
We want to maximise this likelihood. However, to make the optimisation problem mathematically and/or computationally tractable we take the function's logarithm. Since all of its constituent functions are densities, less than one, this function must be everywhere less than zero. Thus, the maximisation problem becomes one of minimisatiion.
If you want to avoid almost all of the algebra then you would:
Write a function to calculate the density function for a given diameter and parameter value.
Write another function that would accept a density function parameter value as its Python parameter, and the sample as its second. Make it call the first function once for each sample value, take the log of each of these and return the sum of these.
Call minimize with the second function as its first argument, some reasonable guess for the density function parameter, in a list, as the second argument, the sample for args. Nelder-Mead is probably ok.
Edit: In a nutshell:
diameters =[ 0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27]
from scipy.optimize import minimize
from math import exp, log
def pdf(d, a):
result = d*exp(-d/a)/a**2
return result
def log_L(a, diameters):
result = sum(log(pdf(d, a)) for d in diameters)
return result
res = minimize(log_L, [1], args=diameters)
print (res)
Output:
fun: -337.80985348524604
hess_inv: array([[ 8.71770021e+10]])
jac: array([ -7.62939453e-06])
message: 'Optimization terminated successfully.'
nfev: 93
nit: 30
njev: 31
status: 0
success: True
x: array([ 2157576.39996697])
Addendum:
The wikipedia article offers the following form for the pdf of the exponential.
The constant 'lambda' can be viewed as a value that scales the integral of the remainder of the expression from zero to infinity to one. We can ignore it and equate the exponents of your pdf, without the scaling factor, and the exponential. We have to remember that d takes the role of x.
Solve for 'lambda'.
We see that this is the normalising expression in your pdf. In other words, the alpha is an exponential expressed with different parameters.

Here is another approach, assuming that you're analysing data and not simply working out the details of MLE.
scipy provides means for generating samples from arbitrary distributions. Here I define just the pdf for your alpha. Your parameter a becomes p because a is used as the lower limit for the distribution support, which I define to be zero.
I draw a sample of size 100 with p set somewhat arbitrarily to 0.4. I did a little experimentation, trying to find a value that would give me a sample whose lowest 11 values would approximate those in your sample.
The scipy rv_continuous object has a method called fit that will attempt calculation of MLE estimates of location, scale and 'shape'. In this case, the value for shape, about 0.36, is not all that far from 0.4.
from scipy.stats import rv_continuous
import numpy as np
class Alpha(rv_continuous):
'alpha distribution'
def _pdf(self, x, p):
return x*np.exp(-x/p)/p**2
alpha = Alpha(a=0, shapes='p')
sample = sorted(alpha.rvs(size=100,p=0.4))
for a in sample[:12]:
print ('{:10.2f}'.format(a))
print (Alpha(a=0, shapes='p').fit(sample))
I don't believe that your sample is alpha-distributed. The values seem to be too 'uniform' compared with what I could generate. But I've been wrong before.
I would suggest plotting your sample cdf to see if you can recognise what it is.
Incidentally, when I changed the sign of the log-likelihood in the other answer the code croaked. I suspect that the alpha is just a poor fit.
0.00
0.03
0.04
0.04
0.08
0.09
0.09
0.11
0.12
0.14
0.19
0.20
(1.0902616847853124, -0.039102949269294023, 0.35922022997329517)

Related

Can i tell numpy curve_fit to find the best parameters that meet some conditions?

I have this set of experimental data:
x_data = np.array([0, 2, 5, 10, 15, 30, 60, 120])
y_data = np.array([1.00, 0.71, 0.41, 0.31, 0.29, 0.36, 0.26, 0.35])
t = np.linspace(min(x_data), max(x_data), 151)
scatter plot
I want to fit them with a curve that follows an exponential behaviour for t < t_lim and a linear behaviour for t > t_lim, where t_lim is a value that i can set as i want. I want to use curve_fit to find the best fit. I would like to find the best fit meeting these two conditions:
The end point of the first behaviour (exponential) must be the starting point of the second behaviour (linear): in other words, I don't want the jump discontinuity in the middle.
I would like the second behaviour (linear) to be descending.
I solved in this way:
t_lim = 15
def y(t, k, m, q):
return np.concatenate((np.exp(-k*t)[t<t_lim], (m*t + q)[t>=t_lim]))
popt, pcov = curve_fit(y, x_data, y_data, p0=[0.5, -0.005, 0.005])
y_model = y(t, k_opt, m_opt, q_opt)
I obtain this kind of curve:
chart_plot
I don't know how to tell python to find the best values of m, k, q that meet the two conditions (no jump discontinuity, and m < 0)

Instead of trying to add these conditions as explicit constraints, I'd go about modifying the form of y so that these conditions are always satisfied.
For example, try replacing m with -m**2. That way, the coefficient in the linear part will always be negative.
For the continuity condition, how about this: For an exponential with a given decay factor and a linear curve with a given slope which are supposed to meet at a given t_lim there's only exactly one value for q that will satisfy that condition. You can explicitly compute that value and just plug that in.
Basically, q won't be a fit parameter anymore; instead, inside of y, you'd compute the correct q value based on k, m, t_lim.

This post is not a direct answer to the question. This is a preliminary study.
First : Fitting to a simple exponential function with only a constant (without decreasing or increasing linear part) :
The result is not bad considering the wide scatter on the right part.
Second : Fitting to an exponential function with a linear function (without taking account of the expected decreasing on the right).
The slope of the linear part is very low : 0.000361
But the slope is positive which is not as wanted.
Since the scatter is very large one suspects that the slope of the linear function might be governed mainly by the scatter. In order to check this hypothesis one make the same fitting calculus whitout one point. Taking only the seven first points (that is forgetting the eighth point) the result is :
Now the slope is negative as wanted. But this is an untruthful result.
Of course if some technical reason implies that the slope is necessarily negative one could use a picewise function made of an exponenlial and a linear function. But what is the credibility of such a model ?
This doesn't answer to the question. Neverthelss I hope that this inspection will be of interest.
For information :
The usual nonlinear regression methods are often non convergent in case of large scatter due to the difficulty to set initial values of the parameters sufficienly close to the unknown correct values. In order to avoid the difficulty the above fittings where made with a non usual method which doesn't requires "guessed" initial value. For the principle refer to : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
In the referenced document the case of the function exponential and linear isn't fully treated. In order to overcome this deficiency the method is shown below with the numerical calculus (MathsCAD).
If more accuracy is needed use a nonlinear regression software with the values of p,a,b,c found above as initial values to start the iterative calculus.

Is there a way to optimize function thresholds with fixed steps?

How can I optimize a function with fixed steps? I have developed a function with five thresholds as entry that I want to optimize. I actually tried to optimize them with different solvers, but the steps that the solver takes are so tiny that the function never converge in a good solution.
Defined thresholds vary from 0 to 1, and I want them to take steps of 0.01. For example, in case of threshold_0, I want It to vary from initial guess 0.6 to 0.61 or 0.59, etc. depending on error result.
from scipy import optimize
initial_guess = [0.6,0.3,0.6,0.5,0.5]
def get_sobel3d_accuracy_from_thresholds(thresholds,array_dicts,ponderation_dict):
...
return error
result = optimize.minimize(
get_sobel3d_accuracy_from_thresholds, # function to optimize
initial_guess,
args=(array_dicts,ponderation_dict), # extra fixed args
method='nelder-mead',
options={'xatol': 1e-8, 'disp': True})
What I want to get is a solution that minimizes de error returned from the function get_sobel3d_accuracy_from_thresholds as follows:
optimized_thresholds = [0.61, 0.3, 0.81, 0.52, 0.44]
I would also like to fix boundaries for thresholds from 0 to 1, but I think that It can be done only with some solvers, right?
bounds = [(0, 1) for n in range(0,5)]
thank you all.

Seeming inconsistency in scipy.integrate.quad operation under change of variable

I have an algorith that seeks to multiply two functions and integrate the result. One is a small normal distribution around 1.0, with a small standard deviation of abouty .1; the other is also a normal, but much larger, with mean 100,000 and stdev 15,000.
The code looks like so:
def integrand(x, z, mu, sigma) :
# goal is to return return f_x(x)*f_y(z/x)*1/abs(x)
# return normal(x, mu, sigma) * normal(z/x, 100000, 15000)/abs(x)
return normal(z/x, mu, sigma) * normal(x, 100000, 15000)/abs(x)
pResult = quad(integrand, -10000000,10000000,
args=(_z, MU, SIGMA),
points = [1.0, 2.0, 100000.0],
epsabs = 1, epsrel = .01)
This is a formula for multiplying random variables, Z = XY. It would seem that I could swap where the z/x goes in the integrand, but in trying to see which way runs faster (by commenting out one of the two return statements in integrand()) I was surprised to get different results (one is 114,221.4 and the other is 116,174.2). Any idea why?
Also, when I cycle through a wide range of z values and plot P(z) vs z, the graphs look different, One looks right (like a normal distribution - using the first return statement) and the other looks like a top-hat with soft corners.
The math for this algorithm: https://en.wikipedia.org/wiki/Product_distribution

Scaling matters, as I noted in the answer to your previous question, and recommended to use points to help quad locate relatively small features.
You rescale the function but keep the same interval of integration. Assuming the function essentially "fits" in the interval (the tails outside of the interval are negligible), the difference is due to the algorithm being able or unable to locate the small feature of the function (the narrow Gaussian) within the interval of integration. The more narrow is the function support, the more likely is quad to get integration wrong.
I also recommend putting accuracy as the top priority; speed comes after that. It's easy to write a function that returns "0" quickly.

How to generate a Q-Q plot manually without inverse distribution function in python

I have 4 different distributions which I've fitted to a sample of observations. Now I want to compare my results and find the best solution. I know there are a lot of different methods to do that, but I'd like to use a quantile-quantile (q-q) plot.
The formulas for my 4 distributions are:
where K0 is the modified Bessel function of the second kind and zeroth order, and Γ is the gamma function.
My sample style looks roughly like this: (0.2, 0.2, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.6, 0.7 ...), so I have multiple identical values and also gaps in between them.
I've read the instructions on this site and tried to implement them in python. So, like in the link:
1) I sorted my data from the smallest to the largest value.
2) I computed "n" evenly spaced points on the interval (0,1), where "n" is my sample size.
3) And this is the point I can't manage.
As far as I understand, I should now use the values I calculated beforehand (those evenly spaced values), put them in the inverse functions of my above distributions and thus compute the theoretical quantiles of my distributions.
For reference, here are the inverse functions (partly calculated with wolframalpha, and as far it was possible):
where W is the Lambert W-function and everything in brackets afterwards is the argument.
The problem is, apparently there doesn't exist an inverse function for the first distribution. The next one would probably produce complex values (negative under the root, because b = 0.55 according to the fit) and the last two of them have a Lambert W-Function (where I'm unsecure how to implement them in python).
So my question is, is there a way to calculate the q-q plots without the analytical expressions of the inverse distribution functions?
I'd appreciate any help you could give me very much!

A simpler and more conventional way to go about this is to compute the log likelihood for each model and choose that one that has the greatest log likelihood. You don't need the cdf or quantile function for that, only the density function, which you have already.
The log likelihood is just the sum of log p(x|model) where p(x|model) is the probability density of datum x under a given model. Here "model" = model with parameters selected by maximizing the log likelihood over the possible values of the parameters.
You can be more careful about this by integrating the log likelihood over the parameter space, taking into account also any prior probability assigned to each model; that would be a Bayesian approach.
It sounds like you are essentially looking to choose a model by minimizing the Kolmogorov-Smirnov (KS) statistic, which despite it's heavy name, is pretty simple -- it is the difference between the would-be quantile function and the empirical quantile. That's defensible, but I think comparing log likelihoods is more conventional, and also simpler since you need only the pdf.

It happens that there is an easier way. It's taken me a day or two to dig around until I was pointed toward the right method in scipy.stats. I was looking for the wrong sort of name!
First, build a subclass of rv_continuous to represent one of your distributions. We know the pdf for your distributions, so that's what we define. In this case there's just one parameter. If more are needed just add them to the def statement and use them in the return statement as required.
>>> from scipy import stats
>>> param = 3/2
>>> from math import exp
>>> class NoName(stats.rv_continuous):
... def _pdf(self, x, param):
... return param*exp(-param*x)
...
Now create an instance of this object, declare the lower end of its support (ie, the lowest value that the r.v. can assume), and what the parameters are called.
>>> noname = NoName(a=0, shapes='param')
I don't have an actual sample of values to play with. I'll create a pseudo-random sample.
>>> sample = noname.rvs(size=100, param=param)
Sort it to make it into the so-called 'empirical cdf'.
>>> empirical_cdf = sorted(sample)
The sample has 100 elements, therefore generate 100 points at which to sample the inverse cdf, or quantile function, as discussed in the paper your referenced.
>>> theoretical_points = [(_-0.5)/len(sample) for _ in range(1, 1+len(sample))]
Get the quantile function values at these points.
>>> theoretical_cdf = [noname.ppf(_, param=param) for _ in theoretical_points]
Plot it all.
>>> from matplotlib import pyplot as plt
>>> plt.plot([0,3.5], [0, 3.5], 'b-')
[<matplotlib.lines.Line2D object at 0x000000000921B400>]
>>> plt.scatter(empirical_cdf, theoretical_cdf)
<matplotlib.collections.PathCollection object at 0x000000000921BD30>
>>> plt.show()
Here's the Q-Q plot that results.

Darn it ... Sorry, I was fixated on a slick solution to somehow bypass the missing inverse CDF and calculate the quantiles directly (and avoid any numerically approaches). But it can also be done by simple brute force.
At first you have to define the quantiles for your distributions yourself (for instance ten times more accurate than the original/empirical quantiles). Then you need to calculate the corresponding CDF values. Then you have to compare these values one by one with the ones which were calculated in step 2 in the question. The according quantiles of the CDF values with the smallest deviations are the ones you were looking for.
The precision of this solution is limited by the resolution of the quantiles you defined yourself.
But maybe I'm wrong and there is a more elegant way to solve this problem, then I would be happy to hear it!

How to set optimization variable resolution in scipy.optimize.minimize

Background: A ship is berthed to a jetty using 24 mooring lines and 4 fenders. These mooring lines needs to be pre-tensioned to a design value by experienced engineers. Pre-tensioning is done by setting the appropriate length of each mooring line. Static simulation is done to obtain tension on the lines and compression on the fenders. This is an iterative process as small change in mooring line lenght may cause significant variation in the tension.
Problem Description:
An objective function is set up to take mooring line lenghs as an input array and return the sum of absolute differences between target and achieved pretension values.
Now, I am using the scipy.optimize.minimize function with following options:
target_wire_lenghts = {'Line1': (48.0, 49.0),'Line2': (48.0, 49.0),'Line3': (45.0,46.0),
'Line4': (10.0,11.0),'Line5': (8.0,9.0),'Line6': (7.0,8.0),
'Line7': (46.0,47.0),'Line8': (48.0,49.0),'Line9': (50.0,51.0),
'Line10': (33.0,34.0),'Line11': (31.0,32.0),'Line12': (29.0,30.0),
'Line13': (32.0,33.0),'Line14': (34.0,35.0),'Line15': (36.0,37.0),
'Line16': (48.0,49.0),'Line17': (46.0,47.0),'Line18': (45.0,46.0),
'Line19': (8.0,9.0), 'Line20': (8.0,9.0), 'Line21': (9.0,10.0),
'Line22': (44.0,45.0),'Line23': (45.0,46.0), 'Line24': (46.0,48.0)}
# Bounds
bounds = list(target_wire_lenghts.values())
# Initial guess
x0 = [np.mean([min, max], axis=0) for min,max in bounds]
# Options
options = {'ftol' : 0.1,
'xtol' : 0.1,
'gtol' : 0.1,
'maxiter' : 100,
'accuracy' : 0.1}
result = minimize(objfn, x0, method = 'TNC', bounds = bounds, options = options)
print(result)
However, the optimizer is not varying the input array. The results are the same as initial input array x0(See the length column below). I tried playing around with the optional tolerance parameters of the 'TNC' solver, but do not see any improvement. Also, notice that eventhough I have set the maxiter = 100, the iteration went to 130.
Please suggest what mistake am I making while calling the minimize function.
EDIT: I figured the optimization was running, but changing the variables by 0.000001 at a time. The option parameter eps (Step size used for numerical approximation of the jacobian.) when set to 0.01, the optimization looked working. Unfortunately, it still was not able to reach a reasonable solution. I tried doing an unbounded optimization, with initial guess x0 being very close to the answer (which I found by manually altering each variable), and then the optimizer was able to give a better solution than my manual one.
So the question now is how to do a 24 variable optimization quickly with bad initial guess? Could multi objective optimization be the answer, where reaching each line pre-tension is an objective?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Program error and questions regarding max. log-likelihood - python

Related

Can i tell numpy curve_fit to find the best parameters that meet some conditions?

Is there a way to optimize function thresholds with fixed steps?

Seeming inconsistency in scipy.integrate.quad operation under change of variable

How to generate a Q-Q plot manually without inverse distribution function in python

How to set optimization variable resolution in scipy.optimize.minimize

Categories

Resources