Burr distribution in scipy.stats - python

I am trying to calculate the mean and std for burr distribution, but I am not quite sure how to input this. The pdf I am using is: f(x) = (alpha*gamma*lambda**alpha*x**(gamma-1))/(lambda+x**gamma)**(alpha+1) from the IFoA Formulae.
I have calculated the parameters to be: alpha = 2.3361635751273977, lambda = 10.596809948869414 and gamma = 0.5 in order to get mean = 500 and std = 600.
Could someone suggest how I should input the data into scipy.stats.burr or scipy.stats.burr12?

You need burr12 here, not burr. (The difference is in the sign of the power of x that sits inside another power. Confusingly, it's burr12 that is usually called simply Burr outside of SciPy, not the thing that SciPy calls burr.)
The Burr XII PDF is written in SciPy as c*d*x**(c-1)*(1+x**c)**(-d-1) where c, d are positive shape parameters. Your formula
(alpha*gamma*lamda**alpha*x**(gamma-1)) / (lamda+x**gamma)**(alpha+1)
has lambda in place of 1, so there is some scaling involved. SciPy docs say
burr12.pdf(x, c, d, loc, scale) is identically equivalent to burr12.pdf(y, c, d) / scale with y = (x - loc) / scale.
So, in order for lamda+x**gamma to be a constant multiple of 1 + (x/scale)**gamma, we need scale to be lamda**(1/gamma). The exponents correspond to SciPy notation as c = gamma and d = alpha. Let's test this:
from scipy.stats import burr12
alpha = 2.3361635751273977
lamda = 10.596809948869414
gamma = 0.5
scale = lamda**(1/gamma)
c = gamma
d = alpha
print(burr12.mean(c, d, loc=0, scale=scale))
print(burr12.std(c, d, loc=0, scale=scale))
which prints
500.0
600.0

Related

How to estimate standard deviation in images with speckle noise?

I am using this function that I found on the web, to add speckle noise to images for research purposes:
def add_speckle(k,theta,img):
gauss = np.random.gamma(k, theta, img.size)
gauss = gauss.reshape(img.shape[0], img.shape[1], img.shape[2]).astype('uint8')
noise = img + img * gauss
return noise
My issue is that I want to estimate/define the speckle noise I add as a standard deviation(sigma) parameter, and this function that I found depends on the gamma distribution or random.gamma() which it depends on the k,theta(shape,scale) parameters, which you can see in the gamma pdf equation down below:
according to my knowledge, the variance can be calculated in gamma distribution as follows:
so standard deviation or sigma is equivalent to:
I want to add speckle noise as sigma dependent, so am saying there should be a way to estimate that sigma from k,theta(shape,scale) that we make the input with, so the speckle_adding() function would look like something like this:
def add_speckle(sigma,img):
edited : for the answer in the comments :
def add_speckle(sigma,mean,img):
theta = sigma ** 2 / mean
k = mean / theta
gauss = np.random.gamma(k,theta,img.size)
gauss = gauss.reshape(img.shape[0],img.shape[1],img.shape[2]).astype('uint8')
noise = img + img * gauss
print("k=",k)
print("theta=",theta)
return noise
img = cv2.imread('/content/Hand.jpeg')
cv2.imwrite('speckle12.jpg',add_speckle(8,3,img))
thanks sir for your help, but i really understand why k,theta values changes each time i change values of mean while sigma is constant, i think it must not changes??
As you have noticed that sigma = k ** 0.5 * theta, there are infinite possibilities for parameters in the gamma distribution if only sigma is given (eg. if sigma is 1, (k, theta) can be (1,1) or (4, 0.5) and so on).
If you really want to generate the speckle with statistical inferences as input, I suggest you to add mean as the second input so that the required (k, theta) can be calculated.
The first moment (ie. mean) of a gamma distribution is simply k * theta.
Example:
def add_speckle(mean, sigma, img):
# find theta
theta = sigma ** 2 / mean
k = mean / theta
# your code proceeds...

Symbolic Calculus and Integration in Python

I am trying to numerically compute a double integral.
The issue is that (I think) I need a mix of symbolic integration and numerical integration.
The integral looks something like this:
I cannot use numpy.integrate because it is not just a double integral because of the power (1/a) in the middle.
I cannot get a number for the innermost integral (to then raise to the power) because it ends up being a function that depends on x which I would then need to integrate.
I tried with symbolic calculus, using a nested sym.integrate like here
sym.integrate((sym.integrate(sym.exp(-(w**2)/(2*sigmaw)-alpha*((x-w)**2)/(2*sigma)),(w,-sym.oo, sym.oo)))**(1/alpha),(x,-sym.oo, sym.oo))
however, it just spits back the expression itself and no number.
I think I would need to get a symbolic expression for the inner integral to use as a function for numerical integration.
Is it even possible?
If not in python, with another language like R?
Any experience with things of this sort?
I worked with Maxima (https://maxima.sourceforge.io) since OP seems to be saying the exact system used isn't too important.
The integrand is just a product of Gaussian bumps, so its integral over the real line is not too hard. Maxima doesn't have the strongest integrator in the world, but anyway it seems to handle this problem okay.
Start by assuming all the parameters are positive; if not specified, Maxima will ask for the sign during the calculation.
(%i2) assume (alpha > 0, sigmaw > 0, sigma > 0);
(%o2) [alpha > 0, sigmaw > 0, sigma > 0]
Define the inner integrand.
(%i3) I: exp(-(w**2)/(2*sigmaw)-alpha*((x-w)**2)/(2*sigma));
2 2
alpha (x - w) w
(- --------------) - --------
2 sigma 2 sigmaw
(%o3) %e
Compute the inner integral.
(%i4) I1: integrate (I, w, minf, inf);
(%o4) (sqrt(2) sqrt(%pi) sqrt(sigma) sqrt(sigmaw)
2
alpha x
- ------------------------
2 alpha sigmaw + 2 sigma
%e )/sqrt(alpha sigmaw + sigma)
The pretty-printer (ASCII art) display is hard to read here, maybe this 1-d representation makes more sense. grind produces the 1-d display.
(%i5) grind(%);
(sqrt(2)*sqrt(%pi)*sqrt(sigma)*sqrt(sigmaw)
*%e^-((alpha*x^2)/(2*alpha*sigmaw+2*sigma)))
/sqrt(alpha*sigmaw+sigma)$
(%o5) done
Define the outer integrand.
(%i7) I2: I1^(1/alpha);
1 1 1 1
------- ------- ------- -------
2 alpha 2 alpha 2 alpha 2 alpha
(%o7) (2 %pi sigma sigmaw
2
x 1
- ------------------------ -------
2 alpha sigmaw + 2 sigma 2 alpha
%e )/(alpha sigmaw + sigma)
Compute the outer integral. The final result is named foo here.
(%i9) foo: integrate (I2, x, minf, inf);
1 1 1 1
------- + 1/2 ------- ------- -------
2 alpha 2 alpha 2 alpha 2 alpha
(%o9) (%pi 2 sigma sigmaw
1
-------
2 alpha
sqrt(2 alpha sigmaw + 2 sigma))/(alpha sigmaw + sigma)
Evaluate the outer integral for specific values of the parameters.
(%i10) ev (foo, alpha = 3, sigma = 3/7, sigmaw = 7/4);
1/6 1/6 1/6 1/3 2/3
2 3 7 159 %pi
(%o10) ----------------------------
sqrt(14)
(%i11) float(%);
(%o11) 5.790416728790489
Compute a numerical approximation. Note quad_qagi is suitable for infinite intervals.
(%i12) ev (quad_qagi (lambda([x], quad_qagi (I, w, minf, inf)[1]^(1/alpha)), x, minf, inf),
alpha = 3, sigma = 3/7, sigmaw = 7/4);
(%o12) [5.790416728790598, 7.216782674725913E-9, 270, 0]
Looks like that supports the symbolic result.
(%i13) first(%) - %o11;
(%o13) 1.092459456231154E-13
The outer integral again, in 1-d display which might be useful for copying into another program:
(%i14) grind(foo);
(%pi^(1/(2*alpha)+1/2)*2^(1/(2*alpha))*sigma^(1/(2*alpha))
*sigmaw^(1/(2*alpha))
*sqrt(2*alpha*sigmaw+2*sigma))
/(alpha*sigmaw+sigma)^(1/(2*alpha))$
(%o14) done
I recommend pretty strongly to try to get to a symbolic result if possible; numerical integration is often tricky. In the example given, if it turned out that you could only do the inner integral but not the outer one, that would still be a pretty big win. You could plug the symbolic solution for the inner integral into a numerical approximation for the outer one.
this doesn't answer your question but it will surely help you, as other have already pointed out other useful tools.
for the integration at hand, you don't really need to do symbolic integration.
numerical integration is simply summing on a defined finite grid, and integrating over w is simply summing over the w axis, same as x.
the main problem is how to choose the integration grid, since it cannot be infinite, for gaussians I'd say at least 10 times their sigma for as low error as you can get, as for the grid spacing, I'd make it as small as you can wait for it to run.
so for the above integration, this would be equivalent, make sure you don't increase the grid steps until you have a picture of how much memory it will need, or else your pc will hang.
import numpy as np
# define constants
sigmaw = 0.1
sigma = 0.1
alpha = 0.2
# define grid
max_w = 2
min_w = -max_w
min_x = -3
max_x = -min_x
steps_w = 2000 # don't increase this too much or you'll run out of memory
steps_x = 1000 # don't increase this too much or you'll run out of memory
dw = (max_w - min_w) / steps_w
dx = (max_x - min_x) / steps_x
x_vec = np.linspace(min_x, max_x, steps_x)
w_vec = np.linspace(min_w, max_w, steps_w)
x, w = np.meshgrid(x_vec, w_vec, sparse=True)
# do integration
inner_term = np.exp(-(w ** 2) / (2 * sigmaw) - alpha * ((x - w) ** 2) / (2 * sigma))
inner_integral = np.sum(inner_term, axis=0) * dw
del inner_term # to free some memory
inner_integral_powered = inner_integral ** (1 / alpha)
del inner_integral # to free some memory
outer_integral = np.sum(inner_integral_powered) * dx
print(outer_integral)
Numerical integration works by sampling the integrand at some values of the argument. In particular, the Newton-Cotes formulas sample uniformly, while different flavors of Gaussian integration sample irregularly.
So in your case, the integrator will require an evaluation of the inner integral for various values of x to integrate on x, implying each time a numerical integration on w with known x.
Note that as your domain is unbounded, you will have to use a change of variable to make it finite.
If the inner integral has an analytical expression, you can of course use it and integrate numerically on x.

Difference in result between fmin and fminsearch in Matlab and Python

My objective is to perform an Inverse Laplace Transform on some decay data (NMR T2 decay via CPMG). For that, we were provided with the CONTIN algorithm. This algorithm was adapted to Matlab by Iari-Gabriel Marino, and it works very well. I want to adapt this code into Python. The core of the problem is with scipy.optimize.fmin, which is not minimizing the mean square deviation (MSD) in any way similar to Matlab's fminsearch. The latter results in a good minimization, while the former doesn't.
I have gone through line by line of my adapted code in Python, and the original Matlab. I checked every matrix and every output. I used this to identify that the critical point is in fmin. I also tried scipy.optimize.minimize and other minimization algorithms, but none gave even remotely satisfactory results.
I have made two MWE, for Python and Matlab, to make it reproducible to all. The example data were obtained from the documentation of the matlab function. Apologies if this is long code, but I don't really know how to shorten it without sacrificing readability and clarity. I tried to have the lines match as closely as possible. I am using Python 3.7.3, scipy v1.3.0, numpy 1.16.2, Matlab R2018b, on Windows 8.1. It's a relatively recent Anaconda install (<2 months).
My code:
import numpy as np
from scipy.optimize import fmin
import matplotlib.pyplot as plt
def msd(g, y, A, alpha, R, w, constraints):
""" msd: mean square deviation. This is the function to be minimized by fmin"""
if 'zero_at_extremes' in constraints:
g[0] = 0
g[-1] = 0
if 'g>0' in constraints:
g = np.abs(g)
r = np.diff(g, axis=0, n=2)
yfit = A # g
# Sum of weighted square residuals
VAR = np.sum(w * (y - yfit) ** 2)
# Regularizor
REG = alpha ** 2 * np.sum((r - R # g) ** 2)
# output to be minimized
return VAR + REG
# Objective: match this distribution
g0 = np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]).reshape((-1, 1))
s0 = np.logspace(-3, 6, len(g0)).reshape((-1, 1))
t = np.linspace(0.01, 500, 100).reshape((-1, 1))
sM, tM = np.meshgrid(s0, t)
A = np.exp(-tM / sM)
np.random.seed(1)
# Creates data from the initial distribution with some random noise.
data = (A # g0) + 0.07 * np.random.rand(t.size).reshape((-1, 1))
# Parameters and function start
alpha = 1E-2 # regularization parameter
s = np.logspace(-3, 6, 20).reshape((-1, 1)) # x of the ILT
g0 = np.ones(s.size).reshape((-1, 1)) # guess of y of ILT
y = data # noisy data
options = {'maxiter':1e8, 'maxfun':1e8} # for the fmin function
constraints=['g>0', 'zero_at_extremes'] # constraints for the MSD function
R=np.zeros((len(g0) - 2, len(g0)), order='F') # Regularizor
w=np.ones(y.reshape(-1, 1).size).reshape((-1, 1)) # Weights
sM, tM = np.meshgrid(s, t, indexing='xy')
A = np.exp(-tM/sM)
g0 = g0 * y.sum() / (A # g0).sum() # Makes a "better guess" for the distribution, according to algorithm
print('msd of input data:\n', msd(g0, y, A, alpha, R, w, constraints))
for i in range(5): # Just for testing. If this is extremely high, ~1000, it's still bad.
g = fmin(func=msd,
x0 = g0,
args=(y, A, alpha, R, w, constraints),
**options,
disp=True)[:, np.newaxis]
msdfit = msd(g, y, A, alpha, R, w, constraints)
if 'zero_at_extremes' in constraints:
g[0] = 0
g[-1] = 0
if 'g>0' in constraints:
g = np.abs(g)
g0 = g
print('New guess', g)
print('Final msd of g', msdfit)
# Visualize the fit
plt.plot(s, g, label='Initial approximation')
plt.plot(np.logspace(-3, 6, 11), np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]), label='Distribution to match')
plt.xscale('log')
plt.legend()
plt.show()
Matlab:
% Objective: match this distribution
g0 = [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0]';
s0 = logspace(-3,6,length(g0))';
t = linspace(0.01,500,100)';
[sM,tM] = meshgrid(s0,t);
A = exp(-tM./sM);
rng(1);
% Creates data from the initial distribution with some random noise.
data = A*g0 + 0.07*rand(size(t));
% Parameters and function start
alpha = 1e-2; % regularization parameter
s = logspace(-3,6,20)'; % x of the ILT
g0 = ones(size(s)); % initial guess of y of ILT
y = data; % noisy data
options = optimset('MaxFunEvals',1e8,'MaxIter',1e8); % constraints for fminsearch
constraints = {'g>0','zero_at_the_extremes'}; % constraints for MSD
R = zeros(length(g0)-2,length(g0));
w = ones(size(y(:)));
[sM,tM] = meshgrid(s,t);
A = exp(-tM./sM);
g0 = g0*sum(y)/sum(A*g0); % Makes a "better guess" for the distribution
disp('msd of input data:')
disp(msd(g0, y, A, alpha, R, w, constraints))
for k = 1:5
[g,msdfit] = fminsearch(#msd,g0,options,y,A,alpha,R,w,constraints);
if ismember('zero_at_the_extremes',constraints)
g(1) = 0;
g(end) = 0;
end
if ismember('g>0',constraints)
g = abs(g);
end
g0 = g;
end
disp('New guess')
disp(g)
disp('Final msd of g')
disp(msdfit)
% Visualize the fit
semilogx(s, g)
hold on
semilogx(logspace(-3,6,11), [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0])
legend('First approximation', 'Distribution to match')
hold off
function out = msd(g,y,A,alpha,R,w,constraints)
% msd: The mean square deviation; this is the function
% that has to be minimized by fminsearch
% Constraints and any 'a priori' knowledge
if ismember('zero_at_the_extremes',constraints)
g(1) = 0;
g(end) = 0;
end
if ismember('g>0',constraints)
g = abs(g); % must be g(i)>=0 for each i
end
r = diff(diff(g(1:end))); % second derivative of g
yfit = A*g;
% Sum of weighted square residuals
VAR = sum(w.*(y-yfit).^2);
% Regularizor
REG = alpha^2 * sum((r-R*g).^2);
% Output to be minimized
out = VAR+REG;
end
Here is the optimization in Python
Here is the optimization in Matlab
I have checked the output of MSD of g0 before starting, and both give the value of 2651. After minimization, Python goes up, to 4547, and Matlab goes down to 0.1381.
I think the problem is one of the following. It's in my implementation, that is, I am using fmin wrong, or there's some other passage I got wrong, but I can't figure out what. The fact the MSD increases when it should have decreased with a minimization function is damning. Reading the documentation, the scipy implementation is different from Matlab's (they use the Nelder Mead method described in Lagarias, per their documentation), while scipy uses the original Nelder Mead). Maybe that affects significantly? Or perhaps my initial guess is too bad for scipy's algorithm?
So, quite a long time since I posted this, but I wanted to share what I ended up learning and doing.
The Inverse Laplace Transform for CPMG data is a bit of a misnomer, and it's more properly called just inversion. The general problem is solving a Fredholm integral of the first kind. One way of doing this is the Tikhonov regularization method. Turns out, you can describe this problem quite easily using numpy, and solve it with a scipy package, so I don't have to "reinvent" the wheel with this.
I used the solution shown in this post, and the names here reflect that solution.
def tikhonov_regularized_inversion(
kernel: np.ndarray, alpha: float, data: np.ndarray
) -> np.ndarray:
data = data.reshape(-1, 1)
I = alpha * np.eye(*kernel.shape)
C = np.concatenate([kernel, I], axis=0)
d = np.concatenate([data, np.zeros_like(data)])
x, _ = nnls(C, d.flatten())
Here, kernel is a matrix containing all the possible exponential decay curves, and my solution judges the contribution of each decay curve in the data I received. First, I stack my data as a column, then pad it with zeros, creating the vector d. I then stack my kernel on top of a diagonal matrix containing the regularization parameter alpha along the diagonal, of the same size as the kernel. Last, I call the convenient nnls, a non negative least square solver in scipy.optimize. This is because there's no reason to have a negative contribution, only no contribution.
This solved my problem, it's quick and convenient.

Illogical parameters returned by scipy.curve_fit

I'm modelling a ball falling through fluid in Python and fitting the model function to a set of data points using the damping coefficients (a and b) and the density of the fluid, but the fitted value for the fluid density is coming back negative and I have no idea what is wrong in the code. My code is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy.optimize import curve_fit
##%%Parameters and constants
m = 0.1 #mass of object in kg
g = 9.81 #acceleration due to gravity in m/s**2
rho = 700 #density of object in kg/m**3
v0 = 0 #velocity at t=0
y0 = 0 #position at t=0
V = m / rho #volume in cubic meters
r = ((3/4)*(V/np.pi))**(1/3) #radius of the sphere
asample = 0.0001 #sample value for a
bsample = 0.0001 #sample value for b
#%%Defining integrating function
##fuction => y'' = g*(1-(rhof/rho))-((a/m)y'+(b/m)y'**2)
## y' = v
## v' = g*(1-rhof/rho)-((a/m)v+(b/m)v**2)
def sinkingball(Y, time, a, b, rhof):
return [Y[1],(1/m)*(V*g*(rho-rhof)-a*Y[1]-b*(Y[1]**2))]
def balldepth(time, a, b, rhof):
solutions = odeint(sinkingball, [y0,v0], time, args=(a, b, rhof))
return solutions[:,0]
time = np.linspace(0,15,151)
# imported some experimental values and named the array data
a, b, rhof = curve_fit(balldepth, time, data, p0=(asample, bsample, 100))[0]
print(a,b,rhof)
Providing the output you actually get would be helpful, and the comment about time not being used by sinkingball() is worth following.
You might find lmfit (https://lmfit.github.io/lmfit-py) useful. This provides a higher-level interface to curve-fitting that allows, among other things, placing bounds on parameters so that they can remain physically sensible. I think your problem would translate from curve_fit to lmfit as:
from lmfit import Model
def balldepth(time, a, b, rhof):
solutions = odeint(sinkingball, [y0,v0], time, args=(a, b, rhof))
return solutions[:,0]
# create a model based on the model function "balldepth"
ballmodel = Model(balldepth)
# create parameters, which will be named using the names of the
# function arguments, and provide initial values
params = bollmodel.make_params(a=0.001, b=0.001, rhof=100)
# you wanted rhof to **not** vary in the fit:
params['rhof'].vary = False
# set min/max values on `a` and `b`:
params['a'].min = 0
params['b'].min = 0
# run the fit
result = ballmodel.fit(data, params, time=time)
# print out full report of results
print(result.fit_report())
# get / print out best-fit parameters:
for parname, param in result.params.items():
print("%s = %f +/- %f" % (parname, param.value, param.stderr))

passing a float to a function with a bounded domain

I am trying to calculate the angle between some vectors in python2.7.
I am using the following identity to find the angle.
theta = acos(v . w / |v||w|)
For a particular instance my code is:
v = numpy.array([1.0, 1.0, 1.0])
w = numpy.array([1.0, 1.0, 1.0])
a = numpy.dot(v, w) / (numpy.linalg.norm(v) * numpy.linalg.norm(w))
theta = math.acos(a)
When I run this I get the error ValueError: math domain error
I assume this is because acos is only defined on the domain [-1,1] and my value 'a' is a float which is very close to 1 but actually a little bit bigger. I can confirm this with print Decimal(a) and I get 1.0000000000000002220446...
What is the best way to work around this problem?
All I can think of is to check for any values of 'a' being bigger than 1 (or less than -1) and round them to precisely 1. This seems like a tacky work around. Is there a neater / more conventional way to solve this problem?
numpy.clip:
was used in Angles between two n-dimensional vectors in Python
numpy.nan_to_num: also looks like a good patch if you re-arrange the math
and could be used with your code modified with my theta = atan2(b,a) formulation to avoid 1 + eps trouble with acos (which was my my 1st pass with: b = np.nan_to_num(np.sqrt(1 - a ** 2)))
But I have issues with the near universal use of the dot product alone with acos for the angle between vectors problem, particularly in 2 and 3 D where we have a np.cross product
I prefer forming the cross product b "sine" term, passing both the unnormalized a "cosine" term and my b to atan2:
import numpy as np
v = np.array([1.0, 1.0, 1.0])
w = np.array([1.0, 1.0, 1.0])
a = np.dot(v, w)
c = np.cross(v, w)
b = np.sqrt(np.dot(c,c))
theta = np.arctan2(b,a)
the atan2(b, a) formulation won't throw an exception with 1 + eps float errors from linalg.norm floating point tolerance if you use the normed args - and atan2 doesn't need normed args anyway
I believe it is more numerically robust with the cross product b term and atan2 giving better accuracy overall than just using the information in the a dot product "cosine" term with acos
edit: (a bit of a Math explaination, not the same a, b, c as in the code above, somewhat mixed up vector math typed in text since MathJax doesn't seem to be enabled on this forum )
a * b = |a||b| cos(w)
c = a x b = |a||b| sin(w) c_unit_vector
sqrt(c * c) = |a||b| sin(w) since c_unit_vector * c_unit_vector = 1
so we end up with atan( |a||b| sin(w), |a||b| cos(w) ) and the |a||b| cancel out in the ratio calculation internal to atan

Categories

Resources