I am using scipy.stats to fit my data.
scipy.stats.invgauss.fit(my_data_array)
scipy.stats.wald.fit(my_data_array)
From wiki http://en.wikipedia.org/wiki/Inverse_Gaussian_distribution it says that Wald distribution is just another name for inverse gaussian, but using two functions above gives me different fitting parameters. scipy.stats.invgauss.fit gives me three parameters and scipy.stats.wald.fit gives two.
What is the difference between these two distributons in scipy.stats?
I was trying to find the answer here, http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wald.html and http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.invgauss.html, but really no clue.
The link to the scipy.stats wald distribution has the answer to your question.
wald is a special case of invgauss with mu == 1.
So the following should produce the same answer:
import numpy as np
import scipy.stats as st
my_data = np.random.randn(1000)
wald_params = st.wald.fit(my_data)
invgauss_params = st.invgauss.fit(my_data, f0=1)
wald_params and invgauss_params are the same except invgauss has a 1 in front of the other two parameters which is the parameter that they said would be fixed as one in the wald distribution (I fixed it with the arg f0=1).
Hope that helps.
Related
I am working in python and I have some performance data of some actions
DailyReturn = [0.325, -0.287, ...]
I've been trying to fit a normal distribution and a student's t-distribution to the density histogram of that data to use as a PDF. I would like to get the adjustment parameters, the standard errors of the parameters and the value of the LogLikelihood by the method of MLE (maximum likelihood). But I have run into some issues. At the moment I have this idea
import numpy as np
import math
import scipy.optimize as optimize
import statistics
def llnorm(par, data):
n = len(data)
mu, sigma = par
ll = -np.sum(-n/2*math.log(2*math.pi*(sigma**2))-((data-mu)**2)/(2*(sigma**2)))
return ll
data = DailyReturn
result = optimize.minimize(llnorm, [statistics.mean(data),statistics.stdev(data)], args = (data)
But I'm not sure and I'm lost with the t student distribution, is there an easier way to do it?
In scipy.stats you find several distributions, amongn them student's T and normal
These modules have a fit method. You can see an example here for normal distribution.
Your approach seems to be correct to normal distribution, I there is no point in this case since the optimal solution will be given by the mu and sigma you are passing.
why the p-value of kstest between array'x' and array'y' is less than 0.05? As you see, they are actually from one distribution (that is ,normal distribution).I cannot find the reasons and I'm very confused.Thanks you in advance!
import scipy.stats as st
import numpy as np
np.random.seed(12)
x = np.random.normal(0,1,size=1000)
y = np.random.normal(0,1,size=1000)
st.ks_2samp(x,y)
Out[9]: KstestResult(statistic=0.066, pvalue=0.025633868930359294)
This is correct. Remember the p-value being low means you have grounds to reject the null hypothesis, which says that these two samples came from the same distribution. But rejecting the null hypothesis is not the same as affirming that these two came from different distributions, it just means that you can't conclude that they came from the same distribution.
I'm new to Bayesian stats and I'm trying to estimate the posterior of a poisson (likelihood) and gamma distribution (prior) in Python. The parameter I'm trying to estimate is the lambda variable in the poisson distribution. I think the posterior will take the form of a gamma distribution (conjugate prior?) but I don't want to leverage that. The only thing I'm given is the data (named "my_data"). Here's my code:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.stats
x=np.linspace(1,len(my_data),len(my_data))
lambda_estimate=np.mean(my_data)
prior= scipy.stats.gamma.pdf(x,alpha,beta) #the parameters dont matter for now
likelihood_temp = lambda yi, a: scipy.stats.poisson.pmf(yi, a)
likelihood = lambda y, a: np.log(np.prod([likelihood_temp(data, a) for data in my_data]))
posterior=likelihood(my_data,lambda_estimate) * prior
When I try to plot the posterior I get an empty plot. I plotted the prior and it looks fine, so I think the issue is the likelihood. I took the log because the data is fairly large and I didn't want things to get unstable. Can anyone point out the issues in my code? Any help would be appreciated.
In Bayesian statistics, one goal is to calculate the posterior distribution of the parameter (lambda) given the data and the prior over a range of possible values for lambda. In your code, you calculating the prior over the array x, but you are taking a single value for lambda to calculate the likelihood. The posterior and likelihood should be over x as well, something like:
posterior = [likelihood(my_data, lambda_i) for lambda_i in x] * prior
(assuming you are not taking the logs of the prior and likelihood)
You might want to take a look at the PyMC3 library.
I would recommend you to have a look at the conjugate_prior module.
You could just type:
from conjugate_prior import GammaPoisson
model = GammaPoisson(prior_a, prior_b)
model = model.update(...)
credible_interval = model.posterior(lower_bound, upper_bound)
I'm having trouble finding quantile functions for well-known probability distributions in Python, do they exist? In particular, is there an inverse normal distribution function? I couldn't find anything in either Numpy or Scipy.
Check the .ppf() method of any distribution class in scipy.stats.
This is the equivalent of a quantile function (otherwise named as percent point function or inverse CDF)
An example with the exponential distribution from scipy.stats:
# analysis libs
import scipy
import numpy as np
# plotting libs
import matplotlib as mpl
import matplotlib.pyplot as plt
# Example with the exponential distribution
c = 0
lamb = 2
# Create a frozen exponential distribution instance with specified parameters
exp_obj = scipy.stats.expon(c,1/float(lamb))
x_in = np.linspace(0,1,200) # 200 numbers in [0,1], input for ppf()
y_out = exp_obj.ppf(x_in)
plt.plot(x_in,y_out) # graphically check the results of the inverse CDF
It seems new but I've found this about numpy and quantile. Maybe you can have a look (not tested)
random.gauss(mu, sigma)
Above is a function allowing to randomly draw a number from a normal distribution with a given mean and variance. But how can we draw values from a normal distribution defined by more than only the two first moments?
something like:
random.gauss(mu, sigma, skew, kurtosis)
How about using scipy? You can pick the distribution you want from continuous distributions in the scipy.stats library.
The generalized gamma function has non-zero skew and kurtosis, but you'll have a little work to do to figure out what parameters to use to specify the distribution to get a particular mean, variance, skew and kurtosis. Here's some code to get you started.
import scipy.stats
import matplotlib.pyplot as plt
distribution = scipy.stats.norm(loc=100,scale=5)
sample = distribution.rvs(size=10000)
plt.hist(sample)
plt.show()
print distribution.stats('mvsk')
This displays a histogram of a 10,000 element sample from a normal distribution with mean 100 and variance 25, and prints the distribution's statistics:
(array(100.0), array(25.0), array(0.0), array(0.0))
Replacing the normal distribution with the generalized gamma distribution,
distribution = scipy.stats.gengamma(100, 70, loc=50, scale=10)
you get the statistics [mean, variance, skew, kurtosis]
(array(60.67925117494595), array(0.00023388203873597746), array(-0.09588807605341435), array(-0.028177799805207737)).
Try to use this:
http://statsmodels.sourceforge.net/devel/generated/statsmodels.sandbox.distributions.extras.pdf_mvsk.html#statsmodels.sandbox.distributions.extras.pdf_mvsk
Return the Gaussian expanded pdf function given the list of 1st, 2nd
moment and skew and Fisher (excess) kurtosis.
Parameters : mvsk : list of mu, mc2, skew, kurt
Looks good to me. There's a link to the source on that page.
Oh, and here's the other StackOverflow question that pointed me there:
Apply kurtosis to a distribution in python