Write a scipy function without using a standard library (exponential power)

Write a scipy function without using a standard library (exponential power) - python

My question might come across as stupid or so simple, but I could not work towards finding a solution. Here is my question: I want to write an exponential power distribution function which is available in scipy. However, I don't want to use the scipy for this. How do I go about it?
Here are my efforts so far:
import math
import numpy as np
def ExpPowerFun(x,b, size=1000):
distribution = b*x**(b-1)*math.exp(1+x**b-math.exp(x**b))
return distribution
I used this equation based on this scipy doc. To be fair, using this equation and writing a function using it doesn't do much. As you can see, it returns only one value. I want to generate a distribution of random numbers based on scipy's exponential power distribution function without using scipy.
I have looked at class exponpow_gefrom github code. However, it uses scipy.special(-sc), so it's kind of useless for me, unless there is any workaround and avoids the use of scipy.
I can't figure out how to go about it. Again, this might be a simple task, but I am stuck. Please help.

the simplest way to generate a random number for a given distribution is using the inverse of the CDF of that function, the PPF (Percent point function) will give you the distribution you need when you apply it on uniform distributed numbers.
for you case the PPF (taken directly from scipy source code with some modifications) is:
np.power(np.log(1-np.log(1-x)), 1.0/b)
hence you code should look like this
def ExpPowerFun(b, size=1000):
x = np.random.rand(size)
return np.power(np.log(1-np.log(1-x)), 1.0/b)
import matplotlib.pyplot as plt
plt.hist(ExpPowerFun(2.7,10000),20)
plt.show()
Edit: the uniform distribution has to be from 0 to 1 ofc since the probabilities are from 0% to 100%

Related

Generating random numbers from custom continuous probability density function

as the title states I am trying to generate random numbers from a custom continuous probability density function, which is:
0.001257 *x^4 * e^(-0.285714 *x)
to do so, I use (on python 3) scipy.stats.rv_continuous and then rvs() to generate them
from decimal import Decimal
from scipy import stats
import numpy as np
class my_distribution(stats.rv_continuous):
def _pdf(self, x):
return (Decimal(0.001257) *Decimal(x)**(4)*Decimal(np.exp(-0.285714 *x)))
distribution = my_distribution()
distribution.rvs()
note that I used Decimal to get rid of an OverflowError: (34, 'Result too large').
Still, I get an error RuntimeError: Failed to converge after 100 iterations.
What's going on there? What's the proper way to achieve what I need to do?

I've found out the reason for your issue.
rvs by default uses numerical integration, which is a slow process and can fail in some cases. Your PDF is presumably one of those cases, where the left side grows without bound.
For this reason, you should specify the distribution's support as follows (the following example shows that the support is in the interval [-4, 4]):
distribution = my_distribution(a = -4, b = 4)
With this interval, the PDF will be bounded from above, allowing the integration (and thus the random variate generation) to work as normal. Note that by default, rv_continuous assumes the distribution is supported on the entire real line.
However, this will only work for the particular PDF you give here, not necessarily for arbitrary PDFs.
Usually, when you only give a PDF to your rv_continuous subclass, the subclass's rvs, mean, etc. Will then be very slow, because the method needs to integrate the PDF every time it needs to generate a random variate or calculate a statistic. For example, random variate generation requires using numerical integration to integrate the PDF, and this process can fail to converge depending on the PDF.
In future cases when you're dealing with arbitrary distributions, and particularly when speed is at a premium, you will thus need to add to an _rvs method that uses its own sampler. One example is a much simpler rejection sampler given in the answer to a related question.
See also my section "Sampling from an Arbitrary Distribution".

Integration with Scipy giving incorrect results with negative lower bound

I am attempting to calculate integrals between two limits using python/scipy.
I am using online calculators to double check my results (http://www.wolframalpha.com/widgets/view.jsp?id=8c7e046ce6f4d030f0b386ea5c17b16a, http://www.integral-calculator.com/), and my results disagree when I have certain limits set.
The code used is:
import scipy as sp
import numpy as np
def integrand(x):
return np.exp(-0.5*x**2)
def int_test(a,b):
# a and b are the lower and upper bounds of the integration
return sp.integrate.quad(integrand,a,b)
When setting the limits (a,b) to (-np.inf,1) I get answers that agree (2.10894...)
however if I set (-np.inf,300) I get an answer of zero.
On further investigation using:
for i in range(50):
print(i,int_test(-np.inf,i))
I can see that the result goes wrong at i=36.
I was wondering if there was a way to avoid this?
Thanks,
Matt

I am guessing this has to do with the infinite bounds. scipy.integrate.quad is a wrapper around quadpack routines.
https://people.sc.fsu.edu/~jburkardt/f_src/quadpack/quadpack.html
In the end, these routines chose suitable intervals and try to get the value of the integral through function evaluations and then numerical integrations. This works fine for finite integrals, assuming you know roughly how fine you can make the steps of the function evaluation.
For infinite integrals it depends how well the algorithms choose respective subintervals and how accurately they are computed.
My advice: do NOT use numerical integration software AT ALL if you are interested in accurate values for infinite integrals.
If your problem can be solved analytically, try that or confine yourself to certain bounds.

How calculate the Error for trapezoidal rule if I only have data? (Python)

I got this array of data and I need to calculate the area under the curve, so I use the Numpy library and the Scipy library which contain the functions trapz in Numpy and integrate.simps in Scipy for a Numerical Integration which gave me a really nice result in both cases.
The problem now is, that I need the error for each one or at least the error for the Trapezoidal Rule. The thing is, that the formula for that ask me a function, which obviously I don't have. I have been researching for a way to obtain the error but always return to the same point...
Here are the pages of scipy.integrate http://docs.scipy.org/doc/scipy/reference/integrate.html and trapz in Numpy http://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html I try and see a lot of code about the Numerical Integration and prefer to use the existing ones...
Any ideas please?

While cel is right that you cannot determine an integration error if you don't know the function, there is something you can do.
You can use curve fitting to fit a function through the available data points. You can then use that function for error estimation.
If you expect the data to fit a certain kind of function like a sine, log or exponential it is good to use that as a basis for curve fitting.
For instance, if you are measuring the drag on a moving car, it is known that this mostly proportional to the velocity squared because of air resistance.
However, if you do not have any knowledge about the applicable function then assuming you have N data points, there is a polynomial of the N-1 degree that fits exactly though all those data points. Determining such a polynomial from the data is solving a system of lineair equations. See e.g. polynomial interpolation. You could use this polynomial as an estimate for the unknown real function. Note however that outside the range of data points this polynomial might be wildly inaccurate.

python (SimPy) generate random numbers that follow the erlang distribution

I am using Python (SimPy package mostly, but it is irrelevant to the question I think), modeling some systems and running simulations. For this purpose I need to produce random numbers that follow distributions. I have done alright so far with some distributions like exponential and normal by importing the random (eg from random import *) and using the expovariate or normalvariate methods. However I cannot find any method in random that produce numbers that follow the Erlang distribution. So:
Is there some method that I overlooked?
Do I have to import some other library?
Can I make some workaround? (In think that I can use the Exponential distribution to produce random “Erlang” numbers but I am not sure how. A piece of code might help me.
Thank you in advance!

Erlang distribution is a special case of the gamma distribution, which exists as numpy.random.gamma (reference). Just use an integer value for the k ("shape") argument. See also about scipy.stats.gamma for functions with the PDF, CDF etc.

As the previous answer stated, the erlang distribution is a special case of the gamma distribution. As far as I know, you do not, however, need the numpy package. Random numbers from a gamma distribution can be generated in python using random.gammavariate(alpha, beta).
Usage:
import random
print random.gammavariate(3,1)

kmedoids using Pycluster with various distance functions

I am using python 2.6 for windows. I am working on OpenCv core module. I search around about the kmedoids function defined in Pycluster, but did not get the accurate answer.
I have installed Pycluster 1.50 in windows7. Can somebody explain how to use Eucledian diatnce, L1 and L2 distance, hellinger distance and Chi-square distance using kmedoids?
Through searching I know so far.
import Pycluster
from Pycluster import distancematrix, kmedoids
The kmedoid function takes four arguments (as mentioned below), among them one is a distance. But I am unable to understand how to specify different distance measures in kmedoids function
clusterid, error, nfound = kmedoids (distance, nclusters=2, npass=1, initialid=None)
Any help regarding the matter would be highly encouraged

As Shambool points out, the documentation gives you the answer. You don't pass a distance function directly, you pass a pairwise distance matrix. So compute that first, with whatever distance metric you want and then pass it along to kmedoids

It seems you didn't even bother to look at the documentation, on pages 28-29 this is clearly explained.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write a scipy function without using a standard library (exponential power) - python

Related

Generating random numbers from custom continuous probability density function

Integration with Scipy giving incorrect results with negative lower bound

How calculate the Error for trapezoidal rule if I only have data? (Python)

python (SimPy) generate random numbers that follow the erlang distribution

kmedoids using Pycluster with various distance functions

Categories

Resources