overflow in exponential function while trying to integrate - python
I want to numerically integrate a discrete dataset (given ad pandas series) -here orange- which is multiplied with a given analytical exponential function (derivative of a Fermi-Dirac-Distribution) -here blue-. However I fail when the exponent becomes large (e.g. for small T) and thus the derivative fermi_dT(E, mu, T)explodes. I couldn't find a way to rewrite fermi_dT(E, mu, T)in an appropriate way to get it done.
Below is a minimal example (not with pandas series), where I simulated the dataset by a Gaussian.
If T<30. I'll get an overflow. Does anyone see a clever way to get around?
import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt
scale_plot = 1e6
kB = 8.618292134831462e-5 #in eV
Ef = 2.0
def gaussian(E, amp, E0, sig):
return amp * np.exp(-(E-E0)**2 / sig)
def fermi_dT(E, mu, T):
return ((np.exp((E - mu) / (kB * T))*(E-mu)) / ((1 + np.exp((E - mu) / (kB * T)))**2*kB*T**2))
T = 100.0
energies = np.arange(1.,3.,0.001)
plt.plot(energies, (energies-Ef)*fermi_dT(energies, Ef, T))
plt.plot(energies, gaussian(energies, 1e-5, 1.8, .01))
plt.plot(energies, gaussian(energies, 1e-5, 1.8, .01)*(energies-Ef)*fermi_dT(energies, Ef, T)*scale_plot)
plt.show()
cum = integrate.cumtrapz(gaussian(energies, 1e-5, 1.8, .01)*(energies-Ef)*fermi_dT(energies, Ef, T), energies)
print(cum[-1])
This kind of numerical issue is quite usual when dealing with exponential derivatives. The trick is to compute first the log, and only after to apply the exponential:
log(a*exp(b) / (1 + c*exp(d)) ** k) = log(a) + b - k * log(1 + exp(log(c) + d)))
Now, you need to find a way to compute log(1 + exp(x)) accurately. Lucky for you, people have done it before, according to this post. So maybe you could rewrite fermi_dT using log1p:
import numpy as np
def softplus(x, limit=30):
val = np.empty_like(x)
val[x>=limit] = x[x>=limit]
val[x<limit] = np.log1p(np.exp(x[x<limit]))
return val
def fermi_dT(E, mu, T):
a = (E - mu) / (kB * T ** 2)
b = d = (E - mu) / (kB * T)
k = 2
val = np.empty_like(E)
val[E-mu>=0] = np.exp(np.log(a[E-mu>=0]) + b[E-mu>=0] - k * softplus(d[E-mu>=0]))
val[E-mu<0] = -np.exp(np.log(-a[E-mu<0]) + b[E-mu<0] - k * softplus(d[E-mu<0]))
return val
Related
How to use Gradient Descent to solve this multiple terms trigonometry function?
Question is like this: f(x) = A sin(2π * L * x) + B cos(2π * M * x) + C sin(2π * N * x) and L,M,N are constants integer, 0 <= L,M,N <= 100 and A,B,C can be any possible integers. Here is the given data: x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99] y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697] enter image description here How to use Gradient Descent to solve this multiple terms trigonometry function?
Gradient descent is not well suited for optimisation over integers. You can try a navie relaxation where you solve in floats, and hope the rounded solution is still ok. from autograd import grad, numpy as jnp import numpy as np def cast(params): [A, B, C, L, M, N] = params L = jnp.minimum(jnp.abs(L), 100) M = jnp.minimum(jnp.abs(M), 100) N = jnp.minimum(jnp.abs(N), 100) return A, B, C, L, M, N def pred(params, x): [A, B, C, L, M, N] = cast(params) return A *jnp.sin(2 * jnp.pi * L * x) + B*jnp.cos(2*jnp.pi * M * x) + C * jnp.sin(2 * jnp.pi * N * x) x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99] y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697] def loss(params): p = pred(params, np.array(x)) return jnp.mean((np.array(y)-p)**2) params = np.array([np.random.random()*100 for _ in range(6)]) for _ in range(10000): g = grad(loss) params = params - 0.001*g(params) print("Relaxed solution", cast(params), "loss=", loss(params)) constrained_params = np.round(cast(params)) print("Integer solution", constrained_params, "loss=", loss(constrained_params)) print() Since the problem will have a lot of local minima, you might need to run it multiple times.
It's quite hard to use gradient descent to find a solution to this problem, because it tends to get stuck when changing the L, M, or N parameters. The gradients for those can push it away from the right solution, unless it is very close to an optimal solution already. There are ways to get around this, such as basinhopping or random search, but because of the function you're trying to learn, you have a better alternative. Since you're trying to learn a sinusoid function, you can use an FFT to find the frequencies of the sine waves. Once you have those frequencies, you can find the amplitudes and phases used to generate the same sine wave. Pardon the messiness of this code, this is my first time using an FFT. import scipy.fft import numpy as np import math import matplotlib.pyplot as plt def get_top_frequencies(x, y, num_freqs): x = np.array(x) y = np.array(y) # Find timestep (assume constant timestep) dt = abs(x[0] - x[-1]) / (len(x) - 1) # Take discrete FFT of y spectral = scipy.fft.fft(y) freq = scipy.fft.fftfreq(y.shape[0], d=dt) # Cut off top half of frequencies. Assumes input signal is real, and not complex. spectral = spectral[:int(spectral.shape[0] / 2)] # Double amplitudes to correct for cutting off top half. spectral *= 2 # Adjust amplitude by sampling timestep spectral *= dt # Get ampitudes for all frequencies. This is taking the magnitude of the complex number spectral_amplitude = np.abs(spectral) # Pick frequencies with highest amplitudes highest_idx = np.argsort(spectral_amplitude)[::-1][:num_freqs] # Find amplitude, frequency, and phase components of each term highest_amplitude = spectral_amplitude[highest_idx] highest_freq = freq[highest_idx] highest_phase = np.angle(spectral[highest_idx]) / math.pi # Convert it into a Python function function = ["def func(x):", "return ("] for i, components in enumerate(zip(highest_amplitude, highest_freq, highest_phase)): amplitude, freq, phase = components plus_sign = " +" if i != (num_freqs - 1) else "" term = f"{amplitude:.2f} * math.cos(2 * math.pi * {freq:.2f} * x + math.pi * {phase:.2f}){plus_sign}" function.append(" " + term) function.append(")") return "\n ".join(function) x = [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99] y = [4,1.240062433,-0.7829654986,-1.332487982,-0.3337640721,1.618033989,3.512512389,4.341307895,3.515268061,1.118929599,-2.097886967,-4.990538967,-6.450324073,-5.831575611,-3.211486891,0.6180339887,4.425660706,6.980842552,7.493970785,5.891593744,2.824429495,-0.5926374511,-3.207870455,-4.263694544,-3.667432785,-2,-0.2617162175,0.5445886005,-0.169441247,-2.323237059,-5.175570505,-7.59471091,-8.488730333,-7.23200463,-3.924327772,0.6180339887,5.138501587,8.38127157,9.532377045,8.495765687,5.902113033,2.849529206,0.4768388529,-0.46697525,0.106795821,1.618033989,3.071952496,3.475795162,2.255463709,-0.4905371745,-4,-7.117914956,-8.727599664,-8.178077181,-5.544088451,-1.618033989,2.365340134,5.169257268,5.995297102,4.758922924,2.097886967,-0.8873135564,-3.06024109,-3.678989552,-2.666365632,-0.6180339887,1.452191817,2.529722611,2.016594378,-0.01374122059,-2.824429495,-5.285215072,-6.302694708,-5.246870619,-2.210419738,2,6.13956874,8.965976562,9.68000641,8.201089581,5.175570505,1.716858387,-1.02183483,-2.278560533,-1.953524751,-0.6180339887,0.7393509358,1.129293593,-0.02181188158,-2.617913164,-5.902113033,-8.727381729,-9.987404016,-9.043589913,-5.984648344,-1.618033989,2.805900027,6.034770001,7.255101454,6.368389697] print(get_top_frequencies(x, y, 3)) That produces this function: def func(x): return ( 5.00 * math.cos(2 * math.pi * 10.00 * x + math.pi * 0.50) + 4.00 * math.cos(2 * math.pi * 5.00 * x + math.pi * -0.00) + 2.00 * math.cos(2 * math.pi * 3.00 * x + math.pi * -0.50) ) Which is not quite the format you specified - you asked for two sins and one cos, and for no phase parameter. However, using the trigonometric identity cos(x) = sin(pi/2 - x), you can convert this into an equivalent expression that matches what you want: def func(x): return ( 5.00 * math.sin(2 * math.pi * -10.00 * x) + 4.00 * math.cos(2 * math.pi * 5.00 * x) + 2.00 * math.sin(2 * math.pi * 3.00 * x) ) And there's the original function!
How calculate a double integral accurately using python
I'm trying to calculate a double integral given by : import scipy.special as sc from numpy.lib.scimath import sqrt as csqrt from scipy.integrate import dblquad def g_re(alpha, beta, k, m): psi = csqrt(alpha ** 2 + beta ** 2 - k ** 2) return np.real( sc.jv(m, alpha) * sc.jv(m, beta) * sc.jv(m, alpha) * np.sin(beta) * sc.jv(m, -1j * psi) * np.exp(-psi) / (alpha ** 2 * psi) ) def g_im(alpha, beta, k, m): psi = csqrt(alpha ** 2 + beta ** 2 - k ** 2) return np.imag( sc.jv(m, alpha) * sc.jv(m, beta) * sc.jv(m, alpha) * np.sin(beta) * sc.jv(m, -1j * psi) * np.exp(-psi) / (alpha ** 2 * psi) ) k = 5 m = 0 tuple_args = (k, m) ans = dblquad(g_re, 0.0, np.inf, 0, np.inf, args=tuple_args)[0] ans += 1j * dblquad(g_im, 0.0, np.inf, 0, np.inf, args=tuple_args)[0] The integration intervals are along the positive real axes ([0, np.inf[). When calculating I got the following warning : /tmp/a.py:10: RuntimeWarning: invalid value encountered in multiply sc.jv(m, alpha) g/home/nschloe/.local/lib/python3.9/site-packages/scipy/integrate/quadpack.py:879: IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used. quad_r = quad(f, low, high, args=args, full_output=self.full_output, I subdivided the domain of integration but I still got the same warning. Could you help me please.
Is this correct for modeling gravity as a second order ODE?
This is my first question on here, so apologies if the formatting is off. I want to model Newton's Universal Law of Gravitation as a second-order differential equation in Python, but the resulting graph doesn’t make sense. For reference, here's the equation and [here's the result][2]. This is my code import numpy as np from scipy.integrate import odeint import matplotlib.pyplot as plt # dy/dt def model(r, t): g = 6.67408 * (10 ** -11) m = 5.972 * 10 ** 24 M = 1.989 * 10 ** 30 return -m * r[1] + ((-g * M * m) / r ** 2) r0 = [(1.495979 * 10 ** 16), 299195800] t = np.linspace(-(2 * 10 ** 17), (2 * 10 ** 17)) r = odeint(model, r0, t) plt.plot(t, r) plt.xlabel('time') plt.ylabel('r(t)') plt.show() I used this website as a base for the code I have virtually no experience with using Python as an ODE solver. What am I doing wrong? Thank you!
To integrate a second order ode, you need to treat it like 2 first order odes. In the link you posted all the examples are second order, and they do this. m d^2 r/ dt^2 = - g M m / r^2 r = u[0] dr / dt = u[1] (1) d/dt(u[0]) = u[1] m * d/dt(u[1]) = -g M m / u[0]^2 => (2) d/dt(u[1]) = -g M / u[0]^2 In python this looks like import numpy as np from scipy.integrate import odeint import matplotlib.pyplot as plt def model(u, t): g = 6.67408 * (10 ** -11) M = 1.989 * 10 ** 30 return (u[1], (-g * M ) / (u[0] ** 2)) r0 = [(1.495979 * 10 ** 16), 299195800] t = np.linspace(0, 5 * (10 ** 15), 500000) r_t = odeint(model, r0, t) r_t = r_t[:,0] plt.plot(t, r_t) plt.xlabel('time') plt.ylabel('r(t)') plt.show() I also made some changes to your time list. What I got for the graph looks like so which makes sense to me. You have a mass escaping away from a large mass but at an incredible starting distance and speed, so r(t) should pretty much be linear in time. Then I brought the speed of 299195800 down to 0, resulting in
What would be the computationally faster way to implement this 2D numerical integration?
I am interested in doing a 2D numerical integration. Right now I am using the scipy.integrate.dblquad but it is very slow. Please see the code below. My need is to evaluate this integral 100s of times with completely different parameters. Hence I want to make the processing as fast and efficient as possible. The code is: import numpy as np from scipy import integrate from scipy.special import erf from scipy.special import j0 import time q = np.linspace(0.03, 1.0, 1000) start = time.time() def f(q, z, t): return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp( -0.5 * ((z - 40) / 2) ** 2) y = np.empty([len(q)]) for n in range(len(q)): y[n] = integrate.dblquad(lambda t, z: f(q[n], z, t), 0, 50, lambda z: 10, lambda z: 60)[0] end = time.time() print(end - start) Time taken is 212.96751403808594 This is too much. Please suggest a better way to achieve what I want to do. I tried to do some search before coming here, but didn't find any solution. I have read quadpy can do this job better and very faster but I have no idea how to implement the same. Please help.
You could use Numba or a low-level-callable Almost your example I simply pass function directly to scipy.integrate.dblquad instead of your method using lambdas to generate functions. import numpy as np from scipy import integrate from scipy.special import erf from scipy.special import j0 import time q = np.linspace(0.03, 1.0, 1000) start = time.time() def f(t, z, q): return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp( -0.5 * ((z - 40) / 2) ** 2) def lower_inner(z): return 10. def upper_inner(z): return 60. y = np.empty(len(q)) for n in range(len(q)): y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0] end = time.time() print(end - start) #143.73969149589539 This is already a tiny bit faster (143 vs. 151s) but the only use is to have a simple example to optimize. Simply compiling the functions using Numba To get this to run you need additionally Numba and numba-scipy. The purpose of numba-scipy is to provide wrapped functions from scipy.special. import numpy as np from scipy import integrate from scipy.special import erf from scipy.special import j0 import time import numba as nb q = np.linspace(0.03, 1.0, 1000) start = time.time() #error_model="numpy" -> Don't check for division by zero #nb.njit(error_model="numpy",fastmath=True) def f(t, z, q): return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp( -0.5 * ((z - 40) / 2) ** 2) def lower_inner(z): return 10. def upper_inner(z): return 60. y = np.empty(len(q)) for n in range(len(q)): y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0] end = time.time() print(end - start) #8.636585235595703 Using a low level callable The scipy.integrate functions also provide the possibility to pass C-callback function instead of a Python function. These functions can be written for example in C, Cython or Numba, which I use in this example. The main advantage is, that no Python interpreter interaction is necessary on function call. An excellent answer of #Jacques Gaudin shows an easy way to do this including additional arguments. import numpy as np from scipy import integrate from scipy.special import erf from scipy.special import j0 import time import numba as nb from numba import cfunc from numba.types import intc, CPointer, float64 from scipy import LowLevelCallable q = np.linspace(0.03, 1.0, 1000) start = time.time() def jit_integrand_function(integrand_function): jitted_function = nb.njit(integrand_function, nopython=True) #error_model="numpy" -> Don't check for division by zero #cfunc(float64(intc, CPointer(float64)),error_model="numpy",fastmath=True) def wrapped(n, xx): ar = nb.carray(xx, n) return jitted_function(ar[0], ar[1], ar[2]) return LowLevelCallable(wrapped.ctypes) #jit_integrand_function def f(t, z, q): return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp( -0.5 * ((z - 40) / 2) ** 2) def lower_inner(z): return 10. def upper_inner(z): return 60. y = np.empty(len(q)) for n in range(len(q)): y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0] end = time.time() print(end - start) #3.2645838260650635
Generally it is much, much faster to do a summation via matrix operations than to use scipy.integrate.quad (or dblquad). You could rewrite your f(q, z, t) to take in a q, z and t vector and return a 3D-array of f-values using np.tensordot, then multiply your area element (dtdz) with the function values and sum them using np.sum. If your area element is not constant, you have to make an array of area-elements and use np.einsum To take your integration limits into account you can use a masked array to mask the function values outside your integration limits before summarizing. Take note that np.einsum overlooks the masks, so if you use einsum you can use np.where to set function values outside your integration limits to zero. Example (with constant area element and simple integration limits): import numpy as np import scipy.special as ss import time def f(q, t, z): # Making 3D arrays before computation for readability. You can save some time by # Using tensordot directly when computing the output Mq = np.tensordot(q, np.ones((len(t), len(z))), axes=0) Mt = np.tensordot(np.ones(len(q)), np.tensordot(t, np.ones(len(z)), axes = 0), axes = 0) Mz = np.tensordot(np.ones((len(q), len(t))), z, axes = 0) return Mt * 0.5 * (ss.erf((Mt - Mz) / 3) - 1) * (Mq * Mt) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp( -0.5 * ((Mz - 40) / 2) ** 2) q = np.linspace(0.03, 1, 1000) t = np.linspace(0, 50, 250) z = np.linspace(10, 60, 250) #if you have constand dA you can shave some time by computing dA without using np.diff #if dA is variable, you have to make an array of dA values and np.einsum instead of np.sum t0 = time.process_time() dA = np.diff(t)[0] * np.diff(z)[0] func_vals = f(q, t, z) I = np.sum(func_vals * dA, axis=(1, 2)) t1 = time.process_time() this took 18.5s on my 2012 macbook pro (2.5GHz i5) with dA = 0.04. Doing things this way also allows you to easily choose between precision and efficiency, and to set dA to a value that makes sense when you know how your function behaves. However, it is worth noting that if you want a larger amount of points, you have to split up your integral, or else you risk maxing out your memory (1000 x 1000 x 1000) doubles requires 8GB of ram. So if you are doing very big integrations with high presicion it can be worth doing a quick check on the memory required before running.
Fitting data with a custom distribution using scipy.stats
So I noticed that there is no implementation of the Skewed generalized t distribution in scipy. It would be useful for me to fit this is distribution to some data I have. Unfortunately fit doesn't seem to be working in this case for me. To explain further I have implemented it like so import numpy as np import pandas as pd import scipy.stats as st from scipy.special import beta class sgt(st.rv_continuous): def _pdf(self, x, mu, sigma, lam, p, q): v = q ** (-1 / p) * \ ((3 * lam ** 2 + 1) * ( beta(3 / p, q - 2 / p) / beta(1 / p, q)) - 4 * lam ** 2 * (beta(2 / p, q - 1 / p) / beta(1 / p, q)) ** 2) ** (-1 / 2) m = 2 * v * sigma * lam * q ** (1 / p) * beta(2 / p, q - 1 / p) / beta( 1 / p, q) fx = p / (2 * v * sigma * q ** (1 / p) * beta(1 / p, q) * ( abs(x - mu + m) ** p / (q * (v * sigma) ** p) * ( lam * np.sign(x - mu + m) + 1) ** p + 1) ** ( 1 / p + q)) return fx def _argcheck(self, mu, sigma, lam, p, q): s = sigma > 0 l = -1 < lam < 1 p_bool = p > 0 q_bool = q > 0 all_bool = s & l & p_bool & q_bool return all_bool This all works fine and I can generate random variables with given parameters no problem. The _argcheck is required as a simple positive params only check is not suitable. sgt_inst = sgt(name='sgt') vars = sgt_inst.rvs(mu=1, sigma=3, lam = -0.1, p = 2, q = 50, size = 100) However, when I try fit these parameters I get an error sgt_inst.fit(vars) RuntimeWarning: invalid value encountered in subtract numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= fatol): and it just returns What I find strange is that when I implement the example custom Gaussian distribution as shown in the docs, it has no problem running the fit method. Any ideas?
As fit docstring says, Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates, self._fitstart(data) is called to generate such. Calling sgt_inst._fitstart(data) returns (1.0, 1.0, 1.0, 1.0, 1.0, 0, 1) (the first five are shape parameters, the last two are loc and scale). Looks like _fitstart is not a sophisticated process. The parameter l it picks does not meet your argcheck requirement. Conclusion: provide your own starting parameters for fit, e.g., sgt_inst.fit(data, 0.5, 0.5, -0.5, 2, 10) returns (1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152 4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771) for my random data.