Related
I have the code which generates a normal distribution as a pdf, centered at the mean 400, with st
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
muPrev, sigmaPrev = 400, 40.
a = np.random.normal(muPrev, sigmaPrev, 100000)
count, bins, ignored = plt.hist(a, 1000, density=True)
plt.plot(bins, 1/(sigmaPrev * np.sqrt(2 * np.pi)) *
np.exp( - (bins - muPrev)**2 / (2 * sigmaPrev**2) ),linewidth=3, color='r')
and I can visualise it. But what if I wanted to convert this into a lognormal distribution? So that I now get values of mu and sigma that correspond to this as a log distribution?
What is posted by #SamMason is not correct. It is somewhat working because your mean and sd are relative large.
Ok, here is what would be correct way to get parameters of the Log-Normal distribution.
You have predefined values of mean (corresponding to your Gaussian mean) and sd (again, your Gaussian sd).
Mean=exp(μ+σ2/2)
Var =(exp(σ2) - 1)(exp(2μ+σ2))
Here μ and σ are log-normal (NOT gaussian) parameter. You have to find them.
Compute mean from your Gaussian mean (ok, that one is easy, they are equal)
Compute variance from your Gaussian sd (square)
Using formulas above solve two non-linear equations system and get your μ and σ
Plug μ and σ into your sampling routine and draw samples
UPDATE
Mean2=exp(2μ+σ2)
Var/Mean2 = (exp(σ2) - 1)
So here is your σ. To be more elaborate
Sd2/Mean2 = exp(σ2) - 1
exp(σ2) = 1 + Sd2/Mean2
σ2 = ln(1 + Sd2/Mean2)
From first equation now you could get μ
2μ+σ2 = ln(Mean2)
2μ=ln(Mean2) - σ2 = ln(Mean2) - ln(1 + Sd2/Mean2) = ln((Mean2)/(1 + Sd2/Mean2))
Please, check the math, but this is the way to get PRECISE log-normal μ,σ parameters to match desired Mean and Sd.
#SamMason approximation works, I believe, only if in the expression for
σ2 = ln(1 + Sd2/Mean2)
one have second term much larger than 1. THen you could drop 1 and have log of ratios.
UPDATE II
2μ=ln((Mean2)/(1 + Sd2/Mean2)) = ln(Mean4/(Mean2 + Sd2))
μ=1/2 ln(Mean4/(Mean2 + Sd2))=ln(Mean2/Sqrt(Mean2 + Sd2))
You could directly generate samples for a lognormal distribution with https://numpy.org/doc/stable/reference/random/generated/numpy.random.lognormal.html, alternatively:
log_norm = np.exp(a)
Note that if you want to generate the lognormal directly you need to calculate the appropriate mean and variance https://en.wikipedia.org/wiki/Log-normal_distribution
To give a more complete answer, here's some code that draws a figure with two plots: one shows your existing Gaussian draws and another for log-normal draws. I keep the first and second moments the same (i.e. mean and variance) by setting the log-normal mu=log(mu) and sigma=sd/mu.
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
mu, sd = 400, 40
n = 100_000
# draw samples from distributions
a = np.random.normal(mu, sd, n)
b = np.random.lognormal(np.log(mu), sd / mu, n)
# use Scipy for analytical PDFs
d1 = sps.norm(mu, sd)
# warning: scipy parameterises its distributions very strangely
d2 = sps.lognorm(sd / mu, scale=mu)
# bins to use for histogram and x for PDFs
lo, hi = np.min([a, b]), np.max([a, b])
dx = (hi - lo) * 0.06
bins = np.linspace(lo, hi, 101)
x = np.linspace(lo - dx, hi + dx, 501)
# draw figure
fig, [ax1, ax2] = plt.subplots(nrows=2, sharex=True, sharey=True, figsize=(8, 5))
ax1.set_title("Normal draws")
ax1.set_xlim(lo - dx, hi + dx)
ax1.hist(a, bins, density=True, alpha=0.5)
ax1.plot(x, d1.pdf(x))
ax1.plot(x, d2.pdf(x), '--')
ax2.set_title("Log-Normal draws")
ax2.hist(b, bins, density=True, alpha=0.5, label="Binned density")
ax2.plot(x, d1.pdf(x), '--', label="Normal PDF")
ax2.plot(x, d2.pdf(x), label="Log-Normal PDF")
ax2.legend()
fig.supylabel("Density")
which produces the following output:
Because the distributions are so close here, I've included dashed lines to show the other distribution for easier comparison. Note that the log-normal distribution will always be slightly right-skewed, more so as the variance increases.
I'm using the multi-taper analysis using the spectrum library on python (https://pyspectrum.readthedocs.io/en/latest/install.html), but I can't understand fully the amplitude of the output.
Here a piece of code for illustration:
from spectrum import *
N=500
dt=2*10**-3
# Creating a signal with 2 sinus waves.
x = np.linspace(0.0, N*dt, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
# classical FFT
yf = fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
# The multitapered method
NW=2.5
k=4
[tapers, eigen] = dpss(N, NW, k)
Sk_complex, weights, eigenvalues=pmtm(y, e=eigen, v=tapers, NFFT=500, show=False)
Sk = abs(Sk_complex)
Sk = np.mean(Sk * np.transpose(weights), axis=0)
# ploting both the results
plt.plot(xf,abs(yf[0:N//2])*dt*2)
plt.plot(xf,Sk[0:N//2])
Both the results are similar and find frequency peak at 50 and 80 Hz.
The classical FFT finds as well the good amplitude (1 and 0.5)
But the multi taper method do not find the proper amplitude. In this example it is around 5 times to important.
Do anyone knows actually how to properly display the results ?
thanks
From my understanding, there is a couple of factors that are at play here.
First, to get the multitaper estimate of the power spectrum density, you should compute like this:
Sk = abs(Sk_complex)**2
Sk = np.mean(Sk * np.transpose(weights), axis=0) * dt
I.e., you need to average the power spectrum, not the Fourier components.
Second, to get the power spectrum, you just need to divide the energy spectrum by N of your estimation using fft and multiply by dt as you did (and you need the **2 to get the power from the fourier components):
plt.plot(xf,abs(yf[0:N//2])**2 / N * dt)
plt.plot(xf,Sk[0:N//2])
Finally, what should be directly comparable, is not so much the amplitude in the power spectrum density, but the total power. You can look at:
print(np.sum(abs(yf[0:N//2])**2/N * dt), np.sum(Sk[0:N//2]))
Which match very closely.
So your whole code becomes:
from spectrum import *
N=500
dt=2*10**-3
# Creating a signal with 2 sinus waves.
x = np.linspace(0.0, N*dt, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
# classical FFT
yf = fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
# The multitapered method
NW=2.5
k=4
[tapers, eigen] = dpss(N, NW, k)
Sk_complex, weights, eigenvalues=pmtm(y, e=eigen, v=tapers, NFFT=N, show=False)
Sk = abs(Sk_complex)**2
Sk = np.mean(Sk * np.transpose(weights), axis=0) * dt
# ploting both results
plt.figure()
plt.plot(xf,abs(yf[0:N//2])**2 / N * dt)
plt.plot(xf,Sk[0:N//2])
# ploting both results in log scale
plt.semilogy(xf, abs(yf[0:N // 2]) ** 2 / N * dt)
plt.semilogy(xf, Sk[0:N // 2])
# comparing total power
print(np.sum(abs(yf[0:N//2])**2 / N * dt), np.sum(Sk[0:N//2]))
I'm trying to find the best parameters (a, b, and c) of the following function (general formula of circle, ellipse, or rhombus):
(|x|/a)^c + (|y|/b)^c = 1
of two arrays of independent data (x and y) in python. My main objective is to estimate the best value of (a, b, and c) based on my x and y variable. I am using curve_fit function from scipy, so here is my code with a demo x, and y.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
alpha = 5
beta = 3
N = 500
DIM = 2
np.random.seed(2)
theta = np.random.uniform(0, 2*np.pi, (N,1))
eps_noise = 0.2 * np.random.normal(size=[N,1])
circle = np.hstack([np.cos(theta), np.sin(theta)])
B = np.random.randint(-3, 3, (DIM, DIM))
noisy_ellipse = circle.dot(B) + eps_noise
X = noisy_ellipse[:,0:1]
Y = noisy_ellipse[:,1:]
def func(xdata, a, b,c):
x, y = xdata
return (np.abs(x)/a)**c + (np.abs(y)/b)**c
xdata = np.transpose(np.hstack((X, Y)))
ydata = np.ones((xdata.shape[1],))
pp, pcov = curve_fit(func, xdata, ydata, maxfev = 1000000, bounds=((0, 0, 1), (50, 50, 2)))
plt.scatter(X, Y, label='Data Points')
x_coord = np.linspace(-5,5,300)
y_coord = np.linspace(-5,5,300)
X_coord, Y_coord = np.meshgrid(x_coord, y_coord)
Z_coord = func((X_coord,Y_coord),pp[0],pp[1],pp[2])
plt.contour(X_coord, Y_coord, Z_coord, levels=[1], colors=('g'), linewidths=2)
plt.legend()
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
By using this code, the parameters are [4.69949891, 3.65493859, 1.0] for a, b, and c.
The problem is that I usually get the value of c the smallest in its bound, while in this demo data it (i.e., c parameter) supposes to be very close to 2 as the data represent an ellipse.
Any help and suggestions for solving this issue are appreciated.
A curve which equation is (|x/a|)^c + (|y/b|)^c = 1 is called "Superellipse" :
http://mathworld.wolfram.com/Superellipse.html
For large c the superellipse tends to a rectangular shape.
For c=2 the curve is an ellipse, or a circle in the particular case a=b.
For c close to 1 the superellipse tends to a rhombus shape.
For c larger than 0 and lower than 1 the superellipse looks like a (squashed) astroid with sharp vertices. This kind of shape will not be considered below.
Before looking to the right question of the OP, it is of interest to study the regression behaviour for fitting a superellipse to scattered data. A short experimental and simplified approach tends to make understand the mathematical difficulty, prior the programming difficulties.
When the scatter increases the computed value of c (corresponding to the minimum of MSE ) decreases. Also the minimum becomes more and more difficult to localize. This is certainly a difficulty for the softwares.
For even larger scatter the value of c=1 leads to a rhombus shape.
So, it is not surprizing that in the example highly scattered published by the OP the software gave a rhombus as fitted curve.
If this was not the expected result, one have to chose another goal than the minimum MSE. For example if the goal is to obtain an elliptic shape, one have to set c=2. The result on the next figure shows that the MSE is worse than with the preceeding rhombus shape. But the elliptic fitting is well achieved.
NOTE : In case of large scatter the result depends a lot from the choice of criteria of fitting (MSE, MAE, ..., and with respect to what variable). This can be the cause of very different results from a software to another if the criterias of fitting (sometime not explicit) are different.
Among the criterias of fitting, if it is specified that the rhombus shape is excluded, one have to define more representative criteria and/or model and implement them in the software.
IMPORTANCE OF CRITERIA OF FITTING :
In order to show how the choice of criteria of fitting is important especially in case of data highly scattered, we will make the study again with a different criteria.
Instead of the preceeding criteria which was the MSE of the errors on the superellipse equation itself, that was :
we chose a different criteria, for example the MSE of the errors on the radial coordinate in polar system :
The notations are defined on the next picture :
Some results from the empirical study for increasing scatter :
We observe that the numerical calculus with the second criteria is more robust that with the first. Cases with higher scatter can be treated With the second criteria of fitting .
The drawback it that this second criteria is probably not considered in the available softwares. So one have to implement the above formulas in the existing software if possible. Or to write a software especially adapted.
Nevertheless this discussion about criteria of fitting is somehow out of subject because the criteria of fitting should not result from mathematical considerations only. If the problem comes from a practical need in physic or technology the criteria of fitting might be derived from the reality without choice.
I have modified your code (though you took it from https://stackoverflow.com/a/47881806/10640534) quite a lot, but I think I have what you expect. I am using a different equation, which I found here. I have also used the new Numpy random generators, but I believe that is only aesthetic for this problem. I am drawing the ellipse using patches from matplotlib, which indeed is aesthetic, but definitely a way better solution to represent your conic. Importantly, I am using the dogbox method for curve_fit because other methods do not converge; occasionally the ellipse is not matched and decreasing the added noise (e.g., rng.normal(0, 1, (500, 2)) / 1e2 instead of rng.normal(0, 1, (500, 2)) / 1e1 helps). Anyway, snippet and figure below.
import numpy as np
from numpy.random import default_rng
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(data, a, b, h, k, A):
x, y = data
return ((((x - h) * np.cos(A) + (y - k) * np.sin(A)) / a) ** 2
+ (((x - h) * np.sin(A) - (y - k) * np.cos(A)) / b) ** 2)
rng = default_rng(3)
numPoints = 500
center = rng.random(2) * 10 - 5
theta = rng.uniform(0, 2 * np.pi, (numPoints, 1))
circle = np.hstack([np.cos(theta), np.sin(theta)])
ellipse = (circle.dot(rng.random((2, 2)) * 2 * np.pi - np.pi)
+ (center[0], center[1]) + rng.normal(0, 1, (500, 2)) / 1e1)
pp, pcov = curve_fit(func, (ellipse[:, 0], ellipse[:, 1]), np.ones(numPoints),
p0=(1, 1, center[0], center[1], np.pi / 2),
method='dogbox')
plt.scatter(ellipse[:, 0], ellipse[:, 1], label='Data Points')
plt.gca().add_patch(Ellipse(xy=(pp[2], pp[3]), width=2 * pp[0],
height=2 * pp[1], angle=pp[4] * 180 / np.pi,
fill=False))
plt.gca().set_aspect('equal')
plt.tight_layout()
plt.show()
To incorporate the value of the exponent, I have used your equation and generated an ellipse according to this answer. This results in:
import numpy as np
from numpy.random import default_rng
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit, root
from scipy.special import ellipeinc
def angles_in_ellipse(num, a, b):
assert(num > 0)
assert(a < b)
angles = 2 * np.pi * np.arange(num) / num
if a != b:
e = (1.0 - a ** 2.0 / b ** 2.0) ** 0.5
tot_size = ellipeinc(2.0 * np.pi, e)
arc_size = tot_size / num
arcs = np.arange(num) * arc_size
res = root(lambda x: (ellipeinc(x, e) - arcs), angles)
angles = res.x
return angles
def func(data, a, b, c):
x, y = data
return (np.absolute(x) / a) ** c + (np.absolute(y) / b) ** c
a = 10
b = 20
n = 100
phi = angles_in_ellipse(n, a, b)
e = (1.0 - a ** 2.0 / b ** 2.0) ** 0.5
arcs = ellipeinc(phi, e)
noise = default_rng(0).normal(0, 1, n) / 2
pp, pcov = curve_fit(func, (b * np.sin(phi) + noise,
a * np.cos(phi) + noise),
np.ones(n), method='lm')
plt.scatter(b * np.sin(phi) + noise, a * np.cos(phi) + noise,
label='Data Points')
plt.gca().add_patch(Ellipse(xy=(0, 0), width=2 * pp[0], height=2 * pp[1],
angle=0, fill=False))
plt.gca().set_aspect('equal')
plt.tight_layout()
plt.show()
As you decrease noise values, pp will tend to (b, a, 2).
I've been trying to plot a graph of the 1d, horizontal diffraction pattern and have written the following code:
import math
import cmath
import numpy as np
import matplotlib.pyplot as plt
lamda=0.2
k=(2*math.pi)/lamda
z=0.005
def expfunc(x,xp):
return cmath.exp(1j*k*((x-xp)**2)/(2*z))
def X(xp1,xp2,x,xp,expfunc,N):
h=(xp2-xp1)/N
y=0.0
for i in np.arange(1, N/2 +1): #summing odd order y terms
y+=4*expfunc(x,xp)
xp+=2*h
xp=xp1+2*h
for i in np.arange(0, N/2): #summing even order y terms
y+=2*expfunc(x,xp)
xp+=2*h
integral= (h/3)*(y+expfunc(x, xp1)+expfunc(x, xp2))
integral= (integral.real)**2
return integral
NumPoints = 90000
xmin = 0
xmax =20
dx = (xmax - xmin) / (NumPoints - 1)
xvals = [0.0] * NumPoints
yvals = np.zeros(NumPoints)
for i in range(NumPoints):
xvals[i] = xmin + i * dx
yvals[i] = X(xmin,xmax,xvals[i],0.1,expfunc,200)
plt.plot(xvals,yvals)
plt.show()
The graph is meant to be a sinc function, however the graph I get is all over the place when I vary the parameters N, the number of intervals, and z, the distance from the screen. I fail to see what is wrong with my code
Thanks
There seems to be a problem with the frequency. Are you maybe missing a minus sign in expfunc() at return cmath.exp(1j*k*... ?
Is there some simple tool in numpy which given a value x returns three random coordinates whose modulus is x?
Well, I don't think you'll find something in numpy for the purpose, but this will be pretty fast:
from numpy.random import normal
from numpy.linalg import norm
from numpy import allclose
def random_vec(modulus):
while True:
y = normal(size=(3,))
if not allclose(y,0):
y *= modulus / norm(y)
return y
Above I am assuming that with module of a vector you mean an L2 norm. Notice that we must check that at least one coordinate is not too close to zero (or zero!), so that we do not have numerical rounding problems when we rescale the components.
EDIT: using now normal() instead of rand()
The reason why we pick the coordinates from a normal distribution (and then of course rescale them) in order to obtain a random point on the sphere of radius modulus is explained here. Read also unutbu's comments below.
Assuming you mean you want the 3-dimensional Cartesian coordinates (X,Y,Z), you can do this with two random choices of angle in spherical polar coordinates and then converting back to Cartesian:
import numpy as np
# Example data: three values of x = sqrt(X^2 + Y^2 + Z^2)
x = np.array([1, 2, 2.5])
n = len(x)
theta = np.arccos(2 * np.random.random(n) - 1)
phi = np.random.random(n) * 2. * np.pi
X = x * np.sin(theta) * np.cos(phi)
Y = x * np.sin(theta) * np.sin(phi)
Z = x * np.cos(theta)
gives (for example):
[-2.60520852 0.01145881 1.01482376]
[-0.85300437 0.29508229 -1.54315779]
[ 1.21871742 -0.95540313 3.54806944]