How to sample from a given probability distribution? - python

I am plotting a histogram with the same probability distribution of particle in a ground state one dimensional box, using random numbers. But when compared with the original distribution, the values are getting cut at the top.
Here is the code
from numpy import*
from matplotlib.pyplot import*
lit=[]
def f(x):
return 2*(sin(pi*x)**2)
for i in range(0,10000):
x=random.uniform(0,1)
y=random.uniform(0,1)
if y<f(x):
lit.append(x)
l=linspace(0,1,10000)
hist(lit,bins=100,density=True)
plot(l,f(l))
show()
The graph produced is:
How to improve this code to match the original?

Your issue is that your density function f ranges in [0,2] but you draw y from [0,1].
Using y=np.random.uniform(0,2) will fix it.
But a better approach is to remap the uniform density to your desired function directly:
from scipy.interpolate import interp1d
l = np.linspace(0,1,100)
g = interp1d(l-np.sin(2*np.pi*l)/2/np.pi, l) # inverse for integral of f
# (not sure if analytical solution to above exists... so using interpolation instead)
lit = g(np.random.rand(10000))
In the general case where f may not have an analytical integral, you can use numerical integral too:
dl = l[1]
g = interp1d(np.cumsum(f(l))*dl, l)

Related

Odd result in python scipy FFT

I have access scipy and want to create a FFT about simple Gaussian function which is exp(-t^2). And also it's well known that fourier transform of exp(−t^2) is √πexp(−π^2*k^2). But FFT of exp(-t^2) was not same as √πexp(−π^2*k^2).
I have tried the following code:
import scipy.fftpack as fft
from scipy import integrate
import numpy as np
import matplotlib.pyplot as plt
#FFT
N=int(1e+3)
T=0.01 #sample period
t = np.linspace(0,N*T, N)
h=np.exp(-t**2)
H_shift=2*np.abs(fft.fftshift(np.fft.fft(h)/N))
freq=fft.fftshift(fft.fftfreq(h.shape[0],t[1]-t[0]))
#Comparing FFT with fourier transform
def f(x):
return np.exp(-x**2)
def F(k):
return (np.pi**0.5)*np.exp((-np.pi**2)*(k**2))
plt.figure(num=1)
plt.plot(freq,F(freq),label=("Fourier Transform"))
plt.legend()
plt.figure(num=2)
plt.plot(freq,H_shift,label=("FFT"))
plt.legend()
plt.show()
#Checking Parseval's Theorm
S_h=integrate.simps(h**2,t)
#0.62665690150683084
S_H_s=integrate.simps(H_shift**2,freq)
#0.025215875346935791
S_F=integrate.simps(F(freq)**2,freq)
#1.2533141373154999
The graph I plotted is not same, also values of FFT do not follow Parseval's theorm. . It has to be S_H_s=S_h*2, but my result was not. I think that S_H_s which is result of FFT is wrong value Because of S_F=S_h*2.
Is there any problem in my code?? Help is greatly appreciated! Thanks in advance.
I suggest you plot your input signal h and verify that it looks like a Gaussian.
Spoiler alert: it doesn't, it is half a Gaussian!
By cutting it like this, you introduce a lot of high frequencies that you see in your plot.
To do this experiment correctly, follow this recipe to create your input signal:
t = np.linspace(-(N/2)*T,(N/2-1)*T, N)
h = np.exp(-t**2)
h = fft.ifftshift(h)
The ifftshift function serves to move the t=0 location to the leftmost array element. Note that t here is constructed carefully such that t=0 is exactly in the right place for this to work correctly, assuming an even-sized N. You can verify that fft.ifftshift(t)[0] is 0.0.

Scipy: efficiently generate a series of integration (integral function)

I have a function, I want to get its integral function, something like this:
That is, instead of getting a single integration value at point x, I need to get values at multiple points.
For example:
Let's say I want the range at (-20,20)
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals =[integrate.nquad(f, [[0, x_val]]) for x_val in x_vals ]
plt.plot(x_vals, y_vals,'-', color = 'r')
The problem
In the example code I give above, for each point, the integration is done from scratch. In my real code, the f(x) is pretty complex, and it's a multiple integration, so the running time is simply too slow(Scipy: speed up integration when doing it for the whole surface?).
I'm wondering if there is any way of efficient generating the Phi(x), at a giving range.
My thoughs:
The integration value at point Phi(20) is calucation from Phi(19), and Phi(19) is from Phi(18) and so on. So when we get Phi(20), in reality we also get the series of (-20,-19,-18,-17 ... 18,19,20). Except that we didn't save the value.
So I'm thinking, is it possible to create save points for a integrate function, so when it passes a save point, the value would get saved and continues to the next point. Therefore, by a single process toward 20, we could also get the value at (-20,-19,-18,-17 ... 18,19,20)
One could implement the strategy you outlined by integrating only over the short intervals (between consecutive x-values) and then taking the cumulative sum of the results. Like this:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
pieces = [si.quad(f, x_vals[i], x_vals[i+1])[0] for i in range(len(x_vals)-1)]
y_vals = np.cumsum([0] + pieces)
Here pieces are the integrals over short intervals, which get summed to produce y-values. As written, this code outputs a function that is 0 at the beginning of the range of integration which is -20. One can, of course, subtract the y-value that corresponds to x=0 in order to have the same normalization as on your plot.
That said, the split-and-sum process is unnecessary. When you find an indefinite integral of f, you are really solving the differential equation F' = f. And SciPy has a built-in method for that, odeint. Just use it:
import numpy as np
import scipy.integrate as si
def f(x):
return x**2
x_vals = np.arange(-20, 21, 1)
y_vals = si.odeint(lambda y,x: f(x), 0, x_vals)
The output is essential identical to the first version (within tiny computational errors), with less code. The reason for using lambda y,x: f(x) is that the first argument of odeint must be a function taking two arguments, the right-hand side of the equation y' = f(y, x).
For the equivalent version of user3717023's answer using scipy's solve_ivp you need to keep in mind the different ordering of x and y in the function f (different from the odeint version).
Further, keep in mind that you can only compute the solution up to a constant. So you might want to shift the result according to some given condition. In the example here (with the function f(x)=x^2 as given by the OP), I shifted the numeric solution such that it goes through the origin, matching the simplest analytic solution F(x)=x^3/3.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def f(x):
return x**2
xs = np.linspace(-20, 20, 1001)
# This is the integration step:
sol = solve_ivp(lambda x, y: f(x), t_span=(xs[0], xs[-1]), y0=[0], t_eval=xs)
plt.plot(sol.t, sol.t**3/3, ls='-', c='C0', label="analytic: $F(x)=x^3/3$")
plt.plot(sol.t, sol.y[0], ls='--', c='C1', label="numeric solution")
plt.plot(sol.t, sol.y[0] - sol.y[0][sol.t.size//2], ls='-.', c='C3', label="shifted solution going through origin")
plt.legend()
In case you don't have an analytical version of the function f, but only xs and ys as data points, then you can use scipy's interp1d function to interpolate between the data points and pass on that interpolating function the same way as before:
from scipy.interpolate import interp1d
f = interp1d(xs, ys)

Fitting a straight line to a log-log curve in matplotlib

I have a plot with me which is logarithmic on both the axes. I have pyplot's loglog function to do this. It also gives me the logarithmic scale on both the axes.
Now, using numpy I fit a straight line to the set of points that I have. However, when I plot this line on the plot, I cannot get a straight line. I get a curved line.
The blue line is the supposedly "straight line". It is not getting plotted straight. I want to fit this straight line to the curve plotted by red dots
Here is the code I am using to plot the points:
import numpy
from matplotlib import pyplot as plt
import math
fp=open("word-rank.txt","r")
a=[]
b=[]
for line in fp:
string=line.strip().split()
a.append(float(string[0]))
b.append(float(string[1]))
coefficients=numpy.polyfit(b,a,1)
polynomial=numpy.poly1d(coefficients)
ys=polynomial(b)
print polynomial
plt.loglog(b,a,'ro')
plt.plot(b,ys)
plt.xlabel("Log (Rank of frequency)")
plt.ylabel("Log (Frequency)")
plt.title("Frequency vs frequency rank for words")
plt.show()
To better understand this problem, let's first talk about plain ol' linear regression (the polyfit function, in this case, is your linear regression algorithm).
Suppose you have a set of data points (x,y), shown below:
You want to create a model that predicts y as a function of x, so you use linear regression. That uses the model:
y = mx + b
and computes the values of m and b that best predict your data, using some linear algebra.
Next, you use your model to predict values of y as a function of x. You do this by picking a set of values for x (think linspace) and computing the corresponding values of y. Plotting these (x,y) pairs gives you your regression line.
Now, let's talk about logarithmic regression. In this case, we still have two variables, y versus x, and we are still interested in their relationship, i.e., being able to predict y given x. The only difference is, now y and x happen to be logarithms of two other variables, which I'll call log(F) and log(R). Thus far, this is nothing more than a simple change of name.
The linear regression also works the same way. You're still regressing y versus x. The linear regression algorithm doesn't care that y and x are actually log(F) and log(R) - it makes no difference to the algorithm.
The last step is a little bit different - and this is where you're getting tripped up in your plot above. What you're doing is computing
F = m R + b
but this is incorrect, because the relationship between F and R is not linear. (That's why you're using a log-log plot.)
Instead, you should compute
log(F) = m log(R) + b
If you transform this (raise 10 to the power of both sides and rearrange), you get
F = c R^m
where c = 10^b. This is the relationship between F and R: it is a power law relationship. (Power law relationships are what log-log plots are best at.)
In your code, you're using A and B when calling polyfit, but you should be using log(A) and log(B).
Your linear fit is not performed on the same data as shown in the loglog-plot.
Make a and b numpy arrays like this
a = numpy.asarray(a, dtype=float)
b = numpy.asarray(b, dtype=float)
Now you can perform operations on them. What the loglog-plot does, is to take the logarithm to base 10 of both a and b. You can do the same by
logA = numpy.log10(a)
logB = numpy.log10(b)
This is what the loglog plot visualizes. Check this by ploting both logA and logB as a regular plot. Repeat the linear fit on the log data and plot your line in the same plot as the logA, logB data.
coefficients = numpy.polyfit(logB, logA, 1)
polynomial = numpy.poly1d(coefficients)
ys = polynomial(b)
plt.plot(logB, logA)
plt.plot(b, ys)
The other answers offer great explanations and a solution. However I would like to propose a solution that helped myself a lot and maybe will help you as well.
Another simple way of writing a line fit for log-log scale is the function powerfit in the code below. It takes in the original x and y data and by using a number of new x-points you can get a straight line on log-log scale. In the current case the values xnew are the same as x (which are both b).
The advantage of defining new x-coordinates is that you can get as few or as many points of the powerfitted line for whatever purpose you might need them.
import numpy as np
from matplotlib import pyplot as plt
import math
def powerfit(x, y, xnew):
"""line fitting on log-log scale"""
k, m = np.polyfit(np.log(x), np.log(y), 1)
return np.exp(m) * xnew**(k)
fp=open("word-rank.txt","r")
a=[]
b=[]
for line in fp:
string=line.strip().split()
a.append(float(string[0]))
b.append(float(string[1]))
ys = powerfit(b, a, b)
plt.loglog(b,a,'ro')
plt.plot(b,ys)
plt.xlabel("Log (Rank of frequency)")
plt.ylabel("Log (Frequency)")
plt.title("Frequency vs frequency rank for words")
plt.show()

How to weigh a function with 2 variables with a Gaussian distribution in python?

I've been working with this for the last days and I couldn't see yet where is the problem.
I'm trying to weight a function with 2 variables f(q,r) within a Gaussian distribution g(r) with a specific mean value (R0) and deviation (sigma). This is needed because the theoretical function f(q) has a certain dispersity in its r variable when analyzed experimentally. Therefore, we use a probability density function to weigh our function in the r variable.
I include the code, which works, but doesn't give the expected result (the weighted curve should be smoother as the polydispersity grows (higher sigma) as it is shown below. As you can see, I integrated the convolution of the 2 functions f(r,q)*g(r) from r = 0 to r = +inf.
The result is plotted to compare the weigh result with the simple function:
from scipy.integrate import quad, quadrature
import numpy as np
import math as m
import matplotlib.pyplot as plt
#function weighted with a probability density function (gaussian)
def integrand(r,q):
#gaussian function normalized
def gauss_nor(r):
#gaussian function
def gauss(r):
return m.exp(-((r-R0)**2)/(2*sigma**2))
return (m.exp(-((r-R0)**2)/(2*sigma**2)))/(quad(gauss,0,np.inf)[0])
#function f(r,q)
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
return gauss_nor(r)*f(r,q)
#quadratic integration of the integrand (from 0 to +inf)
#integrand is function*density_function (gauss)
def function(q):
return quad(integrand, 0, np.inf, args=(q))[0]
#parameters used in the function
R0=20
sigma=5
#range to plot q
q=np.arange(0.001,2.0,0.005)
#vector where the result of the integral will be saved
function_vec = np.vectorize(function)
#vector for the squared power of the integral
I=[]
I=(function_vec(q))**2
#function without density function
I0=[]
I0=(3*(np.sin(q*R0)-q*R0*np.cos(q*R0))/((R0*q)**3))**2
#plot of weighted and non-weighted functions
p1,=plt.plot(q,I,'b')
p3,=plt.plot(q,I0,'r')
plt.legend([p1,p3],('Weighted','No weighted'))
plt.yscale('log')
plt.xscale('log')
plt.show()
Thank you very much. I've been with this problems for some days already and I haven't found the mistake.
Maybe somebody know how to weigh a function with a PDF in an easier way.
I simplified your code, the output is the same as yours. I think it's already very smooth, there are some very sharp peak in the log-log graph, just because the curve has zero points. So it's not smooth in a log-log graph, but it's smooth in a normal X-Y graph.
import numpy as np
def gauss(r):
return np.exp(-((r-R0)**2)/(2*sigma**2))
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
R0=20
sigma=5
qm, rm = np.ogrid[0.001:2.0:0.005, 0.001:40:1000j]
gr = gauss(rm)
gr /= np.sum(gr)
fm = f(rm, qm)
fm *= gr
plot(qm.ravel(), fm.sum(axis=1)**2)
plt.yscale('log')
plt.xscale('log')

Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python

I am trying to utilize Numpy's fft function, however when I give the function a simple gausian function the fft of that gausian function is not a gausian, its close but its halved so that each half is at either end of the x axis.
The Gaussian function I'm calculating is
y = exp(-x^2)
Here is my code:
from cmath import *
from numpy import multiply
from numpy.fft import fft
from pylab import plot, show
""" Basically the standard range() function but with float support """
def frange (min_value, max_value, step):
value = float(min_value)
array = []
while value < float(max_value):
array.append(value)
value += float(step)
return array
N = 256.0 # number of steps
y = []
x = frange(-5, 5, 10/N)
# fill array y with values of the Gaussian function
cache = -multiply(x, x)
for i in cache: y.append(exp(i))
Y = fft(y)
# plot the fft of the gausian function
plot(x, abs(Y))
show()
The result is not quite right, cause the FFT of a Gaussian function should be a Gaussian function itself...
np.fft.fft returns a result in so-called "standard order": (from the docs)
If A = fft(a, n), then A[0]
contains the zero-frequency term (the
mean of the signal), which is always
purely real for real inputs. Then
A[1:n/2] contains the
positive-frequency terms, and
A[n/2+1:] contains the
negative-frequency terms, in order of
decreasingly negative frequency.
The function np.fft.fftshift rearranges the result into the order most humans expect (and which is good for plotting):
The routine np.fft.fftshift(A)
shifts transforms and their
frequencies to put the zero-frequency
components in the middle...
So using np.fft.fftshift:
import matplotlib.pyplot as plt
import numpy as np
N = 128
x = np.arange(-5, 5, 10./(2 * N))
y = np.exp(-x * x)
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y))) / np.sqrt(len(y))
plt.plot(x,y)
plt.plot(x,y_fft)
plt.show()
Your result is not even close to a Gaussian, not even one split into two halves.
To get the result you expect, you will have to position your own Gaussian with the center at index 0, and the result will also be positioned that way. Try the following code:
from pylab import *
N = 128
x = r_[arange(0, 5, 5./N), arange(-5, 0, 5./N)]
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(r_[y[N:], y[:N]])
plot(r_[y_fft[N:], y_fft[:N]])
show()
The plot commands split the arrays in two halfs and swap them to get a nicer picture.
It is being displayed with the center (i.e. mean) at coefficient index zero. That is why it appears that the right half is on the left, and vice versa.
EDIT: Explore the following code:
import scipy
import scipy.signal as sig
import pylab
x = sig.gaussian(2048, 10)
X = scipy.absolute(scipy.fft(x))
pylab.plot(x)
pylab.plot(X)
pylab.plot(X[range(1024, 2048)+range(0, 1024)])
The last line will plot X starting from the center of the vector, then wrap around to the beginning.
A fourier transform implicitly repeats indefinitely, as it is a transform of a signal that implicitly repeats indefinitely. Note that when you pass y to be transformed, the x values are not supplied, so in fact the gaussian that is transformed is one centred on the median value between 0 and 256, so 128.
Remember also that translation of f(x) is phase change of F(x).
Following on from Sven Marnach's answer, a simpler version would be this:
from pylab import *
N = 128
x = ifftshift(arange(-5,5,5./N))
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(fftshift(y))
plot(fftshift(y_fft))
show()
This yields a plot identical to the above one.
The key (and this seems strange to me) is that NumPy's assumed data ordering --- in both frequency and time domains --- is to have the "zero" value first. This is not what I'd expect from other implementations of FFT, such as the FFTW3 libraries in C.
This was slightly fudged in the answers from unutbu and Steve Tjoa above, because they're taking the absolute value of the FFT before plotting it, thus wiping away the phase issues resulting from not using the "standard order" in time.

Categories

Resources