How to correctly sample a density?

How to correctly sample a density? - python

I do not understand why the following code works with the normal function and do not for another custom function :
That is the example where I tried to sample the normal distribution :
n = 100000
xx = np.random.uniform(-5, 5, n)
rho = mpl.pylab.normpdf(xx, 0, 1)
rnd = np.random.rand(n)
ix = np.where(rho > rnd)
xx = xx[ix]
h = plt.hist(xx, bins=20, normed=True)
# plot density
x = np.linspace(-5, 5, 100)
plt.plot(x, mpl.pylab.normpdf(x, 0, 1))
It works and I got :
Now if I changed the density, I do not correctly sample it. I checked if the density is well normed and it is. Thus I do not understand where I am wrong
n = 100000
xx = np.random.uniform(0, 1, n)
rho = 2 * np.sin(2 * xx * np.pi)**2
rnd = np.random.rand(n)
ix = np.where(rho > rnd)
xx = xx[ix]
h = plt.hist(xx, bins=20, normed=True)
# plot density
x = np.linspace(0, 1, 100)
print(np.trapz(2 * np.sin(2 * x * np.pi)**2, x))
plt.plot(x, 2 * np.sin(2 * x * np.pi)**2)

You are doing rejection sampling
In the first case the max value of the pdf is < 1, and you are drawing rnd from [0,1], so all the values are below the max. You are throwing away more values than needed though, since the max is strictly less than 1. In the second case the max of the pdf is 2 but you are still drawing rnd from [0,1] in the line
rnd = np.random.rand(n)
You should change that line so it samples uniformly from [0,2]. Note that the somewhat flat tops of your histograms correspond to the parts of [0,1] where the pdf is > 1. Your code has no way of treating some of those values differently than others.

You're rejecting too much in first example, and not enough in the second.
Optimal case when you're sampling Y from 0 to PDFmax.
In first case, you shall call
rnd = np.random.rand(n) / np.sqrt(2.0 * np.pi)
In second case
rnd = 2.0 * np.random.rand(n)

Related

How to use numy linalg lstsq to fit two datasets with same slope but different intercept?

I am trying to do weighted least-square fitting, and came across numpy.linalg.lstsq. I need to fit the weighted least squares. So, the following works:
# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]
which gives me the best-fit slope and intercept. Now, suppose I have two datasets with same slope but different intercepts? How would I do a joint fit such that I get best-fit slope plus two intercepts. I still need to have the weighted least square version. For an unweighted case, I found that the following works:
(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T,
np.concatenate((y1,y2)))

First of all I rewrite your first approach as it can be written clearer in my opinion like this
weights = 1 / yerr
m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]
To fit 2 datasets you can stack 2 arrays but make 0 some elements of matrix.
np.random.seed(12)
N = 3
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
M = 2
x1 = np.sort(10 * np.random.rand(M))
yerr1 = 0.1 * 0.5 * np.random.rand(M)
y1 = 10.0 * x1 + 25
y1 += yerr1 * np.random.randn(M)
#do the fitting
weights = 1 / yerr
weights1 = 1 / yerr1
first_column = np.r_[weights * x, weights1 * x1]
second_column = np.r_[weights, [0] * x1.size]
third_column = np.r_[[0] * x.size, weights1]
a = np.c_[first_column, second_column, third_column]
print(a)
# [[ 4.20211437 2.72576342 0. ]
# [ 24.54293941 9.32075195 0. ]
# [ 13.22997409 1.78771428 0. ]
# [126.37829241 0. 26.03711851]
# [686.96961895 0. 124.44253391]]
c = np.r_[weights * y, weights1 * y1]
print(c)
# [ 83.66073785 383.70595203 159.12058215 1914.59065915 9981.85549321]
m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
print(m, b1, b2)
# 10.012202998026055 14.841412336510793 24.941219918240172
EDIT
If you want different slopes and one intercept you can do it this way. Probably it is better to grasp the general idea on the one slope 2 intercepts case. Take a look to array a: you construct it from weights as well as c so now it is unweighted problem. You try to find such vector = [slope, intercept1, intercept2] that a # vector = c (as much as possible by minimizing sum of squares of differences). By putting zeros in a we make it separable: upper part of matrix a vary slope and intercept1 and down part of a vary slope and intercept2. Similar to 2 slopes case with vector = [slope1, slope2, intercept].
first_column = np.r_[weights * x, [0] * x1.size]
second_column = np.r_[[0] * x.size, weights1 * x1]
third_column = np.r_[weights, weights1]

Making a histogram via matplotlib

I have a problem. I'm doing a task for my lessons and I'm doing my best, but the teacher does not seem to care and I need to look for the problem myself facing his demands.
I had to make a program, it does not important of what, I don't bother explaining it. I just need to make a histogram to show results, the problem is I can't use .hist() because we need to make OUR histogram via .bar() using matplotlib library.
Here is the code:
import random
import math
import matplotlib.pyplot as plt
import numpy as np
list = []
N = 1000
for i in range(N):
x_1 = random.random()
x_2 = random.random()
xx = ((-2 * math.log(x_1)) ** (1 / 2)) * math.sin(2 * math.pi * x_2)
xy = ((-2 * math.log(x_1)) ** (1 / 2)) * math.cos(2 * math.pi * x_2)
list.append(xx)
list.append(xy)
plt.hist(list, alpha=0.5)
plt.show()
I need to change the plt.hist() to plt.bar(), doing so I end up with this:
plt.bar(list, y_pos, align='center', alpha=0.5)
And the bars overlay, the histogram is unclear. The teacher's assistand told me to sum up the bars like this: when the value if between let's say 1-1.99 you add those values to bar 1, when 2-2.99 to bar 2 etc.
Don't know how to do this, please help.

First off, using numpy allows vectorization and makes everything much faster. Also, please don't use list as a variable name because it overwrites the standard function with that name.
To calculate a histogram "by hand" isn't that difficult.
First you need to decide about some bins, usually they are regularly spaced over the complete domain of the data. Default, plt.hist uses 10 bins, equally divided over the range from the minimum to the maximum of the data
Then, you create an array to count the number of x that fall inside each of the 10 bins. To know in which of the bins (numbered 0 to 9) a particular x falls, subtract the minimum x and divide by the range. This gives a number between 0 and 1. Multiplying by the number of bins gives a float number between 0 and 10. Converting the number to an integer gives the index of the bin which should be incremented by one. A multiplication by a number slightly lower than 1 prevents that the maximum x would be put into bin index 10 which doesn't exist.
The code below first creates a bar plot in yellow and then draws a standard histogram in transparent red on top of it. As both coincide perfectly, together they show an orange histogram.
import matplotlib.pyplot as plt
import numpy as np
N = 1000
x_1 = np.random.random(N)
x_2 = np.random.random(N)
xx = ((-2 * np.log(x_1)) ** (1 / 2)) * np.sin(2 * np.pi * x_2)
xy = ((-2 * np.log(x_1)) ** (1 / 2)) * np.cos(2 * np.pi * x_2)
lst = np.concatenate([xx, xy])
minx = min(lst)
maxx = max(lst)
bins = 10
bin_counts = np.zeros(bins)
bin_factor = bins * 0.999999 / (maxx - minx)
for x in lst:
bin_counts[int((x - minx) * bin_factor)] += 1
plt.bar(np.linspace(minx, maxx, bins, endpoint=False), bin_counts, width=(maxx - minx) / bins,
alpha=.5, ec='w', align='edge', color='yellow')
plt.hist(lst, alpha=0.5, bins=bins, ec='w', color='red')
plt.show()
To draw a standard gaussian normal over the histogram, it needs to be scaled by the number of samples and adjusted for the width of the bars:
from scipy.stats import norm
x = np.linspace(minx, maxx, 200)
plt.plot(x, norm.pdf(x, 0, 1)*len(lst)/bin_factor, color='green', lw=2)
PS: You might want to read more about the Box-Muller transform.
To have bars that are only 0.1 wide, you could change the code as follows. You'd also need a larger N to avoid that the bars would have very irregular heights.
minx = -2
maxx = 2
bins = 40
bin_counts = np.zeros(bins)
factor = bins * 0.999999 / (maxx - minx)
for x in lst:
if minx <= x <= maxx: # this test is needed when minx is larger than the real minimum (similar for maxx)
bin_counts[int((x - minx) * factor)] += 1
# optionally show ticks every .1 steps
plt.xticks(np.arange(minx, maxx+0.001, 0.1), rotation=90)
The following plot uses N = 10000

Whoa, excellent. This looks awesome. I hope there won't be any problems with x axis with labels. Would there be a way to divide it by 10? So I can get for example -0.3, -0.2,-0.1,0,0.1,0.2,0.3 ?
I've also done that method with other task.
import math
from scipy import constants
import matplotlib.pyplot as plt
import numpy as np
kb = constants.Boltzmann
m = 5e-26
var = (kb*300/m)**(1/2)
v_list = []
v = 0
N = 1000
for i in range(N):
x_1 = random.random()
x_2 = random.random()
n1 = ((-2 * math.log(x_1, math.e)) ** (1 / 2)) * math.cos(2 * math.pi * x_2) * var
n2 = ((-2 * math.log(x_1, math.e)) ** (1 / 2)) * math.sin(2 * math.pi * x_2) * var
y_1 = random.random()
y_2 = random.random()
n3 = ((-2 * math.log10(y_1)) ** (1 / 2)) * math.cos(2 * math.pi * y_2) * var
v += math.sqrt(n1*n1+n2*n2+n3*n3)
if math.sqrt(n1*n1+n2*n2+n3*n3) <= 900:
v_list.append(math.sqrt(n1*n1+n2*n2+n3*n3))
else:
continue
minx = min(v_list)
maxx = max(v_list)
bins = 10
bin_counts = np.zeros(bins)
factor = bins * 0.999999 / (maxx - minx)
for x in v_list:
bin_counts[int((x - minx) * factor)] += 1
plt.bar(np.linspace(minx, maxx, bins, endpoint=False), bin_counts, width=(maxx - minx) / bins,
alpha=0.5, ec='w', align='edge', color='green')
plt.show()
I've got some problems with plt.savefig() but it worked later. Also could not transfer the math library to numpy, so I left it like this.
EDIT: I may be having a cool task with Mandelbrot set later, I was looking for answer all across the place but could not manage. It has some strange approach which I could not learn how to solve. All this to addidion of teachers not wanting to simply tell me the answer. I had high hopes for that Python lessos but what I'm left with is just simply disgust.

matplotlib - save plot lines without background and borders and transparency 8 bits alpha channel issue

I am struggling to save the plot from matplotlib without background and borders.
In particular I would like to export in two formats:
svg, with only the plot lines (no background, no axis, no frames, no borders)
png, with transparent background and the bounding box fitting exactly the plot.
To be more precise, I give an example.
i = 1000
j = 1024
periods = 6
x = np.array([np.linspace(0, (2 * (periods) * np.pi), i)]).T
x = np.repeat(x, j, axis=1)
n = (1 * (np.random.normal(size=(j))) *
np.random.uniform(low=1, high=1, size=j))[:, np.newaxis]
n = np.repeat(n, i, axis=1).T
y = np.sin(x) * (np.sin(n)+4) + 0.5 * n**2
xout = np.array([np.linspace(0, 2 * (2 * (periods) * np.pi), 2 * i)]).T
xout = np.repeat(xout, j, axis=1)
yout = np.concatenate((y, y[::-1]))
f1 = plt.figure("GOOD1")
plt.axis('off')
plt.plot(x, y, 'b', alpha=(0.015625))
plt.savefig("GOOD1.svg", bbox_inches='tight', transparent=True)
plt.savefig("GOOD1.png", bbox_inches='tight', transparent=True)
This looks similar to what I would like to export but still has a small space on the right in both the png and svg images; the svg has also a background.
The second question is about the alpha channel.
I understood it is limited to 8 bits and hence decreasing the alpha value a bit much more (as to 0.005) will make the plot disappear.
Referring to the previous code, is there a way to plot even more lines with an increased transparency without losing the plot at all?

I'm not sure how to answer the second part of your question. I believe this code takes care of the border and background issues, though.
import numpy as np
import matplotlib.pyplot as plt
i = 1000
j = 1024
periods = 6
x = np.array([np.linspace(0, (2 * (periods) * np.pi), i)]).T
x = np.repeat(x, j, axis=1)
n = (1 * (np.random.normal(size=(j))) *
np.random.uniform(low=1, high=1, size=j))[:, np.newaxis]
n = np.repeat(n, i, axis=1).T
y = np.sin(x) * (np.sin(n)+4) + 0.5 * n**2
xout = np.array([np.linspace(0, 2 * (2 * (periods) * np.pi), 2 * i)]).T
xout = np.repeat(xout, j, axis=1)
yout = np.concatenate((y, y[::-1]))
f1 = plt.figure("GOOD1")
ax = f1.add_subplot(111)
ax.axis('off')
ax.set_position([0, 0, 1, 1])
ax.plot(x, y, 'b', alpha=(0.015625))
ax.set_xlim((x.min(), x.max()))
ax.set_ylim((y.min(), y.max()))
f1.patch.set_alpha(0.)
ax.patch.set_alpha(0.)
f1.savefig("GOOD1.svg", bbox_inches=0, transparent=True)
f1.savefig("GOOD1.png", bbox_inches=0, transparent=True)

Calculate the Fourier series with the trigonometry approach

I try to implement the Fourier series function according to the following formulas:
...where...
...and...
Here is my approach to the problem:
import numpy as np
import pylab as py
# Define "x" range.
x = np.linspace(0, 10, 1000)
# Define "T", i.e functions' period.
T = 2
L = T / 2
# "f(x)" function definition.
def f(x):
return np.sin(np.pi * 1000 * x)
# "a" coefficient calculation.
def a(n, L, accuracy = 1000):
a, b = -L, L
dx = (b - a) / accuracy
integration = 0
for i in np.linspace(a, b, accuracy):
x = a + i * dx
integration += f(x) * np.cos((n * np.pi * x) / L)
integration *= dx
return (1 / L) * integration
# "b" coefficient calculation.
def b(n, L, accuracy = 1000):
a, b = -L, L
dx = (b - a) / accuracy
integration = 0
for i in np.linspace(a, b, accuracy):
x = a + i * dx
integration += f(x) * np.sin((n * np.pi * x) / L)
integration *= dx
return (1 / L) * integration
# Fourier series.
def Sf(x, L, n = 10):
a0 = a(0, L)
sum = 0
for i in np.arange(1, n + 1):
sum += ((a(i, L) * np.cos(n * np.pi * x)) + (b(i, L) * np.sin(n * np.pi * x)))
return (a0 / 2) + sum
# x axis.
py.plot(x, np.zeros(np.size(x)), color = 'black')
# y axis.
py.plot(np.zeros(np.size(x)), x, color = 'black')
# Original signal.
py.plot(x, f(x), linewidth = 1.5, label = 'Signal')
# Approximation signal (Fourier series coefficients).
py.plot(x, Sf(x, L), color = 'red', linewidth = 1.5, label = 'Fourier series')
# Specify x and y axes limits.
py.xlim([0, 10])
py.ylim([-2, 2])
py.legend(loc = 'upper right', fontsize = '10')
py.show()
...and here is what I get after plotting the result:
I've read the How to calculate a Fourier series in Numpy? and I've implemented this approach already. It works great, but it use the expotential method, where I want to focus on trigonometry functions and the rectangular method in case of calculating the integraions for a_{n} and b_{n} coefficients.
Thank you in advance.
UPDATE (SOLVED)
Finally, here is a working example of the code. However, I'll spend more time on it, so if there is anything that can be improved, it will be done.
from __future__ import division
import numpy as np
import pylab as py
# Define "x" range.
x = np.linspace(0, 10, 1000)
# Define "T", i.e functions' period.
T = 2
L = T / 2
# "f(x)" function definition.
def f(x):
return np.sin((np.pi) * x) + np.sin((2 * np.pi) * x) + np.sin((5 * np.pi) * x)
# "a" coefficient calculation.
def a(n, L, accuracy = 1000):
a, b = -L, L
dx = (b - a) / accuracy
integration = 0
for x in np.linspace(a, b, accuracy):
integration += f(x) * np.cos((n * np.pi * x) / L)
integration *= dx
return (1 / L) * integration
# "b" coefficient calculation.
def b(n, L, accuracy = 1000):
a, b = -L, L
dx = (b - a) / accuracy
integration = 0
for x in np.linspace(a, b, accuracy):
integration += f(x) * np.sin((n * np.pi * x) / L)
integration *= dx
return (1 / L) * integration
# Fourier series.
def Sf(x, L, n = 10):
a0 = a(0, L)
sum = np.zeros(np.size(x))
for i in np.arange(1, n + 1):
sum += ((a(i, L) * np.cos((i * np.pi * x) / L)) + (b(i, L) * np.sin((i * np.pi * x) / L)))
return (a0 / 2) + sum
# x axis.
py.plot(x, np.zeros(np.size(x)), color = 'black')
# y axis.
py.plot(np.zeros(np.size(x)), x, color = 'black')
# Original signal.
py.plot(x, f(x), linewidth = 1.5, label = 'Signal')
# Approximation signal (Fourier series coefficients).
py.plot(x, Sf(x, L), '.', color = 'red', linewidth = 1.5, label = 'Fourier series')
# Specify x and y axes limits.
py.xlim([0, 5])
py.ylim([-2.2, 2.2])
py.legend(loc = 'upper right', fontsize = '10')
py.show()

Consider developing your code in a different way, block by block. You should be surprised if a code like this would work at the first try. Debugging is one option, as #tom10 said. The other option is rapid prototyping the code step by step in the interpreter, even better with ipython.
Above, you are expecting that b_1000 is non-zero, since the input f(x) is a sinusoid with a 1000 in it. You're also expecting that all other coefficients are zero right?
Then you should focus on the function b(n, L, accuracy = 1000) only. Looking at it, 3 things are going wrong. Here are some hints.
the multiplication of dx is within the loop. Sure about that?
in the loop, i is supposed to be an integer right? Is it really an integer? by prototyping or debugging you would discover this
be careful whenever you write (1/L) or a similar expression. If you're using python2.7, you're doing likely wrong. If not, at least use a from __future__ import division at the top of your source. Read this PEP if you don't know what I am talking about.
If you address these 3 points, b() will work. Then think of a in a similar fashion.

Python: heat density plot in a disk

My goal is to make a density heat map plot of sphere in 2D. The plotting code below the line works when I use rectangular domains. However, I am trying to use the code for a circular domain. The radius of sphere is 1. The code I have so far is:
from pylab import *
import numpy as np
from matplotlib.colors import LightSource
from numpy.polynomial.legendre import leggauss, legval
xi = 0.0
xf = 1.0
numx = 500
yi = 0.0
yf = 1.0
numy = 500
def f(x):
if 0 <= x <= 1:
return 100
if -1 <= x <= 0:
return 0
deg = 1000
xx, w = leggauss(deg)
L = np.polynomial.legendre.legval(xx, np.identity(deg))
integral = (L * (f(x) * w)[None,:]).sum(axis = 1)
c = (np.arange(1, 500) + 0.5) * integral[1:500]
def r(x, y):
return np.sqrt(x ** 2 + y ** 2)
theta = np.arctan2(y, x)
x, y = np.linspace(0, 1, 500000)
def T(x, y):
return (sum(r(x, y) ** l * c[:,None] *
np.polynomial.legendre.legval(xx, identity(deg)) for l in range(1, 500)))
T(x, y) should equal the sum of c the coefficients times the radius as a function of x and y to the l power times the legendre polynomial where the argument is of the legendre polynomial is cos(theta).
In python: integrating a piecewise function, I learned how to use the Legendre polynomials in a summation but that method is slightly different, and for the plotting, I need a function T(x, y).
This is the plotting code.
densityinterpolation = 'bilinear'
densitycolormap = cm.jet
densityshadedflag = False
densitybarflag = True
gridflag = True
plotfilename = 'laplacesphere.eps'
x = arange(xi, xf, (xf - xi) / (numx - 1))
y = arange(yi, yf, (yf - yi) / (numy - 1))
X, Y = meshgrid(x, y)
z = T(X, Y)
if densityshadedflag:
ls = LightSource(azdeg = 120, altdeg = 65)
rgb = ls.shade(z, densitycolormap)
im = imshow(rgb, extent = [xi, xf, yi, yf], cmap = densitycolormap)
else:
im = imshow(z, extent = [xi, xf, yi, yf], cmap = densitycolormap)
im.set_interpolation(densityinterpolation)
if densitybarflag:
colorbar(im)
grid(gridflag)
show()
I made the plot in Mathematica for reference of what my end goal is

If you set the values outside of the disk domain (or whichever domain you want) to float('nan'), those points will be ignored when plotting (leaving them in white color).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to correctly sample a density? - python

You're rejecting too much in first example, and not enough in the second. Optimal case when you're sampling Y from 0 to PDFmax. In first case, you shall call rnd = np.random.rand(n) / np.sqrt(2.0 * np.pi) In second case rnd = 2.0 * np.random.rand(n)

Related

How to use numy linalg lstsq to fit two datasets with same slope but different intercept?

Making a histogram via matplotlib

matplotlib - save plot lines without background and borders and transparency 8 bits alpha channel issue

Calculate the Fourier series with the trigonometry approach

Python: heat density plot in a disk

Categories

Resources