I'm trying to plot the time evolution graph for Ornstein-Uhlenbeck Process, which is a stochastic process, and then find the probability distribution at each time steps. I'm able to plot the graph for 1000 realizations of the process. Each realization has a 1000 time step, with width of the time step as .001. I used a 1000 x 1000 array to store the data. Each rows hold value of each realizations. And column wise i-th columns correspond value at i-th time step for 1000 realizations.
Now I want bin results at each time steps together and then plot the probability distribution corresponding to each time step. I'm quite confused with doing it (I tried modifying code from IPython Cookbook, where they don't store each realizations in the memory).
The code that I made from the IPython Cookbook:
import numpy as np
import matplotlib.pyplot as plt
sigma = 1. # Standard deviation.
mu = 10. # Mean.
tau = .05 # Time constant.
dt = .001 # Time step.
T = 1. # Total time.
n = int(T / dt) # Number of time steps.
ntrails = 1000 # Number of Realizations.
t = np.linspace(0., T, n) # Vector of times.
sigmabis = sigma * np.sqrt(2. / tau)
sqrtdt = np.sqrt(dt)
x = np.zeros((ntrails,n)) # Vector containing all successive values of our process
for j in range (ntrails): # Euler Method
for i in range(n - 1):
x[j,i + 1] = x[j,i] + dt * (-(x[j,i] - mu) / tau) + sigmabis * sqrtdt * np.random.randn()
for k in range(ntrails): #plotting 1000 realizations
plt.plot(t, x[k])
# Time averaging of each time stamp using bin
# Really lost from this point onwrds.
bins = np.linspace(-2., 15., 100)
fig, ax = plt.subplots(1, 1, figsize=(12, 4))
for i in range(ntrails):
hist, _ = np.histogram(x[:,[i]], bins=bins)
ax.plot(hist)
Graph for 1000 realizations of Ornstein- Uhlenbeck Process:
Distribution generated from the code above:
I'm really lost with assigning of the bin value and plotting the histogram using it. I want to know whether my code is correct for plotting distributions corresponding to each time step, using bin. If not please tell me what modifications I need to make to my code.
The last for loop should iterate over n, not ntrails (which happen to be the same value here) but otherwise the code and plots look correct (apart from a few minor issues such as that is takes 101 breaks to get 100 bins so your code should probably read bins = np.linspace(-2., 15., 101)).
Your plots could be improved a bit though. A good guiding principle is to use as little ink as necessary to communicate the point that you are trying to make. You are always trying to plot all the data, which ends up obscuring your plots. Also, you could benefit from paying more attention to colour. Colour should carry meaning, or not be used at all.
Here would be my suggestions:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['axes.spines.top'] = False
mpl.rcParams['axes.spines.right'] = False
sigma = 1. # Standard deviation.
mu = 10. # Mean.
tau = .05 # Time constant.
dt = .001 # Time step.
T = 1 # Total time.
n = int(T / dt) # Number of time steps.
ntrails = 10000 # Number of Realizations.
t = np.linspace(0., T, n) # Vector of times.
sigmabis = sigma * np.sqrt(2. / tau)
sqrtdt = np.sqrt(dt)
x = np.zeros((ntrails,n)) # Vector containing all successive values of our process
for j in range(ntrails): # Euler Method
for i in range(n - 1):
x[j,i + 1] = x[j,i] + dt * (-(x[j,i] - mu) / tau) + sigmabis * sqrtdt * np.random.randn()
fig, ax = plt.subplots()
for k in range(200): # plotting fewer realizations shows the distribution better in this case
ax.plot(t, x[k], color='k', alpha=0.02)
# Really lost from this point onwards.
bins = np.linspace(-2., 15., 101) # you need 101 breaks to get 100 bins
fig, ax = plt.subplots(1, 1, figsize=(12, 4))
# plotting a smaller selection of time points spaced out using a log scale prevents
# the asymptotic approach to the mean from dominating the plot
for i in np.logspace(0, np.log10(n)-1, 21):
hist, _ = np.histogram(x[:,[int(i)]], bins=bins)
ax.plot(hist, color=plt.cm.plasma(i/20))
plt.show()
Related
I originally posted this in physics stack exchange but they requested it be posted here as well....
I am trying to create a known signal with a known wavelength, amplitude, and phase. I then want to break this signal apart into all of its frequencies, find amplitudes, phases, and wavelengths for each frequency, then create equations for each frequency based on these new wavelengths, amplitudes, and phases. In theory, the equations should be identical to the individual signals. However, they are not. I am almost positive it is an issue with phase but I cannot figure out how to resolve it. I will post the exact code to reproduce this below. Please help as my phase, wavelength, and amplitudes will vary once I get more complicated signals so it need to work for any combination of these.
import numpy as np
from matplotlib import pyplot as plt
from scipy import fftpack
# create signal
time_vec = np.arange(1, 11, 1)
wavelength = 1/.1
phase = 0
amp = 10
created_signal = amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
# plot it
fig, axs = plt.subplots(2, 1, figsize=(10,6))
axs[0].plot(time_vec, created_signal, label='exact_data')
# get fft and freq array
sig_fft = fftpack.fft(created_signal)
sample_freq = fftpack.fftfreq(created_signal.size, d=1)
# do inverse fft and verify same curve as original signal. This is fine!
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
# create individual signals for each frequency
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
# get phase, amplitude, and wavelength for each individual frequency
sig_size = len(created_signal)
wavelength = []
ph = []
amp = []
indices = []
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
# create an equation for each frequency based on each phase, amp, and wavelength found from above.
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
def find_equations(filtered_signals_mean, high_freq_fft, wavelength, filtered_signals, time_vec, ph, amp):
equations = []
for i in range(len(wavelength)):
temp = eqn(filtered_signals[i], wavelength[i], time_vec, ph[i], amp[i])
equations.append(temp + filtered_signals_mean)
return equations
filtered_signals_mean = np.abs(np.mean(filtered_signals))
equations = find_equations(filtered_signals_mean, sig_fft, wavelength,
filtered_signals, time_vec, ph, amp)
# at this point each equation, for each frequency should match identically each signal from each frequency,
# however, the phase seems wrong and they do not match!!??
axs[0].plot(time_vec, filtered_signal, '--', linewidth=3, label='filtered_sig_combined')
axs[1].plot(time_vec, filtered_signals[1], label='filtered_sig[-1]')
axs[1].plot(time_vec, equations[1], label='equations[-1]')
axs[0].legend()
axs[1].legend()
fig.tight_layout()
plt.show()
These are issues with your code:
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
This only works because np.mean(created_signal) is approximately zero. The IFFT already takes the DC component into account, the zero frequency describes the mean of the signal.
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
Here you are, in the first half of the iterations, going through all the frequencies, taking both the negative and positive frequencies into account. For example, when i=1, you take both the -0.1 and the 0.1 frequencies. The second half of the iterations you are applying the IFFT to a zero signal, none of the np.abs(sample_freq) are smaller than zero by definition.
So the filtered_signals[1] contains a sine wave constructed by both the -0.1 and the 0.1 frequency components. This is good. Otherwise it would be a complex-valued function.
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
Here the second half of the indices array contains negative values. Not sure what you were planning with this, but it causes subsequent code to index from the end of the array.
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
Here, because the indices are not the same as the j in the previous loop, phase[j] doesn't always correspond to wavelength[j], they refer to values from different frequency components in about half the cases. But those cases we shouldn't be evaluating any way. The code assumes a real-valued input, for which the magnitude and phase of only the positive frequencies is sufficient to reconstruct the signal. You should skip all the negative frequencies here.
Next, you build sine waves using the collected information, but using a time_vec that starts at 1, not at 0 as the FFT assumes. And therefore the signal is shifted with respect to the expected value. Furthermore, when phase==0, you should create an even signal (i.e. a cosine, not a sine).
Thus, changing the following two lines of code will create the correct output:
time_vec = np.arange(0, 10, 1)
and
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.cos((2 * np.pi / wavelength * time_vec) + phase)
# ^^^
Note that these two changes corrects the plotted graph, but doesn't correct all the issues in the code discussed above.
I solved this finally after 2 days of frustration. I still have no idea why this is the way it is so any insight would be great. The solution is to use the phase produced by arctan2(Im, Re) and modify it according to this equation.
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
formula = ((((wavelength[j]) / 2) - 2) * np.pi) / wavelength[j]
ph.append([phase + formula])
I had to derive this equation from data but I still do not know why this works. Please let me know. Finally!!
I've constructed a figure containing 3 different plots representing a position of 3 different masses over some time range. I want to find the period of each. I'm not familiar with the FFT function that I've come across while searching for ways to find the period online. How do I go about this?
Below is the kernel for the plot and the figure; I won't include the code used to build all these variables as it would be quite extensive.
I know I code just start making vertical lines and then estimate by eye, but I'd much rather do it through coding than with that method.
#using the times in days
T9 = 500
dt9 = 0.5
num9 = T9/dt9
times9 = np.linspace(0, T9, num9)
xpos_q9_m1_AU_new = xpos_q9_m1_AU[:-1]
xpos_q9_m2_AU_new = xpos_q9_m2_AU[:-1]
xpos_q9_m3_AU_new = xpos_q9_m3_AU[:-1]
plt.plot(times9, xpos_q9_m1_AU_new)
plt.plot(times9, xpos_q9_m2_AU_new)
plt.plot(times9, xpos_q9_m3_AU_new)
plt.xlabel('Time (days)')
plt.ylabel('X Positions (AU)')
plt.title('X Position of the Kepler 16 System over Time')
plt.legend(['Body 1', 'Body 2', 'Body 3'])
plt.savefig('q9_plot.png');
What you're looking for is a Fourier Transform. This function determines what frequencies make up a wave. Scipy has a module that does this for you nicely:
from scipy.fft import fft
# Number of sample points
N = 600
# sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N * T, N)
y = np.sin(50.0 * 2.0 * np.pi * x)
yf = fft(y)
xf = np.linspace(0.0, 1.0 / (2.0 * T), N // 2)
import matplotlib.pyplot as plt
plt.plot(xf, 2.0 / N * np.abs(yf[0:N // 2]))
plt.grid()
plt.show()
This gives you a graph showing the predicted frequency.
edit: to enhance code example/upload improved picture
I am using pylab to plot a graph in Python (example code shown below). The plot appears correctly, however, I can not find a way of removing the coloured axis join lines (shown on the graph below), which make the graph fairly unsightly.
I have searched the forums and not found a similar question, so any help would be appreciated.
Thank you
Code extract used for plot:
Code based on example given here: http://code.activestate.com/recipes/578256-script-that-compares-various-interest-rate-term-st/
from pylab import plot, title, xlabel, ylabel, show
r0 = 0.5 # current UK funding rate
b = 2.0 # 1 % long term interest rate
a = 0.1#speed of reversion
beta = 0.2#SD
n = 1 # number of simulation trials
T = 15. # time of projection
m = 15. # subintervals
dt = T/m # difference in time each subinterval
r = np.zeros(shape=(n, m), dtype=float) # matrix to hold short rate paths
#loop used to simulate interest rates and plot points
for i in np.arange(1,m):
r[j,i] = r[j,i-1] + a*(b-r[j,i-1])*dt + beta*sqrt(dt)*standard_normal();
plot(np.arange(0, T, dt), r[j],linestyle='--')
show()
If I understand correctly, you are just plotting all the lines for j index.
What you want is probably just r[0,:] for the first simulation. If so, after the next i j for-look, do this
figure() # create a new figure canvas
plot(np.arange(0, T, dt), r[0,:], ,linestyle='--')
Does this solve the problem?
(edit)
Then, probably the problem is that what you need is in the intermediate results. I took simply the max of intermediate result and plotted it as thicker line.
from pylab import *
r0 = 0.5 # current UK funding rate
b = 2.0 # 1 % long term interest rate
a = 0.1#speed of reversion
beta = 0.2#SD
n = 1 # number of simulation trials
T = 15. # time of projection
m = 15. # subintervals
dt = T/m # difference in time each subinterval
r = np.zeros(shape=(n, m), dtype=float) # matrix to hold short rate paths
temp = [] # to save intermediate results
j = 0
clf()
x = np.arange(0, T, dt)
#loop used to simulate interest rates and plot points
for i in np.arange(1,m):
r[j,i] = r[j,i-1] + a*(b-r[j,i-1])*dt + beta*sqrt(dt)*standard_normal()
temp.append(r[j,:])
plot(x, r[j,:],linestyle='--')
results = np.array(temp)
plot( x, results.max(axis=0), linewidth=2 )
(edit2)
actually, just the final result is the same thing as max. so
plot(x, results[-1,:])
is enough...
I am trying to take the FFT and plot it. Problem is, my code works for small frequencies (like 50) but doesn't work for the bigger frequencies I need. What is going on with my code?! I expect to see a spike at the frequency of the sine wave I input, but the spike is at different frequencies depending on the sample spacing I use.
bins = 600
ss = 2048
freq = 44100
centerfreq = freq*bins/ss
# Number of samplepoints
N = ss
# sample spacing
T = 1 / 800.
x = np.linspace(0.0, N*T, N)
y = sin(2*np.pi*centerfreq*x)
yf = fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
plt.plot(xf, 2.0/N * np.abs(yf[0:N/2]), 'r')
The code is right, you need to brush up your Fourier Theory and Nyquist Sampling Theorem and make sure the numbers make sense. The problem is with your x-axis scale. The plot function plots the first item in x with the first item in y, if x is not scaled up to your expectations, you are in for a surprise. You also see this if you plot a sinusoidal signal (sine wave) and expect 'degrees' and you get radians for instance. Its your duty to scale it up well so that it lines up to your expectation.
Refer to this SO answer https://stackoverflow.com/a/25735436/2061422.
from scipy import *
from numpy import *
from pylab import * # imports for me to get going
bins = 600
ss = 2048
freq = 44100
centerfreq = freq*bins/ss
print centerfreq
# Number of samplepoints
N = ss
# sample spacing
T = 1. / freq # i have decreased the spacing considerably
x = np.linspace(0.0, N*T, N)
sample_spacing = x[1] - x[0] # but this is the real sample spacing
y = sin(2*np.pi*centerfreq*x)
yf = fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
freqs = np.fft.fftfreq(len(y), sample_spacing) # read the manual on this fella.
plt.plot(freqs[:N/2], 1.0/N * np.abs(yf[0:N/2]), 'r')
plt.grid()
plt.show()
I have an application that requires a disk populated with 'n' points in a quasi-random fashion. I want the points to be somewhat random, but still have a more or less regular density over the disk.
My current method is to place a point, check if it's inside the disk, and then check if it is also far enough away from all other points already kept. My code is below:
import os
import random
import math
# ------------------------------------------------ #
# geometric constants
center_x = -1188.2
center_y = -576.9
center_z = -3638.3
disk_distance = 2.0*5465.6
disk_diam = 5465.6
# ------------------------------------------------ #
pts_per_disk = 256
closeness_criteria = 200.0
min_closeness_criteria = disk_diam/closeness_criteria
disk_center = [(center_x-disk_distance),center_y,center_z]
pts_in_disk = []
while len(pts_in_disk) < (pts_per_disk):
potential_pt_x = disk_center[0]
potential_pt_dy = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_y = disk_center[1]+potential_pt_dy
potential_pt_dz = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_z = disk_center[2]+potential_pt_dz
potential_pt_rad = math.sqrt((potential_pt_dy)**2+(potential_pt_dz)**2)
if potential_pt_rad < (disk_diam/2.0):
far_enough_away = True
for pt in pts_in_disk:
if math.sqrt((potential_pt_x - pt[0])**2+(potential_pt_y - pt[1])**2+(potential_pt_z - pt[2])**2) > min_closeness_criteria:
pass
else:
far_enough_away = False
break
if far_enough_away:
pts_in_disk.append([potential_pt_x,potential_pt_y,potential_pt_z])
outfile_name = "pt_locs_x_lo_"+str(pts_per_disk)+"_pts.txt"
outfile = open(outfile_name,'w')
for pt in pts_in_disk:
outfile.write(" ".join([("%.5f" % (pt[0]/1000.0)),("%.5f" % (pt[1]/1000.0)),("%.5f" % (pt[2]/1000.0))])+'\n')
outfile.close()
In order to get the most even point density, what I do is basically iteratively run this script using another script, with the 'closeness' criteria reduced for each successive iteration. At some point, the script can not finish, and I just use the points of the last successful iteration.
So my question is rather broad: is there a better way to do this? My method is ok for now, but my gut says that there is a better way to generate such a field of points.
An illustration of the output is graphed below, one with a high closeness criteria, and another with a 'lowest found' closeness criteria (what I want).
A simple solution based on Disk Point Picking from MathWorld:
import numpy as np
import matplotlib.pyplot as plt
n = 1000
r = np.random.uniform(low=0, high=1, size=n) # radius
theta = np.random.uniform(low=0, high=2*np.pi, size=n) # angle
x = np.sqrt(r) * np.cos(theta)
y = np.sqrt(r) * np.sin(theta)
# for plotting circle line:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'-', alpha=.5) # draw unit circle line
ax.plot(x, y, '.') # plot random points
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives.
Alternatively, you also could create a regular grid and distort it randomly:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
n = 20
tt = np.linspace(-1, 1, n)
xx, yy = np.meshgrid(tt, tt) # create unit square grid
s_x, s_y = xx.ravel(), yy.ravel()
ii = np.argwhere(s_x**2 + s_y**2 <= 1).ravel() # mask off unwanted points
x, y = s_x[ii], s_y[ii]
triang = tri.Triangulation(x, y) # create triangluar grid
# distort the grid
g = .5 # distortion factor
rx = x + np.random.uniform(low=-g/n, high=g/n, size=x.shape)
ry = y + np.random.uniform(low=-g/n, high=g/n, size=y.shape)
rtri = tri.Triangulation(rx, ry, triang.triangles) # distorted grid
# for circle:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'k-', alpha=.2) # circle line
ax.triplot(triang, "g-", alpha=.4)
ax.triplot(rtri, 'b-', alpha=.5)
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives
The triangles are just there for visualization. The obvious disadvantage is that depending on your choice of grid, either in the middle or on the borders (as shown here), there will be more or less large "holes" due to the grid discretization.
If you have a defined area like a disc (circle) that you wish to generate random points within you are better off using an equation for a circle and limiting on the radius:
x^2 + y^2 = r^2 (0 < r < R)
or parametrized to two variables
cos(a) = x/r
sin(a) = y/r
sin^2(a) + cos^2(a) = 1
To generate something like the pseudo-random distribution with low density you should take the following approach:
For randomly distributed ranges of r and a choose n points.
This allows you to generate your distribution to roughly meet your density criteria.
To understand why this works imagine your circle first divided into small rings of length dr, now imagine your circle divided into pie slices of angle da. Your randomness now has equal probability over the whole boxed area arou d the circle. If you divide the areas of allowed randomness throughout your circle you will get a more even distribution around the overall circle and small random variation for the individual areas giving you the psudo-random look and feel you are after.
Now your job is just to generate n points for each given area. You will want to have n be dependant on r as the area of each division changes as you move out of the circle. You can proportion this to the exact change in area each space brings:
for the n-th to n+1-th ring:
d(Area,n,n-1) = Area(n) - Area(n-1)
The area of any given ring is:
Area = pi*(dr*n)^2 - pi*(dr*(n-1))
So the difference becomes:
d(Area,n,n-1) = [pi*(dr*n)^2 - pi*(dr*(n-1))^2] - [pi*(dr*(n-1))^2 - pi*(dr*(n-2))^2]
d(Area,n,n-1) = pi*[(dr*n)^2 - 2*(dr*(n-1))^2 + (dr*(n-2))^2]
You could expound this to gain some insight on how much n should increase but it may be faster to just guess at some percentage increase (30%) or something.
The example I have provided is a small subset and decreasing da and dr will dramatically improve your results.
Here is some rough code for generating such points:
import random
import math
R = 10.
n_rings = 10.
n_angles = 10.
dr = 10./n_rings
da = 2*math.pi/n_angles
base_points_per_division = 3
increase_per_level = 1.1
points = []
ring = 0
while ring < n_rings:
angle = 0
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + da*math.random()
rr = r*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
base_points_per_division = base_points_per_division*increase_per_level
ring += 1
I tested it with the parameters:
n_rings = 20
n_angles = 20
base_points = .9
increase_per_level = 1.1
And got the following results:
It looks more dense than your provided image, but I imagine further tweaking of those variables could be beneficial.
You can add an additional part to scale the density properly by calculating the number of points per ring.
points_per_ring = densitymath.pi(dr**2)*(2*n+1)
points_per_division = points_per_ring/n_angles
This will provide a an even better scaled distribution.
density = .03
points = []
ring = 0
while ring < n_rings:
angle = 0
base_points_per_division = density*math.pi*(dr**2)*(2*ring+1)/n_angles
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + min(da,da*random.random())
rr = ring*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
ring += 1
Giving better results using the following parameters
R = 1.
n_rings = 10.
n_angles = 10.
density = 10/(dr*da) # ~ ten points per unit area
With a graph...
and for fun you can graph the divisions to see how well it is matching your distriubtion and adjust.
Depending on how random the points need to be, it may be simple enough to just make a grid of points within the disk, and then displace each point by some small but random amount.
It may be that you want more randomness, but if you just want to fill your disc with an even-looking distribution of points that aren't on an obvious grid, you could try a spiral with a random phase.
import math
import random
import pylab
n = 300
alpha = math.pi * (3 - math.sqrt(5)) # the "golden angle"
phase = random.random() * 2 * math.pi
points = []
for k in xrange(n):
theta = k * alpha + phase
r = math.sqrt(float(k)/n)
points.append((r * math.cos(theta), r * math.sin(theta)))
pylab.scatter(*zip(*points))
pylab.show()
Probability theory ensures that the rejection method is an appropriate method
to generate uniformly distributed points within the disk, D(0,r), centered at origin and of radius r. Namely, one generates points within the square [-r,r] x [-r,r], until a point falls within the disk:
do{
generate P in [-r,r]x[-r,r];
}while(P[0]**2+P[1]**2>r);
return P;
unif_rnd_disk is a generator function implementing this rejection method:
import matplotlib.pyplot as plt
import numpy as np
import itertools
def unif_rnd_disk(r=1.0):
pt=np.zeros(2)
while True:
yield pt
while True:
pt=-r+2*r*np.random.random(2)
if (pt[0]**2+pt[1]**2<=r):
break
G=unif_rnd_disk()# generator of points in disk D(0,r=1)
X,Y=zip(*[pt for pt in itertools.islice(G, 1, 1000)])
plt.scatter(X, Y, color='r', s=3)
plt.axis('equal')
If we want to generate points in a disk centered at C(a,b), we have to apply a translation to the points in the disk D(0,r):
C=[2.0, -3.5]
plt.scatter(C[0]+np.array(X), C[1]+np.array(Y), color='r', s=3)
plt.axis('equal')