Let's say that I have the following data (measurements):
As you can see, there are a lot of sharp points (i.e. where the slope changes a lot). It would therefore, be good to take some more measurements around those points. To do that I wrote a script:
I calculate the curvature of 3 consecutive points:
Menger curvature: https://en.wikipedia.org/wiki/Menger_curvature#Definition
Then I decide which values I should resample, based on the curvature.
...and I iterate until the average curvature goes down... but it does not work, because, it goes up. Do you know why ?
Here is the complete code (stopped it after the length of the x values get 60):
import numpy as np
import matplotlib.pyplot as plt
def curvature(A,B,C):
"""Calculates the Menger curvature fro three Points, given as numpy arrays.
Sources:
Menger curvature: https://en.wikipedia.org/wiki/Menger_curvature#Definition
Area of a triangle given 3 points: https://math.stackexchange.com/questions/516219/finding-out-the-area-of-a-triangle-if-the-coordinates-of-the-three-vertices-are
"""
# Pre-check: Making sure that the input points are all numpy arrays
if any(x is not np.ndarray for x in [type(A),type(B),type(C)]):
print("The input points need to be a numpy array, currently it is a ", type(A))
# Augment Columns
A_aug = np.append(A,1)
B_aug = np.append(B,1)
C_aug = np.append(C,1)
# Caclulate Area of Triangle
matrix = np.column_stack((A_aug,B_aug,C_aug))
area = 1/2*np.linalg.det(matrix)
# Special case: Two or more points are equal
if np.all(A == B) or np.all(B == C):
curvature = 0
else:
curvature = 4*area/(np.linalg.norm(A-B)*np.linalg.norm(B-C)*np.linalg.norm(C-A))
# Return Menger curvature
return curvature
def values_to_calulate(x,curvature_list, max_curvature):
"""Calculates the new x values which need to be calculated
Middle point between the three points that were used to calculate the curvature """
i = 0
new_x = np.empty(0)
for curvature in curvature_list:
if curvature > max_curvature:
new_x = np.append(new_x, x[i]+(x[i+2]-x[i])/3 )
i = i+1
return new_x
def plot(x,y, title, xLabel, yLabel):
"""Just to visualize"""
# Plot
plt.scatter(x,y)
plt.plot(x, y, '-o')
# Give a title for the sine wave plot
plt.title(title)
# Give x axis label for the sine wave plot
plt.xlabel(xLabel)
# Give y axis label for the sine wave plot
plt.ylabel(yLabel)
plt.grid(True, which='both')
plt.axhline(y=0, color='k')
# Display the sine wave
plt.show
plt.pause(0.05)
### STARTS HERE
# Get x values of the sine wave
x = np.arange(0, 10, 1);
# Amplitude of the sine wave is sine of a variable like time
def function(x):
return 1+np.sin(x)*np.cos(x)**2
y = function(x)
# Plot it
plot(x,y, title='Data', xLabel='Time', yLabel='Amplitude')
continue_Loop = True
while continue_Loop == True :
curvature_list = np.empty(0)
for i in range(len(x)-2):
# Get the three points
A = np.array([x[i],y[i]])
B = np.array([x[i+1],y[i+1]])
C = np.array([x[i+2],y[i+2]])
# Calculate the curvature
curvature_value = abs(curvature(A,B,C))
curvature_list = np.append(curvature_list, curvature_value)
print("len: ", len(x) )
print("average curvature: ", np.average(curvature_list))
# Calculate the points that need to be added
x_new = values_to_calulate(x,curvature_list, max_curvature=0.3)
# Add those values to the current x list:
x = np.sort(np.append(x, x_new))
# STOPED IT AFTER len(x) == 60
if len(x) >= 60:
continue_Loop = False
# Amplitude of the sine wave is sine of a variable like time
y = function(x)
# Plot it
plot(x,y, title='Data', xLabel='Time', yLabel='Amplitude')
This is how it should look:
EDIT:
If you let it run even further... :
So summarize my comments above:
you are computing the average curvature of your curve which has no reason to go to 0. At every point, no matter how close your points get, the circle radius will converge to whatever the curvature is at that point, not 0.
an alternative would be to use the absolute derivative change between two points: keep sampling until abs(d(df/dx)) < some_threshold where d(df/dx) = (df/dx)[n] - (df/dx)[n-1]
Related
I have several data points in 3 dimensional space (x, y, z) and have interpolated them using scipy.interpolate.Rbf. This gives me a spline nicely representing the surface of my 3D object. I would now like to determine several x and y pairs that have the same, arbitrary z value. I would like to do that in order to compute the cross section of my 3D object at any given value of z. Does someone know how to do that? Maybe there is also a better way to do that instead of using scipy.interpolate.Rbf.
Up to now I have evaluated the cross sections by making a contour plot using matplotlib.pyplot and extracting the displayed segments. 3D points and interpolated spline
segments extracted using a contour plot
I was able to solve the problem. I have calculated the area by triangulating the x-y data and cutting the triangles with the z-plane I wanted to calculate the cross-sectional area of (z=z0). Specifically, I have searched for those triangles whose z-values are both above and below z0. Then I have calculated the x and y values of the sides of these triangles where the sides are equal to z0. Then I use scipy.spatial.ConvexHull to sort the intersected points. Using the shoelace formula I can then determine the area.
I have attached the example code here:
import numpy as np
from scipy import spatial
import matplotlib.pyplot as plt
# Generation of random test data
n = 500
x = np.random.random(n)
y = np.random.random(n)
z = np.exp(-2*(x-.5)**2-4*(y-.5)**2)
z0 = .75
# Triangulation of the test data
triang= spatial.Delaunay(np.array([x, y]).T)
# Determine all triangles where not all points are above or below z0, i.e. the triangles that intersect z0
tri_inter=np.zeros_like(triang.simplices, dtype=np.int) # The triangles which intersect the plane at z0, filled below
i = 0
for tri in triang.simplices:
if ~np.all(z[tri] > z0) and ~np.all(z[tri] < z0):
tri_inter[i,:] = tri
i += 1
tri_inter = tri_inter[~np.all(tri_inter==0, axis=1)] # Remove all rows with only 0
# The number of interpolated values for x and y has twice the length of the triangles
# Because each triangle intersects the plane at z0 twice
x_inter = np.zeros(tri_inter.shape[0]*2)
y_inter = np.zeros(tri_inter.shape[0]*2)
for j, tri in enumerate(tri_inter):
# Determine which of the three points are above and which are below z0
points_above = []
points_below = []
for i in tri:
if z[i] > z0:
points_above.append(i)
else:
points_below.append(i)
# Calculate the intersections and put the values into x_inter and y_inter
t = (z0-z[points_below[0]])/(z[points_above[0]]-z[points_below[0]])
x_new = t * (x[points_above[0]]-x[points_below[0]]) + x[points_below[0]]
y_new = t * (y[points_above[0]]-y[points_below[0]]) + y[points_below[0]]
x_inter[j*2] = x_new
y_inter[j*2] = y_new
if len(points_above) > len(points_below):
t = (z0-z[points_below[0]])/(z[points_above[1]]-z[points_below[0]])
x_new = t * (x[points_above[1]]-x[points_below[0]]) + x[points_below[0]]
y_new = t * (y[points_above[1]]-y[points_below[0]]) + y[points_below[0]]
else:
t = (z0-z[points_below[1]])/(z[points_above[0]]-z[points_below[1]])
x_new = t * (x[points_above[0]]-x[points_below[1]]) + x[points_below[1]]
y_new = t * (y[points_above[0]]-y[points_below[1]]) + y[points_below[1]]
x_inter[j*2+1] = x_new
y_inter[j*2+1] = y_new
# sort points to calculate area
hull = spatial.ConvexHull(np.array([x_inter, y_inter]).T)
x_hull, y_hull = x_inter[hull.vertices], y_inter[hull.vertices]
# Calculation of are using the shoelace formula
area = 0.5*np.abs(np.dot(x_hull,np.roll(y_hull,1))-np.dot(y_hull,np.roll(x_hull,1)))
print('Area:', area)
plt.figure()
plt.plot(x_inter, y_inter, 'ro')
plt.plot(x_hull, y_hull, 'b--')
plt.triplot(x, y, triangles=tri_inter, color='k')
plt.show()
Description:
In the following python code, I am producing a Gaussian PDF, namely p(y). I am trying to find the area confined between the curve and any horizontal line in the range of [min_p, max_p] through the method of rectangular summation. My main problem is in the implementation of the function that is supposed to calculate this area iteratively for any particular element in p array (as defined in the code) and plot it as a function of a given value in the above range.
Issue:
Code runs perfectly but it is not producing the desired monotonically increasing/decreasing function.
import matplotlib
from scipy.stats import norm
from scipy import integrate
import matplotlib.pyplot as pyplot
import numpy as np
N = 10 # Number of sigmas away from central value
M, K = 2**10, 2**10 # Number of grid points along y and p(y)
mean, sigma = 10.0, 1.0 #Mean value and standard deviation of a Gaussian probability distribuiton (PDF)
ymin, ymax = -N*sigma+mean, N*sigma+mean #Minimum and maximum and spacing between grid points along y-axis
ylims = [ymin, ymax]
y = np.linspace(ylims[0],ylims[1],M) #The values of y-axis on grid points
pdf = norm.pdf(y,loc=mean,scale=sigma) # Definiton of the normalized probability distribuiton function (PDF)
min_p , max_p = min(pdf), max(pdf) #The maximum, minimum value of p(y)
p = np.linspace(min_p, max_p, K) #The values of p(y)-axis on grid points
def Area(p): #Calculating the area under the PDF for which probability is more than pj-value for a particular jth index in pj_array above
Areas = []
for yi in y.flat:
for p_j in p.flat:
Area = 0.0
delta_y = (ymax - ymin) / (M-1) #The spacing beteen grid points along y-axis
if (pdf[yi] > p_j): Area += (delta_y * pdf.sum())
else: Area += 0.0
Areas.append(Area)
return Areas
pyplot.plot(p, Area(p), '-')
pyplot.axis([min_p, max_p, 0, 1])
pyplot.show()
After making quite a few changes here is the code I think you are trying to produce:
from scipy.stats import norm
import numpy as np
import pylab as p
%matplotlib inline
N = 10 # Number of sigmas away from central value
M, K = 2**10, 2**10 # Number of grid points along x and y
mean, sigma = 10.0, 1.0 #Mean value and standard deviation of a Gaussian probability distribuiton (PDF)
xmin, xmax = -N*sigma+mean, N*sigma+mean #Minimum and maximum and spacing between grid points along y-axis
xx = np.linspace(xmin,xmax,M) #The values of x-axis on grid points
pdf = norm.pdf(xx,loc=mean,scale=sigma ) # Definiton of the normalized probability distribuiton function (PDF)
min_p , max_p = min(pdf), max(pdf)
pp = np.linspace(min_p,max_p, K) #The values of y-axis on grid points
delta_x = xx[1]-xx[0] # spacing beteen grid points along x-axis
delta_p = pp[1]-pp[0] # spacing between grid points on y-axis
def Area(p): #Calculating the area under the PDF for which probability is more than pj-value for a particular jth index in pj_array above
areas = []
area=0
for i in range(nx):
for pi in pp :
if (pi< pdf[i]):
area += delta_x * delta_p
areas.append(area)
return areas
p.subplot(121)
p.plot(xx,pdf)
p.subplot(122)
p.plot(xx, Area(p), '-')
It shows the probability density functions and it's integral.
The thing is that you don't need to actually add up little boxes of delta_x* delta_p. For each step in x (I changed your variables for clarity) you just need to add the rectangle delta_x * pdf[i].
from scipy.stats import norm
import numpy as np
import pylab as p
%matplotlib inline
N = 10 # Number of sigmas away from central value
M, K = 2**10, 2**10 # Number of grid points along x and y
mean, sigma = 10.0, 1.0 #Mean value and standard deviation of a Gaussian probability distribuiton (PDF)
xmin, xmax = -N*sigma+mean, N*sigma+mean #Minimum and maximum and spacing between grid points along y-axis
xx = np.linspace(xmin,xmax,M) #The values of x-axis on grid points
pdf = norm.pdf(xx,loc=mean,scale=sigma ) # Definiton of the normalized probability distribuiton function (PDF)
delta_x = xx[1]-xx[0] # spacing between grid points along x-axis
def Area(p): #Calculating the area under the PDF for which probability is more than pj-value for a particular jth index in pj_array above
areas = []
area=0
for i in range(nx):
area += delta_x * pdf[i]
areas.append(area)
return areas
p.subplot(121)
p.plot(xx,pdf)
p.subplot(122)
p.plot(xx, Area(p), '-')
To actually just do the integral above any given line p=const, you only need to subtract that fixed value from pdf[i] , and possibly add a conditional depending of what you mean with integral (subtract negatives or ignore them) in your particular context.
linspace generates a linear space. How can I generate a grid using an arbitrary density function?
Say, I would like to have a grid from 0 to 1, with 100 grid points, and where the density of points is given by (x - 0.5)**2 - how would I create such a grid in Python?
That is, I want many grid-points where the function (x - 0.5)**2) is large, and few points where the function is small. I do not want a grid that has values according to this function.
For example like this:
x = (np.linspace(0.5,1.5,100)-0.5)**2
The start and end values have to be chosen so that f(start) = 0 and f(end)=1.
In that case the following solution should work. Be sure that func is positive throughout the range...
import numpy as np
from matplotlib import pyplot as plt
def func(x):
return (x-0.5)**2
start = 0
end = 1
npoints = 100
x = np.linspace(start,end,npoints)
fx = func(x)
# take density (or intervals) as inverse of fx
# g in [0,1] controls how much warping you want.
# g = 0: fully warped
# g = 1: linearly spaced
g = 0
density = (1+g*(fx-1))/fx
# sum the intervals to get new grid
x_density = np.cumsum(density)
# rescale to match old range
x_density -= x_density.min()
x_density/= x_density.max()
x_density *= (end-start)
x_density += start
fx_density = func(x_density)
plt.plot(x,fx,'ok',ms = 10,label = 'linear')
plt.plot(x_density,fx_density,'or',ms = 10,label = 'warped')
plt.legend(loc = 'upper center')
plt.show()
I have an application that requires a disk populated with 'n' points in a quasi-random fashion. I want the points to be somewhat random, but still have a more or less regular density over the disk.
My current method is to place a point, check if it's inside the disk, and then check if it is also far enough away from all other points already kept. My code is below:
import os
import random
import math
# ------------------------------------------------ #
# geometric constants
center_x = -1188.2
center_y = -576.9
center_z = -3638.3
disk_distance = 2.0*5465.6
disk_diam = 5465.6
# ------------------------------------------------ #
pts_per_disk = 256
closeness_criteria = 200.0
min_closeness_criteria = disk_diam/closeness_criteria
disk_center = [(center_x-disk_distance),center_y,center_z]
pts_in_disk = []
while len(pts_in_disk) < (pts_per_disk):
potential_pt_x = disk_center[0]
potential_pt_dy = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_y = disk_center[1]+potential_pt_dy
potential_pt_dz = random.uniform(-disk_diam/2.0, disk_diam/2.0)
potential_pt_z = disk_center[2]+potential_pt_dz
potential_pt_rad = math.sqrt((potential_pt_dy)**2+(potential_pt_dz)**2)
if potential_pt_rad < (disk_diam/2.0):
far_enough_away = True
for pt in pts_in_disk:
if math.sqrt((potential_pt_x - pt[0])**2+(potential_pt_y - pt[1])**2+(potential_pt_z - pt[2])**2) > min_closeness_criteria:
pass
else:
far_enough_away = False
break
if far_enough_away:
pts_in_disk.append([potential_pt_x,potential_pt_y,potential_pt_z])
outfile_name = "pt_locs_x_lo_"+str(pts_per_disk)+"_pts.txt"
outfile = open(outfile_name,'w')
for pt in pts_in_disk:
outfile.write(" ".join([("%.5f" % (pt[0]/1000.0)),("%.5f" % (pt[1]/1000.0)),("%.5f" % (pt[2]/1000.0))])+'\n')
outfile.close()
In order to get the most even point density, what I do is basically iteratively run this script using another script, with the 'closeness' criteria reduced for each successive iteration. At some point, the script can not finish, and I just use the points of the last successful iteration.
So my question is rather broad: is there a better way to do this? My method is ok for now, but my gut says that there is a better way to generate such a field of points.
An illustration of the output is graphed below, one with a high closeness criteria, and another with a 'lowest found' closeness criteria (what I want).
A simple solution based on Disk Point Picking from MathWorld:
import numpy as np
import matplotlib.pyplot as plt
n = 1000
r = np.random.uniform(low=0, high=1, size=n) # radius
theta = np.random.uniform(low=0, high=2*np.pi, size=n) # angle
x = np.sqrt(r) * np.cos(theta)
y = np.sqrt(r) * np.sin(theta)
# for plotting circle line:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'-', alpha=.5) # draw unit circle line
ax.plot(x, y, '.') # plot random points
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives.
Alternatively, you also could create a regular grid and distort it randomly:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
n = 20
tt = np.linspace(-1, 1, n)
xx, yy = np.meshgrid(tt, tt) # create unit square grid
s_x, s_y = xx.ravel(), yy.ravel()
ii = np.argwhere(s_x**2 + s_y**2 <= 1).ravel() # mask off unwanted points
x, y = s_x[ii], s_y[ii]
triang = tri.Triangulation(x, y) # create triangluar grid
# distort the grid
g = .5 # distortion factor
rx = x + np.random.uniform(low=-g/n, high=g/n, size=x.shape)
ry = y + np.random.uniform(low=-g/n, high=g/n, size=y.shape)
rtri = tri.Triangulation(rx, ry, triang.triangles) # distorted grid
# for circle:
a = np.linspace(0, 2*np.pi, 500)
cx,cy = np.cos(a), np.sin(a)
fg, ax = plt.subplots(1, 1)
ax.plot(cx, cy,'k-', alpha=.2) # circle line
ax.triplot(triang, "g-", alpha=.4)
ax.triplot(rtri, 'b-', alpha=.5)
ax.axis('equal')
ax.grid(True)
fg.canvas.draw()
plt.show()
It gives
The triangles are just there for visualization. The obvious disadvantage is that depending on your choice of grid, either in the middle or on the borders (as shown here), there will be more or less large "holes" due to the grid discretization.
If you have a defined area like a disc (circle) that you wish to generate random points within you are better off using an equation for a circle and limiting on the radius:
x^2 + y^2 = r^2 (0 < r < R)
or parametrized to two variables
cos(a) = x/r
sin(a) = y/r
sin^2(a) + cos^2(a) = 1
To generate something like the pseudo-random distribution with low density you should take the following approach:
For randomly distributed ranges of r and a choose n points.
This allows you to generate your distribution to roughly meet your density criteria.
To understand why this works imagine your circle first divided into small rings of length dr, now imagine your circle divided into pie slices of angle da. Your randomness now has equal probability over the whole boxed area arou d the circle. If you divide the areas of allowed randomness throughout your circle you will get a more even distribution around the overall circle and small random variation for the individual areas giving you the psudo-random look and feel you are after.
Now your job is just to generate n points for each given area. You will want to have n be dependant on r as the area of each division changes as you move out of the circle. You can proportion this to the exact change in area each space brings:
for the n-th to n+1-th ring:
d(Area,n,n-1) = Area(n) - Area(n-1)
The area of any given ring is:
Area = pi*(dr*n)^2 - pi*(dr*(n-1))
So the difference becomes:
d(Area,n,n-1) = [pi*(dr*n)^2 - pi*(dr*(n-1))^2] - [pi*(dr*(n-1))^2 - pi*(dr*(n-2))^2]
d(Area,n,n-1) = pi*[(dr*n)^2 - 2*(dr*(n-1))^2 + (dr*(n-2))^2]
You could expound this to gain some insight on how much n should increase but it may be faster to just guess at some percentage increase (30%) or something.
The example I have provided is a small subset and decreasing da and dr will dramatically improve your results.
Here is some rough code for generating such points:
import random
import math
R = 10.
n_rings = 10.
n_angles = 10.
dr = 10./n_rings
da = 2*math.pi/n_angles
base_points_per_division = 3
increase_per_level = 1.1
points = []
ring = 0
while ring < n_rings:
angle = 0
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + da*math.random()
rr = r*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
base_points_per_division = base_points_per_division*increase_per_level
ring += 1
I tested it with the parameters:
n_rings = 20
n_angles = 20
base_points = .9
increase_per_level = 1.1
And got the following results:
It looks more dense than your provided image, but I imagine further tweaking of those variables could be beneficial.
You can add an additional part to scale the density properly by calculating the number of points per ring.
points_per_ring = densitymath.pi(dr**2)*(2*n+1)
points_per_division = points_per_ring/n_angles
This will provide a an even better scaled distribution.
density = .03
points = []
ring = 0
while ring < n_rings:
angle = 0
base_points_per_division = density*math.pi*(dr**2)*(2*ring+1)/n_angles
while angle < n_angles:
for i in xrange(int(base_points_per_division)):
ra = angle*da + min(da,da*random.random())
rr = ring*dr + dr*random.random()
x = rr*math.cos(ra)
y = rr*math.sin(ra)
points.append((x,y))
angle += 1
ring += 1
Giving better results using the following parameters
R = 1.
n_rings = 10.
n_angles = 10.
density = 10/(dr*da) # ~ ten points per unit area
With a graph...
and for fun you can graph the divisions to see how well it is matching your distriubtion and adjust.
Depending on how random the points need to be, it may be simple enough to just make a grid of points within the disk, and then displace each point by some small but random amount.
It may be that you want more randomness, but if you just want to fill your disc with an even-looking distribution of points that aren't on an obvious grid, you could try a spiral with a random phase.
import math
import random
import pylab
n = 300
alpha = math.pi * (3 - math.sqrt(5)) # the "golden angle"
phase = random.random() * 2 * math.pi
points = []
for k in xrange(n):
theta = k * alpha + phase
r = math.sqrt(float(k)/n)
points.append((r * math.cos(theta), r * math.sin(theta)))
pylab.scatter(*zip(*points))
pylab.show()
Probability theory ensures that the rejection method is an appropriate method
to generate uniformly distributed points within the disk, D(0,r), centered at origin and of radius r. Namely, one generates points within the square [-r,r] x [-r,r], until a point falls within the disk:
do{
generate P in [-r,r]x[-r,r];
}while(P[0]**2+P[1]**2>r);
return P;
unif_rnd_disk is a generator function implementing this rejection method:
import matplotlib.pyplot as plt
import numpy as np
import itertools
def unif_rnd_disk(r=1.0):
pt=np.zeros(2)
while True:
yield pt
while True:
pt=-r+2*r*np.random.random(2)
if (pt[0]**2+pt[1]**2<=r):
break
G=unif_rnd_disk()# generator of points in disk D(0,r=1)
X,Y=zip(*[pt for pt in itertools.islice(G, 1, 1000)])
plt.scatter(X, Y, color='r', s=3)
plt.axis('equal')
If we want to generate points in a disk centered at C(a,b), we have to apply a translation to the points in the disk D(0,r):
C=[2.0, -3.5]
plt.scatter(C[0]+np.array(X), C[1]+np.array(Y), color='r', s=3)
plt.axis('equal')
I would like to use the Fourier transform to find the center of a simulated entity under periodic boundary condition; periodic boundary conditions means, that whenever something exits through one side of the box, it is warped around to appear on the opposite side just like in the classic game asteroids.
So what I have is for each time frame a matrix (Nx3) with N the number of points in xyz. what I want to do is determine the center of that cloud even if it all moved over the periodic boundary and is so to say stuck in between.
My idea for an solution would now be do a (mass weigted) histogram of these points and then perform an FFT on that and use the phase of the first Fourier coefficient to determine where in the box the maximum would be.
as a test case I have used
import numpy as np
Points_x = np.random.randn(10000)
Box_min = -10
Box_max = 10
X = np.linspace( Box_min, Box_max, 100 )
### make a Histogram of the points
Histogram_Points = np.bincount( np.digitize( Points_x, X ), minlength=100 )
### make an artifical shift over the periodic boundary
Histogram_Points = np.r_[ Histogram_Points[45:], Histogram_Points[:45] ]
So now I can use FFT since it expects a periodic function anyways.
## doing fft
F = np.fft.fft(Histogram_Points)
## getting rid of everything but first harmonic
F[2:] = 0.
## back transforming
Fist_harmonic = np.fft.ifft(F)
That way I get a sine wave with its maximum exactly where the maximum of the histogram is.
Now I'd like to extract the position of the maximum not by taking the max function on the sine vector, but somehow it should be retrievable from the first (not the 0th) Fourier coefficient, since that should somehow contain the phase shift of the sine to have its maximum exactly at the maximum of the histogram.
Indeed, plotting
Cos_approx = cos( linspace(0,2*pi,100) * angle(F[1]) )
will give
But I can't figure out how to get the position of the peak from this angle.
Using the FFT is overkill when all you need is one Fourier coefficent. Instead, you can simply compute the dot product of your data with
w = np.exp(-2j*np.pi*np.arange(N) / N)
where N is the number of points. (The time to compute all the Fourier coefficients with the FFT is O(N*log(N)). Computing just one coefficient is O(N).)
Here's a script similar to yours. The data is put in y; the coordinates of the data points are in x.
import numpy as np
N = 100
# x coordinates of the data
xmin = -10
xmax = 10
x = np.linspace(xmin, xmax, N, endpoint=False)
# Generate data in y.
n = 35
y = np.zeros(N)
y[:n] = 1 - np.cos(np.linspace(0, 2*np.pi, n))
y[:n] /= 0.7 + 0.3*np.random.rand(n)
m = 10
y = np.r_[y[m:], y[:m]]
# Compute coefficent 1 of the discrete Fourier transform.
w = np.exp(-2j*np.pi*np.arange(N) / N)
F1 = y.dot(w)
print "F1 =", F1
# Get the angle of F1 (in the interval [0,2*pi]).
angle = np.angle(F1.conj())
if angle < 0:
angle += 2*np.pi
center_x = xmin + (xmax - xmin) * angle / (2*np.pi)
print "center_x = ", center_x
# Create the first sinusoidal mode for the plot.
mode1 = (F1.real * np.cos(2*np.pi*np.arange(N)/N) -
F1.imag*np.sin(2*np.pi*np.arange(N)/N))/np.abs(F1)
import matplotlib.pyplot as plt
plt.clf()
plt.plot(x, y)
plt.plot(x, mode1)
plt.axvline(center_x, color='r', linewidth=1)
plt.show()
This generates the plot:
To answer the question "Why F1.conj()?":
The complex conjugate of F1 is used because of the minus sign in
w = np.exp(-2j*np.pi*np.arange(N) / N) (which I used because it
is a common convention).
Since w can be written
w = np.exp(-2j*np.pi*np.arange(N) / N)
= cos(-2*pi*arange(N)/N) + 1j*sin(-2*pi*arange(N)/N)
= cos(2*pi*arange(N)/N) - 1j*sin(2*pi*arange(N)/N)
the dot product y.dot(w) is basically a projection of y onto
cos(2*pi*arange(N)/N) (the real part of F1) and -sin(2*pi*arange(N)/N)
(the imaginary part of F1). But when we figure out the phase of
the maximum, it is based on the functions cos(...) and sin(...). Taking
the complex conjugate accounts for the opposite sign of the sin()
function. If w = np.exp(2j*np.pi*np.arange(N) / N) were used instead, the
complex conjugate of F1 would not be needed.
You could calculate the circular mean directly on your data.
When calculating the circular mean, your data is mapped to -pi..pi. This mapped data is interpreted as angle to a point on the unit circle. Then the mean value of x and y component is calculated. The next step is to calculate the resulting angle and map it back to the defined "box".
import numpy as np
import matplotlib.pyplot as plt
Points_x = np.random.randn(10000)+1
Box_min = -10
Box_max = 10
Box_width = Box_max - Box_min
#Maps Points to Box_min ... Box_max with periodic boundaries
Points_x = (Points_x%Box_width + Box_min)
#Map Points to -pi..pi
Points_map = (Points_x - Box_min)/Box_width*2*np.pi-np.pi
#Calc circular mean
Pmean_map = np.arctan2(np.sin(Points_map).mean() , np.cos(Points_map).mean())
#Map back
Pmean = (Pmean_map+np.pi)/(2*np.pi) * Box_width + Box_min
#Plotting the result
plt.figure(figsize=(10,3))
plt.subplot(121)
plt.hist(Points_x, 100);
plt.plot([Pmean, Pmean], [0, 1000], c='r', lw=3, alpha=0.5);
plt.subplot(122,aspect='equal')
plt.plot(np.cos(Points_map), np.sin(Points_map), '.');
plt.ylim([-1, 1])
plt.xlim([-1, 1])
plt.grid()
plt.plot([0, np.cos(Pmean_map)], [0, np.sin(Pmean_map)], c='r', lw=3, alpha=0.5);