Note: This question is building on another question of mine:
Two dimensional FFT using python results in slightly shifted frequency
I have some data, basically a function E(x,y) with (x,y) being a (discrete) subset of R^2, mapping to real numbers. For the (x,y) plane i have a fixed distance between data points in x- as well as in y direction (0,2). I want to analyze the frequency spectrum of my E(x,y) signal using a two dimensional fast fourier transform (FFT) using python.
As far as i know, no matter which frequencies are actually contained in my signal, using FFT, i will only be able to see signals below the Nyquisit limit Ny, which is Ny = sampling frequency / 2. In my case i have a real spacing of 0,2, leading to a sampling frequency of 1 / 0,2 = 5 and therefore my Nyquisit limit is Ny = 5 / 2 = 2,5.
If my signal does have frequencies above the Nyquisit limit, they will be "folded" back into the Nyquisit domain, leading to false results (aliasing). But even though i might sample with a too low frequency, it should in theory not be possible to see any frequencies above the Niquisit limit, correct?
So here is my issue: Analyzing my signal should only lead to frequencies of 2,5 max., but i cleary get frequencies higher than that. Given that i am pretty sure about the theory here, there has to be some mistake in my code. I will provide a shortened code version, only providing necessary information for this issue:
simulationArea =... # length of simulation area in x and y direction
x = np.linspace(0, simulationArea, numberOfGridPointsInX, endpoint=False)
y = x
xx, yy = np.meshgrid(x, y)
Ex = np.genfromtxt('E_field_x100.txt') # this is the actual signal to be analyzed, which may have arbitrary frequencies
FTEx = np.fft.fft2(Ex) # calculating fft coefficients of signal
dx = x[1] - x[0] # calculating spacing of signals in real space. 'print(dx)' results in '0.2'
sampleFrequency = 1.0 / dx
nyquisitFrequency = sampleFrequency / 2.0
half = len(FTEx) / 2
fig, axarr = plt.subplots(2, 1)
im1 = axarr[0, 0].imshow(Ex,
origin='lower',
cmap='jet',
extent=(0, simulationArea, 0, simulationArea))
axarr[0, 0].set_xlabel('X', fontsize=14)
axarr[0, 0].set_ylabel('Y', fontsize=14)
axarr[0, 0].set_title('$E_x$', fontsize=14)
fig.colorbar(im1, ax=axarr[0, 0])
im2 = axarr[1, 0].matshow(2 * abs(FTEx[:half, :half]) / half,
aspect='equal',
origin='lower',
interpolation='nearest')
axarr[1, 0].set_xlabel('Frequency wx')
axarr[1, 0].set_ylabel('Frequency wy')
axarr[1, 0].xaxis.set_ticks_position('bottom')
axarr[1, 0].set_title('$FFT(E_x)$', fontsize=14)
fig.colorbar(im2, ax=axarr[1, 0])
The result of this is:
How is that possible? When i am using the same code for very simple signals, it works just fine (e.g. a sine wave in x or y direction with a specific frequency).
Ok here we go! Here’s a couple of simple functions and a complete example that you can use: it’s got a little bit of extra cruft related to plotting and for data generation but the first function, makeSpectrum shows you how to use fftfreq and fftshift plus fft2 to achieve what you want. Let me know if you have questions.
import numpy as np
import numpy.fft as fft
import matplotlib.pylab as plt
def makeSpectrum(E, dx, dy, upsample=10):
"""
Convert a time-domain array `E` to the frequency domain via 2D FFT. `dx` and
`dy` are sample spacing in x (left-right, 1st axis) and y (up-down, 0th
axis) directions. An optional `upsample > 1` will zero-pad `E` to obtain an
upsampled spectrum.
Returns `(spectrum, xf, yf)` where `spectrum` contains the 2D FFT of `E`. If
`Ny, Nx = spectrum.shape`, `xf` and `yf` will be vectors of length `Nx` and
`Ny` respectively, containing the frequencies corresponding to each pixel of
`spectrum`.
The returned spectrum is zero-centered (via `fftshift`). The 2D FFT, and
this function, assume your input `E` has its origin at the top-left of the
array. If this is not the case, i.e., your input `E`'s origin is translated
away from the first pixel, the returned `spectrum`'s phase will *not* match
what you expect, since a translation in the time domain is a modulation of
the frequency domain. (If you don't care about the spectrum's phase, i.e.,
only magnitude, then you can ignore all these origin issues.)
"""
zeropadded = np.array(E.shape) * upsample
F = fft.fftshift(fft.fft2(E, zeropadded)) / E.size
xf = fft.fftshift(fft.fftfreq(zeropadded[1], d=dx))
yf = fft.fftshift(fft.fftfreq(zeropadded[0], d=dy))
return (F, xf, yf)
def extents(f):
"Convert a vector into the 2-element extents vector imshow needs"
delta = f[1] - f[0]
return [f[0] - delta / 2, f[-1] + delta / 2]
def plotSpectrum(F, xf, yf):
"Plot a spectrum array and vectors of x and y frequency spacings"
plt.figure()
plt.imshow(abs(F),
aspect="equal",
interpolation="none",
origin="lower",
extent=extents(xf) + extents(yf))
plt.colorbar()
plt.xlabel('f_x (Hz)')
plt.ylabel('f_y (Hz)')
plt.title('|Spectrum|')
plt.show()
if __name__ == '__main__':
# In seconds
x = np.linspace(0, 4, 20)
y = np.linspace(0, 4, 30)
# Uncomment the next two lines and notice that the spectral peak is no
# longer equal to 1.0! That's because `makeSpectrum` expects its input's
# origin to be at the top-left pixel, which isn't the case for the following
# two lines.
# x = np.linspace(.123 + 0, .123 + 4, 20)
# y = np.linspace(.123 + 0, .123 + 4, 30)
# Sinusoid frequency, in Hz
x0 = 1.9
y0 = -2.9
# Generate data
im = np.exp(2j * np.pi * (y[:, np.newaxis] * y0 + x[np.newaxis, :] * x0))
# Generate spectrum and plot
spectrum, xf, yf = makeSpectrum(im, x[1] - x[0], y[1] - y[0])
plotSpectrum(spectrum, xf, yf)
# Report peak
peak = spectrum[:, np.isclose(xf, x0)][np.isclose(yf, y0)]
peak = peak[0, 0]
print('spectral peak={}'.format(peak))
Results in the following image, and prints out, spectral peak=(1+7.660797103157986e-16j), which is exactly the correct value for the spectrum at the frequency of a pure complex exponential.
Related
I've read the documentation and searched Stack Overflow for the answer to this question, but can't find it. Sorry if it has already been answered.
I'm working with the results of an np.fft.fft2(Z) where Z is some 2d NumPy array. I would expect positive frequencies to be stored in values less than the Nyquist wavenumber in both x and y directions. From my tests, it seems this is the approach Matlab takes. In NumPy documentation they write positive frequencies are stored below the Nyquist number and negative frequencies above; this does not seem to be the case for fft2.
Some positive frequencies terms are stored at locations greater than the Nyquist wavenumber. For example, a mode at location (127,1) with associated amplitude stored at (1,127), will produce a 2D sinusoid with 4 peaks indicating that the wavenumber should be around 4, not 127.
I can't tell which is the positive and negative frequency in my example above because they are not following standard ordering.
So the main question I have is what kind of order does the fft2 follow for storing positive and negative frequencies?
I didn't post any examples because my question is a universal one and shouldn't be problem specific.
import numpy as np
from heapq import nlargest
## Setting up a simple example
lx = 4.0
ly = 4.0;
lz = 1.5;
nx = 128;
ny = 128;
L = 1.0
H = .4
x = np.linspace(0, lx, nx);
y = np.linspace(0, ly, ny);
x0 = 2.0;
y0 = 2.0;
z1 = np.zeros([ny,nx])
zm= np.zeros([ny,nx])
for j in range(1,ny):
for i in range(1,nx):
if np.sqrt(abs(x[i] - x0)** 2 + abs(y[j] - y0) ** 2) < L:
if np.abs(x[i] - x0) < L:
z1[j, i] = H * np.cos(np.pi * abs(x[i] - x0) / (2 *L))**2;
z1 = z1+np.transpose(z1)/2.0
## Here I take the fft
nf = np.shape(z1)[0]/2
fz1 = np.fft.fft2(z1)
spec_fz1 = np.abs(fz1)**2
valmax = nlargest(1000, spec_fz1.flatten())
## Here I search for amplitude pairs above nyquist number
for i in range(1,len(valmax),2):
xy = return_xy(valmax[i], spec_fz1)
if len(xy) >2:
if ((xy[0] > nf or xy[1]> nf) and (xy[2] > nf or xy[3]> nf) ):
print('both index locations above nyquist frequency')
else:
xy2 = return_xy(valmax[i+1], spec_fz1)
if ((xy[0] > nf or xy[1]> nf) and (xy2[0] > nf or xy2[1]> nf) ):
print('both index locations above nyquist frequency')
def return_xy(mode,spec_topo):
kxky = np.array([])
for i in range(np.shape(spec_topo)[0]):
for j in range(np.shape(spec_topo)[1]):
if spec_topo[i,j] == mode:
kxky= np.append(kxky,[i,j])
if len(kxky)> 1:
return kxky
else:
return kxky[0]
After sorting by the largest amplitude at the 21st index two amplitude pairs are stored at (127,1) and (1,127) which is above the Nyquist number. How should I interpret this wavenumber? note return_xy does same thing as np.where
I think this bit of code demonstrates how the 2D DFT output of np.fft.fft2 is organized:
import numpy as np
import matplotlib.pyplot as plt
n = 16
x = np.arange(n) / n * 2 * np.pi
y = np.arange(n) / n * 2 * np.pi
for kx in range(4):
for ky in range(4):
f = np.cos(kx * x[None,:] + ky * y[:,None])
F = np.fft.fft2(f)
plt.subplot(4, 4, 1 + ky * 4 + kx)
plt.imshow(np.abs(F))
plt.axis('off')
plt.title(f'kx = {kx}, ky = {ky}', fontsize=10)
plt.tight_layout()
plt.show()
We can see that the origin, kx=0 and ky=0 is at the top-left of the array. For a horizontal wave with exactly one period in the input, we see we have a pair of peaks at kx=1 and kx=N-1 (which is equivalent to kx=-1). With two periods in the input, kx=2 and kx=-2, etc. Vertical waves produce the same result but along the vertical axis, and diagonal waves at 45 degrees have the peaks at 45 degrees.
This is the exact same ordering as the 1D DFT (np.fft.fft) produces. The 2D DFT is simply the 1D DFT applied along the columns, and then along the rows of the result (or the other way around, it doesn't matter).
As for the test shown in the question, it is the superposition of two sine waves (one horizontal and one vertical) multiplied by a round window (a "pillbox" function). In the Fourier domain (continuous world), this corresponds to four impulse functions (two along the horizontal axis for the one sine wave, two along the vertical axis for the other sine wave), convolved with the Bessel function of the first kind of order 1 (J1). Because the sine waves have a low frequency, the four impulse functions are close together, and after the convolution appear as a somewhat wider Bessel function, centered around the origin:
plt.imshow(np.log(np.abs(fz1) + 1e-6))
plt.show()
What we see is the peak centered on the origin (at the top-left corner), with things to the left of the origin wrapped around to the right edge, and things to the top of the origin wrapped around to the bottom edge. Applying np.fft.fftshift moves the origin to the middle of the array, yielding a more recognizable shape.
I have several points on the unit sphere that are distributed according to the algorithm described in https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf (and implemented in the code below). On each of these points, I have a value that in my particular case represents 1 minus a small error. The errors are in [0, 0.1] if this is important, so my values are in [0.9, 1].
Sadly, computing the errors is a costly process and I cannot do this for as many points as I want. Still, I want my plots to look like I am plotting something "continuous".
So I want to fit an interpolation function to my data, to be able to sample as many points as I want.
After a little bit of research I found scipy.interpolate.SmoothSphereBivariateSpline which seems to do exactly what I want. But I cannot make it work properly.
Question: what can I use to interpolate (spline, linear interpolation, anything would be fine for the moment) my data on the unit sphere? An answer can be either "you misused scipy.interpolation, here is the correct way to do this" or "this other function is better suited to your problem".
Sample code that should be executable with numpy and scipy installed:
import typing as ty
import numpy
import scipy.interpolate
def get_equidistant_points(N: int) -> ty.List[numpy.ndarray]:
"""Generate approximately n points evenly distributed accros the 3-d sphere.
This function tries to find approximately n points (might be a little less
or more) that are evenly distributed accros the 3-dimensional unit sphere.
The algorithm used is described in
https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf.
"""
# Unit sphere
r = 1
points: ty.List[numpy.ndarray] = list()
a = 4 * numpy.pi * r ** 2 / N
d = numpy.sqrt(a)
m_v = int(numpy.round(numpy.pi / d))
d_v = numpy.pi / m_v
d_phi = a / d_v
for m in range(m_v):
v = numpy.pi * (m + 0.5) / m_v
m_phi = int(numpy.round(2 * numpy.pi * numpy.sin(v) / d_phi))
for n in range(m_phi):
phi = 2 * numpy.pi * n / m_phi
points.append(
numpy.array(
[
numpy.sin(v) * numpy.cos(phi),
numpy.sin(v) * numpy.sin(phi),
numpy.cos(v),
]
)
)
return points
def cartesian2spherical(x: float, y: float, z: float) -> numpy.ndarray:
r = numpy.linalg.norm([x, y, z])
theta = numpy.arccos(z / r)
phi = numpy.arctan2(y, x)
return numpy.array([r, theta, phi])
n = 100
points = get_equidistant_points(n)
# Random here, but costly in real life.
errors = numpy.random.rand(len(points)) / 10
# Change everything to spherical to use the interpolator from scipy.
ideal_spherical_points = numpy.array([cartesian2spherical(*point) for point in points])
r_interp = 1 - errors
theta_interp = ideal_spherical_points[:, 1]
phi_interp = ideal_spherical_points[:, 2]
# Change phi coordinate from [-pi, pi] to [0, 2pi] to please scipy.
phi_interp[phi_interp < 0] += 2 * numpy.pi
# Create the interpolator.
interpolator = scipy.interpolate.SmoothSphereBivariateSpline(
theta_interp, phi_interp, r_interp
)
# Creating the finer theta and phi values for the final plot
theta = numpy.linspace(0, numpy.pi, 100, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, 100, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(100))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
Issue with the code above:
With the code as-is, I have a
ValueError: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots.
that is raised when initialising the interpolator instance.
The issue above seems to say that I should change the value of s that is one on the parameters of scipy.interpolate.SmoothSphereBivariateSpline. I tested different values of s ranging from 0.0001 to 100000, the code above always raise, either the exception described above or:
ValueError: Error code returned by bispev: 10
Edit: I am including my findings here. They can't really be considered as a solution, that is why I am editing and not posting as an answer.
With more research I found this question Using Radial Basis Functions to Interpolate a Function on a Sphere. The author has exactly the same problem as me and use a different interpolator: scipy.interpolate.Rbf. I changed the above code by replacing the interpolator and plotting:
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp)
# Creating the finer theta and phi values for the final plot
plot_points = 100
theta = numpy.linspace(0, numpy.pi, plot_points, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, plot_points, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(plot_points))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
colormap = cm.inferno
normaliser = mpl.colors.Normalize(vmin=numpy.min(heatmap), vmax=1)
scalar_mappable = cm.ScalarMappable(cmap=colormap, norm=normaliser)
scalar_mappable.set_array([])
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(
X,
Y,
Z,
facecolors=colormap(normaliser(heatmap)),
alpha=0.7,
cmap=colormap,
)
plt.colorbar(scalar_mappable)
plt.show()
This code runs smoothly and gives the following result:
The interpolation seems OK except on one line that is discontinuous, just like in the question that led me to this class. One of the answer give the idea of using a different distance, more adapted the the spherical coordinates: the Haversine distance.
def haversine(x1, x2):
theta1, phi1 = x1
theta2, phi2 = x2
return 2 * numpy.arcsin(
numpy.sqrt(
numpy.sin((theta2 - theta1) / 2) ** 2
+ numpy.cos(theta1) * numpy.cos(theta2) * numpy.sin((phi2 - phi1) / 2) ** 2
)
)
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp, norm=haversine)
which, when executed, gives a warning:
LinAlgWarning: Ill-conditioned matrix (rcond=1.33262e-19): result may not be accurate.
self.nodes = linalg.solve(self.A, self.di)
and a result that is not at all the one expected: the interpolated function have values that may go up to -1 which is clearly wrong.
You can use Cartesian coordinate instead of Spherical coordinate.
The default norm parameter ('euclidean') used by Rbf is sufficient
# interpolation
x, y, z = numpy.array(points).T
interpolator = scipy.interpolate.Rbf(x, y, z, r_interp)
# predict
heatmap = interpolator(X, Y, Z)
Here the result:
ax.plot_surface(
X, Y, Z,
rstride=1, cstride=1,
# or rcount=50, ccount=50,
facecolors=colormap(normaliser(heatmap)),
cmap=colormap,
alpha=0.7, shade=False
)
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
You can also use a cosine distance if you want (norm parameter):
def cosine(XA, XB):
if XA.ndim == 1:
XA = numpy.expand_dims(XA, axis=0)
if XB.ndim == 1:
XB = numpy.expand_dims(XB, axis=0)
return scipy.spatial.distance.cosine(XA, XB)
In order to better see the differences,
I stacked the two images, substracted them and inverted the layer.
I'm looking for a clarification of Fourier transform principles.
I try to do something quite simple: Create a signal (sine wave with a given frequency and phase shift) and recreate its params with Fourier transform. Frequency estimations work fine, but when it comes to phase, it looks like I get systematic shift (-pi/2).
import numpy as np
import matplotlib.pyplot as plt
duration = 4.0 # lenght of window (in sec)
ticks_per_sec = 400.0 # sampling interval
samples = int(ticks_per_sec*duration)
phase_shift = np.pi / 2 # sin wave shift in angle
freq = 1 # sine wave freq in Hz
phase_shift = round(ticks_per_sec/freq * phase_shift/(2*np.pi)) # angle value translated to no of ticks
t = np.arange(phase_shift, samples+phase_shift) / ticks_per_sec
s = 1 * np.sin(2.0 * np.pi * freq * t)
N = s.size
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
axs[0].grid(True)
axs[0].set_ylabel("Amplitude")
axs[0].set_xlabel("Time [s]")
axs[0].set_title(f"F: {freq}, Phase shift: {phase_shift} ticks.")
axs[0].plot(np.arange(samples)/ticks_per_sec, s)
f = np.linspace(0, ticks_per_sec, N)
fft = np.fft.fft(s)
peak_pos = np.argmax(np.abs(fft[:N//2]))
axs[1].set_ylabel("Amplitude")
axs[1].set_xlabel("Frequency [Hz]")
axs[1].set_title(f"Peak bar: {peak_pos}")
barlist = axs[1].bar(f[:N // 2], np.abs(fft)[:N // 2] * (1 / (N//2)), width=1.5) # 1 / N is a normalization factor
barlist[peak_pos].set_color('r')
axs[2].set_ylabel("Angle")
axs[2].set_xlabel("Frequency [Hz]")
axs[2].set_title(f"Peak angle: {np.angle(fft[peak_pos])}")
barlist = axs[2].bar(f[:N // 2], np.angle(fft)[:N // 2], width=1.5)
barlist[peak_pos].set_color('r')
fig.show()
Plotted Results of the code above
Please help me if there's a bug in my code that I can't notice, or I misunderstand something.
Thank you in advance.
Your code is just fine, this is not a programming issue.
Let's recall that a sine wave can be expressed as a cosine wave with a phase shift (or vice versa), now remember that sine function as an inherent phase shift of -pi/2 in real Fourier basis relatively to cosine.
This means that your code should output a pi/2 phase angle when replacing np.sin by np.cos, i.e. returns input phase_shift, or equivalently, returns a phase angle of zero when specifying phase_shift = np.pi / 2, i.e. phase shift and sine phase compensate each other.
I have xy coordinates that represents a subject over a given space. It is referenced from another point and is therefore off centre. As in the longitudinal axes is not aligned along the x-axis.
The randomly generated ellipse below provides an indication of this:
import numpy as np
from matplotlib.pyplot import scatter
xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8 # correlation
covs = [[stds[0]**2 , stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
m = np.random.multivariate_normal(means, covs, 1000).T
scatter(m[0], m[1])
To straighten the coordinates I was thinking of applying the vector to a rotation matrix.
Would something like this work?
angle = 65.
theta = (angle/180.) * np.pi
rotMatrix = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
This may also seem like a silly question but is there a way to determine if the resulting vector of xy coordinates is perpendicular? Or will you just have to play around with the rotation angle?
You can use sklearn.decomposition.PCA (principal component analysis) with n_components=2 to extract the smallest angle required to rotate the point cloud such that its major axis is horizontal.
Runnable example
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
np.random.seed(1)
xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8 # correlation
covs = [[stds[0]**2, stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
m = np.random.multivariate_normal(means, covs, 1000)
pca = PCA(2)
# This was in my first answer attempt: fit_transform works fine, but it randomly
# flips (mirrors) points across one of the principal axes.
# m2 = pca.fit_transform(m)
# Workaround: get the rotation angle from the PCA components and manually
# build the rotation matrix.
# Fit the PCA object, but do not transform the data
pca.fit(m)
# pca.components_ : array, shape (n_components, n_features)
# cos theta
ct = pca.components_[0, 0]
# sin theta
st = pca.components_[0, 1]
# One possible value of theta that lies in [0, pi]
t = np.arccos(ct)
# If t is in quadrant 1, rotate CLOCKwise by t
if ct > 0 and st > 0:
t *= -1
# If t is in Q2, rotate COUNTERclockwise by the complement of theta
elif ct < 0 and st > 0:
t = np.pi - t
# If t is in Q3, rotate CLOCKwise by the complement of theta
elif ct < 0 and st < 0:
t = -(np.pi - t)
# If t is in Q4, rotate COUNTERclockwise by theta, i.e., do nothing
elif ct > 0 and st < 0:
pass
# Manually build the ccw rotation matrix
rotmat = np.array([[np.cos(t), -np.sin(t)],
[np.sin(t), np.cos(t)]])
# Apply rotation to each row of m
m2 = (rotmat # m.T).T
# Center the rotated point cloud at (0, 0)
m2 -= m2.mean(axis=0)
fig, ax = plt.subplots()
plot_kws = {'alpha': '0.75',
'edgecolor': 'white',
'linewidths': 0.75}
ax.scatter(m[:, 0], m[:, 1], **plot_kws)
ax.scatter(m2[:, 0], m2[:, 1], **plot_kws)
Output
Warning: pca.fit_transform() sometimes flips (mirrors) the point cloud
The principal components can randomly come out as either positive or negative. In some cases, your point cloud may appear flipped upside down or even mirrored across one of its principal axes. (To test this, change the random seed and re-run the code until you observe flipping.) There's an in-depth discussion here (based in R, but the math is relevant). To correct this, you'd have to replace the fit_transform line with manual flipping of one or both components' signs, then multiply the sign-flipped component matrix by the point cloud array.
Indeed a very useful concept here is a linear transformation of a vector v performed by a matrix A. If you treat your scatter points as the tip of vectors originating from (0,0), then is very easy to rotate them any angle theta. A matrix that performs such rotation of theta would be
A = [[cos(theta) -sin(theta]
[sin(theta) cos(theta)]]
Evidently, when theta is 90 degrees this results into
A = [[ 0 1]
[-1 0]]
And to apply the rotation you would only need to perform the matrix multiplication w = A v
With this, the current goal is to perform a matrix multiplication of the vectors stored in m with x,y tips as m[0],m[1]. The rotated vector are gonna be stored in m2. Below is the relevant code to do so. Note that I have transposed m for an easier computation of the matrix multiplication (performed with #) and that the rotation angle is 90 degress counterclockwise.
import numpy as np
import matplotlib.pyplot as plt
xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8 # correlation
covs = [[stds[0]**2 , stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
m = np.random.multivariate_normal(means, covs, 1000).T
plt.scatter(m[0], m[1])
theta_deg = 90
theta_rad = np.deg2rad(theta_deg)
A = np.matrix([[np.cos(theta_rad), -np.sin(theta_rad)],
[np.sin(theta_rad), np.cos(theta_rad)]])
m2 = np.zeros(m.T.shape)
for i,v in enumerate(m.T):
w = A # v.T
m2[i] = w
m2 = m2.T
plt.scatter(m2[0], m2[1])
This leads to the rotated scatter plot:
You can be sure that the rotated version is exactly 90 degrees counterclockwise with the linear transformation.
Edit
To find the rotation angle you need to apply in order for the scatter plot to be aligned with the x axis a good approach is to find the linear approximation of the scattered data with numpy.polyfit. This yields to a linear function by providing the slope and the intercept of the y axis b. Then get the rotation angle with the arctan function of the slope and compute the transformation matrix as before. You can do this by adding the following part to the code
slope, b = np.polyfit(m[1], m[0], 1)
x = np.arange(min(m[0]), max(m[0]), 1)
y_line = slope*x + b
plt.plot(x, y_line, color='r')
theta_rad = -np.arctan(slope)
And result to the plot you were seeking
Edit 2
Because #Peter Leimbigler pointed out that numpy.polyfit does not find the correct global direction of the scattered data, I have thought that you can get the average slope by averaging the x and y parts of the data. This is to find another slope, called slope2 (depicted in green now) to apply the rotation. So simply,
slope, b = np.polyfit(m[1], m[0], 1)
x = np.arange(min(m[0]), max(m[0]), 1)
y_line = slope*x + b
slope2 = np.mean(m[1])/np.mean(m[0])
y_line2 = slope2*x + b
plt.plot(x, y_line, color='r')
plt.plot(x, y_line2, color='g')
theta_rad = -np.arctan(slope2)
And by applying the linear transformation with the rotation matrix you get
If the slope of the two lines multiplied together is equal to -1 than they are perpendicular.
The other case this is true, is when one slope is 0 and the other is undefined (a perfectly horizontal line and a perfectly vertical line).
I would like to use the Fourier transform to find the center of a simulated entity under periodic boundary condition; periodic boundary conditions means, that whenever something exits through one side of the box, it is warped around to appear on the opposite side just like in the classic game asteroids.
So what I have is for each time frame a matrix (Nx3) with N the number of points in xyz. what I want to do is determine the center of that cloud even if it all moved over the periodic boundary and is so to say stuck in between.
My idea for an solution would now be do a (mass weigted) histogram of these points and then perform an FFT on that and use the phase of the first Fourier coefficient to determine where in the box the maximum would be.
as a test case I have used
import numpy as np
Points_x = np.random.randn(10000)
Box_min = -10
Box_max = 10
X = np.linspace( Box_min, Box_max, 100 )
### make a Histogram of the points
Histogram_Points = np.bincount( np.digitize( Points_x, X ), minlength=100 )
### make an artifical shift over the periodic boundary
Histogram_Points = np.r_[ Histogram_Points[45:], Histogram_Points[:45] ]
So now I can use FFT since it expects a periodic function anyways.
## doing fft
F = np.fft.fft(Histogram_Points)
## getting rid of everything but first harmonic
F[2:] = 0.
## back transforming
Fist_harmonic = np.fft.ifft(F)
That way I get a sine wave with its maximum exactly where the maximum of the histogram is.
Now I'd like to extract the position of the maximum not by taking the max function on the sine vector, but somehow it should be retrievable from the first (not the 0th) Fourier coefficient, since that should somehow contain the phase shift of the sine to have its maximum exactly at the maximum of the histogram.
Indeed, plotting
Cos_approx = cos( linspace(0,2*pi,100) * angle(F[1]) )
will give
But I can't figure out how to get the position of the peak from this angle.
Using the FFT is overkill when all you need is one Fourier coefficent. Instead, you can simply compute the dot product of your data with
w = np.exp(-2j*np.pi*np.arange(N) / N)
where N is the number of points. (The time to compute all the Fourier coefficients with the FFT is O(N*log(N)). Computing just one coefficient is O(N).)
Here's a script similar to yours. The data is put in y; the coordinates of the data points are in x.
import numpy as np
N = 100
# x coordinates of the data
xmin = -10
xmax = 10
x = np.linspace(xmin, xmax, N, endpoint=False)
# Generate data in y.
n = 35
y = np.zeros(N)
y[:n] = 1 - np.cos(np.linspace(0, 2*np.pi, n))
y[:n] /= 0.7 + 0.3*np.random.rand(n)
m = 10
y = np.r_[y[m:], y[:m]]
# Compute coefficent 1 of the discrete Fourier transform.
w = np.exp(-2j*np.pi*np.arange(N) / N)
F1 = y.dot(w)
print "F1 =", F1
# Get the angle of F1 (in the interval [0,2*pi]).
angle = np.angle(F1.conj())
if angle < 0:
angle += 2*np.pi
center_x = xmin + (xmax - xmin) * angle / (2*np.pi)
print "center_x = ", center_x
# Create the first sinusoidal mode for the plot.
mode1 = (F1.real * np.cos(2*np.pi*np.arange(N)/N) -
F1.imag*np.sin(2*np.pi*np.arange(N)/N))/np.abs(F1)
import matplotlib.pyplot as plt
plt.clf()
plt.plot(x, y)
plt.plot(x, mode1)
plt.axvline(center_x, color='r', linewidth=1)
plt.show()
This generates the plot:
To answer the question "Why F1.conj()?":
The complex conjugate of F1 is used because of the minus sign in
w = np.exp(-2j*np.pi*np.arange(N) / N) (which I used because it
is a common convention).
Since w can be written
w = np.exp(-2j*np.pi*np.arange(N) / N)
= cos(-2*pi*arange(N)/N) + 1j*sin(-2*pi*arange(N)/N)
= cos(2*pi*arange(N)/N) - 1j*sin(2*pi*arange(N)/N)
the dot product y.dot(w) is basically a projection of y onto
cos(2*pi*arange(N)/N) (the real part of F1) and -sin(2*pi*arange(N)/N)
(the imaginary part of F1). But when we figure out the phase of
the maximum, it is based on the functions cos(...) and sin(...). Taking
the complex conjugate accounts for the opposite sign of the sin()
function. If w = np.exp(2j*np.pi*np.arange(N) / N) were used instead, the
complex conjugate of F1 would not be needed.
You could calculate the circular mean directly on your data.
When calculating the circular mean, your data is mapped to -pi..pi. This mapped data is interpreted as angle to a point on the unit circle. Then the mean value of x and y component is calculated. The next step is to calculate the resulting angle and map it back to the defined "box".
import numpy as np
import matplotlib.pyplot as plt
Points_x = np.random.randn(10000)+1
Box_min = -10
Box_max = 10
Box_width = Box_max - Box_min
#Maps Points to Box_min ... Box_max with periodic boundaries
Points_x = (Points_x%Box_width + Box_min)
#Map Points to -pi..pi
Points_map = (Points_x - Box_min)/Box_width*2*np.pi-np.pi
#Calc circular mean
Pmean_map = np.arctan2(np.sin(Points_map).mean() , np.cos(Points_map).mean())
#Map back
Pmean = (Pmean_map+np.pi)/(2*np.pi) * Box_width + Box_min
#Plotting the result
plt.figure(figsize=(10,3))
plt.subplot(121)
plt.hist(Points_x, 100);
plt.plot([Pmean, Pmean], [0, 1000], c='r', lw=3, alpha=0.5);
plt.subplot(122,aspect='equal')
plt.plot(np.cos(Points_map), np.sin(Points_map), '.');
plt.ylim([-1, 1])
plt.xlim([-1, 1])
plt.grid()
plt.plot([0, np.cos(Pmean_map)], [0, np.sin(Pmean_map)], c='r', lw=3, alpha=0.5);