I do np.fft.fft() to decompose discrete-signal to frequencies.
Then I pick top N freqs and now I want to draw a signal of the sum of those frequencies using the following formula :
amp * np.sin( 2 * np.pi * freq * time + phase )
I extract the freqs based on their position in the array (fftfreqs), amplitudes with np.abs(complex_num) and phases with np.angle(complex_num, deg=True), from the data I got from from fft() call.
It does not seem to work very well.
What I'm doing wrong ?
Do I understand correctly that the complex-num contains the amp&phase of sinusoid ? Do I have to use sin&cos for the drawing ?
The documentation is very sparse on exact interpretation of the result of fft().
PS> I know I can use np.fft.ifft() to draw close approximation, but this way of doing it is limited to the time-frame of the original signal.
I want to draw/fit the original signal with the top N freqs in the original timeframe, but also to draw the "continuation" of the signal after this timeframe using those freqs i.e. the extended signal.
I found my mistakes.. first the correct way to calculate the waves is (sum of those for topN freqs) :
cos_amp * np.cos( 2 * np.pi * freq * time) + sin_amp * np.sin( 2 * np.pi * freq * time)
not just sin()..second it is incorrect to use abs-amplitude and angle-phase extraction from the complex array.
The real part contains the cos-amp and the negative-imaginary part contains the sin-amplitude i.e. once I get the complex array out of fft(), I do something like this :
fft_res = np.fft.fft(signal)
.... topN_freqs_idx = find top freqs from PSD
all_freqs = np.fft.fftfreq(N)
freqs = all_freqs[topN_freq_idx]
norm = fft_res[topN_freq_idx] / (N/2)
cos_amp = np.real(norm)
sin_amp = -np.imag(norm)
I figured out this from here :
http://www.dspguide.com/ch8/5.htm
Related
I want to shift an audio file from an audible frequency band to one that is higher than 20kHz using python.
I have been searching online on how to do this and the closest i have got is using fft.fftshift.
The code is as follows:
samplerate, data = wavfile.read('./aud/vader.wav')
fft_out = fft(data)
shifted = np.fft.fftshift(fft_out)
I want 'shifted' to be the shifted version of fft_out. It should be shifted by a magnitude of 20kHz
fftshift doesn't shift the original 20kHz frequency band to a higher band.
As you can read in the manual:
numpy.fft.fftshift¶ ... Shift the zero-frequency component to the
center of the spectrum. This function swaps half-spaces for all axes
listed
If you want to shift the band 0Hz-20kHz for example to the band 100kHz - 120kHz, you can multiply the audio signal with a cosinus function with frequency 100kHz.
This produces the two bands 80kHz-100kHz and 100kHz-120 kHz.
From the resulting spectrum you have to delete the lower band 80kHz-100kHz. You have to do this for the positive frequencies as well as for the negative frequencies, which are mirrored on the zero axis.
Why does this work?
cos(x) * cos(y) = 0.5 * ( cos(x+y) + cos(x-y) )
if x are the frequencies of your audio band and y is the shifting frequency of 100kHz you get:
cos(2pi*audiofrequency) * cos(2pi*100kHz) = 0.5 * (cos(2pi*(100kHz + audiofrequency)) + (cos(2pi*(100kHz - audiofrequency)) )
As the title says, I am trying to reproduce the following statement from (1)
Empirically, many authors have found that the spectral power of
natural images falls with frequency, f, according to a power law,
1/f**p, with estimated values for p typically near 2.
For this purpose, I used the code from (2) with some minor modifications
import matplotlib.image as mpimg
import numpy as np
import scipy.stats as stats
import pylab as pl
from scipy.stats import linregress
from skimage import color
# Load image in greyscale
image = color.rgb2gray(mpimg.imread("clouds.png"))
image_X,image_Y = image.shape[0], image.shape[1]
# Do FFT
fourier_image = np.fft.fftn(image)
fourier_amplitudes = np.abs(fourier_image)**2
# Get wave vector
kfreq_X = np.fft.fftfreq(image_X) * image_X
kfreq_Y = np.fft.fftfreq(image_Y) * image_Y
kfreq2D = np.meshgrid(kfreq_X, kfreq_Y)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
kbins = np.arange(0.5, max(image_X, image_Y)/2 + 1., 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= 4. * np.pi / 3. * (kbins[1:]**3 - kbins[:-1]**3)
# print(linregress(np.log(kvals[20:-300]), np.log(Abins[20:-300])))
# pl.plot(kvals, Abins)
pl.loglog(kvals, Abins)
pl.xlabel("$k$")
pl.ylabel("$P(k)$")
pl.tight_layout()
pl.savefig("cloud_power_spectrum.png", dpi = 300, bbox_inches = "tight")
When I look at power spectrum of the image used in tutorial (2) my p estimate is around -1.3.
When I tried with my own RGB image. I would not get nice power spectrum distribution as described and shown in (2). Instead, there are few peaks.
Questions
1. Since I am not able to get even close to p~-2 for any of the images, I was wondering if my code is correct?
2. If the code is okay, is there any other reason why I am not able to get p~-2?
3. Are those peaks in RGB image some artefact of my conversion to grayscale maybe or this is expected behaviour?
(1) https://www.cns.nyu.edu/pub/eero/simoncelli01-reprint.pdf
(2) https://bertvandenbroucke.netlify.app/2019/05/24/computing-a-power-spectrum-in-python/
Bert Vandenbroucke updated the blog post from which OP obtained the code on March 10, 2021 to address a number of issues, among them:
Bin area formula 4. * np.pi / 3. * (kbins[1:]**3 - kbins[:-1]**3) (3D spherical shell bin volume) was corrected to np.pi * (kbins[1:]**2 - kbins[:-1]**2) (2D ring bin area).
There is now a warning that frequencies above 1D Nyquist (0.5 cycle per pixel), that is, frequencies between 1D and 2D Nyquist (0.5*sqrt(2) cycle per pixel), are not correctly sampled. Original code completely broke down with a Nyquist checkerboard (alternating black and white pixels) with witdh=height=(power of 2).
Suppose one wanted to find the period of a given sinusoidal wave signal. From what I have read online, it appears that the two main approaches employ either fourier analysis or autocorrelation. I am trying to automate the process using python and my usage case is to apply this concept to similar signals that come from the time-series of positions (or speeds or accelerations) of simulated bodies orbiting a star.
For simple-examples-sake, consider x = sin(t) for 0 ≤ t ≤ 10 pi.
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
## sample data
t = np.linspace(0, 10 * np.pi, 100)
x = np.sin(t)
fig, ax = plt.subplots()
ax.plot(t, x, color='b', marker='o')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
Given a sine-wave of the form x = a sin(b(t+c)) + d, the period of the sine-wave is obtained as 2 * pi / b. Since b=1 (or by visual inspection), the period of our sine wave is 2 * pi. I can check the results obtained from other methods against this baseline.
Attempt 1: Autocorrelation
As I understand it (please correct me if I'm wrong), correlation can be used to see if one signal is a time-lagged copy of another signal (similar to how cosine and sine differ by a phase difference). So autocorrelation is testing a signal against itself to measure the times at which the time-lag repeats said signal. Using the example posted here:
result = np.correlate(x, x, mode='full')
Since x and t each consist of 100 elements and result consists of 199 elements, I am not sure why I should arbitrarily select the last 100 elements.
print("\n autocorrelation (shape={}):\n{}\n".format(result.shape, result))
autocorrelation (shape=(199,)):
[ 0.00000000e+00 -3.82130761e-16 -9.73648712e-02 -3.70014208e-01
-8.59889695e-01 -1.56185995e+00 -2.41986054e+00 -3.33109112e+00
-4.15799070e+00 -4.74662427e+00 -4.94918053e+00 -4.64762251e+00
-3.77524157e+00 -2.33298717e+00 -3.97976240e-01 1.87752669e+00
4.27722402e+00 6.54129270e+00 8.39434617e+00 9.57785701e+00
9.88331103e+00 9.18204933e+00 7.44791758e+00 4.76948221e+00
1.34963425e+00 -2.50822289e+00 -6.42666652e+00 -9.99116299e+00
-1.27937834e+01 -1.44791297e+01 -1.47873668e+01 -1.35893098e+01
-1.09091510e+01 -6.93157447e+00 -1.99159756e+00 3.45267493e+00
8.86228186e+00 1.36707567e+01 1.73433176e+01 1.94357232e+01
1.96463736e+01 1.78556800e+01 1.41478477e+01 8.81191526e+00
2.32100171e+00 -4.70897483e+00 -1.15775811e+01 -1.75696560e+01
-2.20296487e+01 -2.44327920e+01 -2.44454330e+01 -2.19677060e+01
-1.71533510e+01 -1.04037163e+01 -2.33560966e+00 6.27458308e+00
1.45655029e+01 2.16769872e+01 2.68391837e+01 2.94553896e+01
2.91697473e+01 2.59122266e+01 1.99154591e+01 1.17007613e+01
2.03381596e+00 -8.14633251e+00 -1.78184255e+01 -2.59814393e+01
-3.17580589e+01 -3.44884934e+01 -3.38046447e+01 -2.96763956e+01
-2.24244433e+01 -1.26974172e+01 -1.41464998e+00 1.03204331e+01
2.13281784e+01 3.04712823e+01 3.67721634e+01 3.95170295e+01
3.83356037e+01 3.32477037e+01 2.46710643e+01 1.33886439e+01
4.77778141e-01 -1.27924775e+01 -2.50860560e+01 -3.51343866e+01
-4.18671622e+01 -4.45258983e+01 -4.27482779e+01 -3.66140001e+01
-2.66465884e+01 -1.37700036e+01 7.76494745e-01 1.55574483e+01
2.90828312e+01 3.99582426e+01 4.70285203e+01 4.95000000e+01
4.70285203e+01 3.99582426e+01 2.90828312e+01 1.55574483e+01
7.76494745e-01 -1.37700036e+01 -2.66465884e+01 -3.66140001e+01
-4.27482779e+01 -4.45258983e+01 -4.18671622e+01 -3.51343866e+01
-2.50860560e+01 -1.27924775e+01 4.77778141e-01 1.33886439e+01
2.46710643e+01 3.32477037e+01 3.83356037e+01 3.95170295e+01
3.67721634e+01 3.04712823e+01 2.13281784e+01 1.03204331e+01
-1.41464998e+00 -1.26974172e+01 -2.24244433e+01 -2.96763956e+01
-3.38046447e+01 -3.44884934e+01 -3.17580589e+01 -2.59814393e+01
-1.78184255e+01 -8.14633251e+00 2.03381596e+00 1.17007613e+01
1.99154591e+01 2.59122266e+01 2.91697473e+01 2.94553896e+01
2.68391837e+01 2.16769872e+01 1.45655029e+01 6.27458308e+00
-2.33560966e+00 -1.04037163e+01 -1.71533510e+01 -2.19677060e+01
-2.44454330e+01 -2.44327920e+01 -2.20296487e+01 -1.75696560e+01
-1.15775811e+01 -4.70897483e+00 2.32100171e+00 8.81191526e+00
1.41478477e+01 1.78556800e+01 1.96463736e+01 1.94357232e+01
1.73433176e+01 1.36707567e+01 8.86228186e+00 3.45267493e+00
-1.99159756e+00 -6.93157447e+00 -1.09091510e+01 -1.35893098e+01
-1.47873668e+01 -1.44791297e+01 -1.27937834e+01 -9.99116299e+00
-6.42666652e+00 -2.50822289e+00 1.34963425e+00 4.76948221e+00
7.44791758e+00 9.18204933e+00 9.88331103e+00 9.57785701e+00
8.39434617e+00 6.54129270e+00 4.27722402e+00 1.87752669e+00
-3.97976240e-01 -2.33298717e+00 -3.77524157e+00 -4.64762251e+00
-4.94918053e+00 -4.74662427e+00 -4.15799070e+00 -3.33109112e+00
-2.41986054e+00 -1.56185995e+00 -8.59889695e-01 -3.70014208e-01
-9.73648712e-02 -3.82130761e-16 0.00000000e+00]
Attempt 2: Fourier
Since I am not sure where to go from the last attempt, I sought a new attempt. To my understanding, Fourier analysis basically shifts a signal from/to the time-domain (x(t) vs t) to/from the frequency domain (x(t) vs f=1/t); the signal in frequency-space should appear as a sinusoidal wave that dampens over time. The period is obtained from the most observed frequency since this is the location of the peak of the distribution of frequencies.
Since my values are all real-valued, applying the Fourier transform should mean my output values are all complex-valued. I wouldn't think this is a problem, except for the fact that scipy has methods for real-values. I do not fully understand the differences between all of the different scipy methods. That makes following the algorithm proposed in this posted solution hard for me to follow (ie, how/why is the threshold value picked?).
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
threshold = 0
idx = np.where(abs(omega)>threshold)[0][-1]
max_f = abs(freq[idx])
print(max_f)
This outputs 0.01, meaning the period is 1/0.01 = 100. This doesn't make sense either.
Attempt 3: Power Spectral Density
According to the scipy docs, I should be able to estimate the power spectral density (psd) of the signal using a periodogram (which, according to wikipedia, is the fourier transform of the autocorrelation function). By selecting the dominant frequency fmax at which the signal peaks, the period of the signal can be obtained as 1 / fmax.
freq, pdensity = signal.periodogram(x)
fig, ax = plt.subplots()
ax.plot(freq, pdensity, color='r')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
The periodogram shown below peaks at 49.076... at a frequency of fmax = 0.05. So, period = 1/fmax = 20. This doesn't make sense to me. I have a feeling it has something to do with the sampling rate, but don't know enough to confirm or progress further.
I realize I am missing some fundamental gaps in understanding how these things work. There are a lot of resources online, but it's hard to find this needle in the haystack. Can someone help me learn more about this?
Let's first look at your signal (I've added endpoint=False to make the division even):
t = np.linspace(0, 10*np.pi, 100, endpoint=False)
x = np.sin(t)
Let's divide out the radians (essentially by taking t /= 2*np.pi) and create the same signal by relating to frequencies:
fs = 20 # Sampling rate of 100/5 = 20 (e.g. Hz)
f = 1 # Signal frequency of 1 (e.g. Hz)
t = np.linspace(0, 5, 5*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
This makes it more salient that f/fs == 1/20 == 0.05 (i.e. the periodicity of the signal is exactly 20 samples). Frequencies in a digital signal always relate to its sampling rate, as you have already guessed. Note that the actual signal is exactly the same no matter what the values of f and fs are, as long as their ratio is the same:
fs = 1 # Natural units
f = 0.05
t = np.linspace(0, 100, 100*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
In the following I'll use these natural units (fs = 1). The only difference will be in t and hence the generated frequency axes.
Autocorrelation
Your understanding of what the autocorrelation function does is correct. It detects the correlation of a signal with a time-lagged version of itself. It does this by sliding the signal over itself as seen in the right column here (from Wikipedia):
Note that as both inputs to the correlation function are the same, the resulting signal is necessarily symmetric. That is why the output of np.correlate is usually sliced from the middle:
acf = np.correlate(x, x, 'full')[-len(x):]
Now index 0 corresponds to 0 delay between the two copies of the signal.
Next you'll want to find the index or delay that presents the largest correlation. Due to the shrinking overlap this will by default also be index 0, so the following won't work:
acf.argmax() # Always returns 0
Instead I recommend to find the largest peak instead, where a peak is defined to be any index with a larger value than both its direct neighbours:
inflection = np.diff(np.sign(np.diff(acf))) # Find the second-order differences
peaks = (inflection < 0).nonzero()[0] + 1 # Find where they are negative
delay = peaks[acf[peaks].argmax()] # Of those, find the index with the maximum value
Now delay == 20, which tells you that the signal has a frequency of 1/20 of its sampling rate:
signal_freq = fs/delay # Gives 0.05
Fourier transform
You used the following to calculate the FFT:
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
Thhese functions re designed for complex-valued signals. They will work for real-valued signals, but you'll get a symmetric output as the negative frequency components will be identical to the positive frequency components. NumPy provides separate functions for real-valued signals:
ft = np.fft.rfft(x)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0]) # Get frequency axis from the time axis
mags = abs(ft) # We don't care about the phase information here
Let's have a look:
plt.plot(freqs, mags)
plt.show()
Note two things: the peak is at frequency 0.05, and the maximum frequency on the axis is 0.5 (the Nyquist frequency, which is exactly half the sampling rate). If we had picked fs = 20, this would be 10.
Now let's find the maximum. The thresholding method you have tried can work, but the target frequency bin is selected blindly and so this method would suffer in the presence of other signals. We could just select the maximum value:
signal_freq = freqs[mags.argmax()] # Gives 0.05
However, this would fail if, e.g., we have a large DC offset (and hence a large component in index 0). In that case we could just select the highest peak again, to make it more robust:
inflection = np.diff(np.sign(np.diff(mags)))
peaks = (inflection < 0).nonzero()[0] + 1
peak = peaks[mags[peaks].argmax()]
signal_freq = freqs[peak] # Gives 0.05
If we had picked fs = 20, this would have given signal_freq == 1.0 due to the different time axis from which the frequency axis was generated.
Periodogram
The method here is essentially the same. The autocorrelation function of x has the same time axis and period as x, so we can use the FFT as above to find the signal frequency:
pdg = np.fft.rfft(acf)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0])
plt.plot(freqs, abs(pdg))
plt.show()
This curve obviously has slightly different characteristics from the direct FFT on x, but the main takeaways are the same: the frequency axis ranges from 0 to 0.5*fs, and we find a peak at the same signal frequency as before: freqs[abs(pdg).argmax()] == 0.05.
Edit:
To measure the actual periodicity of np.sin, we can just use the "angle axis" that we passed to np.sin instead of the time axis when generating the frequency axis:
freqs = np.fft.rfftfreq(len(x), 2*np.pi*f*(t[1]-t[0]))
rad_period = 1/freqs[mags.argmax()] # 6.283185307179586
Though that seems pointless, right? We pass in 2*np.pi and we get 2*np.pi. However, we can do the same with any regular time axis, without presupposing pi at any point:
fs = 10
t = np.arange(1000)/fs
x = np.sin(t)
rad_period = 1/np.fft.rfftfreq(len(x), 1/fs)[abs(np.fft.rfft(x)).argmax()] # 6.25
Naturally, the true value now lies in between two bins. That's where interpolation comes in and the associated need to choose a suitable window function.
So, I am probably missing something obvious, but I have searched through lots of tutorials and documentation and can't seem to find a straight answer. How do you find the frequency axis of a function that you performed an fft on in Python(specifically the fft in the scipy library)?
I am trying to get a raw EMG signal, perform a bandpass filter on it, and then perform an fft to see the remaining frequency components. However, I am not sure how to find an accurate x component list. The specific signal I am working on currently was sampled at 1000 Hz and has 5378 samples.
Is it just creating a linear x starting from 0 and going to the length of the fft'd data? I see a lot of people creating a linspace from 0 to sample points times the sample spacing. But what would be my sample spacing in this case? Would it just be samples/sampling rate? Or is it something else completely?
Here is an example.
First create a sine wave with sampling interval pre-determined. we will combine two sine waves with frequencies 20 and 40. Remember high frequencies might be aliased if the time interval is large.
#Import the necessary packages
from scipy import fftpack
import matplotlib.pyplot as plt
import numpy as np
# sampling freq in herts 20Hz, and 40Hz
freq_sampling1 = 10
freq_sampling2 = 20
amplitude1 = 2 # amplitude of first sine wave
amplitude2 = 4 # amplitude of second sine wave
time = np.linspace(0, 6, 500, endpoint=True) # time range with total samples of 500 from 0 to 6 with time interval equals 6/500
y = amplitude1*np.sin(2*np.pi*freq_sampling1*time) + amplitude2*np.sin(2*np.pi*freq_sampling2*time)
plt.figure(figsize=(10, 4))
plt.plot(time,y, 'k', lw=0.8)
plt.xlim(0,6)
plt.show()
Notice in the figure that two sine waves are superimposed. One with freq. 10 and amplitude 2 and the other with freq. 20 and amplitude 4.
# apply fft function
yf = fftpack.fft(y, time.size)
amp = np.abs(yf) # get amplitude spectrum
freq = np.linspace(0.0, 1.0/(2.0*(6/500)), time.size//2) # get freq axis
# plot the amp spectrum
plt.figure(figsize=(10,6))
plt.plot(freq, (2/amp.size)*amp[0:amp.size//2])
plt.show()
Notice in the amplitude spectrum the two frequencies are recovered while amplitude is zero at other frequencies. the Amplitude values are also 2 and 4 respectively.
you can use instead fftpack.fftfreq to obtain frequency axis as suggested by tom10
Therefore, the code changes to
yf = fftpack.fft(y, time.size)
amp = np.abs(yf) # get amplitude spectrum
freq = fftpack.fftfreq(time.size, 6/500)
plt.figure(figsize=(10,6))
plt.plot(freq[0:freq.size//2], (2/amp.size)*amp[0:amp.size//2])
plt.show()
We are only plotting the positive part of the amplitude spectrum [0:amp.size//2]
Once you feed your window of samples into the FFT call it will return an array of imaginary points ... the freqency separation between each element of returned array is determined by
freq_resolution = sampling_freq / number_of_samples
the 0th element is your DC offset which will be zero if your input curve is balanced straddling the zero crossing point ... so in your case
freq_resolution = 1000 / 5378
In general, for efficiency, you will want to feed an even power of 2 number of samples into your FFT call, important if you are say sliding your window of samples forward in time and repeatedly calling FFT on each window
To calculate the magnitude of a frequency in a given freq_bin (an element of the returned imaginary array)
X = A + jB
A on real axis
B on imag axis
for above formula its
mag = 2.0 * math.Sqrt(A*A+B*B) / number_of_samples
phase == arctan( B / A )
you iterate across each element up to the Nyquist limit which is why you double above magnitude
So yes its a linear increment with same frequency spacing between each freq_bin
I use the following code to generate the fibonacci lattice, see page 4 for the unit sphere. I think the code is working correctly. Next, I have a list of points (specified by latitude and longitude in radians, just as the generated fibonacci lattice points). For each of the points I want to find the index of the closest point on the fibonacci lattice. I.e. I have latitude and longitude and want to get i. How would I do this?
I specifically don't want to iterate over all the points from the lattice and find the one with minimal distance, as in practice I generate much more than just 50 points and I don't want the runtime to be O(n*m) if O(m) is possible.
FWIW, when talking about distance, I mean haversine distance.
#!/usr/bin/env python2
import math
import sys
n = 50
phi = (math.sqrt(5.0) + 1.0) / 2.0
phi_inv = phi - 1.0
ga = 2.0 * phi_inv * math.pi
for i in xrange(-n, n + 1):
longitude = ga * i
longitude = (longitude % phi) - phi if longitude < 0 else longitude % phi
latitude = math.asin(2.0 * float(i) / (2.0 * n + 1.0))
print("{}-th point: ".format(i + n + 1))
print("\tLongitude is {}".format(longitude))
print("\tLatitude is {}".format(latitude))
// Given latitude and longitude of point A, determine index i of point which is closest to A
// ???
What you are probably looking for is a spatial index: https://en.wikipedia.org/wiki/Spatial_database#Spatial_index. Since you only care about nearest neighbor search, you might want to use something relatively simple like http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html.
Note that spatial indexes usually consider points on a plane rather than a sphere. To adapt it to your situation, you'll probably want to split up the sphere into several regions that can be approximated by rectangles. You can then find several of the nearest neighbors according to the rectangular approximation and compute their actual haversine distances to identify the true nearest neighbor.
It's somewhat easier to use spherical coordinates here.
Your spherical coordinates are given by lat = arcsin(2 * i / (2 * N + 1)), and lon = 2 * PI * i / the golden ratio.
Reversing this is not a dead end - it's a great way to determine latitude. The issue with the reverse approach is only that it fails to represent longitude.
sin(lat) = 2 * i / (2 * N + 1)
i = (2 * N + 1) * sin(lat) / 2
This i is an exact representation of the index of a point matching the latitude of your input point. The next step is your choice - brute force, or choosing a different spiral.
The Fibonacci spiral is great at covering a sphere, but one of its properties is that it does not preserve locality between consecutive indices. Thus, if you want to find the closest points, you have to search a wide range - it is difficult to even estimate bounds for this search. Brute force is expensive. However, this is already a significant improvement over the original problem of checking every point - if you like, you can threshhold your results and bound your search in any way you like and get approximately accurate results. If you want to accomplish this in a more deterministic way, though, you'll have to dig deeper.
My solution to this problem looks a bit like this (and apologies, this is written in C# not Python)
// Take a stored index on a spiral on a sphere and convert it to a normal vector
public Vector3 UI2N(uint i)
{
double h = -1 + 2 * (i/n);
double phi = math.acos(h);
double theta = sqnpi*phi;
return new Vector3((float)(math.cos(theta) * math.sin(phi)), (float)math.cos(phi), (float)(math.sin(theta) * math.sin(phi)));
}
// Take a normalized vector and return the closest matching index on a spiral on a sphere
public uint N2UI(Vector3 v)
{
double iT = sqnpi * math.acos(v.y); // theta calculated to match latitude
double vT = math.atan2(v.z, v.x); // theta calculated to match longitude
double iTR = (iT - vT + math.PI_DBL)%(twoPi); // Remainder from iTR, preserving the coarse number of turns
double nT = iT - iTR + math.PI_DBL; // new theta, containing info from both
return (uint)math.round(n * (math.cos(nT / sqnpi) + 1) / 2);
}
Where n is the spiral's resolution, and sqnpi is sqrt(n * PI).
This is not the most efficient possible implementation, nor is it particularly clear. However, it is a middle ground where I can attempt to explain it.
The spiral I am using is one I found here:
https://web.archive.org/web/20121103201321/http://groups.google.com/group/sci.math/browse_thread/thread/983105fb1ced42c/e803d9e3e9ba3d23#e803d9e3e9ba3d23%22%22
(Orion's spiral is basically the one I'm using here)
From this I can reverse the function to get both a coarse and a fine measure of Theta (distance along the spiral), and combine them to find the best-fitting index. The way this works is that iT is cumulative, but vT is periodic. vT is a more correct measure of the longitude, but iT is a more correct measure of latitude.
I strongly encourage that anyone reading this try things other than what I'm doing with my code, as I know that it can be improved from here - that's what I did, and I would do well to do more. Using doubles is absolutely necessary here with the current implementation - otherwise too much information would be lost, particularly with the trig functions and the conversion to uint.