I'm trying to generate some synthetic data from en experiment. I got the theoretical PSD in the positive frequency domain and calculate some timeseries out of. Than C do some manipulation on my data, do an FFT and make a fit in the frequency domain. I will first show you some code, than explain the main problem:
The bins at which I evaluate the theoretical PSD are my bins given by:
bins = np.linspace(1,1e3,2**13)
From that I calculate my timerseries by:
for i in range(bins.shape[0]):
signal += dataCOS[i] * np.cos(2*np.pi* t * bins[i] + random.uniform(0,2*np.pi))
where dataCOS is the value of the PSD and the corresponding bins and t is given by
t_full = np.linspace(0,1,2**13, endpoint = False)
My Problem is, I need an FFT of my timeseries that does not smear out. Okay, usually not possible, BUT, if I'm correct and add exactly one frequency bins per time bin in a way that they match in frame (highest and lowest frequency are still visible), it should work. So, my question is, how have bins and t_full to be shaped??
Here are my thoughts: Both need to have the same amount of points, and the highest and lowest frequency must be visible from the time domain. But I'm not sure, hope you can help me out.
EDIT ****************************************** First, to illustrate my problem a bit more see the pictures attached: In the first you have the classic result of a backtransformation with too many points in the time domain, but with all the bins being periodic. In the second its the same, but now they are not periodic anymore and one gets the expected smearing. Of course, one could improve here with a filter, but if you want to fit the inital red function, you run into major problem. Last picture is the result i wanted: All functions are periodic within the timeframe, so there is no smearing out. Second, every pair of t_bins holds exactly one frequency i added before, so the cloud at the computational limit vanishes.
To get that, here is the needed spacing of bins and t
bins = np.linspace(0,2**14,2**14, endpoint = False)
t_full = np.linspace(0,1,2*bins.shape[0], endpoint = False)
You probably just need to apply a suitable window function prior to the FFT to prevent spectral leakage. I'm not a Python user, so I can't give code, but the formula is quite simple to apply. Try a von Hann or Hamming window (they are similar, but give slightly different shaped peaks).
Related
I am doing an automatic music recognition project with a deep learning model. For my data preprocessing, I am trying to calculate the Constant Q Transform for polyphonic 88-key piano audio using Python's Librosa library. However, I do not understand what I should set fmin, n_bins, and bins_per_octave to in Librosa's cqt() method to do this. Specifically:
What exactly is a bin? Do the upper and lower boundaries of a bin correspond to the frequencies of two consecutive notes? In other words, because an 88-key piano has 7 octaves each with 12 unique notes, should I set n_bins = 7 * 12 = 84 or equivalently bins_per_octave = 7? Or should several bins correspond to a single note interval?
Is fmin supposed to be the deepest note on the 88-key piano, i.e. the A note with a frequency of about 27.5 Hz?
Why do we need fmin? Is this some sort of reference point, similar to the equation from amplitude to decibels?
What are the differences between n_bins and bins_per_octave and which is better to use? For example, this research paper here uses both.
When is it appropriate to use Librosa's chroma_cqt method?
I'm not an expert in CQT, but I can maybe help to answer some of these question. According to this wikipedia page, CQT can be thought of as series of filters on the signal. Each filter isolates some frequency domain of the signal, and then the amplitude of the filtered signal is the amplitude which is output for that frequency.
So to your first question, a bin is a filter which isolates a particular part of the signal in the frequency domain. The upper and lower bound of the filter is a bit unclear to me exactly, but certainly the idea is that each bin is centered at frequencies which I'll describe after this, and the bins ideally wouldn't overlap, but also not lose any data if you attempted to reconstruct the original signal.
For your case, I would set fmin to the lowest note on the piano, like you said, 27.5 hz. Then I would put bins_per_octave at the default (12), since you'd like to match the bins to each note on the piano. Finally, the number of bins would be 88, since you have 88 keys. You won't capture any harmonics of the keys (especially the higher ones), but maybe that's okay for you case.
To explain more about how frequencies are chosen, the idea is to mimic how humans hear frequencies. We are more discerning at lower frequencies, and less so at high frequencies, with a roughly logarithmic response. So the internal formula for each bin's frequency is probably something like:
f_min * 2 ** (i / bins_per_octave)
where i is the bin index and in range [0, n_bins).
I have no idea when chroma_cqt is best used, so hopefully someone else can help with that :)
I want to know how much energy is at a specific frequency in a signal. I am using FFT to get the spectrum, and the frequency step is determined by the length of my signal.
My spectrum looks, for example, like this :
I want to get the spectrum peak at a specific frequency, -0.08. However, the discretization of the spectrum only give me a peaks at -0.0729 and -0.0833.
Is there a way to shift the spectrum to make sure there is a data point at the frequency I want? Or a way to get the value without necessarily using fft?
Thank you very much!
What you're actually doing when you take a DFT (or any Fourier Transform) is measuring how much of your signal "intersects" with sines of certain frequencies. This is done by summing product of your signal with the complex conjugate of the wave of whatever frequency. Technically, this is called an inner product, which is a generalization of the dot product, and measures how "close" a signal is to another. So if you're only interested in one frequency, don't take the whole DFT, just look at one you want.
I'm not sure what your units are, so I'll assume you want the peak at f0 = -0.08 Hz (If your units are something else, like normalized to the sampling frequency, then you'll need to account for that). This corresponds to the complex exponential exp(2*pi*j*f0*t). Because you're sampling, your t is discrete, so t = n/fs, where fs is the sampling frequency (in Hz).
# assuming you're using numpy arrays
w = exp(-2*pi*1j*f0*arange(len(signal))/fs)
peak = abs(sum(signal*w))
There are different definitions of the DFT; I'm pretty sure numpy's corresponds to what I have above. The extra minus in the exponential is because it's the complex conjugate.
Notice that it's unlikely that w is actually periodic. If the number of samples is large enough this doesn't really matter. A good heuristic is at least 10 periods.
If you have discrete data but need an output for a continuous variable you'll necessarily need some kind of interpolation function. For a value by request style I would advise Scipy interp1d (example of the use of a interp1d function). I believe it's the fastest way to achieve your intended results.
I'm using a slightly modified version of this python code to do frequency analysis:
FFT wrong value?
Lets say I have a pack of sine waves in the time domain that are very close together in frequency, while sharing the same amplitude. This is how they look like in the frequency domain, using FFT on 1024 samples from which I strip out the second half, giving 512 bins of resolution:
This is when I apply a FFT over the same group of waves but this time with 128 samples (64 bins):
I expected a plateau-ish frequency response but it looks like the waves in the center are being cancelled. What are those "horns" I see? Is this normal?
I believe your result is correct. The peaks are at ±f1 and ±f2), corresponding to the respective frequency components of the two signals shown in your first plot.
I assume that you are shifting the DC component back to the center? What "waves in the center" are you referring to?
There are a couple of other potential issues that you should be aware of:
Aliasing: by inspection it appears that you have enough samples across your signal but keep in mind that artificial (or aliased) frequencies can be created by the FFT, if there are not enough sample points to capture the underlying frequency. Specifically, if your frequency is f, then you need your data sample spacing to be at least, Δx = 1/(2*f), or smaller.
Windowing: your signal is windowed (has a finite extent) so there will also be some broadening, ringing, or redistribution of power about each spatial frequency due to edge affects.
Since I don't know the details of your data, I went ahead and created a sinusoid and then sampled the data close to what appears to be your sampling rate. For example, below is a sinusoid with 64 points and with a signal frequency at 10 cycles (count the peaks):
The FFT result is then:
which shows the same quantitative features as yours, but without having your data, its difficult for me to match your exact situation (spacing and taper).
Next I applied a super-Gauss window function (shown below) to simulate the finite extent of your data:
After applying the window to the input signal we have:
The corresponding FFT result shows some additional power redistribution, due to the finite extent of the data:
Although I can't match your exact situation, I believe your results appear as expected and some qualitative features of your data have been identified. Hope this helps.
Sine waves closely spaced in the frequency domain will occasionally nearly cancel out in the time domain. Since your second FFT is 8 times shorter than your first FFT, you may have windowed just such an short area of cancellation. Try a different time location of shorter time window to see something different (or different phases of sinusoids).
I want to amplify audio input at specific frequencies, and I use numpy.fft.
So my question is: When changing the amplitudes of the signal, what happens with phase?
For example, if I multiply amplitudes in some frequency range, by some factor, let's say 2, do I need to change the phases, and if so, what should I do with them?
I've done the amplification without changing phases, and the result was not what I wanted. It's pretty much the same signal, with some unwanted noise.
You shouldn't need the change the phase for something like this. More likely the problem is that you need to be a bit more gentle about applying the boost. It sounds like you are taking some frequency window and multiplying by a constant while leaving everything else unchanged. This will cause ringing in the time domain with a very long tail. You need to smooth the transition from the gain=1 region to the gain=2 region, for instance by using a gaussian waveform with code that looks something like this:
x, t = get_waveform()
f0, df = get_parameters() # Center frequency and bandwidth of gain region
f = np.fft.rfft(x)
freqs = np.fft.fftfreq(len(x), t[1]-t[0])
freqs = freqs[0:len(f)] # rfft has only non-negative frequency components
gain_window = 1 + np.exp(-(freqs-f0)**2/(df)**2)
f = f * gain_window
x = np.fft.irfft(f)
return x
If that works, you can experiment with more aggressive functions that have sharper turn-on and a flatter top.
The FFT may not actually be what you want. FFTs are not normally used for real-time / streaming applications. This is because in the naive approach you have to collect the whole sample buffer before you start processing. For simple filtering applications it is often easier to do filtering directly in the time domain. This is what FIR and IIR filters do.
In order to filter with the fourier transform in real time what you have to do is break your data stream into overlapping blocks of a fixed length, FFT, filter, reverse FFT, and stich them back together without introducing glitches. This is possible, but it is tricky to get right. For a full-blown multi-channel EQ it might still be the best option, but you should at least consider time domain filtering.
If this is not a real-time application, then FFT is the way to go. For medium sized data sets (up to a few hundred megabytes) you can just FFT the whole data set. For much larger data sets you still have to break the data up into blocks, but they can be much larger blocks and you don't have to worry about the latency introduced.
Also, remember the FFT treats the signal as periodic, so if your signal doesn't go to zero at the beginning and end you will need to do some sort of windowing.
i don't know what to do after obtaining a set of complex numbers from FFT on a wav file.How could i obtain the corresponding frequencies.This is output i got after performing FFT which is shown below
[ 12535945.00000000 +0.j -30797.74496367 +6531.22295858j
-26330.14948055-11865.08322966j ..., 34265.08792783+31937.15794965j
-26330.14948055+11865.08322966j -30797.74496367 -6531.22295858j]
Actually the abs(x) operation only converts a real/imaginary pair from your result list into a magnitude. Do that unless you want to keep the imaginary portion for future use. So after conversion, each number in the result list represents a magnitude of signal at a certain frequency in your frequency spectrum. So frequency is represented by list index. When you plot the data on an XY graph, what you see is the magnitude of frequencies that your source signal contains. Don't forget that only your first half of data is valid. The other half is usually a mirror image of the first half due to aliasing.
For example say you run a 1024 point FFT on a wav file that contains data sampled at 10Khz. The FFT will take that 10Khz spectrum and divide that into 1024 'bins'. Then the FFT will decide how much each chunk of spectrum is present in the source wav file. Your output should be those bins. Generally when I do a frequency analysis, the actual numbers I get back aren't what's important. Its the magnitudes relative to surrounding bins that I'm interested in.
For a little more detail, we're relying on the principle of superposition which states that any time-varying signal containing many frequencies can be split up into many signals containing one component frequency each and vice versa. So the FFT output reflects this property. Each value of your output list represents a magnitude for a signal at a single frequency (usually called a 'bin') that is present in your source signal. Combine all those signals together and you should get your source signal back.
Oh and in case you didn't know, only the first half of your result list is valid due to Nyquist's Rule (or law, not sure) which says that all sampling systems can only reproduce frequencies in a signal that is at most half the sampling frequency. So if you sample a signal at 10Khz, you can only reproduce frequencies up to 5Khz from the data taken during your sampling. The same principle is the reason why only the first half of your FFT data is valid. The second half is an alias of the first half.
Sorry for the long-winded explanation, your question doesn't indicate what experience you have so I thought an explanation of the general gist of an FFT is needed.
As #KennyTM already explained on the duplicate question:
The frequency is determined by the index of the array. Each element corresponds to a frequency.
To determine the frequency of that each element represents, you need to know the sampling frequency of your data and the length of the array.
Basically, it would be something like:
sampling_freq = 1000.0 # in Hz
freq = np.linspace(0, (1.0 / sampling_freq / 2.0), (x.size / 2) + 1)
For one half of the fft array (which is symmetric about the center). My memory is rusty, though, so this may be a bit off...
Either way, numpy has a helper function to do it for you: numpy.fft.fftfreq
If I'm not mistaken, the frequency can be obtained by calculating the magnitude of the complex number. So a simple abs(x) on each of those complex numbers should return the frequency.