If I have a power spectrum that has been computed using the welch method in scipy.signal is there any way I can retrieve the signal original signal? If not, what data can I get that can tell me something about the signal given the power spectrum?
Is there any way I can retrieve the signal original signal?
It is impossible to recover the original signal from its power spectral density. Welch's method is calculated as
where K is the number of segments averaged together, L is the number of samples in each segment's Fourier transform, R is the decimation factor, or the number of samples "jumped" when moving to the next segment, w[n] is a window function (e.g. Hann, Hamming), and U is a normalization factor equal to the energy of the window function:
Two important things to notice:
The Fourier transform (the sum indexed by n) has an absolute value squared around it. This means that you have lost all phase information of your signal. Take this post for the importance of the phase information in a signal. Just using the magnitude information, it is impossible to tell what the original signal/image was.
The equation above is averaging multiple PSD estimates (modified Periodogram's to be specific). In the same way that a simple average loses all of the detailed information contained in individual samples x[n] and how they vary over time, Welch's method also loses how the signal varies over time. To have any information about how the signal varies over time, you would need to have calculated a spectrogram.
If not, what data can I get that can tell me something about the signal given the power spectrum?
As the name implies, the power spectral density (PSD) tells you the density of energy at each frequency. You can identify whether the majority of energy is in low, mid, or high frequencies. The averaging of Welch's method does a decent job at reducing stochastic noise, so the steady state signatures present in your signal should be well separated from the noise. Assuming your L is sufficiently large and you are not aliasing your data, you should be able to easily estimate the power level and frequency of any signatures.
Related
I set up a sensor which measures temperature data every 3 seconds. I collected the data for 3 days and have 60.000 rows in my csv export. Now I would like to forecast the next few days. When looking at the data you can already see a "seasonality" which displays the fridges heating and cooling cycle so I guess it shouldn't be too difficult to predict. I am not really sure if my data is too granular and if I should do some kind of undersampling. I thought about using a seasonal ARIMA model but I am having difficulties with picking parameters. As the seasonality in the data is pretty obious is there maybe a model that fits better? Please bear with me I'm pretty new to machine learning.
When the goal is to forecast rising temperatures, you can forecast the lower and upper peaks, i.e., their hight and distances. Assuming (simplified model) that the temperature change in between is linear we can, model each complete peak starting from a first lower peak of the temperature curve to the next upper peak down to next lower peak. So a complete peak can be seen as triangle which we easily integrate (calculate its area + the area of the rectangle below of it). The estimation can now be done by a integrating a number of complete peaks we have already measured. By repeating this procedure, we can do now a linear regression on the average temperatures and alert when the slope is above a defined threshold.
As this only tackles a certain kind of errors, one can do the same for the average distances between the upper peaks and the also for the lower peaks. I.e., take the times between them for a certain periode, fit a curve (linear regression can possibly be sufficient) and alert when the slope of the curve is indicating too long distances.
It's mission impossible. If fridge work without interference, then graph always looks the same. The change can be caused, for example, by opening a door, a breakdown, a major change in external conditions. But you cannot predict such events. Instead, you can try to warn about the possibility of problems in the near future, for example, based on a constant increase in average temperature. This situation may indicate a leak in the cooling system.
By the way, have you considered logging the temperature every 3 seconds? This is usually unjustified, because it is physically impossible for the temperature to change to a measurable degree in such an interval. Our team usually sets the login interval to 30 or 60 seconds in such cases. Sometimes even more. Depending on the size of the chamber, the way the air is circulated, the ratio of volume to power of the refrigeration unit, etc.
I am wondering what is difference between filtering and interpolating data.
I am now comparing
savgol_filter(itp(xx), window_size, poly_order)
and
itp = interp1d(x,y, kind='nearest')
I understand that filter filters noise in data, so that they are more smooth.
But the same does the interpolating.
My purpose is to smooth data, so that data will be ever rising.
If they are not ever rising then adjust only those values that break it.
And if there are rising = NO ADJUSTMENT.
What would you recommend to use?
Thanks!
Suppose you have a series of discrete data points, measured for example at specific time.
Interpolation is a way to guess the value of the series at a time in-between two measurements. For example, the temperature is measured every hours, but you want to have the temperature value every half-hour. If the data is noisy, the interpolation will be noisy too.
Filtering is a way to reduce the noise in the data. The value given by the measurement is the real value plus a random noise. For multiple measurements, the real value is assumed to remain identical, while the noise value will change positively and negatively. Therefore, by taking the average over enough multiple measurements the contribution of the noise averaged to zero.
Fitting a model over the data is another similar way to remove the noise from the data. Actually, taking the average is similar to fit the data using a straight horizontal line as a model. Doing a linear regression is fitting with any line i.e. finding the straight line which best describes the data.
The Savitzky–Golay filter performs successive fits over successive parts of the data series (window) in order to reduce locally the noise.
I want to know how much energy is at a specific frequency in a signal. I am using FFT to get the spectrum, and the frequency step is determined by the length of my signal.
My spectrum looks, for example, like this :
I want to get the spectrum peak at a specific frequency, -0.08. However, the discretization of the spectrum only give me a peaks at -0.0729 and -0.0833.
Is there a way to shift the spectrum to make sure there is a data point at the frequency I want? Or a way to get the value without necessarily using fft?
Thank you very much!
What you're actually doing when you take a DFT (or any Fourier Transform) is measuring how much of your signal "intersects" with sines of certain frequencies. This is done by summing product of your signal with the complex conjugate of the wave of whatever frequency. Technically, this is called an inner product, which is a generalization of the dot product, and measures how "close" a signal is to another. So if you're only interested in one frequency, don't take the whole DFT, just look at one you want.
I'm not sure what your units are, so I'll assume you want the peak at f0 = -0.08 Hz (If your units are something else, like normalized to the sampling frequency, then you'll need to account for that). This corresponds to the complex exponential exp(2*pi*j*f0*t). Because you're sampling, your t is discrete, so t = n/fs, where fs is the sampling frequency (in Hz).
# assuming you're using numpy arrays
w = exp(-2*pi*1j*f0*arange(len(signal))/fs)
peak = abs(sum(signal*w))
There are different definitions of the DFT; I'm pretty sure numpy's corresponds to what I have above. The extra minus in the exponential is because it's the complex conjugate.
Notice that it's unlikely that w is actually periodic. If the number of samples is large enough this doesn't really matter. A good heuristic is at least 10 periods.
If you have discrete data but need an output for a continuous variable you'll necessarily need some kind of interpolation function. For a value by request style I would advise Scipy interp1d (example of the use of a interp1d function). I believe it's the fastest way to achieve your intended results.
I'm using a slightly modified version of this python code to do frequency analysis:
FFT wrong value?
Lets say I have a pack of sine waves in the time domain that are very close together in frequency, while sharing the same amplitude. This is how they look like in the frequency domain, using FFT on 1024 samples from which I strip out the second half, giving 512 bins of resolution:
This is when I apply a FFT over the same group of waves but this time with 128 samples (64 bins):
I expected a plateau-ish frequency response but it looks like the waves in the center are being cancelled. What are those "horns" I see? Is this normal?
I believe your result is correct. The peaks are at ±f1 and ±f2), corresponding to the respective frequency components of the two signals shown in your first plot.
I assume that you are shifting the DC component back to the center? What "waves in the center" are you referring to?
There are a couple of other potential issues that you should be aware of:
Aliasing: by inspection it appears that you have enough samples across your signal but keep in mind that artificial (or aliased) frequencies can be created by the FFT, if there are not enough sample points to capture the underlying frequency. Specifically, if your frequency is f, then you need your data sample spacing to be at least, Δx = 1/(2*f), or smaller.
Windowing: your signal is windowed (has a finite extent) so there will also be some broadening, ringing, or redistribution of power about each spatial frequency due to edge affects.
Since I don't know the details of your data, I went ahead and created a sinusoid and then sampled the data close to what appears to be your sampling rate. For example, below is a sinusoid with 64 points and with a signal frequency at 10 cycles (count the peaks):
The FFT result is then:
which shows the same quantitative features as yours, but without having your data, its difficult for me to match your exact situation (spacing and taper).
Next I applied a super-Gauss window function (shown below) to simulate the finite extent of your data:
After applying the window to the input signal we have:
The corresponding FFT result shows some additional power redistribution, due to the finite extent of the data:
Although I can't match your exact situation, I believe your results appear as expected and some qualitative features of your data have been identified. Hope this helps.
Sine waves closely spaced in the frequency domain will occasionally nearly cancel out in the time domain. Since your second FFT is 8 times shorter than your first FFT, you may have windowed just such an short area of cancellation. Try a different time location of shorter time window to see something different (or different phases of sinusoids).
i don't know what to do after obtaining a set of complex numbers from FFT on a wav file.How could i obtain the corresponding frequencies.This is output i got after performing FFT which is shown below
[ 12535945.00000000 +0.j -30797.74496367 +6531.22295858j
-26330.14948055-11865.08322966j ..., 34265.08792783+31937.15794965j
-26330.14948055+11865.08322966j -30797.74496367 -6531.22295858j]
Actually the abs(x) operation only converts a real/imaginary pair from your result list into a magnitude. Do that unless you want to keep the imaginary portion for future use. So after conversion, each number in the result list represents a magnitude of signal at a certain frequency in your frequency spectrum. So frequency is represented by list index. When you plot the data on an XY graph, what you see is the magnitude of frequencies that your source signal contains. Don't forget that only your first half of data is valid. The other half is usually a mirror image of the first half due to aliasing.
For example say you run a 1024 point FFT on a wav file that contains data sampled at 10Khz. The FFT will take that 10Khz spectrum and divide that into 1024 'bins'. Then the FFT will decide how much each chunk of spectrum is present in the source wav file. Your output should be those bins. Generally when I do a frequency analysis, the actual numbers I get back aren't what's important. Its the magnitudes relative to surrounding bins that I'm interested in.
For a little more detail, we're relying on the principle of superposition which states that any time-varying signal containing many frequencies can be split up into many signals containing one component frequency each and vice versa. So the FFT output reflects this property. Each value of your output list represents a magnitude for a signal at a single frequency (usually called a 'bin') that is present in your source signal. Combine all those signals together and you should get your source signal back.
Oh and in case you didn't know, only the first half of your result list is valid due to Nyquist's Rule (or law, not sure) which says that all sampling systems can only reproduce frequencies in a signal that is at most half the sampling frequency. So if you sample a signal at 10Khz, you can only reproduce frequencies up to 5Khz from the data taken during your sampling. The same principle is the reason why only the first half of your FFT data is valid. The second half is an alias of the first half.
Sorry for the long-winded explanation, your question doesn't indicate what experience you have so I thought an explanation of the general gist of an FFT is needed.
As #KennyTM already explained on the duplicate question:
The frequency is determined by the index of the array. Each element corresponds to a frequency.
To determine the frequency of that each element represents, you need to know the sampling frequency of your data and the length of the array.
Basically, it would be something like:
sampling_freq = 1000.0 # in Hz
freq = np.linspace(0, (1.0 / sampling_freq / 2.0), (x.size / 2) + 1)
For one half of the fft array (which is symmetric about the center). My memory is rusty, though, so this may be a bit off...
Either way, numpy has a helper function to do it for you: numpy.fft.fftfreq
If I'm not mistaken, the frequency can be obtained by calculating the magnitude of the complex number. So a simple abs(x) on each of those complex numbers should return the frequency.