DSP - get the amplitude of all the frequencies - python

this question is related to :
DSP : audio processing : squart or log to leverage fft?
in which I was lost about the right algorithm to choose.
Now,
Goal :
I want to get all the frequencies of my signal, that I get from an audio file.
Context:
I use numpy, and scikits.audiolab. I made a lot of reading on the dsp subject, went to dspguru.com as well, read papers and nice blogs over the net.
The code I use is this one :
import numpy as np
from scikits.audiolab import Sndfile
f = Sndfile('first.ogg', 'r')
# Sndfile instances can be queried for the audio file meta-data
fs = f.samplerate
nc = f.channels
enc = f.encoding
print(fs,nc,enc)
# Reading is straightfoward
data = f.read_frames(10)
print(data)
print(np.fft.rfft(data))
I am new to DSP.
My question
I would like to be able to separate all the frequencies of a signal to compare different signals.
I use numpy.fft.rfft on the array of sound; But now, this operation alone is not enough. So, what is the best solution to get all the magnitudes of frequencies correctly ?
I saw that multiplying the resulting values get the complex numbers off and transform the whole as a real number.
Now what please ? Is that it ?
if you need me to clarify anything, just ask.
Thanks a lot !

You say "I want to get all the frequencies of my signal, that I get from an audio file." but what you really want is the magnitude of the frequencies.
In your code, it looks like (I don't know python) you only read the first 10 samples. Assuming your file is mono, that's fine, but you probably want to look at a larger set of samples, say 1024 samples. Once you do that, of course, you'll want to repeat on the next set of N samples. You may or may not want to overlap the sets of samples, and you may want to apply a window function, but what you've done here is a good start.
What sleepyhead says is true. The output of the fft is complex. To find the magnitude of a given frequency, you need to find the length or absolute value of the complex number, which is simply sqrt( r^2 + i^2 ).

Mathematically Fourier Transform returns complex values as it is transform with the function *exp(-i*omega*t). So the PC gives you spectrum as a complex number corresponding to the cosine and sine transforms. In order to get the amplitude you just need to take the absolute value: np.abs(spectrum). In order to get the power spectrum square the absolute value. Complex representation is valuable as you can get not only amplitude, but also phase of the frequencies - that may be useful in DSP as well.

If I got it right, you want walk over all data(sound) and capture amplitude, for this make a "while" over the data capturing at each time 1024 samples
data = f.read_frames(1024)
while data != '':
print(data)
print(np.fft.rfft(data))
data = f.read_frames(1024)

Related

Do you know why the filtered output always starts at value zero? spicy filter python eeg

it is my first question in stack.
For EEG filters I try to use lfilter from spicy by the next function:
def butter_lowpass_filter(data):
b, a = butter(3, 0.05)
y = lfilter(b, a, data)
return y
but every time when calling function and send data by NumPy massive to the function, I receive the result that starts from zero. Why Butterworth filter every time from 0, I need measure in real-time.
Here, already trying to decide this problem, but without result.
How to filter/smooth with SciPy/Numpy?
it is not good for me, because i every time receive the next picture
enter image description here
This behavior is fine. However, it would create a spike in the beginning of your data. To avoid this you should subtract the first value (or the mean of the first N values) of your EEG so the data itself will also start at zero, or close to zero. The process can be referred to as baseline correction or, in some cases when you remove a straight line from start to finish, as detrending.
Note that filtering EEG is a whole science, you may want to look at packages designed for that, such as MNE python (here is their summary on filters)

Slicing audio signal to detect pitch

I am using Librosa to transcribe monophonic guitar audio signals.
I thought that, it would be a good start to "slice" the signal depending on the onset times, to detect note changes at the correct time.
Librosa provides a function that detects the local minima before the onset times. I checked those timings and they are correct.
Here is the waveform of the original signal and the times of the minima.
[ 266240 552960 840704 1161728 1427968 1735680 1994752]
The melody played is E4, F4, F#4 ..., B4.
Therefore the results should ideally be: 330Hz, 350Hz, ..., 493Hz (approximately).
As you can see, the times in the minima array, represent the time just before the note was played.
However, on a sliced signal (of 10-12 seconds with only one note per slice), my frequency detection methods have really poor results. I am confused because I can't see any bugs in my code:
y, sr = librosa.load(filename, sr=40000)
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
oenv = librosa.onset.onset_strength(y=y, sr=sr)
onset_bt = librosa.onset.onset_backtrack(onset_frames, oenv)
# Converting those times from frames to samples.
new_onset_bt = librosa.frames_to_samples(onset_bt)
slices = np.split(y, new_onset_bt[1:])
for i in range(0, len(slices)):
print freq_from_hps(slices[i], 40000)
print freq_from_autocorr(slices[i], 40000)
print freq_from_fft(slices[i], 40000)
Where the freq_from functions are taken directly from here.
I would assume this is just bad precision from the methods, but I get some crazy results. Specifically, freq_from_hps returns:
1.33818658287
1.2078047577
0.802142642257
0.531096911977
0.987532329094
0.559638134414
0.953497587952
0.628980979055
These values are supposed to be the 8 pitches of the 8 corresponding slices (in Hz!).
freq_from_fft returns similar values whereas freq_from_autocorr returns some more "normal" values but also some random values near 10000Hz:
242.748000585
10650.0394232
275.25299319
145.552578747
154.725859019
7828.70876515
174.180627765
183.731497068
This is the spectrogram from the whole signal:
And this is, for example, the spectrogram of slice 1 (the E4 note):
As you can see, the slicing has been done correctly. However there are several issues. First, there is an octave issue in the spectrogram. I was expecting some issues with that. However, the results I get from the 3 methods mentioned above are just very weird.
Is this an issue with my signal processing understanding or my code?
Is this an issue with my signal processing understanding or my code?
Your code looks fine to me.
The frequencies you want to detect are the fundamental frequencies of your pitches (the problem is also known as "f0 estimation").
So before using something like freq_from_fft I'd bandpass filter the signal to get rid of garbage transients and low frequency noise—the stuff that's in the signal, but irrelevant to your problem.
Think about, which range your fundamental frequencies are going to be in. For an acoustic guitar that's E2 (82 Hz) to F6 (1,397 Hz). That means you can get rid of anything below ~80 Hz and above ~1,400 Hz (for a bandpass example, see here). After filtering, do your peak detection to find the pitches (assuming the fundamental actually has the most energy).
Another strategy might be, to ignore the first X samples of each slice, as they tend to be percussive and not harmonic in nature and won't give you much information anyway. So, of your slices, just look at the last ~90% of your samples.
That all said, there is a large body of work for f0 or fundamental frequency estimation. A good starting point are ISMIR papers.
Last, but not least, Librosa's piptrack function may do just what you want.

Fitting on a semi-logarithmic scale and transfering it back to normal?

I am working with IFFT and have a set of real and imaginary values with their respective frequencies (x-axis). The frequencies are not equidistant, I can't use a discrete IFFT, and I am unable to fit my data correctly, because the values are so jumpy at the beginning. So my plan is to "stretch out" my frequency data points on a lg-scale, fit them (with polyfit) and then return - somehow - to normal scale.
f = data[0:27,0] #x-values
re = daten[0:27,5] #y-values
lgf = p.log10(f)
polylog_re = p.poly1d(p.polyfit(lgf, re, 6))
The fit works definitely better (http://imgur.com/btmC3P0), but is it possible to then transform my polynom back into the normal x-scaling? Right now I'm using those logarithmic fits for my IFFT and take the log10 of my transformed values for plotting etc., but that probably defies all mathematical logic and results in errors.
Your fit is perfectly valid but not a regular polynomial fit. By using log_10(x), you use another model function. Something like y(x)=sum(a_i * 10^(x_i^i). If this is okay for you, you are done. When you wan't to do some more maths, I would suggest using the natural logarithm instead the one to base 10.

DSP : audio processing : squart or log to leverage fft?

Context :
I am discovering the vast field of DSP. Yes I'm a beginner.
My goal :
Apply fft on an audio array given by audiolab to get the different frequencies of the signal.
Question :
One question : I just cannot get what to do with a numpy array which contains audio datas, thanks to audiolab. :
import numpy as np
from scikits.audiolab import Sndfile
f = Sndfile('first.ogg', 'r')
# Sndfile instances can be queried for the audio file meta-data
fs = f.samplerate
nc = f.channels
enc = f.encoding
print(fs,nc,enc)
# Reading is straightfoward
data = f.read_frames(10)
print(data)
print(np.fft.fft(data))
Now I have got my datas.
Readings
I read those two nice articles here :
Analyze audio using Fast Fourier Transform (the accepted answser is wonderful)
and
http://www.onlamp.com/pub/a/python/2001/01/31/numerically.html?page=2
Now there are two technics : apparently one suggests square (first link) whereas the other a log, especially : 10ln10(abs(1.10**-20 + value))
Which one is the best ?
SUM UP :
I would like to get the fourier analysis of my array but any of those two answers seems to only emphasis the signal and not isolating the components.
I may be wrong, I am a still a noob.
What should I really do then ?
Thanks,
UPDATE:
I ask this question :
DSP - get the amplitude of all the frequencies which is related to this one.
Your question seems pretty confused, but you've obviously tried something, which is great. Let me take a step back and suggest an overall route for you:
Start by breaking your audio into chunks of some size, say N.
Perform the FFT on each chunk of N samples.
THEN worry about displaying the data as RMS (the square approach) or dB (the ln-based approach).
Really, you can think of those values as scaling factors for display.
If you need help with the FFT itself, my blog post on pitch detection with the FFT may help: http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
Adding to the answer given by #Bjorn Roche.
Here is a simple code for plotting frequency spectrum, using dB scale.
It uses matplotlib for plotting.
import numpy as np
import pylab
# for a real signal
def plotfftspectrum(signal, dt): # where dt is the sample rate
n = signal.size
spectrum = np.abs(np.fft.fft(signal))
spectrum = 20*np.log(spectrum/spectrum.max()) # dB scale
frequencies = np.fft.fftfreq(n, dt)
pylab.plot(frequencies[:n//2], spectrum[:n//2])
# plot n//2 due real function symmetry
pylab.show()
You can use it, after reading at least some samples of your data, e.g like 1024.
data = f.read_frames(1024)
plotfftspectrum(data, 1./f.samplerate)
Where I believe your sample rate is in frequency.

How to get data from a notepad file and use fft (fast fourier transform on it)

I am a bit of a novice with programming as we are being made to do it in our physics degree. I am using Python 2.
I've been given a txt file with two columns of data, the first few lines look like this:
0.000000000000000000e+00 7.335686114232199684e-02
1.999999999999999909e-07 7.571960558042964973e-01
3.999999999999999819e-07 9.909475704320810374e-01
5.999999999999999728e-07 3.412754086075696081e-01
7.999999999999999638e-07 -5.558766000866324219e-01
9.999999999999999547e-07 -9.810046985453722002e-01
1.199999999999999946e-06 -5.436864816312496629e-01
1.399999999999999937e-06 2.645021165628647641e-01
1.599999999999999928e-06 9.667259209284312371e-01
1.799999999999999919e-06 7.395753817164774091e-01
1.999999999999999909e-06 7.289488801158025555e-02
2.200000000000000112e-06 -7.925906572709742193e-01
2.399999999999999891e-06 -9.727702002847055107e-01
2.599999999999999671e-06 -1.772398644968510018e-01
2.799999999999999873e-06 6.627909312992285029e-01
3.000000000000000076e-06 1.022032186188189362e+00
3.199999999999999855e-06 5.531242183135693935e-01
and on it goes for many hundreds of lines.
The question asks:
This week you have been provided with a file which consists of a simulated NMR time domain response following an external impulse. This free induction decay (FID) is characterized by a frequency, an initial amplitude and a decay constant. The data has a single
oscillation frequency and the second contains a mixture of two frequencies.
Write a program to evaluate the Fast Fourier transform of both signals and plot them
in the frequency domain.
Could someone give me an example of how I might go about doing this? Unfortunately we are not given much guidance in the lab, just some online tutorials and otherwise told to google stuff.
I'll turn my comment into an answer:
It is actually very easy. Load your data using numpy.genfromtxt() into a numpy array, and then you can choose some form of FFT from numpy.fft.
As this is your exercise I won't write down exact code but that basically sums it.
for reading the .txt file, you'll want to do something like this (not the fastest but most clear):
column1 = []
column2 = []
infile = open("MyFile.txt", "r")
for l in infile.readlines():
if l.strip():
v1 = float(l.split()[0])
v2 = float(l.split()[1])
column1.append(v1)
column2.append(v2)
For the fft, look into numpy

Categories

Resources