signal.spectrogram returns too many hz

signal.spectrogram returns too many hz - python

I was just getting started with a code to pre-process some audio data in order to lately feed a neural network with it. Before explaining more deeply my actual problem, mention that I took the reference for how to do the project from this site. Also used some code taken from this post and read for more info in the signal.spectogram doc and this post.
For now with all of the sources mentioned before, I managed to get the wav audio file as a numpy array and plot both its amplitude and spectrogram. Theese represent a recording of me saying the word "command" in Spanish.
The strange fact here is that I search on the internet and found that human voice spectrum moves between 80 and 8k Hz, so just to get sure I compared this output with the one Audacity spectrogram returned. As you can see, this seems to be more coherent with the info found, as the frequency range is the one supposed to be for humans.
So that takes me to final question: Am I doing something wrong in the process of reading the audio or generating the spectrogram or maybe am I having plot issues?
By the way I'm new to both python and signal processing so thx in advance for your patience.
Here is the code I'm actually using:
def espectrograma(wav):
sample_rate, samples = wavfile.read(wav)
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, nperseg=320, noverlap=16, scaling='density')
#dBS = 10 * np.log10(spectrogram) # convert to dB
plt.subplot(2,1,1)
plt.plot(samples[0:3100])
plt.subplot(2,1,2)
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')
plt.ylim(0,30)
plt.ylabel('Frecuencia [kHz]')
plt.xlabel('Fragmento[20ms]')
plt.colorbar()
plt.show()

The computation of the spectrogram seems fine to me. If you plot the spectrogram in log scale you should observe something more similar to the audition plots you referenced. So uncomment your line
#dBS = 10 * np.log10(spectrogram) # convert to dB
and then use the variable dBS for the plotting instead of spectrogram in
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')

The spectrogram uses a fourier transform to convert your timeseries data into frequency domain.
The maximum frequency that can be measured is (sampling frequency) / 2, so in this case it may seem like your sampling frequency is 60KHz?
Anyway, regarding your question. It may be correct that the human voice spectrum lies within this range, but the fourier transform is never perfect. I would simply adjust your Y-Axis to specifically look at these frequencies.
It seems to me that you are calculating your spectrogram correctly, at least as long as you are reading the sample_rate and samples correctly..

Related

How to smooth frequency spectrum of time series?

I tried to reproduce Watson's spectrum plot from these set of slides (PDF p. 30, p.29 of the slides), that came from this data of housing building permits.
Watson achieves a very smooth spectrum curve in which it is very easy to tell the peak frequencies.
When I tried to run a FFT on the data, I get a really noisy spectrum curve and I wonder if there is an intermediate step that I am missing.
I ran the fourier analysis on python, using scipy package fftpack as follows:
from scipy import fftpack
fs = 1 / 12 # monthly
N = data.shape[0]
spectrum = fftpack.fft(data.PERMITNSA.values)
freqs = fftpack.fftfreq(len(spectrum)) #* fs
plt.plot(freqs[:N//2], 20 * np.log10(np.abs(spectrum[:N//2])))
Could anyone help me with the missing link?
The original data is:
Below is the Watson's spectrum curve, the one I tried to reproduce:
And these are my results:

The posted curve doesn't look realistic. But there are many methods to get a smooth result with a similar amount of "curviness", using various kinds of resampling and/or plot interpolation.
One method I like is to chop the data into segments (windows, possibly overlapped) roughly 4X longer than the maximum number of "bumps" you want to see, maybe a bit longer. Then window each segment before using a much longer (size of about the resolution of the final plot you want) zero-padded FFT. Then average the results of the multiple FFTs of the multiple windowed segments. This works because a zero-padded FFT is (almost) equivalent to a highest-quality Sinc interpolating low-pass filter.

How can I improve the look of scipy's spectrograms?

I need to generate spectrograms for audio files with Python and I'm following the solution given here. However, the spectrograms I'm getting don't look very "populated," and not at all like other spectrograms I get from other software.
This is the code I used for the particular image I'm showing here:
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy import signal
from scipy.io import wavfile
sample_rate, samples = wavfile.read('audio-mono.wav')
frequencies, times, spectrogram = signal.spectrogram(samples[:700000], sample_rate)
cMap = cm.get_cmap('gray', 3000) # Maybe I'm not understanding this very well
fig = plt.figure(figsize=(4,2), dpi=400, frameon=False)
plt.pcolormesh(times, frequencies, spectrogram, cmap=cMap)
plt.savefig('spectrogram.png')
The following images are spectrograms from Audacity and Aegisub, respectively, both for the same file for which the third image's spectrogram was created (with scipy).
To create this spectrogram, trying to see if it was a figure-size/resolution issue, I tried a couple of things things, one by one, and the end result is this image (with both of them applied).
First, when extracting the .wav file from the .mp4 file, I set the sampling rate to 10 KHz to avoid having such a big y-axis in the plot and see if this helps. This is why you see a max of 5,000. I though I could live with some frequencies neglected given that I care, most of all, about speech frequencies.
Then, to get a better zoom, I created a spectrogram with only the first 700,000 elements of the samples array (see code), which, in the case of this file, represent about 70 seconds. This didn't help either. I even tried to create the spectrogram with the same slice of the samples array, but by taking only every tenth value, then every twentieth, and so on, but this only made the spectrogram have horizontal lines instead of dots. This is not applied here in the figure I'm showing you, because I realized it's far from helping. I also tinkered with the figure size and the resolution, but it didn't really help either.
As you can see in the first figure, the y-axis goes from 0 to 5 KHz, and many frequencies have some intensity at that level. Also, the only moment in that 70-second span with complete silence is around the 35-second mark. The accuracy of this becomes obvious when listening to the file.
In the second figure, there is no y-axis mark, but I can see it has a bigger range than the 5 KHz, which I think accounts for the difference with the first figure. I'm pretty sure that, unfortunately, I can't change this view range. However, this spectrogram also shows the moment of complete silence accurately, and it is at least properly "populated" in the rest of it.
By seeing the third figure (the one I generated with scipy), one could easily think there are several parts of complete silence in those first 70 seconds, which is far from true. I'd like it to look more like those above it, because I know they're much more accurate, but I don't really know how I can do it, and this one won't work at all.
I'm pretty sure there is something I can do, but I think I still don't know scipy enough to know what it is.
Thanks in advance.
EDIT 1
PLOTTED THE SPECTROGRAM WITHOUT SPECIFYING A COLORMAP
You can see the plot looks a bit more populated, but still not even close to the other ones.
EDIT 2
Considering the idea given in the first comment of this question, I used a manipulated version of the gray colormap to have black as the first entry (as normal) but with the second entry being the color that's normally halfway, and then 2,999 colors from there up to white. Please excuse me if I'm using wrong terminology here or if this is not correctly phrased. I'm still trying to understand how to work with colormaps.
The code used to create and plot the spectrogram is the same. The only difference is the colormap used, which I manipulated as follows:
import numpy as np
from matplotlib.colors import ListedColormap
cMap = cm.get_cmap('gray', 3000)
new_colors = cMap(np.linspace(0.5, 1, 3000))
black = [0, 0, 0, 1]
new_colors[0, :] = black
new_cmp = ListedColormap(new_colors)
Using new_cmp as the colormap for the pcolormesh() function, I get the following spectrogram.
This is much, much better than the original, and looks much more like the ones from Audacity and Aegisub. However, I'd like to know if there is a better approach I can take to make my spectrograms look better, if there could be something else that's causing this to not look so much as the sample ones, and if there is a better way to do what I did with the colormap. As I said, I'm still struggling with them.
EDIT 3
I'm now sharing the audio I used to create these spectrograms here.

Using Fourier Transform to Transform Image into Sound (I don't think it's working)

Backstory
I started messing with electronics, and realized I need an oscilloscope. I went to buy the oscilloscope (for like $40) online and watched tutorials on how to use them. I stumbled upon a video using the "X-Y" function of the oscilloscope to draw images; I thought that was cool. I tried searching how to do this from scratch and learned you need to convert the image into the frequency domain and some how convert that to an audio signal and send the signal to the two channels on the oscilloscope from the left and right channels from the audio output. So now I am trying to do the image processing part.
What I Got So Far
Choosing an Image
First thing I did was to create an nxn image using some drawing software. I've read online that the total number of pixels of the image should be a power of two. I don't know why, but I created 256x256 pixel images to minimize calculation time. Here is the image I used for this example.
I kept the image simple, so I can vividly see the symmetry when it is transformed. Therefore, if there is no symmetry, then there must be something wrong.
The MATLAB Code
The first thing I did was read the image, convert to gray scale, change data type, and grab the size of the image (for size variability for later use).
%Read image
img = imread('tets.jpg');
%Convert image to gray scale
grayImage = rgb2gray(img);
%Incompatability of data type. uint8 type vs double
grayImage = double(grayImage);
%Grab size of image
[nx, ny, nz] = size(grayImage);
The Algorithm
This is where things get a bit hazy. I am somewhat familiar with the Fourier Transform due to some Mechanical Engineering classes, but the topic was broadly introduced and never really fundamentally part of the course. It was more like, "Hey, check out this thing; but use the Laplace Transformation instead."
So somehow you have to incorporate spatial, amplitude, frequency, and time when doing the calculation. I understand that the spatial coordinates is just the location of each pixel on the image in a matrix or bitmap. I also understand that the amplitude is just the gray scale value from 0-255 of a certain pixel. However, I don't necessarily know how to incorporate frequency and time based on the pixel itself. I think I read somewhere that the frequency increases as the y location of the pixel increases, and the time variable increases with the x location. Here's the link (read first part of Part II).
So I tried following the formula as well as other formulas online and this is what I got for the MATLAB code.
if nx ~= ny
error('Image size must be NxN.'); %for some reason
else
%prepare transformation matrix
DFT = zeros(nx,ny);
%compute transformation for each pixel
for ii = 1:1:nx
for jj = 1:1:ny
amplitude = grayImage(ii,jj);
DFT(ii,jj) = amplitude * exp(-1i * 2 * pi * ((ii*ii/nx) + (jj*jj/ny)));
end
end
%plot of complex numbers
plot(DFT, '*');
%calculate magnitude and phase
magnitudeAverage = abs(DFT)/nx;
phase = angle(DFT);
%plot magnitudes and phase
figure;
plot(magnitudeAverage);
figure;
plot(phase);
end
This code simply tries to follow this discrete fourier transform example video that I found on YouTube. After the calculation I plotted the complex numbers in complex domain. This appears to be in polar coordinates; I don't know why. As stated in the video about the Nyquist Limit, I plotted the average magnitude too. As well as the phase angles of the complex numbers. I'll just show you the plots!
The Plots
Complex Numbers
This is the complex plot; I believe it's in polar form instead of cartesian, but I don't know. It appears symmetric too.
Average Amplitude Vs. Sample
The vertical axis is amplitude, and the horizontal axis is the sample number. This looks like the deconstruction of the signal, but then again I don't really know what I am looking at.
Phase Angle Vs. Sample
The vertical axis is the phase angle, and the horizontal axis is the sample number. This looks the most promising because it looks like a plot in the frequency domain, but this isn't suppose to be a plot in the frequency domain; rather, its a plot in the sample domain? Again, I don't know what I am looking at.
I Need Help Understanding
I need to somehow understand these plots, so I know I am getting the right plot. I believe there may be something very wrong in the algorithm because it doesn't necessarily implement the frequency and time component. So maybe you can tell me how that is done? Or at least guide me?
TLDR;
I am trying to convert images into sound files to display on an oscilloscope. I am stuck on the image processing part. I believe there is something wrong with the MATLAB code (check above) because it doesn't necessarily include the frequency and time component of each pixel. I need help with the code and understanding how to interpret the result, so I know the transfromations are correct-ish.

FFT on high Audio Freqs

Hi I am trying to plot fft data from real time audio input for my project work . I need to see the spectral components as I have to perform some actions based on frequency response the code that I am modifying is based on here
http://flothesof.github.io/pyqt-microphone-fft-application.html
I am interested in frequency response around 18KHz , and I am sampling at 44100 Hz but the above code works only till 6800Hz after that it simply plots garbage .CHUNK size is 2048 . what should I do to get frequency around 18KHz.Also I dont see aliasing around 6800Hz on overshooting it.
This is my first question on stackoverflow am sorry if it made you face palm . I dont have a DSP background but I am studying it on my own.
Thanks priest

Profiling Python - Streaming Audio and spectrum

I'm trying to modify this example: https://svn.enthought.com/enthought/browser/Chaco/trunk/examples/advanced/spectrum.py. Unfortunately I have not been able to get it to scale. If I double the sampling rate, the graph lags from the sound input. I'd like to find out which part of the code is the bottleneck. I tried to use cProfile but didn't investigate very far.

I wrote the original version of spectrum.py, and I believe that the bottleneck is in the drawing, in particular the spectrogram plot. If you change the code to not draw every time it computes an FFT, it should keep up better.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

signal.spectrogram returns too many hz - python

Related

How to smooth frequency spectrum of time series?

How can I improve the look of scipy's spectrograms?

Using Fourier Transform to Transform Image into Sound (I don't think it's working)

FFT on high Audio Freqs

Profiling Python - Streaming Audio and spectrum

Categories

Resources