How do I stretch the x-axis of a matplotlib spectrogram? - python

Sorry if this is a really obvious question. I am using matplotlib to generate some spectrograms for use as training data in a machine learning model. The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data. I have shown my code below for generating each spectrogram. I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.
from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np
BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5
def generate_random_specgram(track):
# Read audio data from file
audio = AudioSegment.from_file(track.location)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
start = np.random.randint(BUFFER, len(samples) - BUFFER)
chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = FRAME_RATE)
plt.savefig(filename)
plt.close()
# Perform random variations to the BPM
frame_rate = FRAME_RATE
bpm = track.bpm
variation = 1 - BPM_VARIATION_AMOUNT + (
np.random.random() * BPM_VARIATION_AMOUNT * 2)
bpm *= variation
bpm = round(bpm, 2)
# I thought this next line should have been /= but that stretched the wrong way?
frame_rate *= (bpm / track.bpm)
# Read audio data from file
chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = frame_rate)
plt.savefig(filename)
plt.close()
I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways. I'm sure I'm missing something but I can't see what it is. Below is an image to illustrate what I'm getting.

The framerate is a fixed number that only depends on your data, if you change it you will effectively "stretch" the x-axis but in the wrong way. For example, if you have 1000 data points that correspond to 1 second, your framerate (or better sampling frequency) will be 1000. If your signal is a simple 200Hz sine which slightly increases the frequency in time, the specgram will be:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 1000
plt.specgram(signal, Fs=frame_rate);
If you change the framerate you will have a wrong x and y-axis scale. If you set the framerate to be 500 you will have:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 500
plt.specgram(signal, Fs=frame_rate);
The plot is very similar, but this time is wrong: you have almost 2 seconds on the x-axis, while you should only have 1, moreover, the starting frequency you read is 100Hz instead of 200Hz.
To conclude, the sampling frequency you set needs to be the correct one. If you want to stretch the plot you can use something like plt.xlim(0.2, 0.4). If you want to avoid the white band on top of the plot you can manually set the ylim to be half the frame rate:
plt.ylim(0, frame_rate/2)
This works because of simple properties of the Fourier transform and Nyquist-Shannon theorem.

The solution to my problem was to set the xlim and ylim of the plot. Here is the code from my testing file in which I finally got rid of all the odd whitespace:
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
BUFFER = 5
FRAME_RATE = 22050
SAMPLE_LENGTH = 2
def plot(audio_file, bpm, variation=1):
audio = AudioSegment.from_file(audio_file)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
chunk_length = int(FRAME_RATE * SAMPLE_LENGTH * variation)
start = np.random.randint(
BUFFER * FRAME_RATE,
len(samples) - (BUFFER * FRAME_RATE) - chunk_length)
chunk = samples[start:start + chunk_length]
plt.figure(figsize=(5.12, 2.56)).add_axes([0, 0, 1, 1])
plt.specgram(chunk, Fs=FRAME_RATE * variation)
plt.xlim(0, SAMPLE_LENGTH)
plt.ylim(0, FRAME_RATE / 2 * variation)
plt.savefig('specgram-%f.png' % (bpm * variation))
plt.close()

Related

How to compute dBm from FFT results?

I computed a sinewave of 4Hz, applied FFT and calculated the amplitude, the amplitude is an array of 500 length, I want to convert each element in that array to dBm form, and draw a spectrogram. however I can't seem to get the calculation right.
I saw that general formula:
valueDBFS = 20np.log10(abs(value))
so I tried using it and I get only negative results..
Here is my full code (edited):
# Python example - Fourier transform using numpy.fft method
import numpy as np
import matplotlib.pyplot as plotter
from os import times
from PIL import Image
import numpy as np
# How many time points are needed i,e., Sampling Frequency
samplingFrequency = 100
# At what intervals time points are sampled
samplingInterval = 1 / samplingFrequency
# Begin time perod of the signals
beginTime = 0
# End time period of the signals
endTime = 10
# Frequency of the signals
signal1Frequency = 4
signal2Frequency = 70
# Time points
time = np.arange(beginTime, endTime, samplingInterval)
# Create two sine waves
amplitude1 = 100 * np.sin(2*np.pi*signal1Frequency*time)
fourierTransform = np.fft.fft(amplitude1)
fourierTransform = fourierTransform[range(int(len(amplitude1)/2))] # Exclude sampling frequency
tpCount = len(amplitude1)
values = np.arange(int(tpCount/2))
timePeriod = tpCount/samplingFrequency
frequencies = values/timePeriod
valueDBFS = 20*np.log10(abs(fourierTransform))
print(valueDBFS)
#SPECTROGRAM
w, h = 500, 500
data = np.zeros((h, w, 3), dtype=np.uint8)
time = time[:len(time)//2]
for i in range(500):
for j in range(500):
color = abs(fourierTransform)[i]
data[i,j] = [color, color, color]
img = Image.fromarray(data, 'RGB')
img.show()
The maximum value of your amplitude is 1, and log10(1) is 0, everything else will be less than that - for example log10(0.9) = -0,0458.
So that part of your code works fine, the logs should be negative in your example! - Try defining your amplitude like this:
amplitude1 = 100 * np.sin(2*np.pi*signal1Frequency*time)
That should give plenty of positive results.

Time steps difference in spectrogram

I have an audio file of 10 seconds in length. If I generate the spectrogram using matplotlib, then I get a different number of timesteps as compared to the spectrogram generated by librosa.
Here is the code:
fs = 8000
nfft = 200
noverlap = 120
hop_length = 120
audio = librosa.core.load(path, sr=fs)
# Spectogram generated using matplotlib
spec, freqs, bins, _ = plt.specgram(audio, nfft, fs, noverlap = noverlap)
print(spec.shape) # (101, 5511)
# Using librosa
spectrogram_librosa = np.abs(librosa.stft(audio,
n_fft=n_fft,
hop_length=hop_length,
win_length=nfft,
window='hann')) ** 2
spectrogram_librosa_db = librosa.power_to_db(spectrogram_librosa, ref=np.max)
print(spectrogram_librosa_db.shape) # (101, 3676)
Can someone explain it to me why is there a huge diff in the time steps and how to make sure that both generate the same output?
This is because the noverlap of plt.specgram consider the number of points to overlap the audio segments with, whereas the hop_length consider the step between the segments.
That being said, there is still a 2-points difference between the two results, but this is most possibly due to the boundaries.
import numpy as np
import librosa
import matplotlib.pyplot as plt
path = librosa.util.example_audio_file()
fs = 8000
nfft = 200
noverlap = 120 # overlap
hop_length = 80 # step
audio, fs = librosa.core.load(path, sr=fs)
# Spectogram generated using matplotlib
spec, freqs, bins, _ = plt.specgram(
audio, NFFT=nfft, Fs=fs, noverlap=noverlap,
)
spec = np.log10(spec + 1e-14)
print(spec.shape) # (101, 6144)
# Using librosa
spectrogram_librosa = (
np.abs(
librosa.stft(
audio,
n_fft=nfft,
hop_length=hop_length,
win_length=nfft,
window="hann",
)
)
** 2
)
spectrogram_librosa_db = librosa.power_to_db(spectrogram_librosa, ref=np.max)
print(spectrogram_librosa_db.shape) # (101, 6146)
fig, ax = plt.subplots(2)
ax[0].pcolorfast(spec)
ax[1].pcolorfast(spectrogram_librosa_db)
plt.show()
This outputs the following picture:

From Amplitude or FFT to dB

I've a Python code which performs FFT on a wav file and plot the amplitude vs time / amplitude vs freq graphs. I want to calculate dB from these graphs (they are long arrays). I do not want to calculate exact dBA, I just want to see a linear relationship after my calculations. I've dB meter, I will compare it. Here is my code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack
import numpy as np
from matplotlib import pyplot as plt
fs_rate, signal = wavfile.read("output.wav")
print ("Frequency sampling", fs_rate)
l_audio = len(signal.shape)
print ("Channels", l_audio)
if l_audio == 2:
signal = signal.sum(axis=1) / 2
N = signal.shape[0]
print ("Complete Samplings N", N)
secs = N / float(fs_rate)
print ("secs", secs)
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts", Ts)
t = scipy.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray
FFT = abs(scipy.fft(signal))
FFT_side = FFT[range(N//4)] # one side FFT range
freqs = scipy.fftpack.fftfreq(signal.size, t[1]-t[0])
fft_freqs = np.array(freqs)
freqs_side = freqs[range(N//4)] # one side frequency range
fft_freqs_side = np.array(freqs_side)
makespositive = signal[44100:]*(-1)
logal = np.log10(makespositive)
sn1 = np.mean(logal[1:44100])
sn2 = np.mean(logal[44100:88200])
sn3 = np.mean(logal[88200:132300])
sn4 = np.mean(logal[132300:176400])
print(sn1)
print(sn2)
print(sn3)
print(sn4)
abs(FFT_side)
for a in range(500):
FFT_side[a] = 0
plt.subplot(311)
p1 = plt.plot(t[44100:], signal[44100:], "g") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.subplot(312)
p1 = plt.plot(t[44100:], logal, "r") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.subplot(313)
p3 = plt.plot(freqs_side, abs(FFT_side), "b") # plotting the positive fft spectrum
plt.xlabel('Frequency (Hz)')
plt.ylabel('Count single-sided')
plt.show()
First plot is amplitude vs time, second one is logarithm of previous graph and the last one is FFT.
In sn1,sn2 part I tried to calculate dB from signal. First I took log and then calculated mean value for each second. It did not give me a clear relationship. I also tried this and did not worked.
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf
fs, signal = wf.read('output.wav') # Load the file
ref = 32768 # 0 dBFS is 32678 with an int16 signal
N = 8192
win = np.hamming(N)
x = signal[0:N] * win # Take a slice and multiply by a window
sp = np.fft.rfft(x) # Calculate real FFT
s_mag = np.abs(sp) * 2 / np.sum(win) # Scale the magnitude of FFT by window and factor of 2,
# because we are using half of FFT spectrum
s_dbfs = 20 * np.log10(s_mag / ref) # Convert to dBFS
freq = np.arange((N / 2) + 1) / (float(N) / fs) # Frequency axis
plt.plot(freq, s_dbfs)
plt.grid(True)
So which steps should I perform? (Sum/mean all freq amplitudes then take log or reverse, or perform it for signal etc.)
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf
fs, signal = wf.read('db1.wav')
signal2 = signal[44100:]
chunk_size = 44100
num_chunk = len(signal2) // chunk_size
sn = []
for chunk in range(0, num_chunk):
sn.append(np.mean(signal2[chunk*chunk_size:(chunk+1)*chunk_size].astype(float)**2))
print(sn)
logsn = 20*np.log10(sn)
print(logsn)
Output:
[4.6057844427695475e+17, 5.0025315250895744e+17, 5.028593412665193e+17, 4.910948397471887e+17]
[353.26607217 353.98379668 354.02893044 353.82330741]
A decibel meter measures a signal's mean power. So from your time signal recording you can calculate the mean signal power with:
chunk_size = 44100
num_chunk = len(signal) // chunk_size
sn = []
for chunk in range(0, num_chunk):
sn.append(np.mean(signal[chunk*chunk_size:(chunk+1)*chunk_size]**2))
Then the corresponding mean signal power in decibels is simply given by:
logsn = 10*np.log10(sn)
A equivalent relationship could also be obtained for a frequency domain signal with the use of Parseval's theorem, but in your case would require unecessary FFT computations (this relationship is mostly useful when you already have to compute the FFT for other purposes).
Note however that depending on what you compare there may be some (hopefully small) discrepancies. For example the use of non-linear amplifier and speakers would affect the relationship. Similarly ambient noises would add to the measured power by the decibel meter.

Trying to convert PCM to frequency chart but result looks very strange near 0

I tried convert PCM data from wav file and FFT to frequency chart.
Here is my chart.
0.00s 512 sample count
3.15s 512 sample count
The sound file almost quietly and have some knock sound start at 3s.
I noticed near 0 the value very high. But how it can be!
Another strange point is "the value is 0 when frequency greater than about 16000".
Here is my code:
import soundfile as sf
import numpy as np
import math
import matplotlib.pyplot as plt
_audio_path = 'source_normal.wav'
def plot_data(pcm_data, samplerate, current_time):
x_axis = np.arange(0, len(pcm_data) - 1) / len(pcm_data) * samplerate
complex_data = [x+0j for x in pcm_data]
result = np.fft.fft(complex_data)
length = len(pcm_data) // 2
amplitudes = [math.sqrt(x.imag * x.imag + x.real * x.real) for x in result[:length]]
plt.plot(x_axis[:length], amplitudes)
plt.title('{}s sample count: {}'.format(current_time, len(pcm_data)))
plt.xlabel('{}Hz'.format(samplerate))
plt.show()
def baz():
data, samplerate = sf.read(_audio_path, dtype='int16')
window = 512
total_number_of_data = len(data)
current_index = 0 # 144000
while current_index < total_number_of_data:
d = data[current_index:current_index+window]
current_time = current_index / samplerate
print('current time: {}'.format(current_index / samplerate))
plot_data(d, samplerate, current_time)
current_index += window
if __name__ == '__main__':
baz()
I not familiar with DSP and never tried before. So I think my code have some mistake, please help, thank you.
here is my sound file sound file
This high value you see on the first plot is caused by the constant component in the window. Try normalization: shift all window's values by its average.
Tail zeros are just amplitudes small enough to look like zeros. Check out their values to ensure ;)

Not able to recreate same sound using FFT

I am trying to recreate musical note using top 10 frequencies returned by Fourier Transform (FFT). Resulting sound does not match the original sound. Not sure if I am not finding frequencies correctly or not generating sound from it correctly. The goal of this code is to match the original sound.
Here is my code:
import numpy as np
from scipy.io import wavfile
from scipy.fftpack import fft
import matplotlib.pyplot as plt
i_framerate = 44100
fs, data = wavfile.read('./Flute.nonvib.ff.A4.stereo.wav') # load the data
def findFrequencies(arr_data, i_framerate = 44100, i_top_n =5):
a = arr_data.T[0] # this is a two channel soundtrack, I get the first track
# b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
y = fft(a) # calculate fourier transform (complex numbers list)
xf = np.linspace(0,int(i_framerate/2.0),int((i_framerate/2.0))+1) /2 # Need to find out this last /2 part
yf = np.abs(y[:int((i_framerate//2.0))+1])
plt.plot(xf,yf)
yf_top_n = np.argsort(yf)[-i_top_n:][::-1]
amp_top_n = yf[yf_top_n] / np.max(yf[yf_top_n])
freq_top_n = xf[yf_top_n]
return freq_top_n, amp_top_n
def createSoundData(a_freq, a_amp, i_framerate=44100, i_time = 1, f_amp = 1000.0):
n_samples = i_time * i_framerate
x = np.linspace(0,i_time, n_samples)
y = np.zeros(n_samples)
for i in range(len(a_freq)):
y += np.sin(2 * np.pi * a_freq[i] * x)* f_amp * a_amp[i]
data2 = np.c_[y,y] # 2 Channel sound
return data2
top_freq , top_freq_amp = findFrequencies(data, i_framerate = 44100 , i_top_n = 200)
print('Frequencies: ',top_freq)
print('Amplitudes : ',top_freq_amp)
soundData = createSoundData(top_freq, top_freq_amp,i_time = 2, f_amp = 50 / len(top_freq))
wavfile.write('createsound_A4_v6.wav',i_framerate,soundData)
The top 10 spectral frequencies in a musical note are not the same as the center frequencies of the top 10 FFT result bin magnitudes. The actual frequency peaks can be between the FFT bins.
Not only can the frequency peak information be between FFT bins, but the phase information required to reproduce any note transients (attack, decay, etc.) can also be between bins. Spectral information that is between FFT bins is carried by a span (up to the full width) of the complex FFT result.

Categories

Resources