I have an audio file of 10 seconds in length. If I generate the spectrogram using matplotlib, then I get a different number of timesteps as compared to the spectrogram generated by librosa.
Here is the code:
fs = 8000
nfft = 200
noverlap = 120
hop_length = 120
audio = librosa.core.load(path, sr=fs)
# Spectogram generated using matplotlib
spec, freqs, bins, _ = plt.specgram(audio, nfft, fs, noverlap = noverlap)
print(spec.shape) # (101, 5511)
# Using librosa
spectrogram_librosa = np.abs(librosa.stft(audio,
n_fft=n_fft,
hop_length=hop_length,
win_length=nfft,
window='hann')) ** 2
spectrogram_librosa_db = librosa.power_to_db(spectrogram_librosa, ref=np.max)
print(spectrogram_librosa_db.shape) # (101, 3676)
Can someone explain it to me why is there a huge diff in the time steps and how to make sure that both generate the same output?
This is because the noverlap of plt.specgram consider the number of points to overlap the audio segments with, whereas the hop_length consider the step between the segments.
That being said, there is still a 2-points difference between the two results, but this is most possibly due to the boundaries.
import numpy as np
import librosa
import matplotlib.pyplot as plt
path = librosa.util.example_audio_file()
fs = 8000
nfft = 200
noverlap = 120 # overlap
hop_length = 80 # step
audio, fs = librosa.core.load(path, sr=fs)
# Spectogram generated using matplotlib
spec, freqs, bins, _ = plt.specgram(
audio, NFFT=nfft, Fs=fs, noverlap=noverlap,
)
spec = np.log10(spec + 1e-14)
print(spec.shape) # (101, 6144)
# Using librosa
spectrogram_librosa = (
np.abs(
librosa.stft(
audio,
n_fft=nfft,
hop_length=hop_length,
win_length=nfft,
window="hann",
)
)
** 2
)
spectrogram_librosa_db = librosa.power_to_db(spectrogram_librosa, ref=np.max)
print(spectrogram_librosa_db.shape) # (101, 6146)
fig, ax = plt.subplots(2)
ax[0].pcolorfast(spec)
ax[1].pcolorfast(spectrogram_librosa_db)
plt.show()
This outputs the following picture:
Related
I computed a sinewave of 4Hz, applied FFT and calculated the amplitude, the amplitude is an array of 500 length, I want to convert each element in that array to dBm form, and draw a spectrogram. however I can't seem to get the calculation right.
I saw that general formula:
valueDBFS = 20np.log10(abs(value))
so I tried using it and I get only negative results..
Here is my full code (edited):
# Python example - Fourier transform using numpy.fft method
import numpy as np
import matplotlib.pyplot as plotter
from os import times
from PIL import Image
import numpy as np
# How many time points are needed i,e., Sampling Frequency
samplingFrequency = 100
# At what intervals time points are sampled
samplingInterval = 1 / samplingFrequency
# Begin time perod of the signals
beginTime = 0
# End time period of the signals
endTime = 10
# Frequency of the signals
signal1Frequency = 4
signal2Frequency = 70
# Time points
time = np.arange(beginTime, endTime, samplingInterval)
# Create two sine waves
amplitude1 = 100 * np.sin(2*np.pi*signal1Frequency*time)
fourierTransform = np.fft.fft(amplitude1)
fourierTransform = fourierTransform[range(int(len(amplitude1)/2))] # Exclude sampling frequency
tpCount = len(amplitude1)
values = np.arange(int(tpCount/2))
timePeriod = tpCount/samplingFrequency
frequencies = values/timePeriod
valueDBFS = 20*np.log10(abs(fourierTransform))
print(valueDBFS)
#SPECTROGRAM
w, h = 500, 500
data = np.zeros((h, w, 3), dtype=np.uint8)
time = time[:len(time)//2]
for i in range(500):
for j in range(500):
color = abs(fourierTransform)[i]
data[i,j] = [color, color, color]
img = Image.fromarray(data, 'RGB')
img.show()
The maximum value of your amplitude is 1, and log10(1) is 0, everything else will be less than that - for example log10(0.9) = -0,0458.
So that part of your code works fine, the logs should be negative in your example! - Try defining your amplitude like this:
amplitude1 = 100 * np.sin(2*np.pi*signal1Frequency*time)
That should give plenty of positive results.
Sorry if this is a really obvious question. I am using matplotlib to generate some spectrograms for use as training data in a machine learning model. The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data. I have shown my code below for generating each spectrogram. I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.
from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np
BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5
def generate_random_specgram(track):
# Read audio data from file
audio = AudioSegment.from_file(track.location)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
start = np.random.randint(BUFFER, len(samples) - BUFFER)
chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = FRAME_RATE)
plt.savefig(filename)
plt.close()
# Perform random variations to the BPM
frame_rate = FRAME_RATE
bpm = track.bpm
variation = 1 - BPM_VARIATION_AMOUNT + (
np.random.random() * BPM_VARIATION_AMOUNT * 2)
bpm *= variation
bpm = round(bpm, 2)
# I thought this next line should have been /= but that stretched the wrong way?
frame_rate *= (bpm / track.bpm)
# Read audio data from file
chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = frame_rate)
plt.savefig(filename)
plt.close()
I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways. I'm sure I'm missing something but I can't see what it is. Below is an image to illustrate what I'm getting.
The framerate is a fixed number that only depends on your data, if you change it you will effectively "stretch" the x-axis but in the wrong way. For example, if you have 1000 data points that correspond to 1 second, your framerate (or better sampling frequency) will be 1000. If your signal is a simple 200Hz sine which slightly increases the frequency in time, the specgram will be:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 1000
plt.specgram(signal, Fs=frame_rate);
If you change the framerate you will have a wrong x and y-axis scale. If you set the framerate to be 500 you will have:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 500
plt.specgram(signal, Fs=frame_rate);
The plot is very similar, but this time is wrong: you have almost 2 seconds on the x-axis, while you should only have 1, moreover, the starting frequency you read is 100Hz instead of 200Hz.
To conclude, the sampling frequency you set needs to be the correct one. If you want to stretch the plot you can use something like plt.xlim(0.2, 0.4). If you want to avoid the white band on top of the plot you can manually set the ylim to be half the frame rate:
plt.ylim(0, frame_rate/2)
This works because of simple properties of the Fourier transform and Nyquist-Shannon theorem.
The solution to my problem was to set the xlim and ylim of the plot. Here is the code from my testing file in which I finally got rid of all the odd whitespace:
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
BUFFER = 5
FRAME_RATE = 22050
SAMPLE_LENGTH = 2
def plot(audio_file, bpm, variation=1):
audio = AudioSegment.from_file(audio_file)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
chunk_length = int(FRAME_RATE * SAMPLE_LENGTH * variation)
start = np.random.randint(
BUFFER * FRAME_RATE,
len(samples) - (BUFFER * FRAME_RATE) - chunk_length)
chunk = samples[start:start + chunk_length]
plt.figure(figsize=(5.12, 2.56)).add_axes([0, 0, 1, 1])
plt.specgram(chunk, Fs=FRAME_RATE * variation)
plt.xlim(0, SAMPLE_LENGTH)
plt.ylim(0, FRAME_RATE / 2 * variation)
plt.savefig('specgram-%f.png' % (bpm * variation))
plt.close()
I want to get the frequency using pyaudio and plot it in a diagram via matplotlib. Therefore I used pyaudio to get the data from my audio input, which works fine but I've no idea how to get the frequency out of a raw signal. I found this piece of code, which should do the job, but I don't know how to apply it to my code.
Here i set up the microphon and prepare for recording:
# constants
CHUNK = 1024 * 2 # samples per frame
FORMAT = pyaudio.paInt16 # audio format (bytes per sample?)
CHANNELS = 1 # single channel for microphone
RATE = 44100 # samples per second
# pyaudio class instance
mic = pyaudio.PyAudio()
# stream object to get data from microphone
stream = mic.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
This is the part of my code where I get the data from my mic:
data = stream.read(CHUNK)
# convert data to integers, make np array, then offset it by 127
data_int = struct.unpack(str(2 * CHUNK) + 'B', data)
# create np array and offset by 128
data_np = np.array(data_int, dtype='b')[::2]
data_np = [i+127 for i in data_np]
I just put this in a while-loop and plotted it in a life-plot.
Here's the full code:
import pyaudio #for capturing the audio-signal
import struct #for converting the binary-data from the signal to integer
import matplotlib.pyplot as plt #for displaying the audio-signal
import numpy as np
#functions
def plot_setup():
# create matplotlib figure and axes
fig=plt.figure()
ax=fig.add_subplot(111)
# variable for plotting
x = np.arange(0, 2 * CHUNK, 2)
# create a line object with random data
line, = ax.plot(x, [128 for i in range(2048)], '-')
# basic formatting for the axes
ax.set_title('AUDIO WAVEFORM')
ax.set_xlabel('samples')
ax.set_ylabel('volume')
ax.set_ylim(0, 255)
ax.set_xlim(0, 2 * CHUNK)
plt.xticks([0, CHUNK, 2 * CHUNK])
plt.yticks([0, 128, 255])
# show the plot
plt.show(block=False)
return fig, line
def measure():
# binary data
data = stream.read(CHUNK)
# convert data to integers, make np array, then offset it by 127
data_int = struct.unpack(str(2 * CHUNK) + 'B', data)
# create np array and offset by 128
data_np = np.array(data_int, dtype='b')[::2]
data_np = [i+127 for i in data_np]
line.set_ydata(data_np)
try:
fig.canvas.draw()
fig.canvas.flush_events()
except:
return 0
# constants
CHUNK = 1024 * 2 # samples per frame
FORMAT = pyaudio.paInt16 # audio format (bytes per sample?)
CHANNELS = 1 # single channel for microphone
RATE = 44100 # samples per second
# pyaudio class instance
mic = pyaudio.PyAudio()
# stream object to get data from microphone
stream = mic.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
if __name__=="__main__":
fig, line=plot_setup()
while True:
m=measure()
if m==0:
break
And this is the output I get:
The final diagram should look exactly the same, except that I want the frequency to be on the y-axis.
I am trying to recreate musical note using top 10 frequencies returned by Fourier Transform (FFT). Resulting sound does not match the original sound. Not sure if I am not finding frequencies correctly or not generating sound from it correctly. The goal of this code is to match the original sound.
Here is my code:
import numpy as np
from scipy.io import wavfile
from scipy.fftpack import fft
import matplotlib.pyplot as plt
i_framerate = 44100
fs, data = wavfile.read('./Flute.nonvib.ff.A4.stereo.wav') # load the data
def findFrequencies(arr_data, i_framerate = 44100, i_top_n =5):
a = arr_data.T[0] # this is a two channel soundtrack, I get the first track
# b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
y = fft(a) # calculate fourier transform (complex numbers list)
xf = np.linspace(0,int(i_framerate/2.0),int((i_framerate/2.0))+1) /2 # Need to find out this last /2 part
yf = np.abs(y[:int((i_framerate//2.0))+1])
plt.plot(xf,yf)
yf_top_n = np.argsort(yf)[-i_top_n:][::-1]
amp_top_n = yf[yf_top_n] / np.max(yf[yf_top_n])
freq_top_n = xf[yf_top_n]
return freq_top_n, amp_top_n
def createSoundData(a_freq, a_amp, i_framerate=44100, i_time = 1, f_amp = 1000.0):
n_samples = i_time * i_framerate
x = np.linspace(0,i_time, n_samples)
y = np.zeros(n_samples)
for i in range(len(a_freq)):
y += np.sin(2 * np.pi * a_freq[i] * x)* f_amp * a_amp[i]
data2 = np.c_[y,y] # 2 Channel sound
return data2
top_freq , top_freq_amp = findFrequencies(data, i_framerate = 44100 , i_top_n = 200)
print('Frequencies: ',top_freq)
print('Amplitudes : ',top_freq_amp)
soundData = createSoundData(top_freq, top_freq_amp,i_time = 2, f_amp = 50 / len(top_freq))
wavfile.write('createsound_A4_v6.wav',i_framerate,soundData)
The top 10 spectral frequencies in a musical note are not the same as the center frequencies of the top 10 FFT result bin magnitudes. The actual frequency peaks can be between the FFT bins.
Not only can the frequency peak information be between FFT bins, but the phase information required to reproduce any note transients (attack, decay, etc.) can also be between bins. Spectral information that is between FFT bins is carried by a span (up to the full width) of the complex FFT result.
I'd like to create a basic High Pass FIR Filter by Windowing within Python.
My code is below and is intentionally idiomatic - I'm aware you can (most likely) complete this with a single line of code in Python but I'm learning. I have used a basic a sinc function with a rectangular window: My output works for signals that are additive (f1+f2) but not multiplicative (f1*f2), where f1=25kHz and f2=1MHz.
My questions are: Have I misunderstood something fundamental or is my code wrong?
In summary, I'd like to extract just the high pass signal (f2=1MHz) and filter everything else out. I've also included screen shots of what is generated for (f1+f2) and (f1*f2):
import numpy as np
import matplotlib.pyplot as plt
# create an array of 1024 points sampled at 40MHz
# [each sample is 25ns apart]
Fs = 40e6
T = 1/Fs
t = np.arange(0,(1024*T),T)
# create an ip signal sampled at Fs, using two frequencies
F_low = 25e3 # 25kHz
F_high = 1e6 # 1MHz
ip = np.sin(2*np.pi*F_low*t) + np.sin(2*np.pi*F_high*t)
#ip = np.sin(2*np.pi*F_low*t) * np.sin(2*np.pi*F_high*t)
op = [0]*len(ip)
# Define -
# Fsample = 40MHz
# Fcutoff = 900kHz,
# this gives the normalised transition freq, Ft
Fc = 0.9e6
Ft = Fc/Fs
Length = 101
M = Length - 1
Weight = []
for n in range(0, Length):
if( n != (M/2) ):
Weight.append( -np.sin(2*np.pi*Ft*(n-(M/2))) / (np.pi*(n-(M/2))) )
else:
Weight.append( 1-2*Ft )
for n in range(len(Weight), len(ip)):
y = 0
for i in range(0, len(Weight)):
y += Weight[i]*ip[n-i]
op[n] = y
plt.subplot(311)
plt.plot(Weight,'ro', linewidth=3)
plt.xlabel( 'weight number' )
plt.ylabel( 'weight value' )
plt.grid()
plt.subplot(312)
plt.plot( ip,'r-', linewidth=2)
plt.xlabel( 'sample length' )
plt.ylabel( 'ip value' )
plt.grid()
plt.subplot(313)
plt.plot( op,'k-', linewidth=2)
plt.xlabel( 'sample length' )
plt.ylabel( 'op value' )
plt.grid()
plt.show()
You've misunderstood something fundamental. The windowed sinc filter is designed to separate linearly combined frequencies; i.e. frequencies combined through addition, not frequencies combined through multiplication. See chapter 5 of The Scientist and Engineer's Guide to
Digital Signal Processing for more details.
Code based on scipy.signal will provide similar results to your code:
from pylab import *
import scipy.signal as signal
# create an array of 1024 points sampled at 40MHz
# [each sample is 25ns apart]
Fs = 40e6
nyq = Fs / 2
T = 1/Fs
t = np.arange(0,(1024*T),T)
# create an ip signal sampled at Fs, using two frequencies
F_low = 25e3 # 25kHz
F_high = 1e6 # 1MHz
ip_1 = np.sin(2*np.pi*F_low*t) + np.sin(2*np.pi*F_high*t)
ip_2 = np.sin(2*np.pi*F_low*t) * np.sin(2*np.pi*F_high*t)
Fc = 0.9e6
Length = 101
# create a low pass digital filter
a = signal.firwin(Length, cutoff = F_high / nyq, window="hann")
# create a high pass filter via signal inversion
a = -a
a[Length/2] = a[Length/2] + 1
figure()
plot(a, 'ro')
# apply the high pass filter to the two input signals
op_1 = signal.lfilter(a, 1, ip_1)
op_2 = signal.lfilter(a, 1, ip_2)
figure()
plot(ip_1)
figure()
plot(op_1)
figure()
plot(ip_2)
figure()
plot(op_2)
Impulse Response:
Linearly Combined Input:
Filtered Output:
Non-linearly Combined Input:
Filtered Output: