I want to write a signal in a .wav file, but when I do this using
scipy.io.wavfile.write it just create me a .wav without sound.
The .wav has the good length but there is no sound.
I looked for a solution for this problem but I couldn't find help.
My code below :
import scipy as sp
import numpy as np
dt = np.dtype(np.int32)
sig = np.fromfile(filename, dtype=dt, count=-1, sep='')
sp.io.wavfile.write('sound.wav', int(fS), sig)
As a test, I also did a little function :
def write_wav_sin(name,fs,f):
x = np.linspace(0,10,10*fs)
dt = np.dtype(np.float32)
sig = np.sin(2*math.pi*f*x, dtype=dt)
print(type(sig[0]))
sp.io.wavfile.write(name, fs, sig)
plt.plot(x,sig)
With this test it works, but with my other code it doesn't work
Someone knows why I have this problem ?
Check the range of values in sig by printing sig.min() and sig.max(). The values are not scaled by wavfile.write, so it might be that you have a file with values so low that you can't hear them.
Try scaling up the 32 bit integer values, or writing the data as normalized 32 bit floating point. For example, this converts sig to 32 bit floating point values in the range [-1, 1] before saving it:
m = np.max(np.abs(sig))
sigf32 = (sig/m).astype(np.float32)
sp.io.wavfile.write('sound.wav', int(fS), sigf32)
Finally I divided all my signal to have an amplitude max way more little ( my signal had sometimes an amplitude of 500000, to write it in a Wav I divided it by 250000).
With that trick I can listen to the sound but there is something weird, like additionnal artifacts/noise ( I compared it to a .wav obtained with matlab , with the same file )
the code I used is :
import scipy as sp
import numpy as np
dt = np.dtype(np.int32)
sig = np.fromfile(filename, dtype=dt, count=-1, sep='')
sp.io.wavfile.write('sound.wav', int(fS), sig/250000)
Here's a commented example on how to generate a basic wave file with a set duration, frequency, volume and number of samples. Utilizing NumPy and Python's wave library.
import numpy as ny
import struct
import wave
class SoundFile:
def __init__(self, signal):
# https://docs.python.org/3.6/library/wave.html#wave.open
self.file = wave.open('test.wav', 'wb')
self.signal = signal
self.sr = 44100
def write(self):
# https://docs.python.org/3.6/library/wave.html#wave.Wave_write.setparams
self.file.setparams( ( 1, 2, self.sr, 44100 * 4, 'NONE', 'noncompressed' ) )
# https://docs.python.org/3.6/library/wave.html#wave.Wave_write.writeframes
self.file.writeframes( self.signal )
self.file.close()
# signal settings
duration = 4 # duration in Seconds
samplerate = 44100 # Hz (frequency)
samples = duration * samplerate # aka samples per second
frequency = 440 # Hz
period = samplerate / float( frequency ) # of samples
omega = ny.pi * 2 / period # calculate omega (angular frequency)
volume = 16384 # 16384 is the volume measure (max is 32768)
# create sin wave
xaxis = ny.arange( samples, dtype = ny.float )
ydata = volume * ny.sin( xaxis * omega )
# fill blanks
signal = ny.resize( ydata, ( samples, ) )
#create sound file
f = SoundFile( signal )
f.write()
print( 'sound file created' )
Did my best to comment, update, and modify this source by a random blogger.
Related
Sorry if this is a really obvious question. I am using matplotlib to generate some spectrograms for use as training data in a machine learning model. The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data. I have shown my code below for generating each spectrogram. I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.
from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np
BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5
def generate_random_specgram(track):
# Read audio data from file
audio = AudioSegment.from_file(track.location)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
start = np.random.randint(BUFFER, len(samples) - BUFFER)
chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = FRAME_RATE)
plt.savefig(filename)
plt.close()
# Perform random variations to the BPM
frame_rate = FRAME_RATE
bpm = track.bpm
variation = 1 - BPM_VARIATION_AMOUNT + (
np.random.random() * BPM_VARIATION_AMOUNT * 2)
bpm *= variation
bpm = round(bpm, 2)
# I thought this next line should have been /= but that stretched the wrong way?
frame_rate *= (bpm / track.bpm)
# Read audio data from file
chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = frame_rate)
plt.savefig(filename)
plt.close()
I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways. I'm sure I'm missing something but I can't see what it is. Below is an image to illustrate what I'm getting.
The framerate is a fixed number that only depends on your data, if you change it you will effectively "stretch" the x-axis but in the wrong way. For example, if you have 1000 data points that correspond to 1 second, your framerate (or better sampling frequency) will be 1000. If your signal is a simple 200Hz sine which slightly increases the frequency in time, the specgram will be:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 1000
plt.specgram(signal, Fs=frame_rate);
If you change the framerate you will have a wrong x and y-axis scale. If you set the framerate to be 500 you will have:
t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)
frame_rate = 500
plt.specgram(signal, Fs=frame_rate);
The plot is very similar, but this time is wrong: you have almost 2 seconds on the x-axis, while you should only have 1, moreover, the starting frequency you read is 100Hz instead of 200Hz.
To conclude, the sampling frequency you set needs to be the correct one. If you want to stretch the plot you can use something like plt.xlim(0.2, 0.4). If you want to avoid the white band on top of the plot you can manually set the ylim to be half the frame rate:
plt.ylim(0, frame_rate/2)
This works because of simple properties of the Fourier transform and Nyquist-Shannon theorem.
The solution to my problem was to set the xlim and ylim of the plot. Here is the code from my testing file in which I finally got rid of all the odd whitespace:
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
BUFFER = 5
FRAME_RATE = 22050
SAMPLE_LENGTH = 2
def plot(audio_file, bpm, variation=1):
audio = AudioSegment.from_file(audio_file)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
chunk_length = int(FRAME_RATE * SAMPLE_LENGTH * variation)
start = np.random.randint(
BUFFER * FRAME_RATE,
len(samples) - (BUFFER * FRAME_RATE) - chunk_length)
chunk = samples[start:start + chunk_length]
plt.figure(figsize=(5.12, 2.56)).add_axes([0, 0, 1, 1])
plt.specgram(chunk, Fs=FRAME_RATE * variation)
plt.xlim(0, SAMPLE_LENGTH)
plt.ylim(0, FRAME_RATE / 2 * variation)
plt.savefig('specgram-%f.png' % (bpm * variation))
plt.close()
I have a wav file which contains a recorded chirp sound.
The Frequency sampling 44100
Number of Channels 1
Complete Samplings N 90405
secs 2.05
The chirp sound is only 50ms.
The image of the chirp:
[https://i.imgur.com/Hr4u5tx.jpg][1]
The code I have so far to read the wav file and carry out some basic processing.
fs_rate, signal = wavfile.read("chirp.wav")
print ("Frequency sampling", fs_rate)
l_audio = len(signal.shape)
print ("Channels", l_audio)
if l_audio == 2:
signal = signal.sum(axis=1) / 2
N = signal.shape[0]
print ("Complete Samplings N", N)
secs = N / float(fs_rate)
print ("secs", secs)
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts", Ts)
t = scipy.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray
FFT = abs(scipy.fft(signal))
FFT_side = FFT[range(N//2)] # one side FFT range
freqs = scipy.fftpack.fftfreq(signal.size, t[1]-t[0])
fft_freqs = np.array(freqs)
freqs_side = freqs[range(N//2)] # one side frequency range
fft_freqs_side = np.array(freqs_side)
plt.subplot(311)
p1 = plt.plot(t, signal, "g") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.savefig('chirp.jpg')
Problem:
Using python how do I tell where the first sample point of the chirp is located in the audio file.
First point when the chirp was received.
The signal contains background noise. The result I am expecting
should say this is where your chirp signal starts at it is at a 2kHz frequency.
PS: This is not a homework problem. I am learning DSP. Sort of self-study.
If you know the chirp sequence, you could correlate against that to get start of chirp in stream.
import scipy.signal as sig
h = sp.array(chirp_sequence)
rxy = sig.correlate(signal, h)
start_idx = arg.max(abs(rxy))
I've a Python code which performs FFT on a wav file and plot the amplitude vs time / amplitude vs freq graphs. I want to calculate dB from these graphs (they are long arrays). I do not want to calculate exact dBA, I just want to see a linear relationship after my calculations. I've dB meter, I will compare it. Here is my code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack
import numpy as np
from matplotlib import pyplot as plt
fs_rate, signal = wavfile.read("output.wav")
print ("Frequency sampling", fs_rate)
l_audio = len(signal.shape)
print ("Channels", l_audio)
if l_audio == 2:
signal = signal.sum(axis=1) / 2
N = signal.shape[0]
print ("Complete Samplings N", N)
secs = N / float(fs_rate)
print ("secs", secs)
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts", Ts)
t = scipy.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray
FFT = abs(scipy.fft(signal))
FFT_side = FFT[range(N//4)] # one side FFT range
freqs = scipy.fftpack.fftfreq(signal.size, t[1]-t[0])
fft_freqs = np.array(freqs)
freqs_side = freqs[range(N//4)] # one side frequency range
fft_freqs_side = np.array(freqs_side)
makespositive = signal[44100:]*(-1)
logal = np.log10(makespositive)
sn1 = np.mean(logal[1:44100])
sn2 = np.mean(logal[44100:88200])
sn3 = np.mean(logal[88200:132300])
sn4 = np.mean(logal[132300:176400])
print(sn1)
print(sn2)
print(sn3)
print(sn4)
abs(FFT_side)
for a in range(500):
FFT_side[a] = 0
plt.subplot(311)
p1 = plt.plot(t[44100:], signal[44100:], "g") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.subplot(312)
p1 = plt.plot(t[44100:], logal, "r") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.subplot(313)
p3 = plt.plot(freqs_side, abs(FFT_side), "b") # plotting the positive fft spectrum
plt.xlabel('Frequency (Hz)')
plt.ylabel('Count single-sided')
plt.show()
First plot is amplitude vs time, second one is logarithm of previous graph and the last one is FFT.
In sn1,sn2 part I tried to calculate dB from signal. First I took log and then calculated mean value for each second. It did not give me a clear relationship. I also tried this and did not worked.
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf
fs, signal = wf.read('output.wav') # Load the file
ref = 32768 # 0 dBFS is 32678 with an int16 signal
N = 8192
win = np.hamming(N)
x = signal[0:N] * win # Take a slice and multiply by a window
sp = np.fft.rfft(x) # Calculate real FFT
s_mag = np.abs(sp) * 2 / np.sum(win) # Scale the magnitude of FFT by window and factor of 2,
# because we are using half of FFT spectrum
s_dbfs = 20 * np.log10(s_mag / ref) # Convert to dBFS
freq = np.arange((N / 2) + 1) / (float(N) / fs) # Frequency axis
plt.plot(freq, s_dbfs)
plt.grid(True)
So which steps should I perform? (Sum/mean all freq amplitudes then take log or reverse, or perform it for signal etc.)
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf
fs, signal = wf.read('db1.wav')
signal2 = signal[44100:]
chunk_size = 44100
num_chunk = len(signal2) // chunk_size
sn = []
for chunk in range(0, num_chunk):
sn.append(np.mean(signal2[chunk*chunk_size:(chunk+1)*chunk_size].astype(float)**2))
print(sn)
logsn = 20*np.log10(sn)
print(logsn)
Output:
[4.6057844427695475e+17, 5.0025315250895744e+17, 5.028593412665193e+17, 4.910948397471887e+17]
[353.26607217 353.98379668 354.02893044 353.82330741]
A decibel meter measures a signal's mean power. So from your time signal recording you can calculate the mean signal power with:
chunk_size = 44100
num_chunk = len(signal) // chunk_size
sn = []
for chunk in range(0, num_chunk):
sn.append(np.mean(signal[chunk*chunk_size:(chunk+1)*chunk_size]**2))
Then the corresponding mean signal power in decibels is simply given by:
logsn = 10*np.log10(sn)
A equivalent relationship could also be obtained for a frequency domain signal with the use of Parseval's theorem, but in your case would require unecessary FFT computations (this relationship is mostly useful when you already have to compute the FFT for other purposes).
Note however that depending on what you compare there may be some (hopefully small) discrepancies. For example the use of non-linear amplifier and speakers would affect the relationship. Similarly ambient noises would add to the measured power by the decibel meter.
I tried convert PCM data from wav file and FFT to frequency chart.
Here is my chart.
0.00s 512 sample count
3.15s 512 sample count
The sound file almost quietly and have some knock sound start at 3s.
I noticed near 0 the value very high. But how it can be!
Another strange point is "the value is 0 when frequency greater than about 16000".
Here is my code:
import soundfile as sf
import numpy as np
import math
import matplotlib.pyplot as plt
_audio_path = 'source_normal.wav'
def plot_data(pcm_data, samplerate, current_time):
x_axis = np.arange(0, len(pcm_data) - 1) / len(pcm_data) * samplerate
complex_data = [x+0j for x in pcm_data]
result = np.fft.fft(complex_data)
length = len(pcm_data) // 2
amplitudes = [math.sqrt(x.imag * x.imag + x.real * x.real) for x in result[:length]]
plt.plot(x_axis[:length], amplitudes)
plt.title('{}s sample count: {}'.format(current_time, len(pcm_data)))
plt.xlabel('{}Hz'.format(samplerate))
plt.show()
def baz():
data, samplerate = sf.read(_audio_path, dtype='int16')
window = 512
total_number_of_data = len(data)
current_index = 0 # 144000
while current_index < total_number_of_data:
d = data[current_index:current_index+window]
current_time = current_index / samplerate
print('current time: {}'.format(current_index / samplerate))
plot_data(d, samplerate, current_time)
current_index += window
if __name__ == '__main__':
baz()
I not familiar with DSP and never tried before. So I think my code have some mistake, please help, thank you.
here is my sound file sound file
This high value you see on the first plot is caused by the constant component in the window. Try normalization: shift all window's values by its average.
Tail zeros are just amplitudes small enough to look like zeros. Check out their values to ensure ;)
I am trying to recreate musical note using top 10 frequencies returned by Fourier Transform (FFT). Resulting sound does not match the original sound. Not sure if I am not finding frequencies correctly or not generating sound from it correctly. The goal of this code is to match the original sound.
Here is my code:
import numpy as np
from scipy.io import wavfile
from scipy.fftpack import fft
import matplotlib.pyplot as plt
i_framerate = 44100
fs, data = wavfile.read('./Flute.nonvib.ff.A4.stereo.wav') # load the data
def findFrequencies(arr_data, i_framerate = 44100, i_top_n =5):
a = arr_data.T[0] # this is a two channel soundtrack, I get the first track
# b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
y = fft(a) # calculate fourier transform (complex numbers list)
xf = np.linspace(0,int(i_framerate/2.0),int((i_framerate/2.0))+1) /2 # Need to find out this last /2 part
yf = np.abs(y[:int((i_framerate//2.0))+1])
plt.plot(xf,yf)
yf_top_n = np.argsort(yf)[-i_top_n:][::-1]
amp_top_n = yf[yf_top_n] / np.max(yf[yf_top_n])
freq_top_n = xf[yf_top_n]
return freq_top_n, amp_top_n
def createSoundData(a_freq, a_amp, i_framerate=44100, i_time = 1, f_amp = 1000.0):
n_samples = i_time * i_framerate
x = np.linspace(0,i_time, n_samples)
y = np.zeros(n_samples)
for i in range(len(a_freq)):
y += np.sin(2 * np.pi * a_freq[i] * x)* f_amp * a_amp[i]
data2 = np.c_[y,y] # 2 Channel sound
return data2
top_freq , top_freq_amp = findFrequencies(data, i_framerate = 44100 , i_top_n = 200)
print('Frequencies: ',top_freq)
print('Amplitudes : ',top_freq_amp)
soundData = createSoundData(top_freq, top_freq_amp,i_time = 2, f_amp = 50 / len(top_freq))
wavfile.write('createsound_A4_v6.wav',i_framerate,soundData)
The top 10 spectral frequencies in a musical note are not the same as the center frequencies of the top 10 FFT result bin magnitudes. The actual frequency peaks can be between the FFT bins.
Not only can the frequency peak information be between FFT bins, but the phase information required to reproduce any note transients (attack, decay, etc.) can also be between bins. Spectral information that is between FFT bins is carried by a span (up to the full width) of the complex FFT result.