I want to create a .wav file in Python using the numpy and scipy libraries, where multiple tones are played, the way I intend to do it is by storing my frequencies in an array and then the generated signals are stored in another one. I've managed to create such file with the desired playtime, but it doesn't play any sound. Am I missing something?
Thank you.
import numpy as np
from scipy.io import wavfile
freq =np.array([440,493,523,587,659,698,783,880]) #tone frequencies
fs=22050 #sample rate
duration=1 #signal duration
music=[]
t=np.arange(0,duration,1./fs) #time
for i in range(0,len(freq)):
x=10000*np.cos(2*np.pi*freq[i]*t) #generated signals
music=np.hstack((music,x))
wavfile.write('music.wav',fs,music)
The vector that you are using to create the wave file contains floats, but scipy.io interprets them as 64 bit ints (as is mentioned in the docs), which is not supported by most players.
Changing the last line to
wavfile.write('music.wav',fs,music.astype(np.dtype('i2')))
should produce a file that can be played properly.
Related
I am currently working on augmenting audio in Python. I've been using Librosa due to its speed and simplicity but need to fallback on PyDub for some other utilities such as applying gain.
Is there a mathematical way to add gain to the Numpy array provided with librosa.load? In PyDub it is quite easy but I have to constantly convert back between Pydub's get_array_of_samples() to np.array then to the proper 32 bit float representation on the [-1,1) scale (that Librosa uses by default). I'd rather keep it all in one library for simplicity.
Also a normalization of an audio signal to 0 db gain beforehand would be useful too. I am a bit new to a lot of the terminology used in audio signal processing.
This is what I am currently doing. Down the road I would like to make this a class method which starts with using librosa's numpy array, so if there is a way to mathematically add specified gain in a certain unit to a numpy array from librosa that would be ideal.
Thanks
import librosa
import numpy as np
from pydub import AudioSegment, effects
pydub_audio = AudioSegment.from_file(audio_file_path)
pydub_audio = pydub_audio.set_frame_rate(16000) # make file 16k khz frame rate
print("Original dBFS is {}".format(pydub_audio.dBFS))
pydub_audio = pydub_audio.apply_gain(20) # apply 20db of gain to introduce clipping
#pydub_audio = effects.normalize(pydub_audio)
print("New dBFS is {}".format(pydub_audio.dBFS))
pydub_array = pydub_audio.get_array_of_samples()
pydub_array = np.array(pydub_array)
print("PyDub audio type is {}".format(pydub_array.dtype))
pydub_array_32bitfloat = pydub_array.astype(np.float32, order = 'C') / 32768 # rescaling to between [-1, 1] like librosa
print("Rescaled Pydub type is {}".format(pydub_array_32bitfloat.dtype))
import soundfile as sf
sf.write(r"test_pydub_gain.wav", pydub_array_32bitfloat, samplerate = 16000, format = 'wav')
thinking about it, (if i am not wrong), mathematicaly the gain is:
dBFS = 20 * log (level2 / level1)
so i would multiply all elements of the array by
10**(dBFS/20) to apply the gain
I'm trying to create a single audio file out of multiple wav files. Using tkinter and pygame.mixer, I've converted key presses into a dictionary that stores audio samples and the time they're invoked. {sound1:10000, sound2:10001, ect...}
So far I've devised a way to add blocks of silence:
def change_speed(seconds):
'''modifies the metronome beat to loop at different speeds. This is done by creating a new wav file.'''
#original wav file is 0.1 seconds long, so subtract that from time added
seconds-=0.1
#read the original wav file
original = scipy.io.wavfile.read('Downloads\\sounds\\metronome_original.wav')
#use sample rate of the original file (should be 44100) to create a new block of silence
add_secs = np.array([[0]]*round(original[0]*seconds))
add_secs.dtype='int16'
#concatenate new block to original
new = np.concatenate((original[1], add_secs))
scipy.io.wavfile.write('Downloads\\sounds\\metronome.wav', original[0], new)
Is there some way to combine overlapping arrays like [[0,0,1,1,2,0], [0,0,0,3,2,1]] into a single wav file?
Update:
To be more specific, I'm trying to merge two audio samples that overlap in playtime, like a DJ who starts playing one song before the other one finishes. Is there a way to do this with integer or byte arrays generated in python?
Like so:
Here's how I'd do it:
wav1 = [0,0,1,1,2,0]
wav2 = [0,0,0,3,2,1]
combined = np.hstack([wav1, wav2])
from scipy.io import wavfile
import numpy as np
N = 2400 # Samples per second.
wavfile.write('combined.wav', rate=N, data=combined.astype(np.int16))
I have a question about the difference between the load function of librosa and the read function of scipy.io.wavfile.
from scipy.io import wavfile
import librosa
fs, data = wavfile.read(name)
data, fs = librosa.load(name)
The imported voice file is the same file. If you run the code above, the values of the data come out of the two functions differently. I want to know why the value of the data is different.
From the docstring of librosa.core.load:
Load an audio file as a floating point time series.
Audio will be automatically resampled to the given rate (default sr=22050).
To preserve the native sampling rate of the file, use sr=None.
scipy.io.wavfile.read does not automatically resample the data, and the samples are not converted to floating point if they are integers in the file.
It's worth also mentioning that librosa.load() normalizes the data (so that all the data points are between 1 and -1), whereas wavfile.read() does not.
The data is different because scipy does not normalize the input signal.
Here is a snippet showing how to change scipy output to match librosa's:
nbits = 16
l_wave, rate = librosa.core.load(path, sr=None)
rate, s_wave = scipy.io.wavfile.read(path)
s_wave /= 2 ** (nbits - 1)
all(s_wave == l_wave)
# True
librosa.core.load has support for 24 bit audio files and 96kHz sample rates. Because of this, converting to float and default resampling, it can be considerably slower than scipy.io.wavfile.read in many cases.
In Python, I have an array of floats representing the voltages of an analog signal.
Can anyone explain how I can change the array into a .wav format? I have seen this
Do I first need to change the data format from [1.23,1.24,1.25,1.26] (for example) to 1.231.241.251.26 before adding the headers so that it's read correctly?
I eventually plan on using FFT on the values to derive the fundamental frequencies is there a better way to store the values in this case?
Thank you
If you know the sampling frequency of your signal and data is already scaled appropriately by max(abs(data)) then you can do it very easily using scipy:
from __future__ import print_function
import scipy.io.wavfile as wavf
import numpy as np
if __name__ == "__main__":
samples = np.random.randn(44100)
fs = 44100
out_f = 'out.wav'
wavf.write(out_f, fs, samples)
You can also use the standard wave module.
I was wondering if it's possible to get the frequencies present in a file with NumPy, and then alter those frequencies and create a new WAV file from them? I would like to do some filtering on a file, but I have yet to see a way to read a WAV file into NumPy, filter it, and then output the filtered version. If anyone could help, that would be great.
SciPy provides functions for doing FFTs on NumPy arrays, and also provides functions for reading and writing them to WAV files. e.g.
from scipy.io.wavfile import read, write
from scipy.fftpack import rfft, irfft
import np as numpy
rate, input = read('input.wav')
transformed = rfft(input)
filtered = function_that_does_the_filtering(transformed)
output = irfft(filtered)
write('output.wav', rate, output)
(input, transformed and output are all numpy arrays)