I'm trying to create a single audio file out of multiple wav files. Using tkinter and pygame.mixer, I've converted key presses into a dictionary that stores audio samples and the time they're invoked. {sound1:10000, sound2:10001, ect...}
So far I've devised a way to add blocks of silence:
def change_speed(seconds):
'''modifies the metronome beat to loop at different speeds. This is done by creating a new wav file.'''
#original wav file is 0.1 seconds long, so subtract that from time added
seconds-=0.1
#read the original wav file
original = scipy.io.wavfile.read('Downloads\\sounds\\metronome_original.wav')
#use sample rate of the original file (should be 44100) to create a new block of silence
add_secs = np.array([[0]]*round(original[0]*seconds))
add_secs.dtype='int16'
#concatenate new block to original
new = np.concatenate((original[1], add_secs))
scipy.io.wavfile.write('Downloads\\sounds\\metronome.wav', original[0], new)
Is there some way to combine overlapping arrays like [[0,0,1,1,2,0], [0,0,0,3,2,1]] into a single wav file?
Update:
To be more specific, I'm trying to merge two audio samples that overlap in playtime, like a DJ who starts playing one song before the other one finishes. Is there a way to do this with integer or byte arrays generated in python?
Like so:
Here's how I'd do it:
wav1 = [0,0,1,1,2,0]
wav2 = [0,0,0,3,2,1]
combined = np.hstack([wav1, wav2])
from scipy.io import wavfile
import numpy as np
N = 2400 # Samples per second.
wavfile.write('combined.wav', rate=N, data=combined.astype(np.int16))
Related
I'm learning how to use py-webrtcvad library for Python and trying to pick out segments containing speech on a .wav file. Right now, I don't know how to split a .wav files into smaller frames to put it into a vad object. Here's my code below
spf = wave.open("test4.wav", "rb")
signal = spf.readframes(-1)
vad = webrtcvad.Vad(3)
sample_rate = 16000
frame_duration = 10 # ms
#TODO: turn signal into segments and run the below print statement in a for loop
print('Contains speech: %s' % (vad.is_speech(signal, sample_rate)))
As I have read from https://github.com/wiseman/py-webrtcvad, the frames fed into the vad object must be 16-bit and either 10,20 or 30ms long. My sample is about 9s long and now I don't know how to split it into the appropriate frames. I have tested the example.py in the repo, but it is a bit too complicated for me so I tried to do a very simple example first.
Thank you.
Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.
But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.
Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?
This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).
import wave
import struct
# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')
# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']
# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)
number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip
format = '<{}h'.format(number_of_channels)
print('# Channels:', format)
# Read >> Second
for index in range(number_of_frames):
frame = inf.readframes(1)
data = struct.unpack(format, frame)
# Here, I change data to (0, 0), testing purposes
print('Before Audio Data:', data)
print('After Modifying Audio Data', (0, 0))
# Change Audio Data
data = (0, 0)
value = data[0]
value = (value * 2) // 3
outf.writeframes(struct.pack('<h', value))
# Close In File
inf.close()
# Close Out File
outf.close()
Is there a better practice or reference material if simply just modifying data segments of .wav files?
Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.
Performance comparison
Let's examine first 3 ways to read WAVE files.
The slowest one - wave module
As you might have noticed already, wave module can be painfully slow. Consider this code:
import wave
import struct
wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
For a WAVE as defined below:
Input File : 'audio.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size : 55.3M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.
The convenient one - audiofile
A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.
import audiofile as af
signal, sampling_rate = af.read(audio.wav)
This gave me on average 33 ms to read the mentioned file.
The fastest one - numpy
If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:
import numpy as np
byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)
The header structure (that tells us what offset to use) is defined here.
The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.
Modifying the audio
Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).
As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).
If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:
from scipy.io.wavfile import write
fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate
with open('audio_out.wav', 'a') as fout:
new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
write(fout, fs, new_data)
It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
Just the second day met with Python and with troubles...
I've got a lot of CD-standard (16 bit, 44100 Hz) stereo wave files and need to find their average (arithmetic mean) FIR. The algorithm is easy to say... - the sum of amplitudes for each freq. divides on the amount of files. Then the achieved FIR is being plotted and written down to the text file as the table.
I rolled over some similar posts like this exciting Python Scipy FFT wav files but there are still too many things, even alphabet, I lose touch in and compiler mistkes follow every time I try to repeat the examples.
I would appreciate any help that can move mу from the dead-end. So, these are my shy paces...
As the number of files may vary it is probably useful to have a list of files at the elbow:
import os
a = os.path.expanduser(u"~") # absolute user path var.
b = "integrator\\files" # base folder to use with files in it
c = os.path.join(a, b)
flist = os.listdir(c)
images = filter(lambda x: x.endswith('.wav'), flist) # filter non-wavs
for i in range(len(flist)):
print(flist[i])
print()
And it works fine for me! But I still cannot catch how to organize the multiple files reading, and calculating their mean FIR massive
As I keeked I need something like "global package":
import glob
import mainfile
files = glob.glob('./*.wav')
for ele in files:
f(ele)
quit()
Wherу the mainfile.py looks somethng like that:
import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
from scipy.fftpack import fft
from pylab import *
def f(filename):
fs, data = wavfile.read(filename) # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**16.)*2-1 for ele in a] # this is 16-bit track, now normalized on [-1,1)
c = fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
And here I just don;t know what should I better do with 'd's - summing in cycle or... Then this code example operated just 1 channel for plotting - I need the output FIR as seqence of pairs for each channel. Yet still it's not clear how to tweak FFT window to Hanning with at least 65536 FFT-size (oh yes, I know thу)calculations are slow as hell).
In the end we can plot and save the graph:
plt.plot(abs(c[:(d-1)]),'r')
savefig(filename+'.png',bbox_inches='tight')
... and somehow write average FIR to the txt table file
I'd be happy enough if this script worked as the console application (though at first I dreamt of kinda minimalistic GUI with ability choose any folder containing files with certian overview button and with progress bar to make sure that app is still breathing... though hard covering ten or twenty five wavs with FFT slow "scythe".
Got C:\Anaconda2 (with numpy, scipy and matplotlib properly installed) on Windows 7 x86 PC
Thank you in advance!
With regards,
Me.
I want to create a .wav file in Python using the numpy and scipy libraries, where multiple tones are played, the way I intend to do it is by storing my frequencies in an array and then the generated signals are stored in another one. I've managed to create such file with the desired playtime, but it doesn't play any sound. Am I missing something?
Thank you.
import numpy as np
from scipy.io import wavfile
freq =np.array([440,493,523,587,659,698,783,880]) #tone frequencies
fs=22050 #sample rate
duration=1 #signal duration
music=[]
t=np.arange(0,duration,1./fs) #time
for i in range(0,len(freq)):
x=10000*np.cos(2*np.pi*freq[i]*t) #generated signals
music=np.hstack((music,x))
wavfile.write('music.wav',fs,music)
The vector that you are using to create the wave file contains floats, but scipy.io interprets them as 64 bit ints (as is mentioned in the docs), which is not supported by most players.
Changing the last line to
wavfile.write('music.wav',fs,music.astype(np.dtype('i2')))
should produce a file that can be played properly.