Python - Overlay more than 3 WAV files end to end - python

I am trying to overlap the end of 1 wav file with 20% of the start of the next file. Like this, there are a variable number of files to overla (usually around 5-6).
I have tried using pydub implementation be expanding the following for overlaying 2 wav files :
from pydub import AudioSegment
sound1 = AudioSegment.from_wav("/path/to/file1.wav")
sound2 = AudioSegment.from_wav("/path/to/file1.wav")
# mix sound2 with sound1, starting at 70% into sound1)
output = sound1.overlay(sound2, position=0.7 * len(sound1))
# save the result
output.export("mixed_sounds.wav", format="wav")
And wrote the following program :
for i in range(0,len(files_to_combine)-1):
if 'full_wav' in locals():
prev_wav = full_wav
else:
prev = files_to_combine[i]
prev_wav = AudioSegment.from_wav(prev)
next = files_to_combine[i+1]
next_wav = AudioSegment.from_wav(next)
new_wave = prev_wav.overlay(next_wav,position=len(prev_wav) - 0.3 * len(next_wav))
new_wave.export('partial_wav.wav', format='wav')
full_wav = AudioSegment.from_wav('partial_wav.wav')
However, when I look at the final wave file, only the first 2 files in the list files_to_combine were actually combined and not the rest. The idea was to continuously rewrite partial_wav.wav until it finally contains the full wav file of the near end to end overlapped sounds. To debug this, I stored the new_wave in different files for every combination. The first wave file is the last: it only shows the first 2 wave files combined instead of the entire thing. Furthermore, I expected the len(partial_wav) for every iteration to gradually increase. Hoever, this remains the same after the first combination:
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
MAIN QUESTION
How do I overlap the end of one wav file (about the last 30%) with the beginning of the next for more than 3 wave files?

I believe you can just keep on cascading audiosegments until your final segment as below.
Working Code:
from pydub import AudioSegment
from pydub.playback import play
sound1 = AudioSegment.from_wav("SineWave_440Hz.wav")
sound2 = AudioSegment.from_wav("SineWave_150Hz.wav")
sound3 = AudioSegment.from_wav("SineWave_660Hz.wav")
# mix sound2 with sound1, starting at 70% into sound1)
tmpsound = sound1.overlay(sound2, position=0.7 * len(sound1))
# mix sound3 with sound1+sound2, starting at 30% into sound1+sound2)
output = tmpsound .overlay(sound3, position=0.3 * len(tmpsound))
play(output)
output.export("mixed_sounds.wav", format="wav")

Related

Extract Timestamps of an audio file when loud noises occur, python

I have an audio file in wav format, I would like extract particular timestamps from duration of audio where the loudness is significantly high.
For examples, Consider speech commentaries of sports game , my goal is to identify a timestamp in audio where the commentator shouts for a specific highlight in on-going game.
Python is the priority
Expected output:
start(seconds) end(seconds)
[0.81, 2.429] etc
def target_amplitude(sound, target_dBFS):
diff_in_dBFS = target_dBFS - sound.dBFS
return sound.apply_gain(diff_in_dBFS)
verified_sound = target_amplitude(vid_aud, -20.0)
nonsilent_data = detect_nonsilent(verified_sound, min_silence_len=500, silence_thresh=-20, seek_step=1)
for chunks in nonsilent_data:
chunk=[chunk/1000 for chunk in chunks]
time_list.append(chunk)
This is not actually very hard. The wave module can read a wave file. `numpy can tell you which array elements are outside of a range.
import wave
import numpy as np
w = wave.open('sound.wav')
sam = w.readframes(w.getnframes())
sam = np.frombuffer(sam, dtype=np.int16)
bigpos = np.where( sam > 20000 )
bigneg = np.where( sam < -20000 )
This assumes you have a wave file. If you have an MP3, you'll have to deccode it.

Split audio on timestamps librosa

I have an audio file and I want to split it every 2 seconds. Is there a way to do this with librosa?
So if I had a 60 seconds file, I would split it into 30 two second files.
librosa is first and foremost a library for audio analysis, not audio synthesis or processing. The support for writing simple audio files is given (see here), but it is also stated there:
This function is deprecated in librosa 0.7.0. It will be removed in 0.8. Usage of write_wav should be replaced by soundfile.write.
Given this information, I'd rather use a tool like sox to split audio files.
From "Split mp3 file to TIME sec each using SoX":
You can run SoX like this:
sox file_in.mp3 file_out.mp3 trim 0 2 : newfile : restart
It will create a series of files with a 2-second chunk of the audio each.
If you'd rather stay within Python, you might want to use pysox for the job.
You can split your file using librosa running the following code. I have added comments necessary so that you understand the steps carried out.
# First load the file
audio, sr = librosa.load(file_name)
# Get number of samples for 2 seconds; replace 2 by any number
buffer = 2 * sr
samples_total = len(audio)
samples_wrote = 0
counter = 1
while samples_wrote < samples_total:
#check if the buffer is not exceeding total samples
if buffer > (samples_total - samples_wrote):
buffer = samples_total - samples_wrote
block = audio[samples_wrote : (samples_wrote + buffer)]
out_filename = "split_" + str(counter) + "_" + file_name
# Write 2 second segment
librosa.output.write_wav(out_filename, block, sr)
counter += 1
samples_wrote += buffer
[Update]
librosa.output.write_wav() has been removed from librosa, so now we have to use soundfile.write()
Import required library
import soundfile as sf
replace
librosa.output.write_wav(out_filename, block, sr)
with
sf.write(out_filename, block, sr)

split_on_silence modify the original audio

after using split_on_silence the audio transform :
for exemple :
Original : hello, my name is John.
chunks :
chunk1 : ell
chunk2 : name
my code :
from pydub import AudioSegment
from pydub.silence import split_on_silence
song = AudioSegment.from_wav("videofr.wav")
#split track where silence is 0.2 seconds or more and get chunks
chunks = split_on_silence(song,
# must be silent for at least 0.2 seconds or 200 ms
min_silence_len=200,
)
cpt = 0
print(len(song))
for i,chunk in enumerate(chunks):
print((chunk))
print(len(chunk))
cpt = cpt + 1
chunk.export(".//chunk{0}.wav".format(i), format="wav")
Try varying min_silence_len and silence_thresh values to get as close to actual silence duration and dbFS level as possible.
for e.g.
chunks = split_on_silence(song,
# must be silent for at least 0.2 seconds or 200 ms
min_silence_len=200,
# consider it silent if quieter than -16 dBFS
silence_thresh=-16
You can verify actual values by loading your file in audacity and checking for silence duration and amplitude at the end of sentences.

Alsaaudio record and playback

I was just playing around with sound input and output on a raspberry pi using python.
My plan was to read the input of a microphone, manipulate it and playback the manipulated audio. At the moment I tried to read and playback the audio.
The reading seems to work, since i wrote the read data into a wave file in the last step, and the wave file seemed fine.
But the playback is noise sounds only.
Playing the wave file worked as well, so the headset is fine.
I think maybe I got some problem in my settings or the output format.
The code:
import alsaaudio as audio
import time
import audioop
#Input & Output Settings
periodsize = 1024
audioformat = audio.PCM_FORMAT_FLOAT_LE
channels = 16
framerate=8000
#Input Device
inp = audio.PCM(audio.PCM_CAPTURE,audio.PCM_NONBLOCK,device='hw:1,0')
inp.setchannels(channels)
inp.setrate(framerate)
inp.setformat(audioformat)
inp.setperiodsize(periodsize)
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='hw:0,0')
out.setchannels(channels)
out.setrate(framerate)
out.setformat(audioformat)
out.setperiodsize(periodsize)
#Reading the Input
allData = bytearray()
count = 0
while True:
#reading the input into one long bytearray
l,data = inp.read()
for b in data:
allData.append(b)
#Just an ending condition
count += 1
if count == 4000:
break
time.sleep(.001)
#splitting the bytearray into period sized chunks
list1 = [allData[i:i+periodsize] for i in range(0, len(allData), periodsize)]
#Writing the output
for arr in list1:
# I tested writing the arr's to a wave file at this point
# and the wave file was fine
out.write(arr)
Edit: Maybe I should mention, that I am using python 3
I just found the answer. audioformat = audio.PCM_FORMAT_FLOAT_LE this format isn't the one used by my Headset (just copied and pasted it without a second thought).
I found out about my microphones format (and additional information) by running speaker-test in the console.
Since my speakers format is S16_LE the code works fine with audioformat = audio.PCM_FORMAT_S16_LE
consider using plughw (alsa subsystem supporting resampling/conversion) for the sink part of the chain at least:
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='plughw:0,0')
this should help to negotiate sampling rate as well as the data format.
periodsize is better to estimate based on 1/times of the sample rate like:
periodsize = framerate / 8 (8 = times for 8000 KHz sampling rate)
and sleeptime is better to estimate as a half of the time necessary to play periodsize:
sleeptime = 1.0 / 16 (1.0 - is a second, 16 = 2*times for 8000 KHz sampling rate)

Python, pydub splitting an audio file

Hi I am using pydub to split an audio file, giving the ranges to take segments from the original.
What I have is:
from pydub import AudioSegment
sound_file = AudioSegment.from_mp3("C:\\audio file.mp3")
# milliseconds in the sound track
ranges = [(30000,40000),(50000,60000),(80000,90000),(100000,110000),(150000,180000)]
for x, y in ranges:
new_file = sound_file[x : y]
new_file.export("C:\\" + str(x) + "-" + str(y) +".mp3", format="mp3")
It works well for the first 3 new files. However not the rest - it doesn’t split accordingly.
Does the problem lie in the way I give the range?
Thank you.
Add-on:
When it's made simple - for example
sound_file[150000:180000]
and export it to a mp3 file. it works but only cuts 50000:80000 part. it seems not reading a correct range.
Try this, it might work
import pydub
import numpy as np
sound_file = pydub.AudioSegment.from_mp3("a.mp3")
sound_file_Value = np.array(sound_file.get_array_of_samples())
# milliseconds in the sound track
ranges = [(30000,40000),(50000,60000),(80000,90000),(100000,110000),(150000,180000)]
for x, y in ranges:
new_file=sound_file_Value[x : y]
song = pydub.AudioSegment(new_file.tobytes(), frame_rate=sound_file.frame_rate,sample_width=sound_file.sample_width,channels=1)
song.export(str(x) + "-" + str(y) +".mp3", format="mp3")

Categories

Resources