Split audio on timestamps librosa - python

I have an audio file and I want to split it every 2 seconds. Is there a way to do this with librosa?
So if I had a 60 seconds file, I would split it into 30 two second files.

librosa is first and foremost a library for audio analysis, not audio synthesis or processing. The support for writing simple audio files is given (see here), but it is also stated there:
This function is deprecated in librosa 0.7.0. It will be removed in 0.8. Usage of write_wav should be replaced by soundfile.write.
Given this information, I'd rather use a tool like sox to split audio files.
From "Split mp3 file to TIME sec each using SoX":
You can run SoX like this:
sox file_in.mp3 file_out.mp3 trim 0 2 : newfile : restart
It will create a series of files with a 2-second chunk of the audio each.
If you'd rather stay within Python, you might want to use pysox for the job.

You can split your file using librosa running the following code. I have added comments necessary so that you understand the steps carried out.
# First load the file
audio, sr = librosa.load(file_name)
# Get number of samples for 2 seconds; replace 2 by any number
buffer = 2 * sr
samples_total = len(audio)
samples_wrote = 0
counter = 1
while samples_wrote < samples_total:
#check if the buffer is not exceeding total samples
if buffer > (samples_total - samples_wrote):
buffer = samples_total - samples_wrote
block = audio[samples_wrote : (samples_wrote + buffer)]
out_filename = "split_" + str(counter) + "_" + file_name
# Write 2 second segment
librosa.output.write_wav(out_filename, block, sr)
counter += 1
samples_wrote += buffer
[Update]
librosa.output.write_wav() has been removed from librosa, so now we have to use soundfile.write()
Import required library
import soundfile as sf
replace
librosa.output.write_wav(out_filename, block, sr)
with
sf.write(out_filename, block, sr)

Related

Extract Timestamps of an audio file when loud noises occur, python

I have an audio file in wav format, I would like extract particular timestamps from duration of audio where the loudness is significantly high.
For examples, Consider speech commentaries of sports game , my goal is to identify a timestamp in audio where the commentator shouts for a specific highlight in on-going game.
Python is the priority
Expected output:
start(seconds) end(seconds)
[0.81, 2.429] etc
def target_amplitude(sound, target_dBFS):
diff_in_dBFS = target_dBFS - sound.dBFS
return sound.apply_gain(diff_in_dBFS)
verified_sound = target_amplitude(vid_aud, -20.0)
nonsilent_data = detect_nonsilent(verified_sound, min_silence_len=500, silence_thresh=-20, seek_step=1)
for chunks in nonsilent_data:
chunk=[chunk/1000 for chunk in chunks]
time_list.append(chunk)
This is not actually very hard. The wave module can read a wave file. `numpy can tell you which array elements are outside of a range.
import wave
import numpy as np
w = wave.open('sound.wav')
sam = w.readframes(w.getnframes())
sam = np.frombuffer(sam, dtype=np.int16)
bigpos = np.where( sam > 20000 )
bigneg = np.where( sam < -20000 )
This assumes you have a wave file. If you have an MP3, you'll have to deccode it.

Apply Loop for Subprocessing Call FFMPEG in Python

suppose I have a audio mp3 file with length 00:04:00 (240 seconds). I want to extract parts of said file each within range 2 seconds, so it would be:
File_01 00:00:00-00:00:02, File_02 00:00:02-00:00:04, File_03 00:00:04-00:00:06 ... File_120 00:03:58-00:04:00.
I am using python, call module subprocess to run ffmpeg function. What I did, I simply put it in a loop like this:
count = 0
count2 = 2
count3 = 1
while count2 <= audio_length:
ffmpeg = 'ffmpeg -i input.mp3 -c copy -ss %d -to %d output%d.wav' % (count, count2, count3)
subprocess.call(ffmpeg, shell=True)
count = count + 2
count2 = count2 + 2
count3 = count3 + 1
However, the subprocess part took a long time and it seems stucked. I've searched some insights, but i haven't found any that mentions about looping. Any help appreciated.
If you are open to using an external library and working with offline audio file, pydub has builtin utility to make chunks of playable audio of given length.
Simply call make_chunk method from pydub.utils and provide chunk size and export playable audio chunks.
I took a 34.6 seconds long file and split in 18 chunks of 2 seconds each. Last chunk may be less than 2 seconds depending upon length which in my case was 0.6 seconds.
Working Code:
from pydub import AudioSegment
from pydub.utils import make_chunks
audiofile = 'example.wav'
#set chunk duration in milliseconds
chunk_duration = 2000 #2 seconds
#Convert audio to audio segment
audio_segment = AudioSegment.from_wav(audiofile)
print("audio length in seconds={}".format(len(audio_segment) / float(1000.0)))
#make chunks
chunks = make_chunks(audio_segment, chunk_duration)
end = 0
for idx,chunk in enumerate(chunks):
start = end
end = start + (chunk_duration//1000)
count = idx + 1
print("Exporting File_{}_{}:{}.wav".format(count,start,end))
chunk.export("File_{}_{}:{}.wav".format(count,start,end))
Output:
$python splitaudio.py
audio length in seconds=34.6
Exporting File_1_0:2.wav
Exporting File_2_2:4.wav
Exporting File_3_4:6.wav
Exporting File_4_6:8.wav
Exporting File_5_8:10.wav
Exporting File_6_10:12.wav
Exporting File_7_12:14.wav
Exporting File_8_14:16.wav
Exporting File_9_16:18.wav
Exporting File_10_18:20.wav
Exporting File_11_20:22.wav
Exporting File_12_22:24.wav
Exporting File_13_24:26.wav
Exporting File_14_26:28.wav
Exporting File_15_28:30.wav
Exporting File_16_30:32.wav
Exporting File_17_32:34.wav
Exporting File_18_34:36.wav
Answer from Anil_M is completely valid, but I thought it's good to mention two other ways that are fast and do not require scripting in Python (plus offer ton of extra features should you need them).
With ffmpeg that you have tried already:
ffmpeg -i input.mp3 -f segment -segment_time 2 -c copy output%03d.mp3
And SoX:
sox input.mp3 output.mp3 trim 0 2 : newfile : restart

Alsaaudio record and playback

I was just playing around with sound input and output on a raspberry pi using python.
My plan was to read the input of a microphone, manipulate it and playback the manipulated audio. At the moment I tried to read and playback the audio.
The reading seems to work, since i wrote the read data into a wave file in the last step, and the wave file seemed fine.
But the playback is noise sounds only.
Playing the wave file worked as well, so the headset is fine.
I think maybe I got some problem in my settings or the output format.
The code:
import alsaaudio as audio
import time
import audioop
#Input & Output Settings
periodsize = 1024
audioformat = audio.PCM_FORMAT_FLOAT_LE
channels = 16
framerate=8000
#Input Device
inp = audio.PCM(audio.PCM_CAPTURE,audio.PCM_NONBLOCK,device='hw:1,0')
inp.setchannels(channels)
inp.setrate(framerate)
inp.setformat(audioformat)
inp.setperiodsize(periodsize)
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='hw:0,0')
out.setchannels(channels)
out.setrate(framerate)
out.setformat(audioformat)
out.setperiodsize(periodsize)
#Reading the Input
allData = bytearray()
count = 0
while True:
#reading the input into one long bytearray
l,data = inp.read()
for b in data:
allData.append(b)
#Just an ending condition
count += 1
if count == 4000:
break
time.sleep(.001)
#splitting the bytearray into period sized chunks
list1 = [allData[i:i+periodsize] for i in range(0, len(allData), periodsize)]
#Writing the output
for arr in list1:
# I tested writing the arr's to a wave file at this point
# and the wave file was fine
out.write(arr)
Edit: Maybe I should mention, that I am using python 3
I just found the answer. audioformat = audio.PCM_FORMAT_FLOAT_LE this format isn't the one used by my Headset (just copied and pasted it without a second thought).
I found out about my microphones format (and additional information) by running speaker-test in the console.
Since my speakers format is S16_LE the code works fine with audioformat = audio.PCM_FORMAT_S16_LE
consider using plughw (alsa subsystem supporting resampling/conversion) for the sink part of the chain at least:
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='plughw:0,0')
this should help to negotiate sampling rate as well as the data format.
periodsize is better to estimate based on 1/times of the sample rate like:
periodsize = framerate / 8 (8 = times for 8000 KHz sampling rate)
and sleeptime is better to estimate as a half of the time necessary to play periodsize:
sleeptime = 1.0 / 16 (1.0 - is a second, 16 = 2*times for 8000 KHz sampling rate)

Python - Overlay more than 3 WAV files end to end

I am trying to overlap the end of 1 wav file with 20% of the start of the next file. Like this, there are a variable number of files to overla (usually around 5-6).
I have tried using pydub implementation be expanding the following for overlaying 2 wav files :
from pydub import AudioSegment
sound1 = AudioSegment.from_wav("/path/to/file1.wav")
sound2 = AudioSegment.from_wav("/path/to/file1.wav")
# mix sound2 with sound1, starting at 70% into sound1)
output = sound1.overlay(sound2, position=0.7 * len(sound1))
# save the result
output.export("mixed_sounds.wav", format="wav")
And wrote the following program :
for i in range(0,len(files_to_combine)-1):
if 'full_wav' in locals():
prev_wav = full_wav
else:
prev = files_to_combine[i]
prev_wav = AudioSegment.from_wav(prev)
next = files_to_combine[i+1]
next_wav = AudioSegment.from_wav(next)
new_wave = prev_wav.overlay(next_wav,position=len(prev_wav) - 0.3 * len(next_wav))
new_wave.export('partial_wav.wav', format='wav')
full_wav = AudioSegment.from_wav('partial_wav.wav')
However, when I look at the final wave file, only the first 2 files in the list files_to_combine were actually combined and not the rest. The idea was to continuously rewrite partial_wav.wav until it finally contains the full wav file of the near end to end overlapped sounds. To debug this, I stored the new_wave in different files for every combination. The first wave file is the last: it only shows the first 2 wave files combined instead of the entire thing. Furthermore, I expected the len(partial_wav) for every iteration to gradually increase. Hoever, this remains the same after the first combination:
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
MAIN QUESTION
How do I overlap the end of one wav file (about the last 30%) with the beginning of the next for more than 3 wave files?
I believe you can just keep on cascading audiosegments until your final segment as below.
Working Code:
from pydub import AudioSegment
from pydub.playback import play
sound1 = AudioSegment.from_wav("SineWave_440Hz.wav")
sound2 = AudioSegment.from_wav("SineWave_150Hz.wav")
sound3 = AudioSegment.from_wav("SineWave_660Hz.wav")
# mix sound2 with sound1, starting at 70% into sound1)
tmpsound = sound1.overlay(sound2, position=0.7 * len(sound1))
# mix sound3 with sound1+sound2, starting at 30% into sound1+sound2)
output = tmpsound .overlay(sound3, position=0.3 * len(tmpsound))
play(output)
output.export("mixed_sounds.wav", format="wav")

search a 2GB WAV file for dropouts using wave module

`What is the best way to analyze a 2GB WAV file (1khz Tone) for audio dropouts using wave module? I tried the script below
import wave
file1 = wave.open("testdropout.wav", "r")
file2 = open("silence.log", "w")
for i in xrange(file1.getnframes()):
frame = file1.readframes(i)
zero = True
for j in xrange(len(frame)):
# check if amplitude is greater than 0
# the ord() function converts the hex values to integers
if ord(frame[j]) > 0:
zero = False
break
if zero:
print >> file2, 'dropout at second %s' % (file1.tell()/file1.getframerate())
file1.close()
file2.close()
I haven't used the wave module before, but file1.readframes(i) looks like it's reading 1 frame when you're at the first frame, 2 frames when you're at the second frame, 10 frames when you're in the tenth frame, and a 2Gb CD quality file might have a million frames - by the time you're at frame 100,000 reading 100,000 frames ... getting slower each time through the loop as well?
And from my comment, in Python 2 range() generates an in-memory array of the full size first, and xrange() doesn't, but not using range at all helps even more.
And push the looping down into the lower layers with any() to make the code shorter, and possibly faster:
import wave
file1 = wave.open("testdropout.wav", "r")
file2 = open("silence.log", "w")
chunksize = file1.getframerate()
chunk = file1.readframes(chunksize)
while chunk:
if not any(ord(sample) for sample in chunk):
print >> file2, 'dropout at second %s' % (file1.tell()/chunksize)
chunk = file1.readframes(chunksize)
file1.close()
file2.close()
This should read the file in 1-second chunks.
I think a simple solution to this would be to consider that the frame rate on audio files is pretty high. A sample file on my computer happens to have a framerate of 8,000. That means for every second of audio, I have 8,000 samples. If you have missing audio, I'm sure it will exist across multiple frames within a second, so you can essentially reduce your comparisons as drastically as your standards would allow. If I were you, I would try iterating over every 1,000th sample instead of every single sample in the audio file. That basically means it will examine every 1/8th of a second of audio to see if it's dead. Not as precise, but hopefully it will get the job done.
import wave
file1 = wave.open("testdropout.wav", "r")
file2 = open("silence.log", "w")
for i in range(file1.getnframes()):
frame = file1.readframes(i)
zero = True
for j in range(0, len(frame), 1000):
# check if amplitude is greater than 0
# the ord() function converts the hex values to integers
if ord(frame[j]) > 0:
zero = False
break
if zero:
print >> file2, 'dropout at second %s' % (file1.tell()/file1.getframerate())
file1.close()
file2.close()
At the moment, you're reading the entire file into memory, which is not ideal. If you look at the methods available for a "Wave_read" object, one of them is setpos(pos), which sets the position of the file pointer to pos. If you update this position, you should be able to only keep the frame you want in memory at any given time, preventing errors. Below is a rough outline:
import wave
file1 = wave.open("testdropout.wav", "r")
file2 = open("silence.log", "w")
def scan_frame(frame):
for j in range(len(frame)):
# check if amplitude is less than 0
# It makes more sense here to check for the desired case (low amplitude)
# rather than breaking at higher amplitudes
if ord(frame[j]) <= 0:
return True
for i in range(file1.getnframes()):
frame = file1.readframes(1) # only read the frame at the current file position
zero = scan_frame(frame)
if zero:
print >> file2, 'dropout at second %s' % (file1.tell()/file1.getframerate())
pos = file1.tell() # States current file position
file1.setpos(pos + len(frame)) # or pos + 1, or whatever a single unit in a wave
# file is, I'm not entirely sure
file1.close()
file2.close()
Hope this can help!

Categories

Resources