I have an audio file in wav format, I would like extract particular timestamps from duration of audio where the loudness is significantly high.
For examples, Consider speech commentaries of sports game , my goal is to identify a timestamp in audio where the commentator shouts for a specific highlight in on-going game.
Python is the priority
Expected output:
start(seconds) end(seconds)
[0.81, 2.429] etc
def target_amplitude(sound, target_dBFS):
diff_in_dBFS = target_dBFS - sound.dBFS
return sound.apply_gain(diff_in_dBFS)
verified_sound = target_amplitude(vid_aud, -20.0)
nonsilent_data = detect_nonsilent(verified_sound, min_silence_len=500, silence_thresh=-20, seek_step=1)
for chunks in nonsilent_data:
chunk=[chunk/1000 for chunk in chunks]
time_list.append(chunk)
This is not actually very hard. The wave module can read a wave file. `numpy can tell you which array elements are outside of a range.
import wave
import numpy as np
w = wave.open('sound.wav')
sam = w.readframes(w.getnframes())
sam = np.frombuffer(sam, dtype=np.int16)
bigpos = np.where( sam > 20000 )
bigneg = np.where( sam < -20000 )
This assumes you have a wave file. If you have an MP3, you'll have to deccode it.
Related
I have an audio file and I want to split it every 2 seconds. Is there a way to do this with librosa?
So if I had a 60 seconds file, I would split it into 30 two second files.
librosa is first and foremost a library for audio analysis, not audio synthesis or processing. The support for writing simple audio files is given (see here), but it is also stated there:
This function is deprecated in librosa 0.7.0. It will be removed in 0.8. Usage of write_wav should be replaced by soundfile.write.
Given this information, I'd rather use a tool like sox to split audio files.
From "Split mp3 file to TIME sec each using SoX":
You can run SoX like this:
sox file_in.mp3 file_out.mp3 trim 0 2 : newfile : restart
It will create a series of files with a 2-second chunk of the audio each.
If you'd rather stay within Python, you might want to use pysox for the job.
You can split your file using librosa running the following code. I have added comments necessary so that you understand the steps carried out.
# First load the file
audio, sr = librosa.load(file_name)
# Get number of samples for 2 seconds; replace 2 by any number
buffer = 2 * sr
samples_total = len(audio)
samples_wrote = 0
counter = 1
while samples_wrote < samples_total:
#check if the buffer is not exceeding total samples
if buffer > (samples_total - samples_wrote):
buffer = samples_total - samples_wrote
block = audio[samples_wrote : (samples_wrote + buffer)]
out_filename = "split_" + str(counter) + "_" + file_name
# Write 2 second segment
librosa.output.write_wav(out_filename, block, sr)
counter += 1
samples_wrote += buffer
[Update]
librosa.output.write_wav() has been removed from librosa, so now we have to use soundfile.write()
Import required library
import soundfile as sf
replace
librosa.output.write_wav(out_filename, block, sr)
with
sf.write(out_filename, block, sr)
after using split_on_silence the audio transform :
for exemple :
Original : hello, my name is John.
chunks :
chunk1 : ell
chunk2 : name
my code :
from pydub import AudioSegment
from pydub.silence import split_on_silence
song = AudioSegment.from_wav("videofr.wav")
#split track where silence is 0.2 seconds or more and get chunks
chunks = split_on_silence(song,
# must be silent for at least 0.2 seconds or 200 ms
min_silence_len=200,
)
cpt = 0
print(len(song))
for i,chunk in enumerate(chunks):
print((chunk))
print(len(chunk))
cpt = cpt + 1
chunk.export(".//chunk{0}.wav".format(i), format="wav")
Try varying min_silence_len and silence_thresh values to get as close to actual silence duration and dbFS level as possible.
for e.g.
chunks = split_on_silence(song,
# must be silent for at least 0.2 seconds or 200 ms
min_silence_len=200,
# consider it silent if quieter than -16 dBFS
silence_thresh=-16
You can verify actual values by loading your file in audacity and checking for silence duration and amplitude at the end of sentences.
I was just playing around with sound input and output on a raspberry pi using python.
My plan was to read the input of a microphone, manipulate it and playback the manipulated audio. At the moment I tried to read and playback the audio.
The reading seems to work, since i wrote the read data into a wave file in the last step, and the wave file seemed fine.
But the playback is noise sounds only.
Playing the wave file worked as well, so the headset is fine.
I think maybe I got some problem in my settings or the output format.
The code:
import alsaaudio as audio
import time
import audioop
#Input & Output Settings
periodsize = 1024
audioformat = audio.PCM_FORMAT_FLOAT_LE
channels = 16
framerate=8000
#Input Device
inp = audio.PCM(audio.PCM_CAPTURE,audio.PCM_NONBLOCK,device='hw:1,0')
inp.setchannels(channels)
inp.setrate(framerate)
inp.setformat(audioformat)
inp.setperiodsize(periodsize)
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='hw:0,0')
out.setchannels(channels)
out.setrate(framerate)
out.setformat(audioformat)
out.setperiodsize(periodsize)
#Reading the Input
allData = bytearray()
count = 0
while True:
#reading the input into one long bytearray
l,data = inp.read()
for b in data:
allData.append(b)
#Just an ending condition
count += 1
if count == 4000:
break
time.sleep(.001)
#splitting the bytearray into period sized chunks
list1 = [allData[i:i+periodsize] for i in range(0, len(allData), periodsize)]
#Writing the output
for arr in list1:
# I tested writing the arr's to a wave file at this point
# and the wave file was fine
out.write(arr)
Edit: Maybe I should mention, that I am using python 3
I just found the answer. audioformat = audio.PCM_FORMAT_FLOAT_LE this format isn't the one used by my Headset (just copied and pasted it without a second thought).
I found out about my microphones format (and additional information) by running speaker-test in the console.
Since my speakers format is S16_LE the code works fine with audioformat = audio.PCM_FORMAT_S16_LE
consider using plughw (alsa subsystem supporting resampling/conversion) for the sink part of the chain at least:
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='plughw:0,0')
this should help to negotiate sampling rate as well as the data format.
periodsize is better to estimate based on 1/times of the sample rate like:
periodsize = framerate / 8 (8 = times for 8000 KHz sampling rate)
and sleeptime is better to estimate as a half of the time necessary to play periodsize:
sleeptime = 1.0 / 16 (1.0 - is a second, 16 = 2*times for 8000 KHz sampling rate)
I am trying to overlap the end of 1 wav file with 20% of the start of the next file. Like this, there are a variable number of files to overla (usually around 5-6).
I have tried using pydub implementation be expanding the following for overlaying 2 wav files :
from pydub import AudioSegment
sound1 = AudioSegment.from_wav("/path/to/file1.wav")
sound2 = AudioSegment.from_wav("/path/to/file1.wav")
# mix sound2 with sound1, starting at 70% into sound1)
output = sound1.overlay(sound2, position=0.7 * len(sound1))
# save the result
output.export("mixed_sounds.wav", format="wav")
And wrote the following program :
for i in range(0,len(files_to_combine)-1):
if 'full_wav' in locals():
prev_wav = full_wav
else:
prev = files_to_combine[i]
prev_wav = AudioSegment.from_wav(prev)
next = files_to_combine[i+1]
next_wav = AudioSegment.from_wav(next)
new_wave = prev_wav.overlay(next_wav,position=len(prev_wav) - 0.3 * len(next_wav))
new_wave.export('partial_wav.wav', format='wav')
full_wav = AudioSegment.from_wav('partial_wav.wav')
However, when I look at the final wave file, only the first 2 files in the list files_to_combine were actually combined and not the rest. The idea was to continuously rewrite partial_wav.wav until it finally contains the full wav file of the near end to end overlapped sounds. To debug this, I stored the new_wave in different files for every combination. The first wave file is the last: it only shows the first 2 wave files combined instead of the entire thing. Furthermore, I expected the len(partial_wav) for every iteration to gradually increase. Hoever, this remains the same after the first combination:
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
MAIN QUESTION
How do I overlap the end of one wav file (about the last 30%) with the beginning of the next for more than 3 wave files?
I believe you can just keep on cascading audiosegments until your final segment as below.
Working Code:
from pydub import AudioSegment
from pydub.playback import play
sound1 = AudioSegment.from_wav("SineWave_440Hz.wav")
sound2 = AudioSegment.from_wav("SineWave_150Hz.wav")
sound3 = AudioSegment.from_wav("SineWave_660Hz.wav")
# mix sound2 with sound1, starting at 70% into sound1)
tmpsound = sound1.overlay(sound2, position=0.7 * len(sound1))
# mix sound3 with sound1+sound2, starting at 30% into sound1+sound2)
output = tmpsound .overlay(sound3, position=0.3 * len(tmpsound))
play(output)
output.export("mixed_sounds.wav", format="wav")
How can I produce real-time audio output from music made with Music21. Failing that, how can i produce ANY audio output from music made with Music21 via open-source software? Thanks for the help.
As you've seen, music21 isn't designed to be a music playback system, but it IS designed to be embedded within other playback systems or to call them from within the system. We're not planning on putting too much work into playback systems (because of the hardware support, our being a tiny research lab, the work still needing to be done on musical analysis, etc.), but your solution is so elegant that it is now included in all versions of music21 (post v1.1) as the music21.midi.realtime module. Here's an example that takes music21's ability to dynamically allocate midi channels with different pitch-bend objects in order to simulate microtonal playback (a major problem for most midi playback):
# Set up a detuned piano
# (where each key has a random
# but consistent detuning from 30 cents flat to sharp)
# and play a Bach Chorale on it in real time.
from music21 import *
import random
keyDetune = []
for i in range(0, 127):
keyDetune.append(random.randint(-30, 30))
b = corpus.parse('bach/bwv66.6')
for n in b.flat.notes:
n.microtone = keyDetune[n.pitch.midi]
sp = midi.realtime.StreamPlayer(b)
sp.play()
The StreamPlayer's .play() function can also take busyFunction and busyArgs and busyWaitMilliseconds arguments which specify a function to call with arguments at most every busyWaitMilliseconds (could be more if your system is slower). There is also an endFunction and endArgs that will be called at the end, in case you want to set up some sort of threaded playback. -- Myke Cuthbert (Music21 creator)
So here's what I found out. Here's a python script that works on Windows XP. It needs pygame in addition to music21.
# genPlayM21Score.py Generates and Plays 2 Music21 Scores "on the fly".
#
# see way below for source notes
from music21 import *
# we create the music21 Bottom Part, and do this explicitly, one object at a time.
n1 = note.Note('e4')
n1.duration.type = 'whole'
n2 = note.Note('d4')
n2.duration.type = 'whole'
m1 = stream.Measure()
m2 = stream.Measure()
m1.append(n1)
m2.append(n2)
partLower = stream.Part()
partLower.append(m1)
partLower.append(m2)
# For the music21 Upper Part, we automate the note creation procedure
data1 = [('g4', 'quarter'), ('a4', 'quarter'), ('b4', 'quarter'), ('c#5', 'quarter')]
data2 = [('d5', 'whole')]
data = [data1, data2]
partUpper = stream.Part()
def makeUpperPart(data):
for mData in data:
m = stream.Measure()
for pitchName, durType in mData:
n = note.Note(pitchName)
n.duration.type = durType
m.append(n)
partUpper.append(m)
makeUpperPart(data)
# Now, we can add both Part objects into a music21 Score object.
sCadence = stream.Score()
sCadence.insert(0, partUpper)
sCadence.insert(0, partLower)
# Now, let's play the MIDI of the sCadence Score [from memory, ie no file write necessary] using pygame
import cStringIO
# for music21 <= v.1.2:
if hasattr(sCadence, 'midiFile'):
sCadence_mf = sCadence.midiFile
else: # for >= v.1.3:
sCadence_mf = midi.translate.streamToMidiFile(sCadence)
sCadence_mStr = sCadence_mf.writestr()
sCadence_mStrFile = cStringIO.StringIO(sCadence_mStr)
import pygame
freq = 44100 # audio CD quality
bitsize = -16 # unsigned 16 bit
channels = 2 # 1 is mono, 2 is stereo
buffer = 1024 # number of samples
pygame.mixer.init(freq, bitsize, channels, buffer)
# optional volume 0 to 1.0
pygame.mixer.music.set_volume(0.8)
def play_music(music_file):
"""
stream music with mixer.music module in blocking manner
this will stream the sound from disk while playing
"""
clock = pygame.time.Clock()
try:
pygame.mixer.music.load(music_file)
print "Music file %s loaded!" % music_file
except pygame.error:
print "File %s not found! (%s)" % (music_file, pygame.get_error())
return
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
# check if playback has finished
clock.tick(30)
# play the midi file we just saved
play_music(sCadence_mStrFile)
#============================
# now let's make a new music21 Score by reversing the upperPart notes
data1.reverse()
data2 = [('d5', 'whole')]
data = [data1, data2]
partUpper = stream.Part()
makeUpperPart(data)
sCadence2 = stream.Score()
sCadence2.insert(0, partUpper)
sCadence2.insert(0, partLower)
# now let's play the new Score
sCadence2_mf = sCadence2.midiFile
sCadence2_mStr = sCadence2_mf.writestr()
sCadence2_mStrFile = cStringIO.StringIO(sCadence2_mStr)
play_music(sCadence2_mStrFile)
## SOURCE NOTES
## There are 3 sources for this mashup:
# 1. Source for the Music21 Score Creation http://web.mit.edu/music21/doc/html/quickStart.html#creating-notes-measures-parts-and-scores
# 2. Source for the Music21 MidiFile Class Behaviour http://mit.edu/music21/doc/html/moduleMidiBase.html?highlight=midifile#music21.midi.base.MidiFile
# 3. Source for the pygame player: http://www.daniweb.com/software-development/python/code/216979/embed-and-play-midi-music-in-your-code-python