I have a number of MP3 files containing lectures where the speaker talks very slowly and I would like to alter the MP3 file so that the playback rate is about 1.5 times as fast as normal.
Can someone suggest a good Python library for this? By the way, I'm running Python 2.6 on Windows.
Thanks in advance.
I wrote a library, pydub which is mainly designed for manipulating audio.
I've created an experimental time-stretching algorithm if you're interested in seeing how these sorts of things work.
Essentially you want to throw away a portion of your data, but you can't just play back the waveform faster because then it'll all get high pitched (as synthesizerpatel mentioned). Instead you want to throw away chunks (20 Hz is the lowest a human can hear so 50ms chunks do not cause audible frequency changes, though there are other artifacts).
PS - I get 50ms like so:
20 Hz == 1 second per 20 cycles
or
1000 ms per 20 cycles
or
1000ms / 20Hz == 50ms per cycle
pymedia includes a recode_audio.py example that allows arbitrary input and output formats available here. This of course requires the installation of pymedia as well.
Note that as Nick T notes, if you just change the sample-rate without resampling you'll get high-pitched 'fast' audio, so you'll want to employ time-stretching in combination with changing the bit-rate.
You can have a try on _spawn module in audio_segment.py of Pydub. Here is an example code:
from pydub import AudioSegment
import os
def speed_swifter(sound, speed=1.0):
return sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={"frame_rate": int(sound.frame_rate * speed)})
in_path = 'your/path/of/input_file/hello.mp3'
ex_path = 'your/path/of/output_file/hello.mp3'
sound = AudioSegment.from_file(in_path)
# generate a slower audio for example
slower_sound = speed_change(sound, 0.5)
slower_sound.export(os.path.join(ex_path, 'slower.mp3'), format="mp3")
Related
I'm building a simple Python application that involves altering the speed of an audio track.
(I acknowledge that changing the framerate of an audio also make pitch appear different, and I do not care about pitch of the audio being altered).
I have tried using solution from abhi krishnan using pydub, which looks like this.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
However, the audio with changed speed sounds distorted, or crackled, which would not be heard with using Audacity to do the same, and I hope I find out a way to reproduce in Python how Audacity (or other digital audio editors) changes the speed of audio tracks.
I presume that the quality loss is caused by the original audio having low framerate, which is 8kHz, and that .set_frame_rate(sound.frame_rate) tries to sample points of the audio with altered speed in the original, low framerate. Simple attempts of setting the framerate of the original audio or the one with altered framerate, and the one that were to be exported didn't work out.
Is there a way in Pydub or in other Python modules that perform the task in the same way Audacity does?
Assuming what you want to do is to play audio back at say x1.5 the speed of the original. This is synonymous to saying to resample the audio samples down by 2/3rds and pretend that the sampling rate hasn't changed. Assuming this is what you are after, I suspect most DSP packages would support it (search audio resampling as the keyphrase).
You can try scipy.signal.resample_poly()
from scipy.signal import resample_poly
dec_data = resample_poly(sound.raw_data,up=2,down=3)
dec_data should have 2/3rds of the number of samples as the original raw_data samples. If you play dec_data samples at the sound's sampling rate, you should get a sped-up version. The downside of using resample_poly is you need a rational factor, and having large numerator or denominator will cause output less ideal. You can try scipy's resample function or seek other packages, which supports audio resampling.
I'm making a Raspberry Pi bat detector using a USB-powered ultrasonic microphone. I want to be able to record bats while excluding insects and other non-bat noises. Recording needs to be sound-triggered to avoid filling the SD card too quickly and to aid with analysis. This website explains how to do this with SoX:
rec - c1 -r 192000 record.wav sinc 10k silence 1 0.1 1% trim 0 5
This records for 5 seconds after a trigger sound of at least 0.1 seconds and includes a 10kHz high pass filter. This is a good start, but what I'd really like is an advanced filter that excludes crickets and other non-bat noises. Insect and bat calls overlap in frequency so a high pass or band filter won't do.
The Elekon Batlogger does this with a period trigger that analyses zero crossings. From the Batlogger website:
The difference in sound production of bats (vocal cords) and insects
(stridulation) affects the period continuity. The period trigger takes
advantage of this:
The trigger fires when ProdVal and DivVal are lower than the set
limits, so if the values are within the yellow range.
(Values mean default values): ProdVal = 8, higher values trigger
easier DivVal = 20, higher values trigger easier
Translated text from the image:
Bat: Tonal signal
Period constant => zero crossings / time = stable
Insects: scratching
Period constant => zero crossings / time = differs
MN => mean value of the number of periods per measurement interval
SD => standard deviation of the number of periods
Higher values trigger better even at low frequencies (also insects!)
And vice versa
Is there a way to implement this (or something to the same effect) in Raspberry Pi OS? The language I'm most familiar with is R. Based on answers to this question it seems like R would be suitable for this problem, although if R isn't the best choice then I'm open to other suggestions.
I'd really appreciate some working code for recording audio and filtering as described above. My desired output is 5 second files that contain bat calls, not insects or noise. Needs to be efficient in terms of CPU / power use and needs to work on-the-fly.
Example recordings of bats and insects here.
UPDATE:
I've got a basic sound-activated script working in Python (based on this answer) but I'm not sure how to include an advanced filter in this:
import pyaudio
import wave
from array import array
import time
FORMAT=pyaudio.paInt16
CHANNELS=1
RATE=44100
CHUNK=1024
RECORD_SECONDS=5
audio=pyaudio.PyAudio()
stream=audio.open(format=FORMAT,channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
nighttime=True # I will expand this later
while nighttime:
data=stream.read(CHUNK)
data_chunk=array('h',data)
vol=max(data_chunk)
if(vol>=3000):
print("recording triggered")
frames=[]
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("recording saved")
# write to file
words = ["RECORDING-", time.strftime("%Y%m%d-%H%M%S"), ".wav"]
FILE_NAME= "".join(words)
wavfile=wave.open(FILE_NAME,'wb')
wavfile.setnchannels(CHANNELS)
wavfile.setsampwidth(audio.get_sample_size(FORMAT))
wavfile.setframerate(RATE)
wavfile.writeframes(b''.join(frames))
wavfile.close()
# check if still nighttime
nighttime=True # I will expand this later
stream.stop_stream()
stream.close()
audio.terminate()
TL;DR
R should be capable of doing this in post-processing. If you want this done on a live audio recording/stream, I'd suggest you look for a different tool.
Longer Answer
R is capable of processing audio files through several packages (most notable seem to be tuneR), but I'm fairly sure that will be limited to post-collection processing, i.e. analysing the files you have already collected, rather than a 'live' filtering of streaming audio input.
There are a couple of approaches you could take to 'live' filtering of the insect/unwanted sounds. One would be to just record files like you have listed above, and then write R code to process them (you could automate this on a schedule with cron for example) and discard parts or files that don't match your criteria. If you are concerned about SD card space you can also offload these files to another location after processing (i.e. upload to another drive somewhere). You could make this a fairly short time frame (at the risk of CPU usage on the Pi), to get an 'almost-live' processing approach.
Another approach would be to look more at the sox documentation and see if there are options in there to achieve what you want based on streaming audio input, or see if there is another tool that you can stream input to, that will allow that sort of filtering.
Viel Glück!
I am trying to do a project, and in part of the project I have the user say a word which gets recorded. This word then gets the silence around it cut out, and there is a button that plays back their word without the silence. I am using librosa's librosa.effects.trim command to achieve this.
For example:
def record_audio():
global myrecording
global yt
playsound(beep1)
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
sd.wait()
playsound(beep2)
#trimming the audio
yt, index = librosa.effects.trim(myrecording, top_db=60)
However, when I play the audio back, I can tell that it is not trimming the recording. The variable explorer shows that myrecording and yt are the same length. I can hear it when I play what is supposed to be the trimmed audio clip back as well. I don't get any error messages when this occurs either. Is there any way to get librosa to actually clip the audio? I have tried adjusting top_db and that did not fix it. Aside from that, I am not quite sure what I could be doing wrong.
For a real answer, you'd have to post a sample recording so that we could inspect what exactly is going on.
In lieu of of that, I'd like to refer to this GitHub issue, where one of the main authors of librosa offers advice for a very similar issue.
In essence: You want to lower the top_db threshold and reduce frame_length and hop_length. E.g.:
yt, index = librosa.effects.trim(myrecording, top_db=50, frame_length=256, hop_length=64)
Decreasing hop_length effectively increases the resolution for trimming. Decreasing top_db makes the function less sensitive, i.e., low level noise is also regarded as silence. Using a computer microphone, you do probably have quite a bit of low level background noise.
If this all does not help, you might want to consider using SOX, or its Python wrapper pysox. It also has a trim function.
Update Look at the waveform of your audio. Does it have a spike somewhere at the beginning? Some crack sound perhaps. That will keep librosa from trimming correctly. Perhaps manually throwing away the first second (=fs samples) and then trimming solves the issue:
librosa.effects.trim(myrecording[fs:], top_db=50, frame_length=256, hop_length=64)
I have .wav files sampled at 192kHz and want to split them based on time to many smaller files while keeping the same sample rate.
To start with I thought I would just open and re-save the wav file using pydub in order to learn how to do this. However when I save it it appears to resave at a much lower file size, I'm not sure why, perhaps the sample rate is lower? and I also can't open the new file with the audio analysis program I usually use (Song scope).
So I had two questions:
- How to open, read, copy and resave a wav file using pydub without changing it? (Sorry I know this is probably easy I just can't find it yet).
Whether Python and Pydub are a sensible choice for what I am trying to do? Or maybe there is a much simpler way.
what I am exactly trying to do is:
Split about 10 high sample frequency wav files (~ 1GB each) into many
(about 100) small wave files. (I plan to make a list of start and end
times for each of the smaller wav files needed then get Python to open
copy and resave the wav file data between those times).
I assume it is possible since I've seen questions for lower frequency wav files, but if you know otherwise or know of a simpler way please let me know. Thanks!!
My code so far is as follows:
from pydub import AudioSegment
# Input audio file to be sliced
audio = AudioSegment.from_wav("20190212_164446.wav")
audio.export("newWavFile.wav")
(I put the wav file and ffmpeg into the same directory as the Python file to save time since was having a lot of trouble getting pydub to find ffmpeg).
In case it's relevant the files are of bat calls, these bats make calls between around 1kHz and 50kHz which is quite low frequency for bats. I'm trying to crop out the actual calls from some very long files.
I know this is a basic question, I just couldn't find the answer yet, please also feel free to direct me to the answer if it's a duplicate.
thanks!!
I'm new to working with large amounts of data. I have a pretty big data set (around 1 million audio files each a couple seconds long), and I'm trying to load the data in an efficient manner for visualization purposes (and eventually to use as training data in a neural network).
What I've tried so far is using librosa (used librosa.load(filename)) but this took a couple hours just to load 10,000 of the files. I tried to find out if I could use a GPU to speed it up (fumbled around with Numba) but I'm not clear if this is even a valid problem for a GPU to solve.
I feel like I'm missing something really obvious. Can someone more experienced tell me what to do? I am having a hard time trying to find the solution on the Internet. Thanks for the help!
You could use pygame.
In this mini program I made, I tested out how long I takes to load a sound file that is about 10 seconds long:
import pygame
import time
pygame.init()
time_now = time.time()
sound = pygame.mixer.music.load('music.wav')
print(time.time() - time_now)
And this is the result is:
0.0
And if you want to play that file, you do:
pygame.mixer.music.play(loops=int, start=float)
It will take about 1-4 hour(s) to load all of them still.
For further info, go to https://www.pygame.org/docs/ref/music.html .