Concatenated audio clips come out broken - python

I am concatenating a couple of audio clips using moviepy, but every 1/2 times or so, in the place where the files unite, there is a hiss sound or other extra sounds. How can I fix this?
Code:
clips = []
for x in os.listdir(r"{}".format(cwd) + "/SpeechFolder"):
clips.append(AudioFileClip(r"{}".format(cwd) + "/SpeechFolder/" + x))
speech = concatenate_audioclips(clips)```

Related

Problem Finding Differences Between Objetcs

I'm a python self student in early stages of my process. I've decided to write a code that does some reverse-engineering on Adobe Premiere's PRPROJ files. This files are gzipped XMLs, and I've managed to parse it and extract most of the attributes I've wanted to and store them in objects. I've later found out what I'm trying to do is kind of an Open Source API to read PRPROJ files. I might learn something in the way and have a lot lo learn yet, so thanks for your patience.
Now, when trying to find what the inner timecodes of Audio Clips in timelines are, I've couldn't find the right criteria to distinguish between them in the XML Code. Premiere seems to know the difference between them and I can't find find it.
My first hypothesis was the difference was about audios having embed timecode or not. My second was file formats —WAVs, AIFFs, mp3 etc— and now I'm in total blank.
Audio clips in timelines are a result of the XML different objects combined in this way:
Object Structure in XML
So I've made this Premiere Project containing different kind of Audio Clips and tried different codes to actually retrieve In and Out points for the audio selection.
Representation of Timeline and Clip's in and outs
I've managed to successfully retrieve 3 of 4 inPoints and OutPoints, resulting of the adding of XMLs object's inPoint + MediaInPoints and OutPoint + MediaOutPoints properly divided by the MediaFrameRate attribute corresponding (for some reason for me still unknown, some kinds use the clip MediaFrame attributes, other uses ProjectSettings' MediaFrame attribute)
**Sample 1 (.mov's audio coming from video file)
**
<MediaInPoint>0</MediaInPoint>
<MediaOutPoint>23843635200000</MediaOutPoint>
<MediaFrameRate>4233600000</MediaFrameRate>
And was successfully calculated this way:
cframerate = int(clip_loggings.find('MediaFrameRate').text)
cinpoint = int(clip_loggings.find('MediaInPoint').text)
cinpoint = round((media_outpoint / cframerate))
clip_timecode_in = round((clip_timecode_in / cframerate)) + cinpoint
clip_timecode_out = round((clip_timecode_out - 1) / cframerate) + cinpoint - 1
**Sample 2 (mp3 file, this particular file I found it sourced its time code from Project's)
**
<MediaInPoint>0</MediaInPoint>
<MediaOutPoint>52835328000000</MediaOutPoint>
<MediaFrameRate>5760000</MediaFrameRate>
VideoSettings' <FrameRate>8475667200</FrameRate>
proj_fr_ref = root.find('ProjectSettings/VideoSettings').get('ObjectRef')
cinpoint = int(clip_loggings.find('MediaInPoint').text)
cinpoint = round((media_outpoint / cframerate))
for proj_frs in root.findall ('VideoSettings'):
if proj_frs.get('ObjectID') == proj_fr_ref:
if clip_speed > 0:
cframerate = int(proj_frs.find('FrameRate').text)
clip_timecode_in = round((clip_timecode_in) / cframerate) + cinpoint # Seems to be linked to Project's VideoSetting's FrameRate, Why???
clip_timecode_out = round((clip_timecode_out - 1)/ cframerate) + cinpoint - 1
**Sample 3 (WAV file, '2000'is a factor I found empirically)
**
<MediaInPoint>472035821292000</MediaInPoint>
<MediaOutPoint>8715173208084000</MediaOutPoint>
<MediaFrameRate>5292000</MediaFrameRate>
<TimecodeFormat>200</TimecodeFormat>
cinpoint = int(clip_loggings.find('MediaInPoint').text)
cinpoint = round((media_outpoint / media_framerate / 2000))
clip_timecode_in = round((clip_timecode_in / cframerate / 2000)) + cinpoint
clip_timecode_out = round((clip_timecode_out - 1) / cframerate / 2000) + cinpoint - 1
**Sample 4 (Criteria Pending)
**
<CaptureMode>1</CaptureMode>
<ClipName>Sample 4.wav</ClipName>
<MediaInPoint>0</MediaInPoint>
<MediaOutPoint>186823687680000</MediaOutPoint>
<MediaFrameRate>5292000</MediaFrameRate>
<TimecodeFormat>200</TimecodeFormat>
The last, second WAV, Sample 4, I couldnt make it work, and made me realize my criteria was wrong. What Could it Be?
Please Help me! Complete information needed is uploaded to a Google Drive and is here: https://drive.google.com/drive/folders/1zbK42WFh4SN-8-ppo7QMSkTsXlZEv9MB

Concatenate a video, image and audio using ffmpeg

I am trying to concatenate a group of images with associated audio with a video clip at the start and front of the video. Whenever I concatenate the image with the associated audio it dosen't playback correctly in VLC media player and only displays the image for a frame before cutting to black and continually playing audio. I came across this github issue: https://github.com/kkroening/ffmpeg-python/issues/274 where the accepted solution was the one I implemented but one of the comments mentioned this issue of incorrect playback and error on youtube.
'''
Generates a clip from an image and a wav file, helper function for export_video
'''
def generate_clip(img):
transition_cond = os.path.exists("static/transitions/" + img + ".mp4")
chart_path = os.path.exists("charts/" + img + ".png")
if transition_cond:
clip = ffmpeg.input("static/transitions/" + img + ".mp4")
elif chart_path:
clip = ffmpeg.input("charts/" + img + ".png")
else:
clip = ffmpeg.input("static/transitions/Transition.jpg")
audio_clip = ffmpeg.input("audio/" + img + ".wav")
clip = ffmpeg.concat(clip, audio_clip, v=1, a=1)
clip = ffmpeg.filter(clip, "setdar","16/9")
return clip
'''
Combines the charts from charts/ and the audio from audio/ to generate one final video that will be uploaded to Youtube
'''
def export_video(CHARTS):
clips = []
intro = generate_clip("Intro")
clips.append(intro)
for key in CHARTS.keys():
value = CHARTS.get(key)
value.insert(0, key)
subclip = []
for img in value:
subclip.append(generate_clip(img))
concat_clip = ffmpeg.concat(*subclip)
clips.append(concat_clip)
outro = generate_clip("Outro")
clips.append(outro)
concat_clip = ffmpeg.concat(*clips)
concat_clip.output("export/export.mp4").run(overwrite_output=True)
It is unfortunate concat filter does not offer the shortest option like overlay. Anyway, the issue here is that image2 demuxer uses 25 fps by default, so a video stream with one image only lasts for 1/25 seconds long. There are a several ways to address this, but you first need to get the duration of the paired audio files. To incorporate the duration information to the ffmpeg command, you can:
Use tpad filter for each video (in series with setdar) to make the video duration to match the audio. Padded amount should be 1/25 seconds less than the audio duration.
Specify -loop 1 input option so the image will loop (indefinitely) and then specify an additional -t {duration} input option to limit the number of loops. Caution that the video duration may not be exact.
Specify -r {1/duration} so the image will last as long as the audio and use fps filter on each input to the output frame rate.
I'm not familiar with ffmpeg-python so I cannot provide its solution, but if you're interested, I'd be happy to post an equivalent code with my ffmpegio package.
[edit]
ffmpegio Solution
Here is how I'd code the 3rd solution with ffmpegio:
import ffmpegio
def generate_clip(img):
"""
Generates a clip from an image and a wav file,
helper function for export_video
"""
transition_cond = path.exists("static/transitions/" + img + ".mp4")
chart_path = path.exists("charts/" + img + ".png")
if transition_cond:
video_file = "static/transitions/" + img + ".mp4"
elif chart_path:
video_file = "charts/" + img + ".png"
else:
video_file = "static/transitions/Transition.jpg"
audio_file = "audio/" + img + ".wav"
video_opts = {}
if not transition_cond:
# audio_streams_basic() returns audio duration in seconds as Fraction
# set the "framerate" of the video to be the reciprocal
info = ffmpegio.probe.audio_streams_basic(audio_file)
video_opts["r"] = 1 / info[0]["duration"]
return [(video_file, video_opts), (audio_file, None)]
def export_video(CHARTS):
"""
Combines the charts from charts/ and the audio from audio/
to generate one final video that will be uploaded to Youtube
"""
# get all input files (video/audio pairs)
clips = [
generate_clip("Intro"),
*(generate_clip(img) for key, value in CHARTS.items() for img in value),
generate_clip("Outro"),
]
# number of clips
nclips = len(clips)
# filter chains to set DAR and fps of all video streams
vfilters = (f"[{2*n}:v]setdar=16/9,fps=30[v{n}]" for n in range(nclips))
# concatenation filter input: [v0][1:a][v1][3:a][v2][5:a]...
concatfilter = "".join((f"[v{n}][{2*n+1}:a]" for n in range(nclips))) + f"concat=n={nclips}:v=1:a=1[vout][aout]"
# form the full filtergraph
fg = ";".join((*vfilters, concatfilter))
# set output file and options
output = ("export/export.mp4", {"map": ["[vout]", "[aout]"]})
# run ffmpeg
ffmpegio.ffmpegprocess.run(
{
"inputs": [input for pair in clips for input in pair],
"outputs": [output],
"global_options": {"filter_complex": fg},
},
overwrite=True,
)
Since this code does not use the read/write features, ffmpegio-core package suffices:
pip install ffmpegio-core
Make sure that FFmpeg binary can be found by ffmpegio. See the installation doc.
Here are the direct links to the documentations of the functions used:
ffmpegprocess.run
ffmpeg_args dict argument
probe.audio_streams_basic (Ignore the documentation error both duration and start_time are both of Fraction type.
The code has not been fully validated. If you encounter a problem, it might be the easiest to post it on the GitHub Discussions to proceed.

Split audio on timestamps librosa

I have an audio file and I want to split it every 2 seconds. Is there a way to do this with librosa?
So if I had a 60 seconds file, I would split it into 30 two second files.
librosa is first and foremost a library for audio analysis, not audio synthesis or processing. The support for writing simple audio files is given (see here), but it is also stated there:
This function is deprecated in librosa 0.7.0. It will be removed in 0.8. Usage of write_wav should be replaced by soundfile.write.
Given this information, I'd rather use a tool like sox to split audio files.
From "Split mp3 file to TIME sec each using SoX":
You can run SoX like this:
sox file_in.mp3 file_out.mp3 trim 0 2 : newfile : restart
It will create a series of files with a 2-second chunk of the audio each.
If you'd rather stay within Python, you might want to use pysox for the job.
You can split your file using librosa running the following code. I have added comments necessary so that you understand the steps carried out.
# First load the file
audio, sr = librosa.load(file_name)
# Get number of samples for 2 seconds; replace 2 by any number
buffer = 2 * sr
samples_total = len(audio)
samples_wrote = 0
counter = 1
while samples_wrote < samples_total:
#check if the buffer is not exceeding total samples
if buffer > (samples_total - samples_wrote):
buffer = samples_total - samples_wrote
block = audio[samples_wrote : (samples_wrote + buffer)]
out_filename = "split_" + str(counter) + "_" + file_name
# Write 2 second segment
librosa.output.write_wav(out_filename, block, sr)
counter += 1
samples_wrote += buffer
[Update]
librosa.output.write_wav() has been removed from librosa, so now we have to use soundfile.write()
Import required library
import soundfile as sf
replace
librosa.output.write_wav(out_filename, block, sr)
with
sf.write(out_filename, block, sr)

Python - Overlay more than 3 WAV files end to end

I am trying to overlap the end of 1 wav file with 20% of the start of the next file. Like this, there are a variable number of files to overla (usually around 5-6).
I have tried using pydub implementation be expanding the following for overlaying 2 wav files :
from pydub import AudioSegment
sound1 = AudioSegment.from_wav("/path/to/file1.wav")
sound2 = AudioSegment.from_wav("/path/to/file1.wav")
# mix sound2 with sound1, starting at 70% into sound1)
output = sound1.overlay(sound2, position=0.7 * len(sound1))
# save the result
output.export("mixed_sounds.wav", format="wav")
And wrote the following program :
for i in range(0,len(files_to_combine)-1):
if 'full_wav' in locals():
prev_wav = full_wav
else:
prev = files_to_combine[i]
prev_wav = AudioSegment.from_wav(prev)
next = files_to_combine[i+1]
next_wav = AudioSegment.from_wav(next)
new_wave = prev_wav.overlay(next_wav,position=len(prev_wav) - 0.3 * len(next_wav))
new_wave.export('partial_wav.wav', format='wav')
full_wav = AudioSegment.from_wav('partial_wav.wav')
However, when I look at the final wave file, only the first 2 files in the list files_to_combine were actually combined and not the rest. The idea was to continuously rewrite partial_wav.wav until it finally contains the full wav file of the near end to end overlapped sounds. To debug this, I stored the new_wave in different files for every combination. The first wave file is the last: it only shows the first 2 wave files combined instead of the entire thing. Furthermore, I expected the len(partial_wav) for every iteration to gradually increase. Hoever, this remains the same after the first combination:
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
partial_wave : 237
MAIN QUESTION
How do I overlap the end of one wav file (about the last 30%) with the beginning of the next for more than 3 wave files?
I believe you can just keep on cascading audiosegments until your final segment as below.
Working Code:
from pydub import AudioSegment
from pydub.playback import play
sound1 = AudioSegment.from_wav("SineWave_440Hz.wav")
sound2 = AudioSegment.from_wav("SineWave_150Hz.wav")
sound3 = AudioSegment.from_wav("SineWave_660Hz.wav")
# mix sound2 with sound1, starting at 70% into sound1)
tmpsound = sound1.overlay(sound2, position=0.7 * len(sound1))
# mix sound3 with sound1+sound2, starting at 30% into sound1+sound2)
output = tmpsound .overlay(sound3, position=0.3 * len(tmpsound))
play(output)
output.export("mixed_sounds.wav", format="wav")

Python, pydub splitting an audio file

Hi I am using pydub to split an audio file, giving the ranges to take segments from the original.
What I have is:
from pydub import AudioSegment
sound_file = AudioSegment.from_mp3("C:\\audio file.mp3")
# milliseconds in the sound track
ranges = [(30000,40000),(50000,60000),(80000,90000),(100000,110000),(150000,180000)]
for x, y in ranges:
new_file = sound_file[x : y]
new_file.export("C:\\" + str(x) + "-" + str(y) +".mp3", format="mp3")
It works well for the first 3 new files. However not the rest - it doesn’t split accordingly.
Does the problem lie in the way I give the range?
Thank you.
Add-on:
When it's made simple - for example
sound_file[150000:180000]
and export it to a mp3 file. it works but only cuts 50000:80000 part. it seems not reading a correct range.
Try this, it might work
import pydub
import numpy as np
sound_file = pydub.AudioSegment.from_mp3("a.mp3")
sound_file_Value = np.array(sound_file.get_array_of_samples())
# milliseconds in the sound track
ranges = [(30000,40000),(50000,60000),(80000,90000),(100000,110000),(150000,180000)]
for x, y in ranges:
new_file=sound_file_Value[x : y]
song = pydub.AudioSegment(new_file.tobytes(), frame_rate=sound_file.frame_rate,sample_width=sound_file.sample_width,channels=1)
song.export(str(x) + "-" + str(y) +".mp3", format="mp3")

Categories

Resources