suppose I have a audio mp3 file with length 00:04:00 (240 seconds). I want to extract parts of said file each within range 2 seconds, so it would be:
File_01 00:00:00-00:00:02, File_02 00:00:02-00:00:04, File_03 00:00:04-00:00:06 ... File_120 00:03:58-00:04:00.
I am using python, call module subprocess to run ffmpeg function. What I did, I simply put it in a loop like this:
count = 0
count2 = 2
count3 = 1
while count2 <= audio_length:
ffmpeg = 'ffmpeg -i input.mp3 -c copy -ss %d -to %d output%d.wav' % (count, count2, count3)
subprocess.call(ffmpeg, shell=True)
count = count + 2
count2 = count2 + 2
count3 = count3 + 1
However, the subprocess part took a long time and it seems stucked. I've searched some insights, but i haven't found any that mentions about looping. Any help appreciated.
If you are open to using an external library and working with offline audio file, pydub has builtin utility to make chunks of playable audio of given length.
Simply call make_chunk method from pydub.utils and provide chunk size and export playable audio chunks.
I took a 34.6 seconds long file and split in 18 chunks of 2 seconds each. Last chunk may be less than 2 seconds depending upon length which in my case was 0.6 seconds.
Working Code:
from pydub import AudioSegment
from pydub.utils import make_chunks
audiofile = 'example.wav'
#set chunk duration in milliseconds
chunk_duration = 2000 #2 seconds
#Convert audio to audio segment
audio_segment = AudioSegment.from_wav(audiofile)
print("audio length in seconds={}".format(len(audio_segment) / float(1000.0)))
#make chunks
chunks = make_chunks(audio_segment, chunk_duration)
end = 0
for idx,chunk in enumerate(chunks):
start = end
end = start + (chunk_duration//1000)
count = idx + 1
print("Exporting File_{}_{}:{}.wav".format(count,start,end))
chunk.export("File_{}_{}:{}.wav".format(count,start,end))
Output:
$python splitaudio.py
audio length in seconds=34.6
Exporting File_1_0:2.wav
Exporting File_2_2:4.wav
Exporting File_3_4:6.wav
Exporting File_4_6:8.wav
Exporting File_5_8:10.wav
Exporting File_6_10:12.wav
Exporting File_7_12:14.wav
Exporting File_8_14:16.wav
Exporting File_9_16:18.wav
Exporting File_10_18:20.wav
Exporting File_11_20:22.wav
Exporting File_12_22:24.wav
Exporting File_13_24:26.wav
Exporting File_14_26:28.wav
Exporting File_15_28:30.wav
Exporting File_16_30:32.wav
Exporting File_17_32:34.wav
Exporting File_18_34:36.wav
Answer from Anil_M is completely valid, but I thought it's good to mention two other ways that are fast and do not require scripting in Python (plus offer ton of extra features should you need them).
With ffmpeg that you have tried already:
ffmpeg -i input.mp3 -f segment -segment_time 2 -c copy output%03d.mp3
And SoX:
sox input.mp3 output.mp3 trim 0 2 : newfile : restart
Related
Is it normal that gzip algorithm can make file size large after compression?
E.g. it's needed to split a large file of 8.2Mb into small 101024 chunks of 81 bytes and compress them using gzip library. After it's done I see that folder with gzipped files has become larger in size and it is 13Mb now in comparison with total chunks size without compression. And for example there is a piece of code here:
def gzip_it(filenumber, chunk, path=FOLDER_PATH, prefix=FILE_NAME_PREFIX):
with gzip.open(os.path.join(path, prefix + "{:07d}".format(filenumber) + ".gz"), mode="wb") as chunk_file:
chunk_file.write(gzip.compress(chunk))
def split_and_write(file, thread_num):
spare_to_distribute_inner = SPARE_TO_DISTRIBUTE
initial_position = 0 if thread_num == 0 else BYTES_PER_THREAD * thread_num
initial_file_num = 0 if thread_num == 0 else FILES_PER_THREAD * thread_num
with open(file, mode="rb") as file:
file.seek(initial_position)
while initial_file_num < FILES_PER_THREAD * (thread_num + 1):
if spare_to_distribute_inner:
chunk = file.read(CHUNK_FILE_SIZE + 1)
gzip_it(initial_file_num, chunk)
initial_file_num += 1
initial_position += (CHUNK_FILE_SIZE + 1)
spare_to_distribute_inner -= 1
else:
if initial_file_num == FILES_TOTAL - 1:
chunk = file.read(CHUNK_FILE_SIZE + SPARE_TO_DISTRIBUTE_REMAINDER)
gzip_it(initial_file_num, chunk)
make_marker_file(str(SOURCE_FILE_SIZE).encode())
break
else:
chunk = file.read(CHUNK_FILE_SIZE)
gzip_it(initial_file_num, chunk)
initial_file_num += 1
initial_position += CHUNK_FILE_SIZE
def main():
for thread in range(VIRTUAL_THREADS):
pool.submit(split_and_write, "cry_cmake.exe", thread)
Yes it is completely normal that files become larger after compression. This happens usually with files that are already compressed.
What you are doing is wrong. Your chunks are too small to be compressed meaningfully. Try making chunks of 1MiB or more.
Basically in a compression, the algorithm looks for repeated sequences and shortens them, creating an initial dictionary with the original sequence and the shortened version.
If the chunks are so small, it can't really find long repeated sequences and it needs to repeat this initial dictionary per every chunk.
How come you want to split the original file first and compress each minichunk by itself? In most use cases people compress first and split afterwards.
An alternative for your case can be to split the original file into the minichunks but do not compress each of it separately but instead put all of them in one directory and then make a .tgz out of the directory:
tar -c -z -f result.tgz chunks_directory/
Then the compression takes place after tar has bundled all the files again but after unpacking, you will receive all the minichunk files again.
I have an audio file and I want to split it every 2 seconds. Is there a way to do this with librosa?
So if I had a 60 seconds file, I would split it into 30 two second files.
librosa is first and foremost a library for audio analysis, not audio synthesis or processing. The support for writing simple audio files is given (see here), but it is also stated there:
This function is deprecated in librosa 0.7.0. It will be removed in 0.8. Usage of write_wav should be replaced by soundfile.write.
Given this information, I'd rather use a tool like sox to split audio files.
From "Split mp3 file to TIME sec each using SoX":
You can run SoX like this:
sox file_in.mp3 file_out.mp3 trim 0 2 : newfile : restart
It will create a series of files with a 2-second chunk of the audio each.
If you'd rather stay within Python, you might want to use pysox for the job.
You can split your file using librosa running the following code. I have added comments necessary so that you understand the steps carried out.
# First load the file
audio, sr = librosa.load(file_name)
# Get number of samples for 2 seconds; replace 2 by any number
buffer = 2 * sr
samples_total = len(audio)
samples_wrote = 0
counter = 1
while samples_wrote < samples_total:
#check if the buffer is not exceeding total samples
if buffer > (samples_total - samples_wrote):
buffer = samples_total - samples_wrote
block = audio[samples_wrote : (samples_wrote + buffer)]
out_filename = "split_" + str(counter) + "_" + file_name
# Write 2 second segment
librosa.output.write_wav(out_filename, block, sr)
counter += 1
samples_wrote += buffer
[Update]
librosa.output.write_wav() has been removed from librosa, so now we have to use soundfile.write()
Import required library
import soundfile as sf
replace
librosa.output.write_wav(out_filename, block, sr)
with
sf.write(out_filename, block, sr)
would like to generate text files for frames extracted with ffmpeg, containing subtitle of the frame if any, on a video for which I have burn the subtitles using ffmpeg also.
I use a python script with pysrt to open the subrip file and generate the text files.
What I am doing is that each frames is named with the frame number by ffmpeg, then and since they are extracted at a constant rate, I can easily retrieve the time position of the frame using the formula t1 = fnum/fps, where fnum is the number of the frame retrieved with the filename, and fps is the frequency passed to ffmpeg for the frame extraction.
Even though I am using the same subtitle file to retrieve the text positions in the timeline, that the one that has been used in the video, I still get accuracy errors. Most I have some text files missing or some that shouldn't be present.
Because time is not really continuous when talking about frames, I have tried recalibrating t using the fps of the video wih the hardcoded subtitles, let's call that fps vfps for video fps (I have ensured that the video fps is the same before and after subtitle burning). I get the formula: t2 = int(t1*vfps)/vfps.
It still is not 100% accurate.
For example, my video is at 30fps (vfps=30) and I extracted frames at 4fps (fps=4).
The extracted frame 166 (fnum=166) shows no subtitle. In the subrip file, the previous subtitle ends at t_prev=41.330 and the next subtitle begins at t_next=41.400, which means that t_sub should satisfy: t_prev < t_sub and t_sub < t_next, but I can't make this happen.
Formulas I have tried:
t1 = fnum/fps # 41.5 > t_next
t2 = int(fnum*vfps/fps)/vfps # 41.5 > t_next
# is it because of a indexing problem? No:
t3 = (fnum-1)/fps # 41.25 < t_prev
t4 = int((fnum-1)*vfps/fps)/vfps # 41.23333333 < t_prev
t5 = int(fnum*vfps/fps - 1)/vfps # 41.466666 > t_next
t6 = int((fnum-1)*vfps/fps + 1)/vfps # 41.26666 < t_prev
Command used:
# burning subtitles
# (previously)
# ffmpeg -r 25 -i nosub.mp4 -vf subtitles=sub.srt withsub.mp4
# now:
ffmpeg -i nosub.mp4 -vf subtitles=sub.srt withsub.mp4
# frames extraction
ffmpeg -i withsub.mp4 -vf fps=4 extracted/%05.bmp -hide_banner
Why does this happen and how can I solve this?
One thing I have noticed is that if I extract frames of the original video and the subtitle ones, do a difference of the frames, the result is not only the subtitles, there are variations in the background (that shouldn't happen). If I do the same experience using the same video two times, the difference is null, which means that the frame extraction is consistant.
Code for the difference:
ffmpeg -i withsub.mp4 -vf fps=4 extracted/%05.bmp -hide_banner
ffmpeg -i no_sub.mp4 -vf fps=4 extracted_no_sub/%05.bmp -hide_banner
for img in no_sub/*.bmp; do
convert extracted/${img##*/} $img -compose minus -composite diff/${img##*/}
done
Thanks.
You can extract frames with accurate timestamps, thus
ffmpeg -i nosub.mp4 -vf subtitles=sub.srt,settb=AVTB,select='if(eq(n\,0)\,1\,floor(4*t)-floor(4*prev_t))' -vsync 0 -r 1000 -frame_pts true extracted/%08d.bmp
This will extract the first frame from each quarter second. The output filename is 8 characters long where the first 5 digits are seconds and last three are milliseconds. You can change the field size based on max file duration.
I was just playing around with sound input and output on a raspberry pi using python.
My plan was to read the input of a microphone, manipulate it and playback the manipulated audio. At the moment I tried to read and playback the audio.
The reading seems to work, since i wrote the read data into a wave file in the last step, and the wave file seemed fine.
But the playback is noise sounds only.
Playing the wave file worked as well, so the headset is fine.
I think maybe I got some problem in my settings or the output format.
The code:
import alsaaudio as audio
import time
import audioop
#Input & Output Settings
periodsize = 1024
audioformat = audio.PCM_FORMAT_FLOAT_LE
channels = 16
framerate=8000
#Input Device
inp = audio.PCM(audio.PCM_CAPTURE,audio.PCM_NONBLOCK,device='hw:1,0')
inp.setchannels(channels)
inp.setrate(framerate)
inp.setformat(audioformat)
inp.setperiodsize(periodsize)
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='hw:0,0')
out.setchannels(channels)
out.setrate(framerate)
out.setformat(audioformat)
out.setperiodsize(periodsize)
#Reading the Input
allData = bytearray()
count = 0
while True:
#reading the input into one long bytearray
l,data = inp.read()
for b in data:
allData.append(b)
#Just an ending condition
count += 1
if count == 4000:
break
time.sleep(.001)
#splitting the bytearray into period sized chunks
list1 = [allData[i:i+periodsize] for i in range(0, len(allData), periodsize)]
#Writing the output
for arr in list1:
# I tested writing the arr's to a wave file at this point
# and the wave file was fine
out.write(arr)
Edit: Maybe I should mention, that I am using python 3
I just found the answer. audioformat = audio.PCM_FORMAT_FLOAT_LE this format isn't the one used by my Headset (just copied and pasted it without a second thought).
I found out about my microphones format (and additional information) by running speaker-test in the console.
Since my speakers format is S16_LE the code works fine with audioformat = audio.PCM_FORMAT_S16_LE
consider using plughw (alsa subsystem supporting resampling/conversion) for the sink part of the chain at least:
#Output Device
out = audio.PCM(audio.PCM_PLAYBACK,device='plughw:0,0')
this should help to negotiate sampling rate as well as the data format.
periodsize is better to estimate based on 1/times of the sample rate like:
periodsize = framerate / 8 (8 = times for 8000 KHz sampling rate)
and sleeptime is better to estimate as a half of the time necessary to play periodsize:
sleeptime = 1.0 / 16 (1.0 - is a second, 16 = 2*times for 8000 KHz sampling rate)
Rather than crawl PubChem's website, I'd prefer to be nice and generate the images locally from the PubChem ftp site:
ftp://ftp.ncbi.nih.gov/pubchem/specifications/
The only problem is that I'm limited to OSX and Linux and I can't seem to find a way of programmatically generating the 2d images that they have on their site. See this example:
https://pubchem.ncbi.nlm.nih.gov/compound/6#section=Top
Under the heading "2D Structure" we have this image here:
https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=6&t=l
That is what I'm trying to generate.
If you want something working out of the box I would suggest using molconvert from ChemAxon's Marvin (https://www.chemaxon.com/products/marvin/), which is free for academics. It can be used easily from the command line and it supports plenty of input and output formats. So for your example it would be:
molconvert "png" -s "C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl" -o cdnb.png
Resulting in the following image:
It also allows you to set parameters such as width, height, quality, background color and so on.
However, if you are a programmer I would definitely recommend RDKit. Follows a code which generates images for a pair of compounds given as smiles.
from rdkit import Chem
from rdkit.Chem import Draw
ms_smis = [["C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl", "cdnb"],
["C1=CC(=CC(=C1)N)C(=O)N", "3aminobenzamide"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]
for m in ms: Draw.MolToFile(m[0], m[1] + ".svg", size=(800, 800))
This gives you following images:
So I also emailed the PubChem guys and they got back to me very quickly with this response:
The only bulk access we have to images is through the download
service: https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi
You can request up to 50,000 images at a time.
Which is better than I was expecting, but still not amazing since it requires downloading things that I in theory could generate locally. So I'm leaving this question open until some kind soul writes an open source library to do the same.
Edit:
I figure I might as well save people some time if they are doing the same thing as I am. I've created a Ruby Gem backed on Mechanize to automate the downloading of images. Please be kind to their servers and only download what you need.
https://github.com/zachaysan/pubchem
gem install pubchem
An open source option is the Indigo Toolkit, which also has pre-compiled packages for Linux, Windows, and MacOS and language bindings for Python, Java, .NET, and C libraries. I chose the 1.4.0 beta.
I had a similar interest to yours in converting SMILES to 2D structures and adapted my Python to address your question and to capture timing information. It uses the PubChem FTP (Compound/Extras) download of CID-SMILES.gz. The following script is an implementation of a local SMILES-to-2D-structure converter that reads a range of rows from the PubChem CID-SMILES file of isomeric SMILES (which contains over 102 million compound records) and converts the SMILES to PNG images of the 2D structures. In three tests with 1000 SMILES-to-structure conversions, it took 35, 50, and 60 seconds to convert 1000 SMILES at file row offsets of 0, 100,000, and 10,000,000 on my Windows 10 laptop (Intel i7-7500U CPU, 2.70GHz) with a solid state drive and running Python 3.7.4. The 3000 files totaled 100 MB in size.
from indigo import *
from indigo.renderer import *
import subprocess
import datetime
def timerstart():
# start timer and print time, return start time
start = datetime.datetime.now()
print("Start time =", start)
return start
def timerstop(start):
# end timer and print time and elapsed time, return elapsed time
endtime = datetime.datetime.now()
elapsed = endtime - start
print("End time =", endtime)
print("Elapsed time =", elapsed)
return elapsed
numrecs = 1000
recoffset = 0 # 10000000 # record offset
starttime = timerstart()
indigo = Indigo()
renderer = IndigoRenderer(indigo)
# set render options
indigo.setOption("render-atom-color-property", "color")
indigo.setOption("render-coloring", True)
indigo.setOption("render-comment-position", "bottom")
indigo.setOption("render-comment-offset", "20")
indigo.setOption("render-background-color", 1.0, 1.0, 1.0)
indigo.setOption("render-output-format", "png")
# set data path (including data file) and output file path
datapath = r'../Download/CID-SMILES'
pngpath = r'./2D/'
# read subset of rows from data file
mycmd = "head -" + str(recoffset+numrecs) + " " + datapath + " | tail -" + str(numrecs)
print(mycmd)
(out, err) = subprocess.Popen(mycmd, stdout=subprocess.PIPE, shell=True).communicate()
lines = str(out.decode("utf-8")).split("\n")
count = 0
for line in lines:
try:
cols = line.split("\t") # split on tab
key = cols[0] # cid in cols[0]
smiles = cols[1] # smiles in cols[1]
mol = indigo.loadMolecule(smiles)
s = "CID=" + key
indigo.setOption("render-comment", s)
#indigo.setOption("render-image-size", 200, 250)
#indigo.setOption("render-image-size", 400, 500)
renderer.renderToFile(mol, pngpath + key + ".png")
count += 1
except:
print("Error processing line after", str(count), ":", line)
pass
elapsedtime = timerstop(starttime)
print("Converted", str(count), "SMILES to PNG")