Using FFMPEG to convert Numpy text file to mp3 - python

I have a text file that contains the output of a Numpy array and would like to use FFMPEG to convert that into an mp3 file.
I've been following this tutorial, and to put it in perspective I have essentially written the audio_array that is created to a text file, which I would like to read later on in my program into FFMPEG and convert back into an mp3 file.
I tried modifying the part below as follows, but it doesn't seem to give me an output:
pipe = sp.Popen([ FFMPEG_BIN,
'-y', # (optional) means overwrite the output file if it already exists.
"-f", 's16le', # means 16bit input
"-acodec", "pcm_s16le", # means raw 16bit input
'-r', "44100", # the input will have 44100 Hz
'-ac','2', # the input will have 2 channels (stereo)
'-i', '[path/to/my/text_file.txt]',
'-vn', # means "don't expect any video input"
'-acodec', "libfdk_aac" # output audio codec
'-b', "3000k", # output bitrate (=quality). Here, 3000kb/second
'my_awesome_output_audio_file.mp3'],
stdin=sp.PIPE,stdout=sp.PIPE, stderr=sp.PIPE)

Related

Convert ogg byte array to wav byte array Python

I want to convert ogg byte array/bytes with Opus codec to wav byte array/bytes without saving to disk. I have downloaded audio from telegram api and it is in byte array format with .ogg extension. I do not want to save it to filesystem to eliminate filesystem io latencey.
Currently what I am doing is after saving the audio file in .ogg format using code the below code using telegram api for reference https://docs.python-telegram-bot.org/en/stable/telegram.file.html#telegram.File.download_to_drive
# listen for audio messages
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
await newFile.download_to_drive(output_path)
I am using the code
subprocess.call(["ffmpeg", "-i", output_path, output_path.replace(".ogg", ".wav"), '-y'], stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)
to convert ogg file to wav file. But this is not what I want.
I want the code
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
byte_array = await newFile.download_as_bytearray()
to get byte_array and now I want this byte_array to be converted to wav without saving to disk and without using ffmpeg. Let me know in comments if something is unclear. Thanks!
Note: I have setted up a telegram bot at the backend which listens for audios sent to private chat which I do manually for testing purposes.
We may write the OGG data to FFmpeg stdin pipe, and read the encoded WAV data from FFmpeg stdout pipe.
My following answer describes how to do it with video, and we may apply the same solution to audio.
The example assumes that the OGG data is already downloaded and stored in bytes array (in the RAM).
Piping architecture:
-------------------- Encoded --------- Encoded ------------
| Input OGG encoded | OGG data | FFmpeg | WAV data | Store to |
| stream | ----------> | process | ----------> | BytesIO |
-------------------- stdin PIPE --------- stdout PIPE -------------
The implementation is equivalent to the following shell command:
Linux: cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
Windows: type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
The example uses ffmpeg-python module, but it's just a binding to FFmpeg sub-process (FFmpeg CLI must be installed, and must be in the execution path).
Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output:
ffmpeg_process = (
ffmpeg
.input('pipe:', format='ogg')
.output('pipe:', format='wav')
.run_async(pipe_stdin=True, pipe_stdout=True)
)
The input format is set to ogg, the output format is set to wav (use default encoding parameters).
Assuming the audio file is relatively large, we can't write the entire OGG data at once, because doing so (without "draining" stdout pipe) causes the program execution to halt.
We may have to write the OGG data (in chunks) in a separate thread, and read the encoded data in the main thread.
Here is a sample for the "writer" thread:
def writer(ffmpeg_proc, ogg_bytes_arr):
chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.
The "writer thread" writes the OGG data in small chucks.
The last chunk is smaller (assume the length is not a multiple of chuck size).
At the end, stdin pipe is closed.
Closing stdin finish encoding the data, and closes FFmpeg sub-process.
In the main thread, we are starting the thread, and read encoded "WAV" data from stdout pipe (in chunks):
thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array))
thread.start()
while thread.is_alive():
wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
For reading the remaining data, we may use ffmpeg_process.communicate():
# Read the last encoded chunk.
wav_chunk = ffmpeg_process.communicate()[0]
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
Complete code sample:
import ffmpeg
import base64
from io import BytesIO
import threading
async def download_audio(update, context):
# The method is not not used - we are reading the audio from as file instead (just for testing).
newFile = await context.bot.get_file(update.message.voice.file_id)
bytes_array = await newFile.download_as_bytearray()
return bytes_array
# Equivalent Linux shell command:
# cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
# Equivalent Windows shell command:
# type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
# Writer thread - write the OGG data to FFmpeg stdin pipe in small chunks of 1KBytes.
def writer(ffmpeg_proc, ogg_bytes_arr):
chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.
if False:
# We may assume that ogg_bytes_array is the output of download_audio method
ogg_bytes_array = download_audio(update, context)
else:
# The example reads the decode_string from a file (for testing").
with open('input.ogg', 'rb') as f:
ogg_bytes_array = f.read()
# Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output.
ffmpeg_process = (
ffmpeg
.input('pipe:', format='ogg')
.output('pipe:', format='wav')
.run_async(pipe_stdin=True, pipe_stdout=True)
)
# Open in-memory file for storing the encoded WAV file
out_stream = BytesIO()
# Starting a thread that writes the OGG data in small chunks.
# We need the thread because writing too much data to stdin pipe at once, causes a deadlock.
thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array))
thread.start()
# Read encoded WAV data from stdout pipe of FFmpeg, and write it to out_stream
while thread.is_alive():
wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
# Read the last encoded chunk.
wav_chunk = ffmpeg_process.communicate()[0]
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
out_stream.seek(0) # Seek to the beginning of out_stream
ffmpeg_process.wait() # Wait for FFmpeg sub-process to end
# Write out_stream to file - just for testing:
with open('test.wav', "wb") as f:
f.write(out_stream.getbuffer())

Can I pass a list of image into the input method of ffmpeg-python

My task involves using ffmpeg to create video from image sequence.
the code belows solves the problem.
import ffmpeg
video = ffmpeg.input('/path/to/images/*.jpg', pattern_type='glob',framerate=20).output(video.mp4).run()
However since the image data we are getting follows the pattern
1.jpg,
2.jpg,
3.jpg
.
.
20.jpg
.
.
100.jpg
the video get created with the glob pattern 1.jpg, 100.jpg, 11.jpg, 12.jpg, ... 2.jpg, 20.jpg, 21.jpg ... which is very unpleasant to watch.
Is there anyway I can pass a list or anything else aside a path/glob pattern where the images are sorted in order.
Also as a bonus I will be happy if I can choose which files to add as an the input method input()
You may use Concat demuxer:
Create a file mylist.txt with all the image files in the following format:
file '/path/to/images/1.jpg'
file '/path/to/images/2.jpg'
file '/path/to/images/3.jpg'
file '/path/to/images/20.jpg'
file '/path/to/images/100.jpg'
You may create mylist.txt manually, or create the text file using Python code.
Use the following command (you may select different codec):
ffmpeg.input('mylist.txt', r='20', f='concat', safe='0').output('video.mp4', vcodec='libx264').run()
Second option:
Writing JPEG data into stdin PIPE of FFmpeg sub-process.
Create a list of JPEG file names (Python list).
Execute FFmpeg sub-process, with stdin PIPE as input, and jpeg_pipe input format.
Iterate the list, read the content of each file and write it to stdin PIPE.
Close stdin PIPE.
Here is a code sample:
import ffmpeg
# List of JPEG files
jpeg_files = ['/tmp/0001.jpg', '/tmp/0002.jpg', '/tmp/0003.jpg', '/tmp/0004.jpg', '/tmp/0005.jpg']
# Execute FFmpeg sub-process, with stdin pipe as input, and jpeg_pipe input format
process = ffmpeg.input('pipe:', r='20', f='jpeg_pipe').output('/tmp/video.mp4', vcodec='libx264').overwrite_output().run_async(pipe_stdin=True)
# Iterate jpeg_files, read the content of each file and write it to stdin
for in_file in jpeg_files:
with open(in_file, 'rb') as f:
# Read the JPEG file content to jpeg_data (bytes array)
jpeg_data = f.read()
# Write JPEG data to stdin pipe of FFmpeg process
process.stdin.write(jpeg_data)
# Close stdin pipe - FFmpeg fininsh encoding the output file.
process.stdin.close()
process.wait()

Why is mp3/wav duration different when I convert a numpy array with ffmpeg into audiofile (python)?

I want to convert a numpy array which should contain 60s of raw audio into .wav and .mp3 file. With ffmpeg (version 3.4.6) I try to convert the array to the desired formats. For comparison I also use the modul soundfile.
Only the .wav-file created by soundfile has the expected length of exact 60s. The .wav-file created by ffmpeg is a little shorter and the .mp3-file is ca. 32s long.
I want all exports to be the same length.What am I doing wrong?
Here is a sample code:
import subprocess as sp
import numpy as np
import soundfile as sf
def data2audiofile(filename,data):
out_cmds = ['ffmpeg',
'-f', 'f64le', # input 64bit float little endian
'-ar', '44100', # inpt samplerate 44100 Hz
'-ac','1', # input 1 channel (mono)
'-i', '-', # inputfile via pipe
'-y', # overwrite outputfile if it already exists
filename]
pipe = sp.Popen(out_cmds, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
pipe.stdin.write(data)
data = (np.random.randint(low=-32000, high=32000, size=44100*60)/32678).astype('<f8')
data2audiofile('ffmpeg_mp3.mp3',data)
data2audiofile('ffmpeg_wav.wav',data)
sf.write('sf_wav.wav',data,44100)
Here the resulting files displayed in audacity:
You need to close pipe.stdin and wait for the sub-process to end.
Closing pipe.stdin flushes stdin pipe.
The subject is explained here: Writing to a python subprocess pipe:
The key it to close stdin (flush and send EOF) before calling wait
Add the following code lines after pipe.stdin.write(data):
pipe.stdin.close()
pipe.wait()
You can also try setting a large buffer size in sp.Popen:
pipe = sp.Popen(out_cmds, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, bufsize=10**8)

FFMpeg giving invalid argument error with python subprocess

I am trying to convert a file or microphone stream to 22050 sample rate and change tempo to double. I can do it using terminal with below code;
#ffmpeg -i test.mp3 -af asetrate=44100*0.5,aresample=44100,atempo=2 output.mp3
But i can not run this terminal code with python subprocess. I try many things but every time fail. Generaly i am taking Requested output format 'asetrate' or 'aresample' or 'atempo' is not suitable output format errors. Invalid argument. How can i run it and take a stream with pipe?
song = subprocess.Popen(["ffmpeg.exe", "-i", sys.argv[1], "-f", "asetrate", "22050", "wav", "pipe:1"],
stdout=subprocess.PIPE)
Your two commands are different. Try:
song = subprocess.Popen(["ffmpeg", "-i", sys.argv[1], "-af", "asetrate=22050,aresample=44100,atempo=2", "-f", "wav", "pipe:1"],
-af is for audio filter.
-f is to manually set muxer/output format
ffmpeg interprets whatever supplied by -af as a single argument that it would then parse internally into separate ones, so splitting them out before passing it via Popen would not achieve the same thing.
The initial example using the terminal should be created using Popen as
subprocess.Popen([
'ffmpeg', '-i', 'test.mp3', '-af', 'asetrate=44100*0.5,aresample=44100,atempo=2',
'output.mp3',
])
So for your actual example with pipe, try instead the following:
song = subprocess.Popen(
["ffmpeg.exe", "-i", sys.argv[1], "-f", "asetrate=22050,wav", "pipe:1"],
stdout=subprocess.PIPE
)
You will then need to call song.communicate() to get the output produced by ffmpeg.exe.

Send args to subprocess while using stdin

I'm trying to take a screenshot then run a command on that screenshot without saving to disk.
The actual command I want to run is visgrep image.png pattern.pat
visgrep must have two args: the image file and a .pat file.
Here is what I have so far.
p = subprocess.Popen(['import', '-crop', '305x42+1328+281', '-window', 'root', '-depth', '8', 'png:' ], stdout=subprocess.PIPE,)
cmd = ['visgrep']
subprocess.call(cmd, stdin=p.stdout)
Obviously this fails as visgrep must have two args.
So how can I do visgrep image.png pattern.pat but substituting 'image.png' with the output of ImageMagick's import?
Do I need to use xargs? Is there a better way to accomplish what I'm trying?
In linux you can use /dev/stdin as file name but it does not work all the times. If it does not work with visgrep, you must use a temporary file (which is not a shame).
PS. shouldn't png: be png:-?
According to this answer, changing the argument png: to png:- will cause the import command to output to standard out instead of a file. I am unfamiliar with visgrep, so I'm not sure how to tell it to read the source image from stdin.
From the ImageMagick documentation:
STDIN, STDOUT, and file descriptors
Unix and Windows permit the output of one command to be piped to the
input of another. ImageMagick permits image data to be read and
written from the standard streams STDIN (standard in) and STDOUT
(standard out), respectively, using a pseudo-filename of -. In this
example we pipe the output of convert to the display program:
$ convert logo: gif:- | display gif:-
The second explicit format "gif:" is optional in the preceding
example. The GIF image format has a unique signature within the image
so ImageMagick's display command can readily recognize the format as
GIF. The convert program also accepts STDIN as input in this way:
$ convert rose: gif:- | convert - -resize "200%" bigrose.jpg
You can use the same filename convention with the import command.
So, try:
p = subprocess.Popen(['import', '-crop', '305x42+1328+281',
'-window', 'root', '-depth', '8', 'png:-' ],
stdout=subprocess.PIPE,)

Categories

Resources