Convert ogg byte array to wav byte array Python - python

I want to convert ogg byte array/bytes with Opus codec to wav byte array/bytes without saving to disk. I have downloaded audio from telegram api and it is in byte array format with .ogg extension. I do not want to save it to filesystem to eliminate filesystem io latencey.
Currently what I am doing is after saving the audio file in .ogg format using code the below code using telegram api for reference https://docs.python-telegram-bot.org/en/stable/telegram.file.html#telegram.File.download_to_drive
# listen for audio messages
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
await newFile.download_to_drive(output_path)
I am using the code
subprocess.call(["ffmpeg", "-i", output_path, output_path.replace(".ogg", ".wav"), '-y'], stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)
to convert ogg file to wav file. But this is not what I want.
I want the code
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
byte_array = await newFile.download_as_bytearray()
to get byte_array and now I want this byte_array to be converted to wav without saving to disk and without using ffmpeg. Let me know in comments if something is unclear. Thanks!
Note: I have setted up a telegram bot at the backend which listens for audios sent to private chat which I do manually for testing purposes.

We may write the OGG data to FFmpeg stdin pipe, and read the encoded WAV data from FFmpeg stdout pipe.
My following answer describes how to do it with video, and we may apply the same solution to audio.
The example assumes that the OGG data is already downloaded and stored in bytes array (in the RAM).
Piping architecture:
-------------------- Encoded --------- Encoded ------------
| Input OGG encoded | OGG data | FFmpeg | WAV data | Store to |
| stream | ----------> | process | ----------> | BytesIO |
-------------------- stdin PIPE --------- stdout PIPE -------------
The implementation is equivalent to the following shell command:
Linux: cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
Windows: type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
The example uses ffmpeg-python module, but it's just a binding to FFmpeg sub-process (FFmpeg CLI must be installed, and must be in the execution path).
Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output:
ffmpeg_process = (
ffmpeg
.input('pipe:', format='ogg')
.output('pipe:', format='wav')
.run_async(pipe_stdin=True, pipe_stdout=True)
)
The input format is set to ogg, the output format is set to wav (use default encoding parameters).
Assuming the audio file is relatively large, we can't write the entire OGG data at once, because doing so (without "draining" stdout pipe) causes the program execution to halt.
We may have to write the OGG data (in chunks) in a separate thread, and read the encoded data in the main thread.
Here is a sample for the "writer" thread:
def writer(ffmpeg_proc, ogg_bytes_arr):
chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.
The "writer thread" writes the OGG data in small chucks.
The last chunk is smaller (assume the length is not a multiple of chuck size).
At the end, stdin pipe is closed.
Closing stdin finish encoding the data, and closes FFmpeg sub-process.
In the main thread, we are starting the thread, and read encoded "WAV" data from stdout pipe (in chunks):
thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array))
thread.start()
while thread.is_alive():
wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
For reading the remaining data, we may use ffmpeg_process.communicate():
# Read the last encoded chunk.
wav_chunk = ffmpeg_process.communicate()[0]
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
Complete code sample:
import ffmpeg
import base64
from io import BytesIO
import threading
async def download_audio(update, context):
# The method is not not used - we are reading the audio from as file instead (just for testing).
newFile = await context.bot.get_file(update.message.voice.file_id)
bytes_array = await newFile.download_as_bytearray()
return bytes_array
# Equivalent Linux shell command:
# cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
# Equivalent Windows shell command:
# type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
# Writer thread - write the OGG data to FFmpeg stdin pipe in small chunks of 1KBytes.
def writer(ffmpeg_proc, ogg_bytes_arr):
chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.
if False:
# We may assume that ogg_bytes_array is the output of download_audio method
ogg_bytes_array = download_audio(update, context)
else:
# The example reads the decode_string from a file (for testing").
with open('input.ogg', 'rb') as f:
ogg_bytes_array = f.read()
# Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output.
ffmpeg_process = (
ffmpeg
.input('pipe:', format='ogg')
.output('pipe:', format='wav')
.run_async(pipe_stdin=True, pipe_stdout=True)
)
# Open in-memory file for storing the encoded WAV file
out_stream = BytesIO()
# Starting a thread that writes the OGG data in small chunks.
# We need the thread because writing too much data to stdin pipe at once, causes a deadlock.
thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array))
thread.start()
# Read encoded WAV data from stdout pipe of FFmpeg, and write it to out_stream
while thread.is_alive():
wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
# Read the last encoded chunk.
wav_chunk = ffmpeg_process.communicate()[0]
out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".
out_stream.seek(0) # Seek to the beginning of out_stream
ffmpeg_process.wait() # Wait for FFmpeg sub-process to end
# Write out_stream to file - just for testing:
with open('test.wav', "wb") as f:
f.write(out_stream.getbuffer())

Related

Pipe bytes from subprocess to file-like object in Python

I'd like accomplish the following in Python. I want to call a subprocess (ffmpeg in this case, using the ffmpy3 wrapper) and directly pipe the process' output on to a file-like object that can be consumed by another function's open() call. Since audio and video data can become quite big, I explicitly don't ever want to load the process' output into memory as a whole, but only "stream" it in a buffered fashion. Here is some example code.
async def convert_and_process(file: FileIO):
ff = ffmpy3.FFmpeg(
inputs={str(file.name): None},
outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f wav'}
)
stdout: StreamReader = (await ff.run_async(stdout=subprocess.PIPE)).stdout
with wave.open(help_needed, 'rb') as wf:
# do stuff with wave file
pass
Here is the code of run_async, it's just a simple wrapper around asyncio.create_subprocess_exec().
My problem is basically just to turn the StreamReader returned by run_async() into a file-like object that can be consumed by wave.open(). Moreover, does this approach actually not load all output into memory, as Popen.wait() or Popen.communicate() would do?
I was thinking that os.pipe() might be useful, but I'm not sure how.
If your example is the true representation of your ultimate goal (read audio samples in blocks) then you can accomplish it much easier just with FFmpeg and its subprocess.Popen.stdout. If there are more to it than using wave library to read a memory-mapped .wav file, then please ignore this answer or clarify.
First a shameless plug, if you are willing to try another library, my ffmpegio can do what you want to do. Here is an example:
import ffmpegio
#audio stream reader
with ffmpegio.open(file,'ra', blocksize=1024, ac=1, ar=16000,
sample_fmt='s16le') as f:
for block in f: # block: [1024xchannels] ndarray
do_your_thing(block)
blocksize argument sets the number of samples to retrieve at a time (so 1024 audio samples in this example).
This library is still pretty young, and if you have any issues please report on its GitHub Issues board.
Second, if you prefer to implement it yourself, it's actually fairly straight forward if you know the FFmpeg output stream formats AND you need only one stream (multiple streams could also be done easily under non-Windows, I think). For your example above, try the following:
ff = ffmpy3.FFmpeg(
inputs={str(file.name): None},
outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f s16le'}
)
stdout = (await ff.run_async(stdout=subprocess.PIPE)).stdout
nsamples = 1024 # read 1024 samples
itemsize = 2 # bytes, int16x1channel
while True:
try:
b = stdout.read(nsamples*itemsize)
# you may need to check for len(b)=0 as well, not sure atm
except BrokenPipeError:
break
x = np.frombuffer(b, nsamples, np.int16)
# do stuff with audio samples in x
Note that I changed -f wav to -f s16le so only the raw samples are sent to stdout. Then stdout.read(n) is essentially identical to wave.readframes(n) except for what their n's mean.

Can I pass a list of image into the input method of ffmpeg-python

My task involves using ffmpeg to create video from image sequence.
the code belows solves the problem.
import ffmpeg
video = ffmpeg.input('/path/to/images/*.jpg', pattern_type='glob',framerate=20).output(video.mp4).run()
However since the image data we are getting follows the pattern
1.jpg,
2.jpg,
3.jpg
.
.
20.jpg
.
.
100.jpg
the video get created with the glob pattern 1.jpg, 100.jpg, 11.jpg, 12.jpg, ... 2.jpg, 20.jpg, 21.jpg ... which is very unpleasant to watch.
Is there anyway I can pass a list or anything else aside a path/glob pattern where the images are sorted in order.
Also as a bonus I will be happy if I can choose which files to add as an the input method input()
You may use Concat demuxer:
Create a file mylist.txt with all the image files in the following format:
file '/path/to/images/1.jpg'
file '/path/to/images/2.jpg'
file '/path/to/images/3.jpg'
file '/path/to/images/20.jpg'
file '/path/to/images/100.jpg'
You may create mylist.txt manually, or create the text file using Python code.
Use the following command (you may select different codec):
ffmpeg.input('mylist.txt', r='20', f='concat', safe='0').output('video.mp4', vcodec='libx264').run()
Second option:
Writing JPEG data into stdin PIPE of FFmpeg sub-process.
Create a list of JPEG file names (Python list).
Execute FFmpeg sub-process, with stdin PIPE as input, and jpeg_pipe input format.
Iterate the list, read the content of each file and write it to stdin PIPE.
Close stdin PIPE.
Here is a code sample:
import ffmpeg
# List of JPEG files
jpeg_files = ['/tmp/0001.jpg', '/tmp/0002.jpg', '/tmp/0003.jpg', '/tmp/0004.jpg', '/tmp/0005.jpg']
# Execute FFmpeg sub-process, with stdin pipe as input, and jpeg_pipe input format
process = ffmpeg.input('pipe:', r='20', f='jpeg_pipe').output('/tmp/video.mp4', vcodec='libx264').overwrite_output().run_async(pipe_stdin=True)
# Iterate jpeg_files, read the content of each file and write it to stdin
for in_file in jpeg_files:
with open(in_file, 'rb') as f:
# Read the JPEG file content to jpeg_data (bytes array)
jpeg_data = f.read()
# Write JPEG data to stdin pipe of FFmpeg process
process.stdin.write(jpeg_data)
# Close stdin pipe - FFmpeg fininsh encoding the output file.
process.stdin.close()
process.wait()

What is the point of setting `output=False` in the PyAudio `open` method?

Below is a code example from the PyAudio documentation, showing how to play a .wav file.
I understand that setting output=False in the open method prevents the file from playing, but what is the point of this? Is this reserved for debugging purpose?
"""PyAudio Example: Play a wave file."""
import pyaudio
import wave
import sys
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# read data
data = wf.readframes(CHUNK)
# play stream (3)
while len(data) > 0:
stream.write(data)
data = wf.readframes(CHUNK)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
You can have input and output streams in PyAudio, setting output = False (which I think is the default anyway) just means it's not an output stream.
An output stream would be used (for example) to play an existing file to the sound subsystem (as per your code snippet).
An input stream may be used to pull info from the sound subsystem to record to a file.
I could see why someone may wonder why you would ever have both input and output set to false (and indeed, I think that's an error condition) but having output being false is fine provided input is true.
TL;DR A stream can either be input, output, or both. At least either output or input must be True.
PyAudio states that the open method of PyAudio opens a new stream:
Open a new stream. See constructor for Stream.__init__() for parameter details.
When looking on Stream.__init__(), we see that both input and output are default to false:
input – Specifies whether this is an input stream. Defaults to False.
output – Specifies whether this is an output stream. Defaults to False.
But there is a warning:
Raises:
ValueError – Neither input nor output are set True.
So stream can be either input (from computer sound system to a file) or output (from file to computer sound system) or both (callback).

Why is mp3/wav duration different when I convert a numpy array with ffmpeg into audiofile (python)?

I want to convert a numpy array which should contain 60s of raw audio into .wav and .mp3 file. With ffmpeg (version 3.4.6) I try to convert the array to the desired formats. For comparison I also use the modul soundfile.
Only the .wav-file created by soundfile has the expected length of exact 60s. The .wav-file created by ffmpeg is a little shorter and the .mp3-file is ca. 32s long.
I want all exports to be the same length.What am I doing wrong?
Here is a sample code:
import subprocess as sp
import numpy as np
import soundfile as sf
def data2audiofile(filename,data):
out_cmds = ['ffmpeg',
'-f', 'f64le', # input 64bit float little endian
'-ar', '44100', # inpt samplerate 44100 Hz
'-ac','1', # input 1 channel (mono)
'-i', '-', # inputfile via pipe
'-y', # overwrite outputfile if it already exists
filename]
pipe = sp.Popen(out_cmds, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
pipe.stdin.write(data)
data = (np.random.randint(low=-32000, high=32000, size=44100*60)/32678).astype('<f8')
data2audiofile('ffmpeg_mp3.mp3',data)
data2audiofile('ffmpeg_wav.wav',data)
sf.write('sf_wav.wav',data,44100)
Here the resulting files displayed in audacity:
You need to close pipe.stdin and wait for the sub-process to end.
Closing pipe.stdin flushes stdin pipe.
The subject is explained here: Writing to a python subprocess pipe:
The key it to close stdin (flush and send EOF) before calling wait
Add the following code lines after pipe.stdin.write(data):
pipe.stdin.close()
pipe.wait()
You can also try setting a large buffer size in sp.Popen:
pipe = sp.Popen(out_cmds, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, bufsize=10**8)

Using FFMPEG to convert Numpy text file to mp3

I have a text file that contains the output of a Numpy array and would like to use FFMPEG to convert that into an mp3 file.
I've been following this tutorial, and to put it in perspective I have essentially written the audio_array that is created to a text file, which I would like to read later on in my program into FFMPEG and convert back into an mp3 file.
I tried modifying the part below as follows, but it doesn't seem to give me an output:
pipe = sp.Popen([ FFMPEG_BIN,
'-y', # (optional) means overwrite the output file if it already exists.
"-f", 's16le', # means 16bit input
"-acodec", "pcm_s16le", # means raw 16bit input
'-r', "44100", # the input will have 44100 Hz
'-ac','2', # the input will have 2 channels (stereo)
'-i', '[path/to/my/text_file.txt]',
'-vn', # means "don't expect any video input"
'-acodec', "libfdk_aac" # output audio codec
'-b', "3000k", # output bitrate (=quality). Here, 3000kb/second
'my_awesome_output_audio_file.mp3'],
stdin=sp.PIPE,stdout=sp.PIPE, stderr=sp.PIPE)

Categories

Resources