scipy.io.wavfile.read() the stdout from FFmpeg - python

After searching for a long time, I still cannot find the solution to use scipy.io.wavfile.read() to read the bytes from the stdout of FFmpeg 3.3.6.
Here is the example code working perfectly. However, it needs to save a converted file to disk.
import subprocess
import scipy.io.wavfile as wavfile
command = 'ffmpeg -i in.mp3 out.wav'
subprocess.run(command)
with open('out.wav', 'rb') as wf:
rate, signal = wavfile.read(wf)
print(rate, signal)
And here is the code I try to get the FFmpeg output from stdout and load it into scipy wavfile.
import io
import subprocess
import scipy.io.wavfile as wavfile
command = 'ffmpeg -i in.mp3 -f wav -'
proc = subprocess.run(command, stdout=subprocess.PIPE)
rate, signal = wavfile.read(io.BytesIO(proc.stdout))
print(rate, signal)
Sadly, it raises a ValueError.
Traceback (most recent call last):
File ".\err.py", line 8, in <module>
rate, signal = wavfile.read(io.BytesIO(proc.stdout))
File "C:\Users\Sean Wu\AppData\Local\Programs\Python\Python36\lib\site-
packages\scipy\io\wavfile.py", line 246, in read
raise ValueError("Unexpected end of file.")
ValueError: Unexpected end of file.
Are there any methods to solve this problem?

Apparently when the output of ffmpeg is sent to stdout, the program does not fill in the RIFF chunk size of the file header. Instead, the four bytes where the chunk size should be are all 0xFF. scipy.io.wavfile.read() expects that value to be correct, so it thinks the length of the chunk is 0xFFFFFFFF bytes.
When you give ffmpeg an output file to write, it correctly fills in the RIFF chunk size, so wavfile.read() is able to read the file in that case.
A work-around for your code is to patch the RIFF chunk size manually before the data is passed to wavfile.read() via an io.BytesIO() object. Here's a modification of your script that does that. Note: I had to use command.split() for the first argument of subprocess.run(). I'm using Python 3.5.2 on Mac OS X. Also, my test file name is "mpthreetest.mp3".
import io
import subprocess
import scipy.io.wavfile as wavfile
command = 'ffmpeg -i mpthreetest.mp3 -f wav -'
proc = subprocess.run(command.split(), stdout=subprocess.PIPE)
riff_chunk_size = len(proc.stdout) - 8
# Break up the chunk size into four bytes, held in b.
q = riff_chunk_size
b = []
for i in range(4):
q, r = divmod(q, 256)
b.append(r)
# Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
riff = proc.stdout[:4] + bytes(b) + proc.stdout[8:]
rate, signal = wavfile.read(io.BytesIO(riff))
print("rate:", rate)
print("len(signal):", len(signal))
print("signal min and max:", signal.min(), signal.max())

Related

Why subprocess ffmpeg corrupt the file?

I have the following code, that reads a video and saves it in another path, the problem is that when the file is saved this is not reproducible?
import subprocess
import shlex
from io import BytesIO
file = open("a.mkv", "rb")
with open('a.mkv', 'rb') as fh:
buf = BytesIO(fh.read())
args = shlex.split('ffmpeg -i pipe: -codec copy -f rawvideo pipe:')
proc = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = proc.communicate(input=buf.getbuffer())
proc.wait()
f = open("a.mp4", "wb")
f.write(out)
f.close()
I need keep the buffers so, the video has the correct size, how can I solve this?
You could use ffmpeg-python
pip install ffmpeg-python
And then:
import subprocess
import shlex
import ffmpeg
from io import BytesIO
file = open("a.mkv", "rb")
with open('a.mkv', 'rb') as fh:
buf = BytesIO(fh.read())
process = (
ffmpeg
.input('pipe:') \
.output('a.mp4') \
.overwrite_output() \
.run_async(pipe_stdin=True) \
)
process.communicate(input=buf.getbuffer())
and isn't there any way of making a pipe of stream data like a node.js, for example, start a pipe that download bytes from s3, process with ffmpeg and upload to s3, that in node.js I guess can do byte by byte instead of fill the ram, so then in python the idea is create a temp file in the backend server and write then the files
Yes, there is, but there is no prescribed mechanism in Python like in Node.js. You need to run your own threads (or asyncio coroutines), one to send data to and other to receive data from FFmpeg process. Here is a sketch of what I would do
from threading import Thread
import subprocess as sp
# let's say getting mp4 and output mkv, copying all the streams
# NOTE: you cannot pipe out mp4
args = ['-f','mp4','-','-c','copy','-f','matroska','-']
proc = sp.Popen(args,stdin=sp.PIPE,stdout=sp.PIPE)
def writer():
# get downloaded bytes data
data = ...
while True:
# get next data block
data = self._queue.get()
self._queue.task_done()
if data is None:
break
try:
nbytes = proc.stdin.write(data)
except:
# stdout stream closed/FFmpeg terminated, end the thread as well
break
if not nbytes and proc.stdin.closed: # just in case
break
def reader():
# output block size
blocksize = ... # set to something reasonable
# I use the frame byte size for rawdata in but would be
# different for receiving encoded data
while True:
try:
data = proc.stdout.read(blocksize)
except:
# stdout stream closed/FFmpeg terminated, end the thread as well
break
if not data: # done no more data
break
# upload the data
...
writer = Thread(target=writer)
reader = Thread(target=reader)
writer.start()
reader.start()
writer.join() # first wait until all the data are written
proc.stdin.close() # triggers ffmpeg to stop waiting for input and wrap up its encoding
proc.wait() # waits for ffmpeg
reader.join() # wait till all the ffmpeg outputs are processed
I tried out multiple different attempts using different approaches for my ffmpegio.streams.SimpleFilterBase class and settled on this approach.

subprocess.Popen bufsize=1 being ignored

I'm having a similar problem described here:
subprocess stdin buffer not flushing on newline with bufsize=1
My script is not outputting line by line when I specify bufsize=1. It's infact outputting when the buffer fills. I want to read line by line in realtime.
I've tried the below on Linux Mint and CentOS 7. test.py is what I run, it calls the 'script' file which is executable.
test.py
#!/usr/bin/python
import subprocess
import sys
process = subprocess.Popen('/root/script',bufsize=1, stdout=subprocess.PIPE, stderr=subprocess.PIPE,close_fds=True, universal_newlines=True)
for line in iter(process.stdout.readline, b''):
print line
script
#!/usr/bin/python
import time
for i in range(0,999999):
print str('This is a short line, ' + str(i) + '\n')
time.sleep(0.01)
This outputs large numbers of lines at once with pauses between the output. The first large chunk ends at line 154
However, If I make the output line much larger in the script file:
print str('This is a long line................................................................................................. ' + str(i) + '\n')
The first large chunk ends at line 66
Why is my buffer size being ignored?

Create and pipe a file-like object as input for a command

I'm looking for a better way to do this, if possible:
import subprocess
f = open('temp.file', 'w+')
f.write('hello world')
f.close()
out = subprocess.check_output(['cat', 'temp.file'])
print out
subprocess.check_output(['rm', 'temp.file'])
In this example I'm creating a file and passing it as input to cat (in reality it's not cat I'm running but some other program that parses an input pcap file).
What I'm wondering is, is there a way in Python I can create a 'file-like object' with some content, and pipe this file-like object as input to a command-line program. If it is possible, I reckon it would be more efficient than writing a file to the disk and then deleting that file.
check_output takes a stdin input argument to specify a file-like object to connect to the process's standard input.
with open('temp.file') as input:
out = subprocess.check_output(['cat'], stdin=input)
Also, there's no need to shell out to run rm; you can remove the file directly from Python:
os.remove('temp.file')
You can write to a TemporaryFile
import subprocess
from tempfile import TemporaryFile
f = TemporaryFile("w")
f.write("foo")
f.seek(0)
out = subprocess.check_output(['cat'],stdin=f)
print(out)
b'foo'
If you just want to write to a file like object and get the content:
from io import StringIO
f = StringIO()
f.write("foo")
print(f.getvalue())
If the program is configured to read from stdin, you can use Popen.communicate:
>>> from subprocess import Popen, PIPE
>>> p = Popen('cat', stdout=PIPE, stdin=PIPE, stderr=PIPE)
>>> out, err = p.communicate(input=b"Hello world!")
>>> out
'Hello world!'
If the command accepts only filenames, if it doesn't read input from its stdin i.e., if you can't use stdin=PIPE + .communicate() or stdin=real_file then you could try /dev/fd/# filenames:
#!/usr/bin/env python3
import os
import subprocess
import threading
def pump_input(pipe):
with pipe:
for i in range(3):
print(i, file=pipe)
r, w = os.pipe()
try:
threading.Thread(target=pump_input, args=[open(w, 'w')]).start()
out = subprocess.check_output(['cat', '/dev/fd/'+str(r)], pass_fds=[r])
finally:
os.close(r)
print('got:', out)
No content touches the disk. The input is passed to the subprocess via the pipe directly.
If you have a file-like object that is not a real file (otherwise, just pass its name as the command-line argument) then pump_input() could look like:
import shutil
def pump_input(pipe):
with pipe:
shutil.copyfileobj(file_like_object, pipe)

Use StringIO as stdin with Popen

I have the following shell script that I would like to write in Python (of course grep . is actually a much more complex command):
#!/bin/bash
(cat somefile 2>/dev/null || (echo 'somefile not found'; cat logfile)) \
| grep .
I tried this (which lacks an equivalent to cat logfile anyway):
#!/usr/bin/env python
import StringIO
import subprocess
try:
myfile = open('somefile')
except:
myfile = StringIO.StringIO('somefile not found')
subprocess.call(['grep', '.'], stdin = myfile)
But I get the error AttributeError: StringIO instance has no attribute 'fileno'.
I know I should use subprocess.communicate() instead of StringIO to send strings to the grep process, but I don't know how to mix both strings and files.
p = subprocess.Popen(['grep', '...'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
output, output_err = p.communicate(myfile.read())
Don't use bare except, it may catch too much. In Python 3:
#!/usr/bin/env python3
from subprocess import check_output
try:
file = open('somefile', 'rb', 0)
except FileNotFoundError:
output = check_output(cmd, input=b'somefile not found')
else:
with file:
output = check_output(cmd, stdin=file)
It works for large files (the file is redirected at the file descriptor level -- no need to load it into the memory).
If you have a file-like object (without a real .fileno()); you could write to the pipe directly using .write() method:
#!/usr/bin/env python3
import io
from shutil import copyfileobj
from subprocess import Popen, PIPE
from threading import Thread
try:
file = open('somefile', 'rb', 0)
except FileNotFoundError:
file = io.BytesIO(b'somefile not found')
def write_input(source, sink):
with source, sink:
copyfileobj(source, sink)
cmd = ['grep', 'o']
with Popen(cmd, stdin=PIPE, stdout=PIPE) as process:
Thread(target=write_input, args=(file, process.stdin), daemon=True).start()
output = process.stdout.read()
The following answer uses shutil as well --which is quite efficient--,
but avoids a running a separate thread, which in turn never ends and goes zombie when the stdin ends (as with the answer from #jfs)
import os
import subprocess
import io
from shutil import copyfileobj
file_exist = os.path.isfile(file)
with open(file) if file_exists else io.StringIO("Some text here ...\n") as string_io:
with subprocess.Popen("cat", stdin=subprocess.PIPE, stdout=subprocess.PIPE, universal_newlines=True) as process:
copyfileobj(string_io, process.stdin)
# the subsequent code is not executed until copyfileobj ends,
# ... but the subprocess is effectively using the input.
process.stdin.close() # close or otherwise won't end
# Do some online processing to process.stdout, for example...
for line in process.stdout:
print(line) # do something
Alternatively to close and parsing, if the output is known to fit in memory:
...
stdout_text , stderr_text = process.communicate()

How to test a directory of files for gzip and uncompress gzipped files in Python using zcat?

I'm in my 2nd week of Python and I'm stuck on a directory of zipped/unzipped logfiles, which I need to parse and process.
Currently I'm doing this:
import os
import sys
import operator
import zipfile
import zlib
import gzip
import subprocess
if sys.version.startswith("3."):
import io
io_method = io.BytesIO
else:
import cStringIO
io_method = cStringIO.StringIO
for f in glob.glob('logs/*'):
file = open(f,'rb')
new_file_name = f + "_unzipped"
last_pos = file.tell()
# test for gzip
if (file.read(2) == b'\x1f\x8b'):
file.seek(last_pos)
#unzip to new file
out = open( new_file_name, "wb" )
process = subprocess.Popen(["zcat", f], stdout = subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
if process.poll() != None:
break;
output = io_method(process.communicate()[0])
exitCode = process.returncode
if (exitCode == 0):
print "done"
out.write( output )
out.close()
else:
raise ProcessException(command, exitCode, output)
which I've "stitched" together using these SO answers (here) and blogposts (here)
However, it does not seem to work, because my test file is 2.5GB and the script has been chewing on it for 10+mins plus I'm not really sure if what I'm doing is correct anyway.
Question:
If I don't want to use GZIP module and need to de-compress chunk-by-chunk (actual files are >10GB), how do I uncompress and save to file using zcat and subprocess in Python?
Thanks!
This should read the first line of every file in the logs subdirectory, unzipping as required:
#!/usr/bin/env python
import glob
import gzip
import subprocess
for f in glob.glob('logs/*'):
if f.endswith('.gz'):
# Open a compressed file. Here is the easy way:
# file = gzip.open(f, 'rb')
# Or, here is the hard way:
proc = subprocess.Popen(['zcat', f], stdout=subprocess.PIPE)
file = proc.stdout
else:
# Otherwise, it must be a regular file
file = open(f, 'rb')
# Process file, for example:
print f, file.readline()

Categories

Resources