Send args to subprocess while using stdin

Send args to subprocess while using stdin - python

I'm trying to take a screenshot then run a command on that screenshot without saving to disk.
The actual command I want to run is visgrep image.png pattern.pat
visgrep must have two args: the image file and a .pat file.
Here is what I have so far.
p = subprocess.Popen(['import', '-crop', '305x42+1328+281', '-window', 'root', '-depth', '8', 'png:' ], stdout=subprocess.PIPE,)
cmd = ['visgrep']
subprocess.call(cmd, stdin=p.stdout)
Obviously this fails as visgrep must have two args.
So how can I do visgrep image.png pattern.pat but substituting 'image.png' with the output of ImageMagick's import?
Do I need to use xargs? Is there a better way to accomplish what I'm trying?

In linux you can use /dev/stdin as file name but it does not work all the times. If it does not work with visgrep, you must use a temporary file (which is not a shame).
PS. shouldn't png: be png:-?

According to this answer, changing the argument png: to png:- will cause the import command to output to standard out instead of a file. I am unfamiliar with visgrep, so I'm not sure how to tell it to read the source image from stdin.

From the ImageMagick documentation:
STDIN, STDOUT, and file descriptors
Unix and Windows permit the output of one command to be piped to the
input of another. ImageMagick permits image data to be read and
written from the standard streams STDIN (standard in) and STDOUT
(standard out), respectively, using a pseudo-filename of -. In this
example we pipe the output of convert to the display program:
$ convert logo: gif:- | display gif:-
The second explicit format "gif:" is optional in the preceding
example. The GIF image format has a unique signature within the image
so ImageMagick's display command can readily recognize the format as
GIF. The convert program also accepts STDIN as input in this way:
$ convert rose: gif:- | convert - -resize "200%" bigrose.jpg
You can use the same filename convention with the import command.
So, try:
p = subprocess.Popen(['import', '-crop', '305x42+1328+281',
'-window', 'root', '-depth', '8', 'png:-' ],
stdout=subprocess.PIPE,)

Related

How do I pass user-input filenames to ImageMagick safely?

I am generating an ImageMagick bash command using Python. Something like
import subprocess
input_file = "hello.png"
output_file = "world.jpg"
subprocess.run(["convert", input_file, output_file])
where there might be more arguments before input_file or output_file. My question is, if either of the filenames is user provided and the user provides a filename that can be parsed as a command line option for ImageMagick, isn't that unsafe?

If the filename starts with a dash, ImageMagick indeed could think that this is an option instead of a filename. Most programs - including AFIK the ImageMagick command line tools - follow the convention that a double-dash (--) denotes the end of the options. If you do a
subprocess.run(["convert", "--", input_file, output_file])
you should be safe in this respect.

From the man page (and a few tests), convert requires an input file and an output file. If you only allow two tokens and if a file name is interpreted as an option then convert is going to miss at least one of the files, so you'll get an ugly message but you should be fine.
Otherwise you can prefix any file name that starts with - with ./ (except - itself, which is stdin or stdout depending on position), so that it becomes an unambiguous file path to the same file.

How do you use Python Ghostscript's high-level interface to convert a .pdf file into multiple .png files?

I am trying to convert a .pdf file into several .png files using Ghostscript in Python. The other answers on here were pretty old hence this new thread.
The following code was given as an example on pypi.org of the 'high level' interface, and I am trying to model my code after the example code below.
import sys
import locale
import ghostscript
args = [
"ps2pdf", # actual value doesn't matter
"-dNOPAUSE", "-dBATCH", "-dSAFER",
"-sDEVICE=pdfwrite",
"-sOutputFile=" + sys.argv[1],
"-c", ".setpdfwrite",
"-f", sys.argv[2]
]
# arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
Can someone explain what this code is doing? And can it be used somehow to convert a .pdf into .png files?
I am new to this and am truly confused. Thanks so much!

That's calling Ghostscript, obviously. From the arguments it's not spawning a process, it's linked (either dynamically or statically) to the Ghostscript library.
The args are Ghostscript arguments. These are documented in the Ghostscript documentation, you can find it online here. Because it mimics the command line interface, where the first argument is the calling program, the first argument here is meaningless and can be anything you want (as the comment says).
The next three arguments turn on SAFER (which prevents some potentially dangerous operations and is, now, the default anyway), sets NOPAUSE so the entire input is processed without pausing between pages, and BATCH so that on completion Ghostscript exits instead of returning to the interactive prompt.
Then it selects a device. In Ghostscript (due to the PostScript language) devices are what actually output stuff. In this case the device selected is the pdfwrite device, which outputs PDF.
Then there's the OutputFile, you can probably guess that this is the name (and path) of the file where the output is to be written.
The next 3 arguments; -c .setpdfwrite -f are, frankly archaic and pointless. They were once recommended when using the pdfwrite device (and only the pdfwrite device) but they have no useful effect these days.
The very last argument is, of course, the input file.
Certainly you can use Ghostscript to render PDF files to PNG. You want to use one of the PNG devices, there are several depending on what colour depth you want to support. Unless you have some stranger requirement, just use png16m. If your input file contains more than one page you'll want to set the OutputFile to use %d so that it writes one file per page.
More details on all of this can, of course, be found in the documentation.

Passing piped data to Python program and also an input file

I want to pass data to a Python file using a pipe and also specifying an input file like:
cat file.txt|python script.py -u configuration.txt
I currently have this:
for line in fileinput.input(mode='rU'):
print(line)
I know there can be something with sys.argv but maybe using fileinput there is a clean way to do it?
Thanks.

From the documentation:
If a filename is '-', it is also replaced by sys.stdin. To specify an alternative list of filenames, pass it as the first argument to input().
So you can create a list containing '-' as well as the contents of sys.argv[1:] (the default), and pass that to input(). Or alternatively just put - in the list of arguments of your Python program:
cat file.txt|python script.py -u - configuration.txt
or
cat file.txt|python script.py -u configuration.txt -
depending on whether you want data provided on standard input to be processed before or after the contents of configuration.txt.
If you want to do anything more complicated than just processing the contents of standard input as if it were an input file, you probably should not be using the fileinput module.

Ffmpeg map and filter_complex subprocess - Python

I want to crop and re encode videos via ffmpeg from within python using subprocesses.
I managed starting a subprocess using a pure string command and shell=True but I want to build more complex commands and would prefer to use shell=False and passing a list of arguments.
So what works is this form (this is a simplified example, there will be multiple streams in the final version):
import subprocess as sp
sp.Popen('ffmpeg.exe -i Test.avi -filter_complex "[0:v]crop=1024:1024:0:0[out1]" -map [out1] out1.mp4', shell=True)
This script produces the expected cropped output video.
For a list of arguments, I tried:
FFMPEG_PATH = 'ffmpeg.exe'
aviP='Test.avi'
sp.Popen([FFMPEG_PATH,
'-i', aviP,
'-filter_complex', '[0:v]crop=1024:1024:0:0[out1]',
'-map', '[out1] out1.mp4'])
When I execute this second version, simply nothing happens. (no error, no output)
I suspect I am messing up something in the map command syntax?

I think I figured it out:
FFMPEG_PATH = 'ffmpeg.exe'
aviP='Test.avi'
sp.Popen([FFMPEG_PATH,
'-i', aviP,
'-filter_complex', '[0:v]crop=1024:1024:0:0[out1]',
'-map', '[out1]', 'out1.mp4'])
is the correct syntax

PDF to PNG in Python with pdf2cairo

I'm looking for a good PDF 2 Image convertor for a long time. I need to convert the PDF to an image in order to print it with use of Qt. I'm programming in Python/Pyside, so if I can convert the PDF to a series of (PNG) images with use of subprocess I can print them without problems.
I achieved to do this by calling convert.exe from Imagemagick. It works quite well but it relies on GhostScript and that is a big package which I want to avoid since its more complex to integrate.
I also tried muPDF from GhostScript, but this seems to not have stdin and stdout options. That's a pity because it first saves my file. Opens it with muPDF, converts and saves it and then reload it again in my Python application. It should be possible without all those steps!
Today I started with experimenting with Poppler's pdf2cairo. I assumed that it would work in this way to convert my (multi paged) PDF to a series of images and pipe it to the stdout. Unfortunately it doesn't and I experience two problems:
It complains that it can only export to stdout when you also use the -singlepage argument. How can I export all pages to stdout?
When I export to stdout I get the error: 'Error opening output file fd://0.png\r\n
Converting a pdf from stdin to image files is no problem it all.
This is my code which also triggers the error about opening the output file:
import subprocess
pdf = open('test.pdf')
p = subprocess.Popen(['pop/pdftocairo.exe', '-singlefile', '-png', '-', '-'],stdin = pdf, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
print(p.stderr.read())
print(p.stdout.read())
I've downloaded PDF2Cairo pre-compiled from: http://blog.alivate.com.au/poppler-windows/
The documentation of the command line options of pdf2cairo can be found here: http://manpages.ubuntu.com/manpages/precise/man1/pdftocairo.1.html
Hopefully you can help me out to make this work!
Update
As you can see below in the answers pdftocairo is buggy and does not work correctly when you want to use stdout. pdftoppm does work it return is byte object of your PDF file:
pdf = open('test.pdf')
p = subprocess.Popen(['pop/pdftoppm.exe', '-png'],stdin = pdf, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
data, error = p.communicate()
The only thing I still need to do is split the byte object into multiple files.

It's a bug in pdftocairo.
The output filename is first passed to getOutputFilename, which returns the special string fd://0 as placeholder for stdout.
But then later that string is passed to getImageFilename which unconditionally adds an extension to the filename, so that later the comparision fails and the program tires to open the literal file fd://0.png instead of using stdout.
Unfortunatlely, the only thing you can do is file a bug report.
As for exporting a multipage document to stdout, that's not supported at all, and it wouldn't work with filetypes like png or jpeg anyway, because these formats don't support multipage documents. It does work for svg, pdf, eps and ps output files, as these formats do support multipage documents (and the processing of the filename done correctly for these.)

I thought it would be easier to just use os.system and pass the whole command string.
This assumes there are "pdfs" and "imgs" folders; change accordingly.
import os
import glob
for pdf_file in glob.glob("pdfs\*.pdf"):
cmd_str = "pdftocairo.exe -jpeg \"%s\" \"%s\"" % (pdf_file, os.path.join("imgs", os.path.splitext(os.path.split(pdf_file)[1])[0]))
print cmd_str
os.system(cmd_str)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.