Python subprocess call to xpdf's pdftotext not working with encoding

Python subprocess call to xpdf's pdftotext not working with encoding - python

I am trying to run pdftotext using python subprocess module.
import subprocess
pdf = r"path\to\file.pdf"
txt = r"path\to\out.txt"
pdftotext = r"path\to\pdftotext.exe"
cmd = [pdftotext, pdf, txt, '-enc UTF-8']
response = subprocess.check_output(cmd,
shell=True,
stderr=subprocess.STDOUT)
TB
CalledProcessError: Command '['path\\to\\pdftotext.exe',
'path\\to\\file.pdf', 'path\\to\\out.txt', '-enc UTF-8']'
returned non-zero exit status 99
When I remove last argument '-enc UTF-8' from cmd, it works OK in python.
When I run pdftotext pdf txt -enc UTF-8 in cmd, it works ok.
What I am missing?
Thanks.

subprocess has some complicated rules for handling commands. From the docs:
The shell argument (which defaults to False) specifies whether to use
the shell as the program to execute. If shell is True, it is
recommended to pass args as a string rather than as a sequence.
More details explained in this answer here.
So, as the docs explain, you should convert your command to a string:
cmd = r"""{} "{}" "{}" -enc UTF-8""".format('pdftotext', pdf, txt)
Now, call subprocess as:
subprocess.call(cmd, shell=True, stderr=subprocess.STDOUT)

Related

Access output of CommandLine Command in python

I have a command for the commandline which is sent using python.
import subprocess
cmdline = 'tool.exe /j231_' #TOOL COMMAND
dir = r'D:\RC\tool.exe'
rc = subprocess.run("start cmd /K " + cmdline, cwd=dir, shell=True)
The output on the command line is:
Application directory is: D:\RC\tool.exe
Project 231_ loaded succesfully.
=========================================
Operation has terminated successfully.
I need to get the line Project 231_ loaded succesfully., as a string in python. How can I do this? I tried using stdout, stderr but I could not do it successfully.

import subprocess
help(subprocess.run)
Help on function run in module subprocess:
run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs)
Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode,
stdout and stderr. By default, stdout and stderr are not
captured, and those attributes will be None. Pass stdout=PIPE
and/or stderr=PIPE in order to capture them.
…
By default, all communication is in bytes, and therefore any "input"
should be bytes, and the stdout and stderr will be bytes. If in
text mode, any "input" should be a string, and stdout and stderr
will be strings decoded according to locale encoding, or by "encoding"
if set. Text mode is triggered by setting any of text, encoding,
errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
Getting the above together [maybe adding some details from help(subprocess.Popen)], the following code snippet should run, and you can extract desired info from the rc.stdout string easy:
import subprocess
cmdline = 'tool.exe /j231_' #TOOL COMMAND
dir = r'D:\RC' # \tool.exe' # should be a directory
rc = subprocess.run(["cmd.exe", "/C", cmdline], cwd=dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, text=True)
print(rc.stdout)
Note the /C (instead of /K) switch for cmd.exe:
/C Carries out the command specified by string and then terminates
/K Carries out the command specified by string but remains

Run cmd file using python

I have a cmd file "file.cmd" containing 100s of lines of command.
Example
pandoc --extract-media -f docx -t gfm "sample1.docx" -o "sample1.md"
pandoc --extract-media -f docx -t gfm "sample2.docx" -o "sample2.md"
pandoc --extract-media -f docx -t gfm "sample3.docx" -o "sample3.md"
I am trying to run these commands using a script so that I don't have to go to a file and click on it.
This is my code, and it results in no output:
file1 = open('example.cmd', 'r')
Lines = file1.readlines()
# print(Lines)
for i in Lines:
print(i)
os.system(i)

You don't need to read the cmd file line by line. you can simply try the following:
import os
os.system('myfile.cmd')
or using the subprocess module:
import subprocess
p = subprocess.Popen(['myfile.cmd'], shell = True, close_fds = True)
stdout, stderr = proc.communicate()
Example:
myfile.cmd:
#ECHO OFF
ECHO Grettings From Python!
PAUSE
script.py:
import os
os.system('myfile.cmd')
The cmd will open with:
Greetings From Python!
Press any key to continue ...
You can debug the issue by knowing the return exit code by:
import os
return_code=os.system('myfile.cmd')
assert return_code == 0 #asserts that the return code is 0 indicating success!
Note: os.system works by calling system() in C can only take up to 65533 arguments after a command (so it is a 16 bit issue). Giving one more argument will result in the return code 32512 (which implies the exit code 127).

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function (os.system('command')).
since it is a command file (cmd), and only the shell can run it, then shell argument must set to be true. since you are setting the shell argument to true, the command needs to be string form and not a list.
use the Popen method for spawn a new process and the communicte for waiting on that process (you can time it out as well). if you whish to communicate with the child process, provide the PIPES (see mu example, but you dont have to!)
the code below for python 3.3 and beyond
import subprocess
try:
proc=subprocess.Popen('myfile.cmd', shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
outs, errs = proc.communicate(timeout=15) #timing out the execution, just if you want, you dont have to!
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
for older python versions
proc = subprocess.Popen('myfile.cmd', shell=True)
t=10
while proc.poll() is None and t >= 0:
print('Still waiting')
time.sleep(1)
t -= 1
proc.kill()
In both cases (python versions) if you dont need the timeout feature and you dont need to interact with the child process, then just, use:
proc = subprocess.Popen('myfile.cmd', shell=True)
proc.communicate()

Redirecting shell command output to a file does not work using subprocess.Popen in Python

I am using Python 2.6.6 and failed to re-direct the Beeline(Hive) SQL query output returning multiple rows to a file on Unix using ">". For simplicity's sake, I replaced the SQL query with simple "ls" command on current directory and outputting to a text file.
Please ignore syntax of function sendfile. I want help to tweak the function "callcmd" to pipe the stdout onto the text file.
def callcmd(cmd, shl):
logging.info('> '+' '.join(map(str,cmd)))
#return 0;
start_time = time.time()
command_process = subprocess.Popen(cmd, shell=shl, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)
command_output = command_process.communicate()[0]
logging.info(command_output)
elapsed_time = time.time() - start_time
logging.info(time.strftime("%H:%M:%S",time.gmtime(elapsed_time))+' = time to complete (hh:mm:ss)')
if (command_process.returncode != 0):
logging.error('ERROR ON COMMAND: '+' '.join(map(str,cmd)))
logging.error('ERROR CODE: '+str(ret_code))
return command_process.returncode
cmd=['ls', ' >', '/home/input/xyz.txt']
ret_code = callcmd(cmd, False)

Your command (i.e. cmd) could be ['sh', '-c', 'ls > ~/xyz.txt']. That would mean that the output of ls is never passed to Python, it happens entirely in the spawned shell – so you can't log the output. In that case, I'd have used return_code = subprocess.call(cmd), no need for Popen and communicate.
Equivalently, assuming you use bash or similar, you can simply use
subprocess.call('ls > ~/test.txt', shell=True)
If you want to access the output, e.g. for logging, you could use
s = subprocess.check_output(['ls'])
and then write that to a file like you would regularly in Python. To check for a non-zero exit code, handle the CalledProcessError that is raised in such cases.

Here the stdout in command_output is written to a file. You don't need to use any redirection although an alternative might be to have the python print to stdout, and then you would redirect that in your shell to a file.
#!/usr/bin/python
import subprocess
cmd=['ls']
command_process = subprocess.Popen(
cmd,
shell='/bin/bash',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True
)
command_output = command_process.communicate()[0]
if (command_process.returncode != 0):
logging.error('ERROR ON COMMAND: '+' '.join(map(str,cmd)))
logging.error('ERROR CODE: '+str(ret_code))
f = open('listing.txt','w')
f.write(command_output)
f.close()

I added this piece of code to my code and It works fine.Thanks to #Snohdo
f = open('listing.txt','w')
f.write(command_output)
f.close()

Python subprocess.Popen() followed by time.sleep

I want to make a python script that will convert a TEX file to PDF and then open the output file with my document viewer.
I first tried the following:
import subprocess
subprocess.Popen(['xelatex', '--output-directory=Alunos/', 'Alunos/' + aluno + '_pratica.tex'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
subprocess.Popen(['gnome-open', 'Alunos/'+aluno+'_pratica.pdf'], shell=False)
This way, the conversion from TEX to PDF works all right, but, as it takes some time, the second command (open file with Document Viewer) is executed before the output file is created.
So, I tried do make the program wait some seconds before executing the second command. Here's what I've done:
import subprocess
import time
subprocess.Popen(['xelatex', '--output-directory=Alunos/', 'Alunos/' + aluno + '_pratica.tex'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
time.sleep(10)
subprocess.Popen(['gnome-open', 'Alunos/'+aluno+'_pratica.pdf'], shell=False)
But, when I do so, the output PDF file is not created. I can't understand why. The only change was the time.sleep command. Why does it affect the Popen process?
Could anyone give me some help?
EDIT:
I've followed the advice from Faust and Paulo Bu and in both cases the result is the same.
When I run this command...
subprocess.call('xelatex --output-directory=Alunos/ Alunos/{}_pratica.tex'.format(aluno), shell=True)
... or this...
p = subprocess.Popen(['xelatex', '--output-directory=Alunos/', 'Alunos/' + aluno + '_pratica.tex'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.wait()
...the Xelatex program is run but doesn't make the conversion.
Strangely, when I run the command directly in the shell...
$ xelatex --output-directory=Alunos/ Alunos/name_pratica.tex
... the conversion works perfectly.
Here's what I get when I run the subprocess.call() command:
$ python my_file.py
Enter name:
name
This is XeTeX, Version 3.1415926-2.4-0.9998 (TeX Live 2012/Debian)
restricted \write18 enabled.
entering extended mode
(./Alunos/name_pratica.tex
LaTeX2e <2011/06/27>
Babel <v3.8m> and hyphenation patterns for english, dumylang, nohyphenation, loaded.
)
*
When I write the command directly in the shell, the output is the same, but it followed automatically by the conversion.
Does anyone know why it happens this way?
PS: sorry for the bad formating. I don't know how to post the shell output properly.

If you need to wait the termination of the program and you are not interested in its output you should use subprocess.call
import subprocess
subprocess.call(['xelatex', '--output-directory=Alunos/', 'Alunos/{}_pratica.tex'.format(aluno)])
subprocess.call([('gnome-open', 'Alunos/{}_pratica.pdf'.format(aluno)])
EDIT:
Also it is generally a good thing to use English when you have to name variables or functions.

If xelatex command works in a shell but fails when you call it from Python then xelatex might be blocked on output in your Python code. You do not read the pipes despite setting stdout/stderr to PIPE. On my machine the pipe buffer is 64KB therefore if xelatex output size is less then it should not block.
You could redirect the output to os.devnull instead:
import os
import webbrowser
from subprocess import STDOUT, check_call
try:
from subprocess import DEVNULL # py3k
except ImportError:
DEVNULL = open(os.devnull, 'w+b')
basename = aluno + '_pratica'
output_dir = 'Alunos'
root = os.path.join(output_dir, basename)
check_call(['xelatex', '--output-directory', output_dir, root+'.tex'],
stdin=DEVNULL, stdout=DEVNULL, stderr=STDOUT)
webbrowser.open(root+'.pdf')
check_call is used to wait for xelatex and raise an exception on error.

Running Subprocess from Python

I want to run a cmd exe using a python script.
I have the following code:
def run_command(command):
p = subprocess.Popen(command, shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
return p.communicate()
then i use:
run_command(r"C:\Users\user\Desktop\application\uploader.exe")
this returns the option menu where i need to specify additional parameter for the cmd exe to run. So i pass additional parameters for the cmd exe to run.
How do i accomplish this. I've looked at subprocess.communicate but i was unable to understand it

If uploader.exe accepts command line options, then you could try subprocess.call in the following manner:
If your command is uploader.exe, the directory of uploader is C:\Users\...\application, and the additional parameter is x, you could try
import subprocess
def run_command(command, directory, arg):
return subprocess.call(["command %s"%arg], cwd=directory, shell=True)
run_command("uploader.exe", "C:\\Users\\..\\application", "x")
However, this assumes you do not need to actually interact with indexer.exe after you Popen to it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python subprocess call to xpdf's pdftotext not working with encoding - python

Related

Access output of CommandLine Command in python

Run cmd file using python

Redirecting shell command output to a file does not work using subprocess.Popen in Python

Python subprocess.Popen() followed by time.sleep

Running Subprocess from Python

Categories

Resources