Running pdftotext from Python - python

I am trying to convert a pdf document to text document using pdftotext software.
I need to call this application inc command prompt from python script to convert the file.
I have following code:
import os
import subprocess
path = "C:\\Users\\..."
pdffname = "pdffilename.pdf"
txtfname = "txtfilename.txt"
subprocess.call(['pdftotext', '-layout',
os.path.join(path, pdffname),
os.path.join(path, txtfname)])
When I run this code, I get error
File "C:/Users/.../code-1.py", line 44, in <module>
os.path.join(path, txtfname)])
File "C:\Anaconda\lib\subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Can you help to call pdftotext application from python to convert pdf to text file.

I had this same error, except with Popen. I fixed it by providing the full path to pdftotext.exe in the subprocess call. Don't forget to escape your backslashes.
I do not know much about Anaconda, and I have not tested this myself, but I believe Conda may have an issue referencing scripts on Windows: fix references to scripts on windows

Related

Creating an .app file using Platypus failing with subprocess

import os
import subprocess
pathName = "FilePath"
os.chdir(r'Directory Path')
process = subprocess.Popen(["scrapy", "crawl", "homeDepotSpider", "-t" , "csv" , "-o", pathName])
Here's the error message I get:
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in __init__ errread, errwrite) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1047, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory
I can't really find any documentation online, but is Platypus just not compatible with subprocess or is there something wrong with my code. When I compile it runs fine, when I create an .app file it doesnt work.
Edit: Here is the software that I used to make my program into an executable: https://sveinbjorn.org/platypus
The Exception says the reason: "No such file"
What subprocess.Popen line is trying to do is roughly the same as running scrapy in a cmd prompt. That is (in Windows terms), in the PATH of the subprocess you are trying to launch there is no folder containing 'scrapy.exe' .
I read that you used some packaging SW to bundle your script. The packaging program has not packaged the scrapy executable.

Subprocess No such file or directory error

As part of larger code, I am trying to make a function that calls to a latexmk compiler using subprocess, but I consistently get FileNotFoundError: [Errno 2] No such file or directory: 'latexmk': 'latexmk'
However, If I write the command directly in the terminal, everything works: latexmk --pdf test.tex
In case it is important, I am on MacOS Mojave 10.14.6, running python 3.6 spyder through anaconda
I have checked the following links:
https://askubuntu.com/questions/801493/python-subprocess-call-not-working-as-expected
OSError: [Errno 2] No such file or directory while using python subprocess in Django
Running Bash commands in Python
If anything there solves the problem, I missed it.
To make everyone's life easier, here's a link to a .tex file [you can use your own]:
https://drive.google.com/open?id=1DoJnvg2BmbRCzmRmqFYRVybyTQUtyS-h
Afer putting type latexmk to terminal it outputs:
latexmk is hashed (/Library/TeX/texbin/latexmk)
Here is the minimal reproducible example (you do need latexmk on your computer though):
import os, subprocess
def pdf(file_path):
cur_dir = os.getcwd()
dest_dir = os.path.dirname(file_path)
basename = os.path.basename(file_path)
os.chdir(dest_dir)
main_arg = [basename]
command = ["latexmk", "--pdf"] + main_arg
try:
output = subprocess.check_output(command)
except subprocess.CalledProcessError as e:
print(e.output.decode())
raise
os.chdir(cur_dir)
pdf("path to your .tex file")
I have a feeling that I am grossly misunderstanding the way subprocess works. Any ideas?
Update: In case neccessary, the full traceback:
Traceback (most recent call last):
File "<ipython-input-90-341a2810ccbf>", line 1, in <module>
pdf('/Users/sergejczan/Desktop/untitled folder/test.tex')
File "/Users/sergejczan/Desktop/Lab/subprocess error reproduction.py", line 23, in pdf
output = subprocess.check_output(command)
File "/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/anaconda3/lib/python3.6/subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'latexmk': 'latexmk'
New Update
Changing the output = subprocess.check_output(command) line with the hardcoded envirnoment that I got from echo $PATH worked wonderfully.
output = subprocess.check_output(command,env = {'PATH': '/anaconda3/bin:/Users/sergejczan/anaconda3/bin:/Users/sergejczan/Desktop/Lab/anaconda2/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin'})
Would you think that there is a way to make the code find the PATH automatically?

Python 2.7 sub process

I have a python script that uses a package called flopy. My script generates a series of inputs to a fortran executable. Flopy writes these into text files and then calls the fortran executable, which uses the text files to run a model.
I'm using a mac (OSX) and I downloaded python 2.7 from python.org- i.e. I'm not using the Apple system version of python. The version of python I'm using is in Library/Frameworks/Python.Frameworks/
I can run my script if I call it from the Terminal window (by typing:
Python myscriptname.py
However if I run my script through IDLE (the version that came with python which I downloaded it) it returns an error:
Traceback (most recent call last):
File "/Users/neilthomas/RotatedModel_v4_Tr_mfnwt.py", line 355, in <module>
success, mfoutput = mf.run_model(silent=False, pause=False)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flopy/mbase.py", line 638, in run_model
normal_msg=normal_msg)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flopy/mbase.py", line 1034, in run_model
stdout=sp.PIPE, stderr=sp.STDOUT, cwd=model_ws)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The file 'mfnwt' absolutely does exist. I'm sure I'm missing something obvious, but is there something I need to do to allow IDLE to run programs/subprocesses via the shell it uses? Thanks.
The problem here is that you have to identify the specific MODFLOW executable file you are calling ('mfnwt' in your case). I do the same with a MODFLOW 2000 file:
mf = flopy.modflow.mf.Modflow(modelname,namefile_ext='nam',version='mf2k',exe_name='/home/MODFLOW-and-related-codes/build-08/bin-windows/mf2k.exe')
In your case, you would do something similar, only replacing the version='mf2k' and exe_name=path to match where you are storing your MODFLOW file.
See the documentation for further details: https://modflowpy.github.io/flopydoc/mf.html

Error running python script in terminal: OSError: [Errno 2] No such file or directory

I'm trying to use a piece of software called "bundler_sfm" which is executed using a python script.
The software I'm trying to use is available here, the script is in the utils directory if you want to have a look.
When trying to run it I get the following python error:
File "/usr/lib/python2.7/subprocess.py", line 493, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The code that leads to this error is as follows:
# Extract SIFT data
if verbose:
with open(pgm_filename, 'rb') as fp_in:
with open(key_filename, 'wb') as fp_out:
subprocess.call(BIN_SIFT, stdin=fp_in, stdout=fp_out)
I've looked at various other answers with similar errors but am still at a loss on how to fix this problem.
I'm trying to run this in the terminal on elementary OS.
Any assistance would be greatly appreciated.
Already worked it out in the comments, but just for an answer:
Worked out the location where the utility thinks the sift binary is located by printing out BIN_SIFT before calling subprocess.call() method.
Realized this path was incorrect
As a hackish work-around, hard code the correct path to line 55 of bundler.py as a string inside of a list:
BIN_SIFT = ["/real/path/to/sift"]

Is there a way to run Python's subprocess.check_output without the cwd to be the same directory as the exe being called

I am trying to automate some tasks - the process requires I call some exe's and pass parameters. The particular directories for the exe's are in the PATH variable for windows. However, I consistently get a
WindowsError: [Error 2] The system cannot find the file specified
My current workaround is to set the os.cwd to the directory with the exe but that imposes some other limits on how we distribute the code. I want to note that in every case if I start a cmd window and type the same code I am passing to subprocess.check_output the code works no matter what directory I am in on the computer.
Just to be clear I am afraid for example of trying to automate a WinRAR task and WinRAR.exe is in a different folder on their computer.
Okay in response to the comment below here is the input and the output after I changed the cwd to root (c:)
The call to subprocess
rarProcess = check_output('''WinRAR a -r -v700m -sfx -agYYYYMMDD-NN -iiconD:\\RarResources\\de96.ico -iimgd:\\RarResources\\accounting2013.bmp d:\\testFTP\\compressed_test_ d:\\files_to_compress''')
and here is the Traceback message in all of it's glory
Traceback (most recent call last):
File "<pyshell#93>", line 1, in <module>
rarProcess = check_output('''WinRAR a -r -v700m -sfx -agYYYYMMDD-NN -iiconD:\\RarResources\\de96.ico -iimgd:\\RarResources\\accounting2013.bmp d:\\testFTP\\compressed_test_ d:\\files_to_compress''')
File "C:\Program Files (x86)\python\lib\subprocess.py", line 537, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "C:\Program Files (x86)\python\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Program Files (x86)\python\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Now I can't prove that this is not a hypothetical question/problem. I get the intended results when I use the same command (adjusting for path separators) through the cmd window and if I change the directory to the directory with the exe before running the command as pasted above.
You don't need to set os.cwd and run the process. Instead you pass the location of your "Winrar.exe" file to the subprocess as a dict.
proc = subprocess.Popen(args, env={'PATH': '/path/to/winrar.exe'})

Categories

Resources