I am new to python and am working on setting up some automation for my job in python and part of that is pulling data from tables in pdf files. Short version is that no matter how I try and what I have looked up I cannot get Tabula-Py to look at the path to java on my portable drive.
I am using a portable IDE set-up since I do not have admin privilege's on my work computer.
Tabula-Py throws the usual cannot find Java make sure it is in your PATH error message. I am using Python Portable and jPortable installed to a common directory with Spyder portable as the IDE. I have run pip install and uninstall on both Tabula and Tabula-Py multiple times. I have also run import sys for sys.path.append to add the filepath to my Java bin.
Code:
import pandas as pd
import numpy
import tabula
import sys
sys.path.append('E:\CommonFiles\Java\bin')
df = tabula.read_pdf('E:\CommonFiles\Python-Portable-3.9.6\Scripts\Sample.pdf', pages='all')
Error Message:
runfile('E:/CommonFiles/Python-Portable-3.9.6/Scripts/untitled01.py', wdir='E:/CommonFiles/Python-Portable-3.9.6/Scripts')
Traceback (most recent call last):
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 80, in _run
result = subprocess.run(
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in __init__
super(SubprocessPopen, self).__init__(*args, **kwargs)
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\CommonFiles\Python-Portable-3.9.6\Scripts\untitled01.py", line 15, in <module>
df = tabula.read_pdf('E:\CommonFiles\Python-Portable-3.9.6\Scripts\Sample.pdf', pages='all')
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
I have also attempted to use camelot with a similar frustration over the ghostscript.dll.
Finally I looked into pdfplumber but had even less luck there getting it to find the tables let alone do anything with them.
I am sure this is doable but my google-fu is failing me currently and have spent the better part of 3 days looking into this with no solution I could find through Google, StackOverflow, Reddit, etc.
I had the same issue, and the solution I found is by using portable Java and registering it in the user environment path.
This explains how to install java from the EXE installer https://stackoverflow.com/a/6571736/11322275
Then, register where you saved the java folder to the user environment path as explained here https://stackoverflow.com/a/67844469/11322275
Make sure you can call java -version on your command prompt once you've done the above
Related
So I made a script that downloads a pdf from the web (via selenium), then converts said pdf table to an excel file (via tabula). I would want to share this script with people in the office however my team does not have any python/programming experience so I decided to convert the python file into an executable using Auto-Py-to-EXE. I then added a file (chromedriver) and it successfully downloaded the file.
For the conversion I used tabula to convert the PDF to a csv and xlsx file. (in the notebook/.py, the conversion worked) but when I converted the .py into an exe and ran the executable I ran into the error below.
File "tabula\io.py", line 80, in _run
File "subprocess.py", line 493, in run
File "subprocess.py", line 858, in __init__
File "subprocess.py", line 1311, in _execute_child
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "erc_scraper.py", line 126, in <module>
File "tabula\io.py", line 322, in read_pdf
File "tabula\io.py", line 91, in _run
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
[24568] Failed to execute script 'erc_scraper' due to unhandled exception!
So I tried adding my java path to the environment path by following this link. I've added the C:\Program Files (x86)\Java\jre6\bin to the JAVA_HOME, JAVA, and PATH.
However, now I'm getting this error when I try to execute the EXE file.
Error from tabula-java:
Unable to access jarfile C:\Users\ur7634o\Desktop\erc_scraper\tabula\tabula-1.0.4-jar-with-dependencies.jar
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\ur7634o\\Desktop\\erc_scraper\\tabula\\tabula-1.0.4-jar-with-dependencies.jar', '--pages', 'all', '--guess', '--format', 'JSON', 'C:\\Users\\ur7634o\\Desktop\\ERC Data\\pdf\\qualified_contestable_customers_20220221-11-09-36.pdf']'
returned non-zero exit status 1.
[25240] Failed to execute script 'erc_scraper' due to unhandled exception!
Any advice what to do next? It seems the executable cannot read the file? I'm thinking how to make this easy also for the end-users to do this. I was just hoping the end-users can double click some shortcut to initiate the downloading and conversion of a file.
I just ran into this problem today, I tried this and it worked:
when you compile your executable, use "One Directory" option
after you are done compiling, go to the directory of your tabula package installation, copy that tabula folder into your output folder of auto-py-exe
tabula package location
that should work. What's missing is "tabula\tabula-1.0.4-jar-with-dependencies.jar" just as the error indicated. I'm not sure why auto-py-to-exe doesn't bring the tabula package over like the other packages, but I had to bring it over manually.
I'm running on Linux Debian 9 and PyCharmwith web scraping purposes; I'm currently using Python 3.5 as interpreter.
The script is the following:
from selenium import webdriver
import time
import datetime
from selenium.webdriver.common.keys import Keys
Till here, the script works fine, by importing the packages properly; when I try to set the driver by running the following line:
driver = webdriver.Firefox(executable_path='/home/quant/Desktop/DataDownload/venv/bin/geckodriver')
I get the following error message, relative to a format problem:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/quant/Desktop/DataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py", line 157, in __init__
self.service.start()
File "/home/quant/Desktop/DataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
raise child_exception_type(errno_num, err_msg)
OSError: [Errno 8] Exec format error
Browsing on the web, I found that probably this problem comes out when the executable program has not been unzipped and made executable correctly; to do that, I followed the steps below, by running them on the terminal:
(1).zip file download from the official repository in the github website:
wget [here][1]
(2) unzipped the file:
cd /home/quant/Downloads
tar -xvzf geckodriver-v0.21.0-arm7hf.tar.gz
(3) made the file executable:
chmod +x geckodriver
(4) moved the file on the following path:
mv geckodriver /home/quant/PycharmProject/DataDownloads/venv/bin/
Could someone help me to understand what's wrong, please?
Thanks in advance all!!
[Errno 8] Exec format error
This means you are trying to run a version of geckodriver that is compiled for a different architecture... you downloaded the ARM version (geckodriver-v0.21.0-arm7hf.tar.gz) and are most likely running on a x86/amd64 machine.
Solution:
go back to the geckodriver releases page and download the correct version for your system: https://github.com/mozilla/geckodriver/releases.
For example, if you are running 64 bit Linux, you want to download: geckodriver-v0.21.0-linux64.tar.gz
So I've had Python 3.6 on my Windows 10 computer for a while now, and today I just downloaded and installed the graphviz 0.8.2 (https://pypi.python.org/pypi/graphviz) package via the admin commandline with:
pip3 install graphviz
It was only after this point that I downloaded the Graphviz 2.38 MSI installer file and installed the program at:
C:\Program Files (x86)\Graphviz2.38
So then I tried to run this simple Python program:
from graphviz import Digraph
dot = Digraph(comment="The round table")
dot.node('A', 'King Arthur')
dot.node('B', 'Sir Bedevere the Wise')
dot.node('L', 'Sir Lancelot the Brave')
dot.render('round-table.gv', view=True)
But unfortunately, I received the following error when I try to run my Python program from commandline:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\graphviz\backend.py", line 124, in render
subprocess.check_call(args, startupinfo=STARTUPINFO, stderr=stderr)
File "C:\Program Files\Python36\lib\subprocess.py", line 286, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Program Files\Python36\lib\subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Program Files\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Program Files\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\foldername\testing.py", line 11, in <module>
dot.render('round-table.gv', view=True)
File "C:\Program Files\Python36\lib\site-packages\graphviz\files.py", line 176, in render
rendered = backend.render(self._engine, self._format, filepath)
File "C:\Program Files\Python36\lib\site-packages\graphviz\backend.py", line 127, in render
raise ExecutableNotFound(args)
graphviz.backend.ExecutableNotFound: failed to execute ['dot', '-Tpdf', '-O', 'round-table.gv'], make sure the Graphviz executables are on your systems' PATH
Notice how what I've asked seems VERY similar to this question asked here:
"RuntimeError: Make sure the Graphviz executables are on your system's path" after installing Graphviz 2.38
But for some reason, adding those paths (suggested in the solutions at the link above) to the system variables isn't working, and I don't know why! I tried restarting the computer after adding the paths as well, still to no success. See the image below:
Although the other suggested solution, which was to add these few lines in front of my Python code, did work:
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
But here's the issue: I don't understand why adding to the environment variables didn't work, and this is my primary concern. So my question is this: why did adding those lines of code in front of the Python script work but changing the environment variables didn't? What do I need to do to get my script to run without adding those lines of code in front?
Can you please post the output you get when you type SET in a cmd window after setting the PATH environment variable?
Does it contain C:/Program Files (x86)/Graphviz2.38/bin/ ?
A cmd window must be restarted before updated environment variables become effective!
I have a python script that uses a package called flopy. My script generates a series of inputs to a fortran executable. Flopy writes these into text files and then calls the fortran executable, which uses the text files to run a model.
I'm using a mac (OSX) and I downloaded python 2.7 from python.org- i.e. I'm not using the Apple system version of python. The version of python I'm using is in Library/Frameworks/Python.Frameworks/
I can run my script if I call it from the Terminal window (by typing:
Python myscriptname.py
However if I run my script through IDLE (the version that came with python which I downloaded it) it returns an error:
Traceback (most recent call last):
File "/Users/neilthomas/RotatedModel_v4_Tr_mfnwt.py", line 355, in <module>
success, mfoutput = mf.run_model(silent=False, pause=False)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flopy/mbase.py", line 638, in run_model
normal_msg=normal_msg)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flopy/mbase.py", line 1034, in run_model
stdout=sp.PIPE, stderr=sp.STDOUT, cwd=model_ws)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The file 'mfnwt' absolutely does exist. I'm sure I'm missing something obvious, but is there something I need to do to allow IDLE to run programs/subprocesses via the shell it uses? Thanks.
The problem here is that you have to identify the specific MODFLOW executable file you are calling ('mfnwt' in your case). I do the same with a MODFLOW 2000 file:
mf = flopy.modflow.mf.Modflow(modelname,namefile_ext='nam',version='mf2k',exe_name='/home/MODFLOW-and-related-codes/build-08/bin-windows/mf2k.exe')
In your case, you would do something similar, only replacing the version='mf2k' and exe_name=path to match where you are storing your MODFLOW file.
See the documentation for further details: https://modflowpy.github.io/flopydoc/mf.html
This is probably a simple problem. But I downloaded the pywiiuse library from here and I also downloaded the examples. However when I try to run one of the examples I end up with import issues. I'm not certain I have everything configured properly to run. One error I receive when trying to run example.py:
Press 1&2
Traceback (most recent call last):
File "example.py", line 73, in <module>
wiimotes = wiiuse.init(nmotes)
File "/home/thed0ctor/Descargas/wiiuse-0.12/wiiuse/__init__.py", line 309, in init
dll = ctypes.cdll.LoadLibrary('libwiiuse.so')
File "/usr/lib/python2.7/ctypes/__init__.py", line 431, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/__init__.py", line 353, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libwiiuse.so: cannot open shared object file: No such file or directory
I'm really just starting out with this library and don't really see any documentation on how to configure pywiiuse so any help is much appreciated.
The pywiiuse library is a Python wrapper for the wiiuse C library.
Before you can use the wrapper you will first need to install the library it wraps, choose the newest version from this download page and download the appropriate installation package for you system (probably the .tar.gz since you appear to be on Linux).
add the link of libwiiuse.so to /usr/local/lib.
I also ran into this situation, I konw why it happies, but I don't konw the deep reason.