Convert PDF to Image using Python

Convert PDF to Image using Python - python

I am trying to convert a pdf file to image file for this in my ubuntu server i have installed:
python2.7
poppler-utils
pdf2image==1.12.1
My code:
from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path("/home/user/pdf_file.pdf")
# OR
with open("/home/user/pdf_file.pdf") as pdf:
images = convert_from_bytes(pdf.read())
OUTPUT
When I am using the function "convert_from_path"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
When I am using the function "convert_from_bytes"
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
paths_only=paths_only,
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
I have reinstalled all my utilities then i am facing these problems.

If you want to convert PDF to image you can try Python Ghostscript package:
pip install ghostscript
import ghostscript
import locale
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pef2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
pdf2jpeg(
"...Fixate/ActiveState/pdf/a.pdf",
"...Fixate/ActiveState/pdf/a.jpeg",
)

I failed in python2 too, but succeeded in python3.
There's a same issue happened on an other library:
TypeError: 'threadsafe_iter' object is not an iterator
As they said, it's a python 2 vs 3 issue, caused by next() function.
If modify __next__() -> next() in file/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py , it will run successful in py2.
BTW, i have create a new issue to pdf2image team.
TypeError: ThreadSafeGenerator object is not an iterator #133
Additional
pdf2image readme said it's a python (3.5+) module.
pdf2image v1.7.1 work on py27. try it by pip install pdf2image==1.7.1

Related

Trouble opening old pickle file

I am trying to load an old pickle file containing the airline dataset ( https://arxiv.org/abs/1611.06740 ) . The pickle is very old and I have problems accessing it. If I try:
objects = []
with (open("airline.pickle", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
I get the following warning and error:
FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
objects.append(pickle.load(openfile))
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 9, in <module>
objects.append(pickle.load(openfile))
TypeError: _reconstruct: First argument must be a sub-type of ndarray
Trying with pandas does not work:
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 203, in read_pickle
return pickle.load(handles.handle) # type: ignore[arg-type]
TypeError: _reconstruct: First argument must be a sub-type of ndarray
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 7, in <module>
df = pd.read_pickle('airline.pickle')
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 208, in read_pickle
return pc.load(handles.handle, encoding=None)
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\compat\pickle_compat.py",
line 249, in load
return up.load()
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1212, in load
dispatch[key[0]](self)
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1725, in load_build
for k, v in state.items():
AttributeError: 'tuple' object has no attribute 'items'
How can I access the file and save it to csv? I need the data that is contained there. I am using pandas 1.2.4 and python 3.6.

The syntax should be simpler than in your example
with open("airline.pickle", "rb") as f:
objects = pickle.load(f)
If this fails, then I would look at the pickle documentation which covers some of the optional parameters that are useful for decoding pickle files created by python2.

As mentioned in a previous answer, the error TypeError: _reconstruct: First argument must be a sub-type of ndarray is due to a change from pandas version 0.14 to 0.15 (Source). The documentation said that pd.read_pickle would be able to load such old pickle files, but this is not working on recent versions. If you install an older version, I tested 0.17.1 which can be obtained in pypi or conda-forge, it can load that pickle file successfully.
If you are using conda, the following should work:
conda create -n old_pandas -c conda-forge pandas=0.17.* python=3.*
conda activate old_pandas
And then, in a Python prompt,
import pandas as pd
dataset = pd.read_pickle("airline.pickle")

What is IF-MIB: snimpy.mib.SMIException: unable to find b'IF-MIB' (check the path)?

I followed the doc but got the following error:
>>> from snimpy.manager import Manager as M
from snimpy.manager import load
load("IF-MIB")
m = M("localhost")
print(m.ifDescr[0])
Traceback (most recent call last):
File "<input>", line 4, in <module>
File "/home/ed8/projects/coaxis-opt/env/lib/python3.5/site-packages/snimpy/manager.py", line 578, in load
m = mib.load(mibname)
File "/home/ed8/projects/coaxis-opt/env/lib/python3.5/site-packages/snimpy/mib.py", line 605, in load
raise SMIException("unable to find {0} (check the path)".format(mib))
snimpy.mib.SMIException: unable to find b'IF-MIB' (check the path)
Question
What is IF-MIB? Do I need to install some package on my system?
Reference
Doc example fails: snimpy.mib.SMIException: unable to find b'IF-MIB' (check the path)

As provided by snimpy developer:
If you are on Ubuntu, you need to
apt-get install snmp-mibs-downloader

PythonOCC Basic example not working

Using the basic example on their website:
from OCC.Display.SimpleGui import init_display
from OCC.BRepPrimAPI import BRepPrimAPI_MakeBox
display, start_display, add_menu, add_function_to_menu = init_display()
my_box = BRepPrimAPI_MakeBox(10., 20., 30.).Shape()
display.DisplayShape(my_box, update=True)
start_display()
I can't get that to run? Any ideas?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\WinPython-32bit-2.7.6.4\python-2.7.6\lib\site-packages\OCC\Display\SimpleGui.py", line 164, in init_display
win.canva.InitDriver()
File "C:\WinPython-32bit-2.7.6.4\python-2.7.6\lib\site-packages\OCC\Display\pysideDisplay.py", line 79, in InitDriver
self._display = OCCViewer.Viewer3d(self.GetHandle())
File "C:\WinPython-32bit-2.7.6.4\python-2.7.6\lib\site-packages\OCC\Display\pysideDisplay.py", line 55, in GetHandle
return int(self.winId())
TypeError: int() argument must be a string or a number, not 'PyCObject'

That's a duplication of a github issue:
https://github.com/tpaviot/pythonocc-core/issues/68
we need to know which GUI library your using and what verion of pythonocc you are one.
this is an older version of pythonocc, the current you can find here:
Why not try and install that newer version?
Its important that the OCE library you've got installed match.
So pythonocc-core 0.16 goes with OCE 0.16

Python PIL Image.save() error in case of EPS format

I am trying to convert jpeg files to eps ones. I am using the following code:
fp=open("test.jpg",'rb')
im=Image.open(fp)
outf=open('test2.eps','wb')
im.save(outf, 'EPS')
However, I ma getting the following error:
Traceback (most recent call last):
File "im2eps2.py", line 11, in <module> im.save(outf, 'EPS')
File "C:\Python26\Lib\site-packages\PIL\Image.py", line 1465, in save
save_handler(self, fp, filename)
File "C:\Python26\Lib\site-packages\PIL\EpsImagePlugin.py", line 353, in _save
fp = io.TextIOWrapper(NoCloseStream(fp), encoding='latin-1')
File "C:\Python26\Lib\io.py", line 1429, in __init__
self._seekable = self._telling = self.buffer.seekable()
File "C:\Python26\Lib\site-packages\PIL\EpsImagePlugin.py", line 348, in __getattr__
return getattr(self.fp, name)
AttributeError: 'file' object has no attribute 'seekable'
I shall be thankful for suggestions.
Thanks
PS: I reinstall PIL from its main page i.e. http://www.pythonware.com/products/pil/ and it worked :) Earlier, I used windows installer provided by http://www.lfd.uci.edu/~gohlke/pythonlibs. I think the problem was with the binary that I installed earlier. Currently, I have 1.1.7 PIL and it is working fine.
Thanks

According to cgohlke this was a bug in Pillow and has been fixed in 2.4.0, which was released last month (1st April 2014).
From the bug report:
io.TextIOWrapper expects an object with io.IOBase interface, not a Python 2 file object. Using fh = io.open('test.eps', 'wb') passes this stage but io.TextIOWrapper.write() expects unicode strings; the example fails with TypeError: can't write str to text stream.

PyPDF2 TypeError when trying to run example from lib

I've got PyPDF2 lib from here:
https://github.com/mstamy2/PyPDF2/tree/Python3-3
When trying to run script "Example 1:" from from there see it:
PyPDF2 python versions (2.5 - 3.3) compatibility branch
Traceback (most recent call last):
File "1.py", line 6, in <module>
input1 = PdfFileReader(open("document1.pdf", "rb"))
File "C:\Python33\lib\site-packages\PyPDF2\pdf.py", line 595, in __init__
self.read(stream)
File "C:\Python33\lib\site-packages\PyPDF2\pdf.py", line 1097, in read
streamData = StringIO(xrefstream.getData())
TypeError: initial_value must be str or None, not bytes
What is wrong?

It was a problem related to the compatibility within PyPDF2 and Python 3.
In my case, I have solved it by replacing pdf.py and utils.py with the ones you will find here, where they basically control if you are running Python 3 and, in case you are, receive data as bytes instead of strings.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert PDF to Image using Python - python

Related

Trouble opening old pickle file

What is IF-MIB: snimpy.mib.SMIException: unable to find b'IF-MIB' (check the path)?

PythonOCC Basic example not working

Python PIL Image.save() error in case of EPS format

PyPDF2 TypeError when trying to run example from lib

Categories

Resources