Trouble opening old pickle file - python

I am trying to load an old pickle file containing the airline dataset ( https://arxiv.org/abs/1611.06740 ) . The pickle is very old and I have problems accessing it. If I try:
objects = []
with (open("airline.pickle", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
I get the following warning and error:
FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
objects.append(pickle.load(openfile))
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 9, in <module>
objects.append(pickle.load(openfile))
TypeError: _reconstruct: First argument must be a sub-type of ndarray
Trying with pandas does not work:
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 203, in read_pickle
return pickle.load(handles.handle) # type: ignore[arg-type]
TypeError: _reconstruct: First argument must be a sub-type of ndarray
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 7, in <module>
df = pd.read_pickle('airline.pickle')
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 208, in read_pickle
return pc.load(handles.handle, encoding=None)
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\compat\pickle_compat.py",
line 249, in load
return up.load()
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1212, in load
dispatch[key[0]](self)
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1725, in load_build
for k, v in state.items():
AttributeError: 'tuple' object has no attribute 'items'
How can I access the file and save it to csv? I need the data that is contained there. I am using pandas 1.2.4 and python 3.6.

The syntax should be simpler than in your example
with open("airline.pickle", "rb") as f:
objects = pickle.load(f)
If this fails, then I would look at the pickle documentation which covers some of the optional parameters that are useful for decoding pickle files created by python2.

As mentioned in a previous answer, the error TypeError: _reconstruct: First argument must be a sub-type of ndarray is due to a change from pandas version 0.14 to 0.15 (Source). The documentation said that pd.read_pickle would be able to load such old pickle files, but this is not working on recent versions. If you install an older version, I tested 0.17.1 which can be obtained in pypi or conda-forge, it can load that pickle file successfully.
If you are using conda, the following should work:
conda create -n old_pandas -c conda-forge pandas=0.17.* python=3.*
conda activate old_pandas
And then, in a Python prompt,
import pandas as pd
dataset = pd.read_pickle("airline.pickle")

Related

Convert PDF to Image using Python

I am trying to convert a pdf file to image file for this in my ubuntu server i have installed:
python2.7
poppler-utils
pdf2image==1.12.1
My code:
from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path("/home/user/pdf_file.pdf")
# OR
with open("/home/user/pdf_file.pdf") as pdf:
images = convert_from_bytes(pdf.read())
OUTPUT
When I am using the function "convert_from_path"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
When I am using the function "convert_from_bytes"
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
paths_only=paths_only,
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
I have reinstalled all my utilities then i am facing these problems.
If you want to convert PDF to image you can try Python Ghostscript package:
pip install ghostscript
import ghostscript
import locale
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pef2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
pdf2jpeg(
"...Fixate/ActiveState/pdf/a.pdf",
"...Fixate/ActiveState/pdf/a.jpeg",
)
I failed in python2 too, but succeeded in python3.
There's a same issue happened on an other library:
TypeError: 'threadsafe_iter' object is not an iterator
As they said, it's a python 2 vs 3 issue, caused by next() function.
If modify __next__() -> next() in file/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py , it will run successful in py2.
BTW, i have create a new issue to pdf2image team.
TypeError: ThreadSafeGenerator object is not an iterator #133
Additional
pdf2image readme said it's a python (3.5+) module.
pdf2image v1.7.1 work on py27. try it by pip install pdf2image==1.7.1

TypeError: initial_value must be unicode or None, not str,

I am using SOAPpy for soap wsdl services. I am following this toturail. My code is as follow
from SOAPpy import WSDL
wsdlfile = 'http://track.tcs.com.pk/trackingaccount/track.asmx?WSDL'
server = WSDL.Proxy(wsdlfile)
I am getting this error on the last line of my code
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/adil/Code/mezino/RoyalTag/royalenv/local/lib/python2.7/site-packages/SOAPpy/WSDL.py", line 85, in __init__
self.wsdl = reader.loadFromString(str(wsdlsource))
File "/home/adil/Code/mezino/RoyalTag/royalenv/local/lib/python2.7/site-packages/wstools/WSDLTools.py", line 52, in loadFromString
return self.loadFromStream(StringIO(data))
TypeError: initial_value must be unicode or None, not str
I tried to convert the string into utf using
wsdlFile = unicode('http://track.tcs.com.pk/trackingaccount/track.asmx?WSDL, "utf-8")
but still having same error. What is missing here ?
I just ran into this problem with some very old 2.7 code that no longer worked due to the TLS update. After updating to the most recent version of Python 2 I ended up getting this issue.
I was only able to fix this by setting up a new virtual environment, then modifying the wstools package in that virtual environment to use BytesIO instead of StringIO.
Replace every required instance of StringIO. For example:
# WSDLTools.py
...
from IO import BytesIO
...
return self.loadFromStream(BytesIO(data))
Not ideal, but it worked. Easier than migrating everything to Python 3...

How to import time column from snowflake to jupyter notebook dataframe?

I need to import data from snowflake to Jupyter. In the dataset I have a time column which is derived from timestamp values.
Every time I try to import the data, Jupyter says the process failed and below is the error message.
How should I get around this issue?
ERROR:snowflake.connector.converter:Failed to convert: field T: TIME::76493.000000000
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/converter.py", line 88, in to_python
type_name=type_name))
AttributeError: 'SnowflakeConverter' object has no attribute '_TIME_to_python'
ERROR:snowflake.connector.cursor:failed to convert row to python
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/cursor.py", line 658, in __row_to_python
res += (self._connection.converter.to_python(col_desc, col),)
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/converter.py", line 88, in to_python
type_name=type_name))
AttributeError: 'SnowflakeConverter' object has no attribute '_TIME_to_python'
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
Can you check the Python Connector version? The error indicates TIME data type is not supported by Python Connector. TIME data type has been supported since v1.0.6. As of today, the latest version is 1.2.8:
https://pypi.python.org/pypi/snowflake-connector-python/
Here is an example of TIME data type in Jupyter notebook:
https://gist.github.com/smtakeda/e401c80d71f2da4aa7452d238c5ccffa

Python PIL Image.save() error in case of EPS format

I am trying to convert jpeg files to eps ones. I am using the following code:
fp=open("test.jpg",'rb')
im=Image.open(fp)
outf=open('test2.eps','wb')
im.save(outf, 'EPS')
However, I ma getting the following error:
Traceback (most recent call last):
File "im2eps2.py", line 11, in <module> im.save(outf, 'EPS')
File "C:\Python26\Lib\site-packages\PIL\Image.py", line 1465, in save
save_handler(self, fp, filename)
File "C:\Python26\Lib\site-packages\PIL\EpsImagePlugin.py", line 353, in _save
fp = io.TextIOWrapper(NoCloseStream(fp), encoding='latin-1')
File "C:\Python26\Lib\io.py", line 1429, in __init__
self._seekable = self._telling = self.buffer.seekable()
File "C:\Python26\Lib\site-packages\PIL\EpsImagePlugin.py", line 348, in __getattr__
return getattr(self.fp, name)
AttributeError: 'file' object has no attribute 'seekable'
I shall be thankful for suggestions.
Thanks
PS: I reinstall PIL from its main page i.e. http://www.pythonware.com/products/pil/ and it worked :) Earlier, I used windows installer provided by http://www.lfd.uci.edu/~gohlke/pythonlibs. I think the problem was with the binary that I installed earlier. Currently, I have 1.1.7 PIL and it is working fine.
Thanks
According to cgohlke this was a bug in Pillow and has been fixed in 2.4.0, which was released last month (1st April 2014).
From the bug report:
io.TextIOWrapper expects an object with io.IOBase interface, not a Python 2 file object. Using fh = io.open('test.eps', 'wb') passes this stage but io.TextIOWrapper.write() expects unicode strings; the example fails with TypeError: can't write str to text stream.

PyPDF2 TypeError when trying to run example from lib

I've got PyPDF2 lib from here:
https://github.com/mstamy2/PyPDF2/tree/Python3-3
When trying to run script "Example 1:" from from there see it:
PyPDF2 python versions (2.5 - 3.3) compatibility branch
Traceback (most recent call last):
File "1.py", line 6, in <module>
input1 = PdfFileReader(open("document1.pdf", "rb"))
File "C:\Python33\lib\site-packages\PyPDF2\pdf.py", line 595, in __init__
self.read(stream)
File "C:\Python33\lib\site-packages\PyPDF2\pdf.py", line 1097, in read
streamData = StringIO(xrefstream.getData())
TypeError: initial_value must be str or None, not bytes
What is wrong?
It was a problem related to the compatibility within PyPDF2 and Python 3.
In my case, I have solved it by replacing pdf.py and utils.py with the ones you will find here, where they basically control if you are running Python 3 and, in case you are, receive data as bytes instead of strings.

Categories

Resources