Python Pickle EOFerror when using Pickler (but not with pickle.dump())

Python Pickle EOFerror when using Pickler (but not with pickle.dump()) - python

So, I'm trying to save some objects to disk on Windows 7 using Python's pickle. I'm using the code below, which fails on pretty much any arbitrary object (the contents of saveobj aren't important, it fails regardless). Below is my test code:
import pickle, os, time
outfile = "foo.pickle"
f = open(outfile, 'wb')
p = pickle.Pickler(f, -1)
saveobj = ( 2,3,4,5,["hat", {"mat": 6}])
p.save(saveobj)
#pickle.dump(saveobj, f)
print "done pickling"
f.close()
g = open(outfile, 'rb')
tup = pickle.load(g)
g.close()
print tup
When I run it, I get the following output/error:
done pickling
Traceback (most recent call last):
File "C:\Users\user\pickletest2.py", line 13, in <module>
tup = pickle.load(g)
File "C:\Python26\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "C:\Python26\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python26\lib\pickle.py", line 880, in load_eof
raise EOFError
EOFError
However, if I use pickle.dump() instead of a Pickler object, it works just fine. My reason for using Pickler is that I would like to subclass it so I can perform operations on each object before I pickle it.
Does anybody know why my code is doing this? My searching has revealed that not having 'wb' and 'rb' commonly cause this, as does not having f.close(), but I have both of those. Is it a problem with using -1 as the protocol? I'd like to keep it, as it can handle objects which define their own __slots__ methods without defining a __getstate__ method.

Pickler.save() is a lower level method, that you're not supposed to call directly.
If you call p.dump(saveobj) instead of p.save(saveobj), it works as expected.
Perhaps it should be called _save to avoid confusion. But dump is the method described in the documentation, and it neatly matches up with the module-level pickle.dump.

In general it is better to use cPickle for performance reasons (since cPickle is written in C).
Anyway, using dump it works just fine:
import pickle
import os, time
outfile = "foo.pickle"
f = open(outfile, 'wb')
p = pickle.Pickler(f, -1)
saveobj = ( 2,3,4,5,["hat", {"mat": 6}])
p.dump(saveobj)
#pickle.dump(saveobj, f)
f.close()
print "done pickling"
#f.close()
g = open(outfile, 'rb')
u = pickle.Unpickler(g) #, -1)
tup = u.load()
#tup = pickle.load(g)
g.close()
print tup

Related

Attempt to use the open() function failing

I'm trying to learn to manipulate files on python, but I can't get the open function to work. I have made a .txt file called foo that holds the content "hello world!" in my user directory (/home/yonatan) and typed this line into the shell:
open('/home/yonatan/foo.txt')
What i get in return is:
<_io.TextIOWrapper name='/home/yonatan/foo.txt' mode='r' encoding='UTF-8'>
I get what that means, but why don't I get the content?

open() returns a file object.
You then need to use read() to read the whole file
f = open('/home/yonatan/foo.txt', 'r')
contents = f.read()
Or you can use readline() to read just one line
line = f.readline()
and don't forget to close the file at the end
f.close()

An example iterating through the lines of the file (using with which ensures file.close() gets called on the end of it's lexical scope):
file_path = '/home/yonatan/foo.txt'
with open(file_path) as file:
for line in file:
print line
A great resource on I/O and file handling operations.

You haven't specified the mode you want to open it in.
Try:
f = open("home/yonatan/foo.txt", "r")
print(f.read())

Why np.load() couldn't read my ndarray data in pickled file?

I am trying to analyze a tensor data, but I could not read the data in picked file by using np.load(). My python code is as follows:
import pickle
import numpy as np
import sktensor as skt
import numpy.random as rn
data = np.ones((10, 8, 3), dtype='int32') # 3-mode count tensor of size 10 x 8 x 3
##data = skt.dtensor(data)
with open('data.dat', 'w+') as f: # can be stored as a .dat using pickle
pickle.dump(data, f)
with open('data.dat', 'r+') as f: # can be loaded back in using pickle.load
tmp = pickle.load(f)
assert np.allclose(tmp, data)
But when I attempted to use np.load() to load the data in data.bat as follows:
np.load('G:\data.dat')
Some error appears as"
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
np.load('D:/GDELT_Tensor/data.dat', mmap_mode = 'r')
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 416, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'D:/data.dat' as a pickle.
Anyone can help me?

Don't use the pickle module to save NumPy arrays. Instead, use one of the methods here: http://docs.scipy.org/doc/numpy/reference/routines.io.html
There's even one that uses pickle under the hood, for example:
np.save('data.dat', data)
tmp = np.load('data.dat')
Another format like CSV or HDF5 might be more suitable for most applications--especially where you might want to interoperate with non-Python systems.

PyPDF2 IOError: [Errno 22] Invalid argument on PyPdfFileReader Python 2.7

Goal = Open file, encrypt file, write encrypted file.
Trying to use the PyPDF2 module to accomplish this. I have verified theat "input" is a file type object. I have researched this error and it translates to "file not found". I believe that it is linked somehow to the file/file path but am unsure how to debug or troubleshoot. and getting the following error:
Traceback (most recent call last):
File "CommissionSecurity.py", line 52, in <module>
inputStream = PyPDF2.PdfFileReader(input)
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1065, in __init__
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1660, in read
IOError: [Errno 22] Invalid argument
Below is the relevant code. I'm not sure how to correct this issue because I'm not really sure what the issue is. Any guidance is appreciated.
for ID in FileDict:
if ID in EmailDict :
path = "C:\\Apps\\CorVu\\DATA\\Reports\\AlliD\\Monthly Commission Reports\\Output\\pdcom1\\"
#print os.listdir(path)
file = os.path.join(path + FileDict[ID])
with open(file, 'rb') as input:
print type(input)
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = inputStream.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
else : continue

I think your problem might be caused by the fact that you use the same filename to both open and write to the file, opening it twice:
with open(file, 'rb') as input :
with open(file, 'wb') as outputStream :
The w mode will truncate the file, thus the second line truncates the input.
I'm not sure what you're intention is, because you can't really try to read from the (beginning) of the file, and at the same time overwrite it. Even if you try to write to the end of the file, you'll have to position the file pointer somewhere.
So create an extra output file that has a different name; you can always rename that output file to your input file after both files are closed, thus overwriting your input file.
Or you could first read the complete file into memory, then write to it:
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = input.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
Notes:
you assign inputStream, but never use it
you assign PdfFileWriter() to output, and then assign something else to output in the next line. Hence, you never used the result from the first output = line.
Please check carefully what you're doing, because it feels there are numerous other problems with your code.
Alternatively, here are some other tips that may help:
The documentation suggests that you can also use the filename as first argument to PdfFileReader:
stream – A File object or an object that supports the standard read
and seek methods similar to a File object. Could also be a string
representing a path to a PDF file.
So try:
inputStream = PyPDF2.PdfFileReader(file)
You can also try to set the strict argument to False:
strict (bool) – Determines whether user should be warned of all
problems and also causes some correctable problems to be fatal.
Defaults to True.
For example:
inputStream = PyPDF2.PdfFileReader(file, strict=False)

Using open(file, 'rb') was causing the issue becuase PdfFileReader() does that automagically. I just removed the with statement and that corrected the problem.
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)

This error raised up because of PDF file is empty.
My PDF file was empty that's why my error was raised up. So First of all i fill my PDF file with some data and Then start reeading it using PyPDF2.PdfFileReader,
And it solved my Problem!!!

Late but, you may be opening an invalid PDF file or an empty file that's named x.pdf and you think it's a PDF file

Python CSV proplem

For the first time I am having a proble loading a csv into Python.
I am trying to do this. My csv file is identical to his, but longer and with different values.
When I run this,
import collections
path='../data/struc.csv'
answer = collections.defaultdict(list)
with open(path, 'r+') as istream:
for line in istream:
line = line.strip()
try:
k, v = line.split(',', 1)
answer[k.strip()].append(v.strip())
except ValueError:
print('Ignoring: malformed line: "{}"'.format(line))
print(answer)
Everything runs fine. I get exactly what you would expect.
With out copy and pasting the code from the link, in both instances I get an error.
In the accepted answer, the terminal spits back ValueError: need more than 1 value to unpack
In the second answer, I get AttributeError: 'file' object has no attribute 'split'. It also does not work if you adjust it to take a list.
I feel like the problem is the csv file itself. The head of it is
_id,parent,name,\n
Section,none,America's,\n
Section,none,Europe,\n
Section,none,Asia,\n
Section,none,Africa,\n
Country,America's,United States,\n
Country,America's,Argentina,\n
Country,America's,Bahamas,\n
Country,America's,Bolivia,\n
Country,America's,Brazil,\n
Country,America's,Colombia,\n
Country,America's,Canada,\n
Country,America's,Cayman Islands,\n
Country,America's,Chile,\n
Country,America's,Costa Rica,\n
Country,America's,Dominican Republic,\n
I have read a lot of stuff about csv's, tried the import csv stuff, and still no luck.
Please someone help. Having this kind of problem is the worst.
import re
from collections import defaultdict
parents=defaultdict(list)
path='../data/struc.csv'
with open(path, 'r+') as istream:
for i, line in enumerate(istream.split(',')):
if i != 0 and line.strip():
id_, parent, name = re.findall(r"[\d\w-]+", line)
parents[parent].append((id_, name))
Traceback (most recent call last):
File "<ipython-input-29-2b2fd98946b3>", line 1, in <module>
runfile('/home/bob/Documents/mega/tree/python/structure.py', wdir='/home/bob/Documents/mega/tree/python')
File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/home/bob/Documents/mega/tree/python/structure.py", line 15, in <module>
for i, line in enumerate(istream.split(',')):
AttributeError: 'file' object has no attribute 'split'

First of all, Python has a special module in it's standard library for dealing with CSV of different flavours. Refer to documentation.
When CSV file has headers, csv.DictReader is probably more intuitive way to parse the file:
import collections
import csv
filepath = '../data/struc.csv'
answer = collections.defaultdict(list)
with open(filepath) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
answer[row["_id"].strip()].append(row["parent"].strip())
print(answer)
You can refer to the field in the row by their names in the header. Here I assumed you would like to use _id and parent, but you got the idea.
Also, dialect=csv.excel_tab can be added as a parameter to DictReader to parse tab-separated files.

If you plan on doing any analysis on this data, then I would suggest learning the pandas library. Pandas library takes care of all the details that seem to be tripping you up, making opening a csv file a one-liner.
import pandas as pd
csv_file = pd.read_csv(file_path)

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

I am getting an interesting error while trying to use Unpickler.load(), here is the source code:
open(target, 'a').close()
scores = {};
with open(target, "rb") as file:
unpickler = pickle.Unpickler(file);
scores = unpickler.load();
if not isinstance(scores, dict):
scores = {};
Here is the traceback:
Traceback (most recent call last):
File "G:\python\pendu\user_test.py", line 3, in <module>:
save_user_points("Magix", 30);
File "G:\python\pendu\user.py", line 22, in save_user_points:
scores = unpickler.load();
EOFError: Ran out of input
The file I am trying to read is empty.
How can I avoid getting this error, and get an empty variable instead?

Most of the answers here have dealt with how to mange EOFError exceptions, which is really handy if you're unsure about whether the pickled object is empty or not.
However, if you're surprised that the pickle file is empty, it could be because you opened the filename through 'wb' or some other mode that could have over-written the file.
for example:
filename = 'cd.pkl'
with open(filename, 'wb') as f:
classification_dict = pickle.load(f)
This will over-write the pickled file. You might have done this by mistake before using:
...
open(filename, 'rb') as f:
And then got the EOFError because the previous block of code over-wrote the cd.pkl file.
When working in Jupyter, or in the console (Spyder) I usually write a wrapper over the reading/writing code, and call the wrapper subsequently. This avoids common read-write mistakes, and saves a bit of time if you're going to be reading the same file multiple times through your travails

I would check that the file is not empty first:
import os
scores = {} # scores is an empty dict already
if os.path.getsize(target) > 0:
with open(target, "rb") as f:
unpickler = pickle.Unpickler(f)
# if file is not empty scores will be equal
# to the value unpickled
scores = unpickler.load()
Also open(target, 'a').close() is doing nothing in your code and you don't need to use ;.

It is very likely that the pickled file is empty.
It is surprisingly easy to overwrite a pickle file if you're copying and pasting code.
For example the following writes a pickle file:
pickle.dump(df,open('df.p','wb'))
And if you copied this code to reopen it, but forgot to change 'wb' to 'rb' then you would overwrite the file:
df=pickle.load(open('df.p','wb'))
The correct syntax is
df=pickle.load(open('df.p','rb'))

As you see, that's actually a natural error ..
A typical construct for reading from an Unpickler object would be like this ..
try:
data = unpickler.load()
except EOFError:
data = list() # or whatever you want
EOFError is simply raised, because it was reading an empty file, it just meant End of File ..

You can catch that exception and return whatever you want from there.
open(target, 'a').close()
scores = {};
try:
with open(target, "rb") as file:
unpickler = pickle.Unpickler(file);
scores = unpickler.load();
if not isinstance(scores, dict):
scores = {};
except EOFError:
return {}

if path.exists(Score_file):
try :
with open(Score_file , "rb") as prev_Scr:
return Unpickler(prev_Scr).load()
except EOFError :
return dict()

Had the same issue. It turns out when I was writing to my pickle file I had not used the file.close(). Inserted that line in and the error was no more.

I have encountered this error many times and it always occurs because after writing into the file, I didn't close it. If we don't close the file the content stays in the buffer and the file stays empty.
To save the content into the file, either file should be closed or file_object should go out of scope.
That's why at the time of loading it's giving the ran out of input error because the file is empty. So you have two options :
file_object.close()
file_object.flush(): if you don't wanna close your file in between the program, you can use the flush() function as it will forcefully move the content from the buffer to the file.

This error comes when your pickle file is empty (0 Bytes). You need to check the size of your pickle file first. This was the scenario in my case. Hope this helps!

Note that the mode of opening files is 'a' or some other have alphabet 'a' will also make error because of the overwritting.
pointer = open('makeaafile.txt', 'ab+')
tes = pickle.load(pointer, encoding='utf-8')

temp_model = os.path.join(models_dir, train_type + '_' + part + '_' + str(pc))
# print(type(temp_model)) # <class 'str'>
filehandler = open(temp_model, "rb")
# print(type(filehandler)) # <class '_io.BufferedReader'>
try:
pdm_temp = pickle.load(filehandler)
except UnicodeDecodeError:
pdm_temp = pickle.load(filehandler, fix_imports=True, encoding="latin1")

from os.path import getsize as size
from pickle import *
if size(target)>0:
with open(target,'rb') as f:
scores={i:j for i,j in enumerate(load(f))}
else: scores={}
#line 1.
we importing Function 'getsize' from Library 'OS' sublibrary 'path' and we rename it with command 'as' for shorter style of writing. Important is hier that we loading only one single Func that we need and not whole Library!
line 2.
Same Idea, but when we dont know wich modul we will use in code at the begining, we can import all library using a command '*'.
line 3.
Conditional Statement... if size of your file >0 ( means obj is not an empty). 'target' is variable that schould be a bit earlier predefined.
just an Example : target=(r'd:\dir1\dir.2..\YourDataFile.bin')
Line 4.
'With open(target) as file:' an open construction for any file, u dont need then to use file.close(). it helps to avoid some typical Errors such as "Run out of input" or Permissions rights.
'rb' mod means 'rea binary' that u can only read(load) the data from your binary file but u cant modify/rewrite it.
Line5.
List comprehension method in applying to a Dictionary..
line 6. Case your datafile is empty, it will not raise an any Error msg, but return just an empty dictionary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pickle EOFerror when using Pickler (but not with pickle.dump()) - python

Related

Attempt to use the open() function failing

Why np.load() couldn't read my ndarray data in pickled file?

PyPDF2 IOError: [Errno 22] Invalid argument on PyPdfFileReader Python 2.7

Python CSV proplem

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Categories

Resources