object loaded by pickle is different than the intended - python

I am trying to optimize a function with nevergrad and save the resulting recommendation in a pickle. But when I assert saved_obj == loaded_obj it raises an error.
To reproduce the issue I uploaded per_recommendation.pkl :
https://drive.google.com/file/d/1bqxO2JjrTP2qh23HT-qdr9Mf1Kfe4mtC/view?usp=sharing
import pickle
import nevergrad
# load obj (in the real code this is the result of some optimization)
with open('/home/franchesoni/Downloads/per_recommendation.pkl', 'rb') as f:
r2 = pickle.load(f)
# r2 = optimizer.minimize(fn)
# save the object
with open('/home/franchesoni/Downloads/per_recommendation2.pkl', 'wb') as f:
pickle.dump(r2, f)
# load the object
with open('/home/franchesoni/Downloads/per_recommendation2.pkl', 'rb') as f:
recommendation = pickle.load(f)
# they are different!
assert r2 == recommendation
Is this normal or expected?
off-question: in the python docs I read pickle is unsafe, is it dangerous to open (for example) the file I uploaded? is it dangerous to reveal paths like /home/franchesoni?

Related

Having Trouble Loading a Pickle File

I am trying to create a small game for fun, and I want to save and load previous run scores. I started a test file to mess around and try to figure out how pickling works. I have a pickle file with a small set of number. How do I add numbers to the pickle file and save it for the next run.
Currently I have it like this:
new_score = 9
filename = "scoreTest.pk"
outfile = open(filename,'wb')
infile = open(filename,'rb')
with infile as f:
scores = pickle.load(f)
scores.add(new_score)
pickle.dump(scores, outfile)
When I run it like this I get this error:
EOFError: Ran out of input
If someone could please tell me what is wrong and how to do it correctly that would be great. Apologies for any un-optimal code, I'm new to code.
You are trying to juggle a reader and writer on the same file at the same time. The open(filename, 'wb') of the write deletes whatever happened to be in the file so there is no data for the reader. You should only open the file when you really need to use it. And its better to write to a temporary file and rename it. If something goes wrong you haven't lost your data.
import pickle
import os
new_score = 9
filename = "scoreTest.pk"
tmp_filename = "scoreTest.tmp"
try:
with open(filename, 'rb') as infile:
scores = pickle.load(f)
except (IOError, EOFError) as e:
scores = default # whatever that is
scores.add(new_score)
with open(tmp_filename, 'wb') as outfile:
pickle.dump(scores, outfile)
os.rename(tmp_filename, filename)

Append new facial encoding to a pickle file

I just want to add new facial encoding to the pickle file. I've tried the following the method but it's not working.
creating of the pickle file
import face_recognition
import pickle
all_face_encodings = {}
img1 = face_recognition.load_image_file("ex.jpg")
all_face_encodings["ex"] = face_recognition.face_encodings(img1)[0]
img2 = face_recognition.load_image_file("ex2.jpg")
all_face_encodings["ex2"] = face_recognition.face_encodings(img2)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(all_face_encodings, f)
Appending new data to the pickle file.
import pickle
img3 = face_recognition.load_image_file("ex3.jpg")
all_face_encodings["ex3"] = face_recognition.face_encodings(img3)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(img3,f)
pickle.dump(all_face_encodings["ex3"],f)
But It's not working. Is there a way to append it?
I guess your steps should be:
Load old pickle-data to memory to <all_face_encodings> dictionary;
Add a new encodings to the dictionary.
Dump the whole dictionary to pickle file again.

How to unpack pkl file?

I have a pkl file from MNIST dataset, which consists of handwritten digit images.
I'd like to take a look at each of those digit images, so I need to unpack the pkl file, except I can't find out how.
Is there a way to unpack/unzip pkl file?
Generally
Your pkl file is, in fact, a serialized pickle file, which means it has been dumped using Python's pickle module.
To un-pickle the data you can:
import pickle
with open('serialized.pkl', 'rb') as f:
data = pickle.load(f)
For the MNIST data set
Note gzip is only needed if the file is compressed:
import gzip
import pickle
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f)
Where each set can be further divided (i.e. for the training set):
train_x, train_y = train_set
Those would be the inputs (digits) and outputs (labels) of your sets.
If you want to display the digits:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
plt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)
plt.show()
The other alternative would be to look at the original data:
http://yann.lecun.com/exdb/mnist/
But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle. As you've seen, it's very easy. ;-)
Handy one-liner
pkl() (
python -c 'import pickle,sys;d=pickle.load(open(sys.argv[1],"rb"));print(d)' "$1"
)
pkl my.pkl
Will print __str__ for the pickled object.
The generic problem of visualizing an object is of course undefined, so if __str__ is not enough, you will need a custom script.
In case you want to work with the original MNIST files, here is how you can deserialize them.
If you haven't downloaded the files yet, do that first by running the following in the terminal:
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Then save the following as deserialize.py and run it.
import numpy as np
import gzip
IMG_DIM = 28
def decode_image_file(fname):
result = []
n_bytes_per_img = IMG_DIM*IMG_DIM
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[16:]
if len(data) % n_bytes_per_img != 0:
raise Exception('Something wrong with the file')
result = np.frombuffer(data, dtype=np.uint8).reshape(
len(bytes_)//n_bytes_per_img, n_bytes_per_img)
return result
def decode_label_file(fname):
result = []
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[8:]
result = np.frombuffer(data, dtype=np.uint8)
return result
train_images = decode_image_file('train-images-idx3-ubyte.gz')
train_labels = decode_label_file('train-labels-idx1-ubyte.gz')
test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')
The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is
train_images = train_images/255
test_images = test_images/255
The pickle (and gzip if the file is compressed) module need to be used
NOTE: These are already in the standard Python library.
No need to install anything new

Python Pickle EOFerror when using Pickler (but not with pickle.dump())

So, I'm trying to save some objects to disk on Windows 7 using Python's pickle. I'm using the code below, which fails on pretty much any arbitrary object (the contents of saveobj aren't important, it fails regardless). Below is my test code:
import pickle, os, time
outfile = "foo.pickle"
f = open(outfile, 'wb')
p = pickle.Pickler(f, -1)
saveobj = ( 2,3,4,5,["hat", {"mat": 6}])
p.save(saveobj)
#pickle.dump(saveobj, f)
print "done pickling"
f.close()
g = open(outfile, 'rb')
tup = pickle.load(g)
g.close()
print tup
When I run it, I get the following output/error:
done pickling
Traceback (most recent call last):
File "C:\Users\user\pickletest2.py", line 13, in <module>
tup = pickle.load(g)
File "C:\Python26\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "C:\Python26\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python26\lib\pickle.py", line 880, in load_eof
raise EOFError
EOFError
However, if I use pickle.dump() instead of a Pickler object, it works just fine. My reason for using Pickler is that I would like to subclass it so I can perform operations on each object before I pickle it.
Does anybody know why my code is doing this? My searching has revealed that not having 'wb' and 'rb' commonly cause this, as does not having f.close(), but I have both of those. Is it a problem with using -1 as the protocol? I'd like to keep it, as it can handle objects which define their own __slots__ methods without defining a __getstate__ method.
Pickler.save() is a lower level method, that you're not supposed to call directly.
If you call p.dump(saveobj) instead of p.save(saveobj), it works as expected.
Perhaps it should be called _save to avoid confusion. But dump is the method described in the documentation, and it neatly matches up with the module-level pickle.dump.
In general it is better to use cPickle for performance reasons (since cPickle is written in C).
Anyway, using dump it works just fine:
import pickle
import os, time
outfile = "foo.pickle"
f = open(outfile, 'wb')
p = pickle.Pickler(f, -1)
saveobj = ( 2,3,4,5,["hat", {"mat": 6}])
p.dump(saveobj)
#pickle.dump(saveobj, f)
f.close()
print "done pickling"
#f.close()
g = open(outfile, 'rb')
u = pickle.Unpickler(g) #, -1)
tup = u.load()
#tup = pickle.load(g)
g.close()
print tup

Using Pickle object like an API call

I trained a NaiveBayes classifier to do elementary sentiment analysis. The model is 208MB . I want to load it only once and then use Gearman workers to keep calling the model to get the results. It takes rather long time to load it only once. How do i load the model only once and then keep calling it ?
Some code , hope this helps :
import nltk.data
c=nltk.data.load("/path/to/classifier.pickle")
This remains as the loader script.
Now i have a gearman worker script which should call this "c" object and then classify the text.
c.classify('features')
This is what i want to do .
Thanks.
If the question is how to use pickle, than that's the answer
import pickle
class Model(object):
#some crazy array of data
def getClass(sentiment)
#return class of sentiment
def loadModel(filename):
f = open(filename, 'rb')
res = pickle.load(f)
f.close()
return res
def saveModel(model, filename):
f = open(filename, 'wb')
pickle.dump(model, f)
f.close()
m = loadModel('bayesian.pickle')
if it's a problem to load large object in such a way, than I don't know whether pickle could help

Categories

Resources