I have a pkl file from MNIST dataset, which consists of handwritten digit images.
I'd like to take a look at each of those digit images, so I need to unpack the pkl file, except I can't find out how.
Is there a way to unpack/unzip pkl file?
Generally
Your pkl file is, in fact, a serialized pickle file, which means it has been dumped using Python's pickle module.
To un-pickle the data you can:
import pickle
with open('serialized.pkl', 'rb') as f:
data = pickle.load(f)
For the MNIST data set
Note gzip is only needed if the file is compressed:
import gzip
import pickle
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f)
Where each set can be further divided (i.e. for the training set):
train_x, train_y = train_set
Those would be the inputs (digits) and outputs (labels) of your sets.
If you want to display the digits:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
plt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)
plt.show()
The other alternative would be to look at the original data:
http://yann.lecun.com/exdb/mnist/
But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle. As you've seen, it's very easy. ;-)
Handy one-liner
pkl() (
python -c 'import pickle,sys;d=pickle.load(open(sys.argv[1],"rb"));print(d)' "$1"
)
pkl my.pkl
Will print __str__ for the pickled object.
The generic problem of visualizing an object is of course undefined, so if __str__ is not enough, you will need a custom script.
In case you want to work with the original MNIST files, here is how you can deserialize them.
If you haven't downloaded the files yet, do that first by running the following in the terminal:
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Then save the following as deserialize.py and run it.
import numpy as np
import gzip
IMG_DIM = 28
def decode_image_file(fname):
result = []
n_bytes_per_img = IMG_DIM*IMG_DIM
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[16:]
if len(data) % n_bytes_per_img != 0:
raise Exception('Something wrong with the file')
result = np.frombuffer(data, dtype=np.uint8).reshape(
len(bytes_)//n_bytes_per_img, n_bytes_per_img)
return result
def decode_label_file(fname):
result = []
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[8:]
result = np.frombuffer(data, dtype=np.uint8)
return result
train_images = decode_image_file('train-images-idx3-ubyte.gz')
train_labels = decode_label_file('train-labels-idx1-ubyte.gz')
test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')
The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is
train_images = train_images/255
test_images = test_images/255
The pickle (and gzip if the file is compressed) module need to be used
NOTE: These are already in the standard Python library.
No need to install anything new
Related
I just want to add new facial encoding to the pickle file. I've tried the following the method but it's not working.
creating of the pickle file
import face_recognition
import pickle
all_face_encodings = {}
img1 = face_recognition.load_image_file("ex.jpg")
all_face_encodings["ex"] = face_recognition.face_encodings(img1)[0]
img2 = face_recognition.load_image_file("ex2.jpg")
all_face_encodings["ex2"] = face_recognition.face_encodings(img2)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(all_face_encodings, f)
Appending new data to the pickle file.
import pickle
img3 = face_recognition.load_image_file("ex3.jpg")
all_face_encodings["ex3"] = face_recognition.face_encodings(img3)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(img3,f)
pickle.dump(all_face_encodings["ex3"],f)
But It's not working. Is there a way to append it?
I guess your steps should be:
Load old pickle-data to memory to <all_face_encodings> dictionary;
Add a new encodings to the dictionary.
Dump the whole dictionary to pickle file again.
I'm trying to load metadata from a file with a SigMF specification as a JSON using python. Here's my code so far:
import json
f_path = "test.sigmf-meta"
sigmf_meta_f = open(f_path,)
sigmf_data = json.load(sigmf_meta_f)
for i in sigmf_data['sample_text']:
print(i)
sigmf_meta_f.close()
This doesn't seem to work for some reason.
When I change the file extension from "sigmf-meta" to "json", it works perfectly, but I need to be able to load these SigMF files without having to change all of their extensions.
Are you sure you changed the extension of the file in its properties to sigmf-meta?
I just tried it it worked just fine, you may have not changed the extension but just the name so its test.sigmf-meta.json and there's no directory of test.sigmf-meta.
As per GNU Radio / SigMF: how to read sigmf-data file? #114
import json
import numpy as np
with open("myrecord.sigmf-meta", "r") as f:
md = json.loads(f.read())
if md["global"]["dtype"] == "cf32_le":
samples = np.memmap("myrecord.sigmf-data", mode="r", dtype=np.complex64)
elif md["global"]["dtype"] == "ci16_le":
samples = np.memmap("myrecord.sigmf-data", mode="r", dtype=np.int16)
# Convert samples to float if you want...
I am trying to optimize a function with nevergrad and save the resulting recommendation in a pickle. But when I assert saved_obj == loaded_obj it raises an error.
To reproduce the issue I uploaded per_recommendation.pkl :
https://drive.google.com/file/d/1bqxO2JjrTP2qh23HT-qdr9Mf1Kfe4mtC/view?usp=sharing
import pickle
import nevergrad
# load obj (in the real code this is the result of some optimization)
with open('/home/franchesoni/Downloads/per_recommendation.pkl', 'rb') as f:
r2 = pickle.load(f)
# r2 = optimizer.minimize(fn)
# save the object
with open('/home/franchesoni/Downloads/per_recommendation2.pkl', 'wb') as f:
pickle.dump(r2, f)
# load the object
with open('/home/franchesoni/Downloads/per_recommendation2.pkl', 'rb') as f:
recommendation = pickle.load(f)
# they are different!
assert r2 == recommendation
Is this normal or expected?
off-question: in the python docs I read pickle is unsafe, is it dangerous to open (for example) the file I uploaded? is it dangerous to reveal paths like /home/franchesoni?
I am trying to analyze a tensor data, but I could not read the data in picked file by using np.load(). My python code is as follows:
import pickle
import numpy as np
import sktensor as skt
import numpy.random as rn
data = np.ones((10, 8, 3), dtype='int32') # 3-mode count tensor of size 10 x 8 x 3
##data = skt.dtensor(data)
with open('data.dat', 'w+') as f: # can be stored as a .dat using pickle
pickle.dump(data, f)
with open('data.dat', 'r+') as f: # can be loaded back in using pickle.load
tmp = pickle.load(f)
assert np.allclose(tmp, data)
But when I attempted to use np.load() to load the data in data.bat as follows:
np.load('G:\data.dat')
Some error appears as"
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
np.load('D:/GDELT_Tensor/data.dat', mmap_mode = 'r')
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 416, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'D:/data.dat' as a pickle.
Anyone can help me?
Don't use the pickle module to save NumPy arrays. Instead, use one of the methods here: http://docs.scipy.org/doc/numpy/reference/routines.io.html
There's even one that uses pickle under the hood, for example:
np.save('data.dat', data)
tmp = np.load('data.dat')
Another format like CSV or HDF5 might be more suitable for most applications--especially where you might want to interoperate with non-Python systems.
Im reading two files (trainfile, testfile), then i would like to vecrize them with word_vectorizer, the problem is that maybe im not reading the files in the right way this is what i tried:
# -- coding: utf-8 --
import codecs
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
import os, sys
with open('/Users/user/Desktop/train.txt', 'r') as trainfile:
contenido_del_trainfile= trainfile.read()
print contenido_del_trainfile
with open('/Users/user/Desktop/test.txt', 'r') as testfile:
contenido_del_testfile= testfile.read()
print contenido_del_testfile
print "\nThis is the training corpus:\n", contenido_del_trainfile
print "\nThis is the test corpus:\n", contenido_del_testfile
train = []
word_vectorizer = CountVectorizer(analyzer='word')
trainset = word_vectorizer.fit_transform(codecs.open(trainfile,'r','utf8'))
print word_vectorizer.get_feature_names()
Here is the output:
TypeError: coercing to Unicode: need string or buffer, file found
How can i read the files in the right way in order to print something like this:
[u'word',... ,u'word']
codecs.open asserts, that you provide a path to file, not a file itself.
So, instead of
trainset = word_vectorizer.fit_transform(codecs.open(trainfile,'r','utf8'))
Do
trainset = word_vectorizer.fit_transform(codecs.open('/Users/user/Desktop/train.txt','r','utf8'))