Pickle problem writing to file - python

I have a problem writing a file with Pickle in Python
Here is my code:
test = "TEST"
f1 = open(path+filename, "wb", 0)
pickle.dump(test,f1,0)
f1.close()
return
This gives me the output in the .txt file as VTESTp0. I'm not sure why this is?
Shouldn't it just have been saved as TEST?
I'm very new to pickle and I didn't even know it existed until today so sorry if I'm asking a silly question.

No, pickle does not write strings just as strings. Pickle is a serialization protocol, it turns objects into strings of bytes so that you can later recreate them. The actual format depends on which version of the protocol you use, but you should really treat pickle data as an opaque type.
If you want to write the string "TEST" to the file, just write the string itself. Don't bother with pickle.

Think of pickling as saving binary data to disk. This is interesting if you have data structures in your program like a big dict or array, which took some time to create. You can save them to a file with pickle and read them in with pickle the next time your program runs, thus saving you the time it took to build the data structure. The downside is that other, non-Python programs will not be able to understand the pickle files.
As pickle is quite versatile you can of course also write simple text strings to a pickle file. But if you want to process them further, e.g. in a text editor or by another program, you need to store them verbatim, as Thomas Wouters suggests:
test = "TEST"
f1 = open(path+filename, "wb", 0)
f1.write(test)
f1.close()
return

Related

Reading python pickle data before writing it to a file

Background:
Hi. I'm currently working on a project that mainly relies on Pickles to save the state of objects I have. Here is a code snippet of two functions I've written:
from Kiosk import * #The main class used for the lockers
import gc #Garbage Collector library. Used to get all instances of a class
import pickle #Library used to store variables in files.
def storeData(Lockers:Locker):
with open('lockerData', 'wb') as File:
pickle.dump(Lockers, File)
def readData():
with open('lockerData', 'rb') as File:
return pickle.load(File)
This pickle data will eventually be sent and received from a server using the Sockets library.
I've done some reading on the topic of Pickles and it seems like everyone agrees that pickles can be quite dangerous to use in some use cases as it's relatively easy to get them to execute unwanted code.
Objective:
For the above mentioned reasons I want to encrypt my pickle data in AES before writing it to the Pickle File, that way the pickle file is always encrypted even when sent and received form the server. My main problem now is, I don't know how to get the pickle data without writing it to the Pickle file first. pickle.dump() only allows me to write the pickle data to a file but doesn't allow me to get this pickle data straight away.
If I decide to do encryption after the pickle data has already been written to the file that would mean that there would be a period of time where the pickle data is stored in plain text, and I don't want that to happen.
Psudocode:
Here is how I'm expecting the task execution to flow:
PickleData = createPicle(Lockers)
PickleDataE = encrypt(PickleData)
with open('textfile.txt', 'wb') as File:
File.write(PickleDataE)
Question:
So my question is, how can I get the pickle data without writing it to a file?
You can store the pickle data to file as the encrypted data itself. When you read the encrypted data, you decrypt it to a variable. If you wrap that variable in an io.StringIO object, you can read it just like you do from a file, except it's in memory now. IF you give that a try, i'm sure future questions can help with how to read the decrypted data as if it were pickle data.

Writing a dictionary to a file and reading it back - Most efficient method [duplicate]

This question already has answers here:
Why is dumping with `pickle` much faster than `json`?
(3 answers)
Closed 3 years ago.
I wish to write to a text file with a dictionary. There are three methods that I've seen and it seems that they are all valid, but I am interested in which one will be most optimized or efficient for reading/writing, especially when I have a large dictionary with many entries and why.
new_dict = {}
new_dict["city"] = "Boston"
# Writing to the file by string conversion
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
new_file.write(str(new_dict))
# Writing to the file using Pickle
import pickle
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
pickle.dump(new_dict, new_file, protocol=pickle.HIGHEST_PROTOCOL)
# Writing to the file using JSON
import json
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
json.dump(new_dict, new_file)
The answers about efficiency have been pretty much been covered with the comments, however, it would probably be useful to you, if your dataset is large and you might want to replicate your approach, to consider SQL alternatives, made easier in python with SQLAlchemy. That way, you can access it quickly, but store it neatly in a database.
Objects of some python classes may not be json serializable. If your dictionary contains such objects (as values), then you can't use json object.
Sure, some objects of some python classes may not be pickle serializable (for example, keras/tensorflow objects). Then, again, you can't use pickle method.
In my opinion, classes which can't be json serialized are more than classes which can't be pickled.
That being said, pickle method may be applicable more widely than json.
Efficiency wise (considering your dictionary is both json serializable and pickle-able), pickle will always win because no string conversion is involved (number to string while serializing and string to number while deserializing).
If you are trying to transport the object to another process/server (written in another programming language especially ... Java etc.), then you have to live with json. This applies even if you write to file and another process read from that file.
So ... it depends on your use-case.

What is pickle doing?

I have used Python for years. I have used pickle extensively. I cannot figure out what this is doing:
with codecs.open("huge_picklefile.pc", "rb") as f:
data = pickle.load(f)
print(len(data))
data = pickle.load(f)
print(len(data))
data = pickle.load(f)
print(len(data))
This returns to me:
335
59
12
I am beyond confused. I am use to pickle loading the massive file into memory. The object itself is a massive array of arrays (I assume). Could it be comprised of multiple pickle objects? Unfortunately, I didn't create the pickle object and I don't have access to who did.
I cannot figure out why pickle is splitting up my file into chunks, which isn't the default, and I am not telling it to. What does reloading the same file do? I honestly never tried or even came across a use case until now.
I spent a good 5 hours trying to figure out how to even ask this question on Google. Unsurprisingly, trying "multiple pickle loads on the same document" doesn't yield anything too useful. The Python 3.7 pickle docs does not describe this behavior. I can't figure out how repeatedly loading a pickle document doesn't (a) crash or (b) load the entire thing into memory and then just reference itself. In my 15 years of using python I have never run into this problem... so I am taking a leap of faith that this is just weird and we should probably just use a database instead.
This file is not quite a pickle file. Someone has dumped multiple pickles into the same file, resulting in the file contents being a concatenation of multiple pickles. When you call pickle.load(f), pickle will read the file from the current file position until it finds a pickle end, so each pickle.load call will load the next pickle.
You can create such a file yourself by calling pickle.dump repeatedly:
with open('demofile', 'wb') as f:
pickle.dump([1, 2, 3], f)
pickle.dump([10, 20], f)
pickle.dump([0, 0, 0], f)

how does pickle know which to pick?

I have my pickle function working properly
with open(self._prepared_data_location_scalar, 'wb') as output:
# company1 = Company('banana', 40)
pickle.dump(X_scaler, output, pickle.HIGHEST_PROTOCOL)
pickle.dump(Y_scaler, output, pickle.HIGHEST_PROTOCOL)
with open(self._prepared_data_location_scalar, 'rb') as input_f:
X_scaler = pickle.load(input_f)
Y_scaler = pickle.load(input_f)
However, I am very curious how does pickle know which to load? Does it mean that everything has to be in the same sequence?
What you have is fine. It's a documented feature of pickle:
It is possible to make multiple calls to the dump() method of the same Pickler instance. These must then be matched to the same number of calls to the load() method of the corresponding Unpickler instance.
There is no magic here, pickle is a really simple stack-based language that serializes python objects into bytestrings. The pickle format knows about object boundaries: by design, pickle.dumps('x') + pickle.dumps('y') is not the same bytestring as pickle.dumps('xy').
If you're interested to learn some background on the implementation, this article is an easy read to shed some light on the python pickler.
wow I did not even know you could do this ... and I have been using python for a very long time... so thats totally awesome in my book, however you really should not do this it will be very hard to work with later(especially if it isnt you working on it)
I would recommend just doing
pickle.dump({"X":X_scalar,"Y":Y_scalar},output)
...
data = pickle.load(fp)
print "Y_scalar:",data['Y']
print "X_scalar:",data['X']
unless you have a very compelling reason to save and load the data like you were in your question ...
edit to answer the actual question...
it loads from the start of the file to the end (ie it loads them in the same order they were dumped)
Yes, pickle pick objects in order of saving.
Intuitively, pickle append to the end when it write (dump) to a file,
and read (load) sequentially the content from a file.
Consequently, order is preserved, allowing you to retrieve your data in the exact order you serialize it.

Cpickle invalid load key error with a weird key at the end

I just tried to update a program i wrote and i needed to add another pickle file. So i created the blank .pkl and then use this command to open it(just as i did with all my others):
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
only this time around i keep getting this really weird error for no obvious reason,
cPickle.UnpicklingError: invalid load key, 'Γ'.
The pickle file does contain the necessary information to be loaded, it is an exact match to other blank .pkl's that i have and they load fine. I don't know what that last key is in the error but i suspect that could give me some incite if i know what it means.
So have have figured out the solution to this problem, and i thought I'd take the time to list some examples of what to do and what not to do when using pickle files. Firstly, the solution to this was to simply just make a plain old .txt file and dump the pickle data to it.
If you are under the impression that you have to actually make a new file and save it with a .pkl ending you would be wrong. I was creating my .pkl's with notepad++ and saving them as .pkl's. Now from my experience this does work sometimes and sometimes it doesn't, if your semi-new to programming this may cause a fair amount of confusion as it did for me. All that being said, i recommend just using plain old .txt files. It's the information stored inside the file not necessarily the extension that is important here.
#Notice file hasn't been pickled.
#What not to do. No need to name the file .pkl yourself.
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
The proper way:
#Pickle your new file
with open(filename, 'wb') as output:
pickle.dump(obj, output, -1)
#Now open with the original .txt ext. DONT RENAME.
with open('tryagain.txt', 'r') as input:
self.open_multi_clock = pickle.load(input)
Gonna guess the pickled data is throwing off portability by the outputted characters. I'd suggest base64 encoding the pickled data before writing it to file. What what I ran:
import base64
import pickle
value_p = pickle.dumps("abdfg")
value_p_b64 = base64.b64encode(value_p)
f = file("output.pkl", "w+")
f.write(value_p_b64)
f.close()
for line in open("output.pkl", 'r'):
readable += pickle.loads(base64.b64decode(line))
>>> readable
'abdfg'

Categories

Resources