Background:
Hi. I'm currently working on a project that mainly relies on Pickles to save the state of objects I have. Here is a code snippet of two functions I've written:
from Kiosk import * #The main class used for the lockers
import gc #Garbage Collector library. Used to get all instances of a class
import pickle #Library used to store variables in files.
def storeData(Lockers:Locker):
with open('lockerData', 'wb') as File:
pickle.dump(Lockers, File)
def readData():
with open('lockerData', 'rb') as File:
return pickle.load(File)
This pickle data will eventually be sent and received from a server using the Sockets library.
I've done some reading on the topic of Pickles and it seems like everyone agrees that pickles can be quite dangerous to use in some use cases as it's relatively easy to get them to execute unwanted code.
Objective:
For the above mentioned reasons I want to encrypt my pickle data in AES before writing it to the Pickle File, that way the pickle file is always encrypted even when sent and received form the server. My main problem now is, I don't know how to get the pickle data without writing it to the Pickle file first. pickle.dump() only allows me to write the pickle data to a file but doesn't allow me to get this pickle data straight away.
If I decide to do encryption after the pickle data has already been written to the file that would mean that there would be a period of time where the pickle data is stored in plain text, and I don't want that to happen.
Psudocode:
Here is how I'm expecting the task execution to flow:
PickleData = createPicle(Lockers)
PickleDataE = encrypt(PickleData)
with open('textfile.txt', 'wb') as File:
File.write(PickleDataE)
Question:
So my question is, how can I get the pickle data without writing it to a file?
You can store the pickle data to file as the encrypted data itself. When you read the encrypted data, you decrypt it to a variable. If you wrap that variable in an io.StringIO object, you can read it just like you do from a file, except it's in memory now. IF you give that a try, i'm sure future questions can help with how to read the decrypted data as if it were pickle data.
Related
So i want to write each element of a list in a new line in a binary file using Pickle, i want to be able to access these dictionaries later as well.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
Output:
{1: 11}
Could someone explain why the rest of the elements arent being dumped or suggest another way to write in a new line?
I'm using Python 3
They're all being dumped, but each dump is separate; to load them all, you need to match them with load calls.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
for _ in range(4):
print(pickle.load(file))
If you don't want to perform multiple loads, pickle them as a single data structure (e.g. the original list of dicts all at once).
In none of these cases are you writing newlines, nor should you be; pickle is a binary protocol, which means newlines are just another byte with independent meaning, and trying to inject newlines into the stream would get in the way of loading the data, and risk splitting up bits of data (if you actually read a line at a time for loading).
Is there any option to load a pickle file in chunks?
I know we can save the data in CSV and load it in chunks.
But other than CSV, is there any option to load a pickle file or any python native file in chunks?
Based on the documentation for Python pickle, there is not currently support for chunking.
However, it is possible to split data into chunks and then read in chunks. For example, suppose the original structure is
import pickle
filename = "myfile.pkl"
str_to_save = "myname"
with open(filename,'wb') as file_handle:
pickle.dump(str_to_save, file_handle)
with open(filename,'rb') as file_handle:
result = pickle.load(file_handle)
print(result)
That could be split into two separate pickle files:
import pickle
filename_1 = "myfile_1.pkl"
filename_2 = "myfile_2.pkl"
str_to_save = "myname"
with open(filename_1,'wb') as file_handle:
pickle.dump(str_to_save[0:4], file_handle)
with open(filename_2,'wb') as file_handle:
pickle.dump(str_to_save[4:], file_handle)
with open(filename_1,'rb') as file_handle:
result = pickle.load(file_handle)
print(result)
As per AKX's comment, writing multiple data to a single file also works:
import pickle
filename = "myfile.pkl"
str_to_save = "myname"
with open(filename,'wb') as file_handle:
pickle.dump(str_to_save[0:4], file_handle)
pickle.dump(str_to_save[4:], file_handle)
with open(filename,'rb') as file_handle:
result = pickle.load(file_handle)
print(result)
result = pickle.load(file_handle)
print(result)
I had a similar issue, where I wrote a barrel file descriptor pool, and noticed that my pickle files were getting corrupt when I closed a file descriptor. Although you may do multiple dump() operations to an open file descriptor, it's not possible to subsequently do an open('file', 'ab') to start saving a new set of objects.
I got around this by doing a pickler.dump(None) as a session terminator right before I had to close the file descriptor, and upon re-opening, I instantiated a new Pickler instance to resume writing to the file.
When loading from this file, a None object signified an end-of-session, at which point I instantiated a new Pickler instance with the file descriptor to continue reading the remainder of the multi-session pickle file.
This only applies if for some reason you have to close the file descriptor, though. Otherwise, any number of dump() calls can be performed for load() later.
As far as I understand Pickle, load/dump by chunk is not possible.
Pickle intrinsically reads a complete data stream by "chunks" of variable length depending on flags within the data stream. That is what serialization is all about. This datastream itself could have been cut in chunk earlier (say, network transfer), but chunks cannot be pickle/unpickled "on the fly".
But maybe something intermediate can be achieved with pickle "buffers" and "out of band" features for very large data.
Note this is not exactly a pickle load/save a single pickle file in chunks. It only applies to objects met during the serialization process that declare themselves has being "out of band" (serialized separately).
Quoting the Pickler class doc:
If buffer_callback is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream.
(emphasis mine)
Quoting the "Out of band" concept doc:
In some contexts, the pickle module is used to transfer massive amounts of data. Therefore, it can be important to minimize the number of memory copies, to preserve performance and resource consumption. However, normal operation of the pickle module, as it transforms a graph-like structure of objects into a sequential stream of bytes, intrinsically involves copying data to and from the pickle stream.
This constraint can be eschewed if both the provider (the implementation of the object types to be transferred) and the consumer (the implementation of the communications system) support the out-of-band transfer facilities provided by pickle protocol 5 and higher.
Example taken from the doc example :
b = ZeroCopyByteArray(b"abc") # NB: class has a special __reduce_ex__ and _reconstruct method
buffers = []
data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append)
# we could do things with these buffers like:
# - writing each to a single file,
# - sending them over network,
# ...
new_b = pickle.loads(data, buffers=buffers) # load in chunks
From this example, we could consider writing each buffer into a file, or sending each on a network. Then unpickling would be performed by loading those files (or network payloads) and passing to the unpickle.
But note that we end up with 2 serialized data in the example:
data
buffers
Not really the OP desire, not exactly pickle load/dump by chunks.
From a pickle-to-a-single-file perspective, I don't think this gives any benefit, because we would have to define a custom method to pack into a file both data and buffers, i.e. define a new data format ... feels like ruining the pickle initial benefits.
Quoting Unpickler constructor doc:
If buffers is not None, it should be an iterable of buffer-enabled objects that is consumed each time the pickle stream references an out-of-band buffer view. Such buffers have been given in order to the buffer_callback of a Pickler object.
Changed in version 3.8: The buffers argument was added.
I have used Python for years. I have used pickle extensively. I cannot figure out what this is doing:
with codecs.open("huge_picklefile.pc", "rb") as f:
data = pickle.load(f)
print(len(data))
data = pickle.load(f)
print(len(data))
data = pickle.load(f)
print(len(data))
This returns to me:
335
59
12
I am beyond confused. I am use to pickle loading the massive file into memory. The object itself is a massive array of arrays (I assume). Could it be comprised of multiple pickle objects? Unfortunately, I didn't create the pickle object and I don't have access to who did.
I cannot figure out why pickle is splitting up my file into chunks, which isn't the default, and I am not telling it to. What does reloading the same file do? I honestly never tried or even came across a use case until now.
I spent a good 5 hours trying to figure out how to even ask this question on Google. Unsurprisingly, trying "multiple pickle loads on the same document" doesn't yield anything too useful. The Python 3.7 pickle docs does not describe this behavior. I can't figure out how repeatedly loading a pickle document doesn't (a) crash or (b) load the entire thing into memory and then just reference itself. In my 15 years of using python I have never run into this problem... so I am taking a leap of faith that this is just weird and we should probably just use a database instead.
This file is not quite a pickle file. Someone has dumped multiple pickles into the same file, resulting in the file contents being a concatenation of multiple pickles. When you call pickle.load(f), pickle will read the file from the current file position until it finds a pickle end, so each pickle.load call will load the next pickle.
You can create such a file yourself by calling pickle.dump repeatedly:
with open('demofile', 'wb') as f:
pickle.dump([1, 2, 3], f)
pickle.dump([10, 20], f)
pickle.dump([0, 0, 0], f)
I just tried to update a program i wrote and i needed to add another pickle file. So i created the blank .pkl and then use this command to open it(just as i did with all my others):
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
only this time around i keep getting this really weird error for no obvious reason,
cPickle.UnpicklingError: invalid load key, 'Γ'.
The pickle file does contain the necessary information to be loaded, it is an exact match to other blank .pkl's that i have and they load fine. I don't know what that last key is in the error but i suspect that could give me some incite if i know what it means.
So have have figured out the solution to this problem, and i thought I'd take the time to list some examples of what to do and what not to do when using pickle files. Firstly, the solution to this was to simply just make a plain old .txt file and dump the pickle data to it.
If you are under the impression that you have to actually make a new file and save it with a .pkl ending you would be wrong. I was creating my .pkl's with notepad++ and saving them as .pkl's. Now from my experience this does work sometimes and sometimes it doesn't, if your semi-new to programming this may cause a fair amount of confusion as it did for me. All that being said, i recommend just using plain old .txt files. It's the information stored inside the file not necessarily the extension that is important here.
#Notice file hasn't been pickled.
#What not to do. No need to name the file .pkl yourself.
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
The proper way:
#Pickle your new file
with open(filename, 'wb') as output:
pickle.dump(obj, output, -1)
#Now open with the original .txt ext. DONT RENAME.
with open('tryagain.txt', 'r') as input:
self.open_multi_clock = pickle.load(input)
Gonna guess the pickled data is throwing off portability by the outputted characters. I'd suggest base64 encoding the pickled data before writing it to file. What what I ran:
import base64
import pickle
value_p = pickle.dumps("abdfg")
value_p_b64 = base64.b64encode(value_p)
f = file("output.pkl", "w+")
f.write(value_p_b64)
f.close()
for line in open("output.pkl", 'r'):
readable += pickle.loads(base64.b64decode(line))
>>> readable
'abdfg'
I have a problem writing a file with Pickle in Python
Here is my code:
test = "TEST"
f1 = open(path+filename, "wb", 0)
pickle.dump(test,f1,0)
f1.close()
return
This gives me the output in the .txt file as VTESTp0. I'm not sure why this is?
Shouldn't it just have been saved as TEST?
I'm very new to pickle and I didn't even know it existed until today so sorry if I'm asking a silly question.
No, pickle does not write strings just as strings. Pickle is a serialization protocol, it turns objects into strings of bytes so that you can later recreate them. The actual format depends on which version of the protocol you use, but you should really treat pickle data as an opaque type.
If you want to write the string "TEST" to the file, just write the string itself. Don't bother with pickle.
Think of pickling as saving binary data to disk. This is interesting if you have data structures in your program like a big dict or array, which took some time to create. You can save them to a file with pickle and read them in with pickle the next time your program runs, thus saving you the time it took to build the data structure. The downside is that other, non-Python programs will not be able to understand the pickle files.
As pickle is quite versatile you can of course also write simple text strings to a pickle file. But if you want to process them further, e.g. in a text editor or by another program, you need to store them verbatim, as Thomas Wouters suggests:
test = "TEST"
f1 = open(path+filename, "wb", 0)
f1.write(test)
f1.close()
return