Difference between pickle and opening a file? - python

What is the difference between using the pickle library and using with open()?
Both have the same functionality where you read and write into the file and I don't see any differences between them.
And why do many people use pickle more than with open() if it is so seemingly similar?

Let me see if I can understand where the point of confusion is and give a useful explanation.
open is how you get a file object, which is the interface between your Python program and an actual file on disk. with is a tool used for ensuring that the file object is closed at the appropriate time.
The file object allows you to read and/or write the file, depending on how it was opened. The built-in way to do this is with the object's own functionality. This lets you write whatever data you want, at the expense that you are responsible for figuring out what that data should be; alternately, it lets you read the data and gives you both the power and responsibility that comes from interpreting that data.
The pickle library builds on top of that functionality, to use the file's contents to represent native Python objects. It does the interpretation (parsing) and data-figuring-out (formatting) work for you, accomplishing something that would be difficult by hand. The trade-off is that it works in a specific way, and is fit for only a specific purpose - you won't, for example, be producing or interpreting plain text files, or images, or JSON data, etc. this way any time soon (which you could by writing the data yourself, or by using a different, special-purpose library - except of course for plain text, where there's no point in doing anything beyond using the built-in functionality).

The difference is in what you put in the file, and who is responsible for the underlying file's format / serialization.
With the builtin open, you receive a raw file handle. You can write whatever you want to it. It doesn't have to be structured, it doesn't have to be consistent, hell it doesn't even have to make sense to an outside observer. You are allowed to write whatever you want to a file.
With a pickle, the underlying module is responsible for what is written. It serializes python objects (as much as possible, there are examples of classes that cannot be pickled) in a consistent, reproducible format that can the be re-loaded. IE - you can save the state of actual python objects in a static file, and then reload them and end up with identical objects the next the the interpreter runs. This has advantages when dealing with stateful programs.
Bonus: The shelve module serves as a user-friendly frontend to pickle that behaves like a dictionary. When you close the shelf, the contents are serialized to disk. When you re-open the shelf, the objects are deserialized from the file and accessible the same way a dictionary would be.

pickle allows you to conveniently write python objects to it, and load those objects. How would you use open() to write a dictionary into a file, and be able to load it into your python file with one simple line?
For open(), it will be like:
dct = {'a': 1,
'b': 2,
'c': 3,
'd': 4,
'e': 5}
with open('file.txt','w') as f:
f.write('\n'.join([f"{k}, {v}" for k, v in dct.items()]))
with open('file.txt','r') as f:
dct = {k: int(v) for k, v in [s.split(', ') for s in f.read().splitlines()]}
While with pickle:
import pickle
dct = {'a': 1,
'b': 2,
'c': 3,
'd': 4,
'e': 5}
with open('file.txt','wb') as f:
pickle.dump(dct, f)
with open('file.txt','rb') as f:
dct = pickle.load(f)
Note the conversion did in the first method, where we need to convert the string into an integer. With pickle, you won't have to worry about that.

pickle is a library function written using with open() to convert the objects into a byte stream to save developer efforts.
Further info: https://docs.python.org/3/library/pickle.html
and here is the code: https://github.com/python/cpython/blob/main/Lib/pickle.py

Related

Dump (pickle) in new line Python

So i want to write each element of a list in a new line in a binary file using Pickle, i want to be able to access these dictionaries later as well.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
Output:
{1: 11}
Could someone explain why the rest of the elements arent being dumped or suggest another way to write in a new line?
I'm using Python 3
They're all being dumped, but each dump is separate; to load them all, you need to match them with load calls.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
for _ in range(4):
print(pickle.load(file))
If you don't want to perform multiple loads, pickle them as a single data structure (e.g. the original list of dicts all at once).
In none of these cases are you writing newlines, nor should you be; pickle is a binary protocol, which means newlines are just another byte with independent meaning, and trying to inject newlines into the stream would get in the way of loading the data, and risk splitting up bits of data (if you actually read a line at a time for loading).

Writing a dictionary to a file and reading it back - Most efficient method [duplicate]

This question already has answers here:
Why is dumping with `pickle` much faster than `json`?
(3 answers)
Closed 3 years ago.
I wish to write to a text file with a dictionary. There are three methods that I've seen and it seems that they are all valid, but I am interested in which one will be most optimized or efficient for reading/writing, especially when I have a large dictionary with many entries and why.
new_dict = {}
new_dict["city"] = "Boston"
# Writing to the file by string conversion
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
new_file.write(str(new_dict))
# Writing to the file using Pickle
import pickle
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
pickle.dump(new_dict, new_file, protocol=pickle.HIGHEST_PROTOCOL)
# Writing to the file using JSON
import json
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
json.dump(new_dict, new_file)
The answers about efficiency have been pretty much been covered with the comments, however, it would probably be useful to you, if your dataset is large and you might want to replicate your approach, to consider SQL alternatives, made easier in python with SQLAlchemy. That way, you can access it quickly, but store it neatly in a database.
Objects of some python classes may not be json serializable. If your dictionary contains such objects (as values), then you can't use json object.
Sure, some objects of some python classes may not be pickle serializable (for example, keras/tensorflow objects). Then, again, you can't use pickle method.
In my opinion, classes which can't be json serialized are more than classes which can't be pickled.
That being said, pickle method may be applicable more widely than json.
Efficiency wise (considering your dictionary is both json serializable and pickle-able), pickle will always win because no string conversion is involved (number to string while serializing and string to number while deserializing).
If you are trying to transport the object to another process/server (written in another programming language especially ... Java etc.), then you have to live with json. This applies even if you write to file and another process read from that file.
So ... it depends on your use-case.

how does pickle know which to pick?

I have my pickle function working properly
with open(self._prepared_data_location_scalar, 'wb') as output:
# company1 = Company('banana', 40)
pickle.dump(X_scaler, output, pickle.HIGHEST_PROTOCOL)
pickle.dump(Y_scaler, output, pickle.HIGHEST_PROTOCOL)
with open(self._prepared_data_location_scalar, 'rb') as input_f:
X_scaler = pickle.load(input_f)
Y_scaler = pickle.load(input_f)
However, I am very curious how does pickle know which to load? Does it mean that everything has to be in the same sequence?
What you have is fine. It's a documented feature of pickle:
It is possible to make multiple calls to the dump() method of the same Pickler instance. These must then be matched to the same number of calls to the load() method of the corresponding Unpickler instance.
There is no magic here, pickle is a really simple stack-based language that serializes python objects into bytestrings. The pickle format knows about object boundaries: by design, pickle.dumps('x') + pickle.dumps('y') is not the same bytestring as pickle.dumps('xy').
If you're interested to learn some background on the implementation, this article is an easy read to shed some light on the python pickler.
wow I did not even know you could do this ... and I have been using python for a very long time... so thats totally awesome in my book, however you really should not do this it will be very hard to work with later(especially if it isnt you working on it)
I would recommend just doing
pickle.dump({"X":X_scalar,"Y":Y_scalar},output)
...
data = pickle.load(fp)
print "Y_scalar:",data['Y']
print "X_scalar:",data['X']
unless you have a very compelling reason to save and load the data like you were in your question ...
edit to answer the actual question...
it loads from the start of the file to the end (ie it loads them in the same order they were dumped)
Yes, pickle pick objects in order of saving.
Intuitively, pickle append to the end when it write (dump) to a file,
and read (load) sequentially the content from a file.
Consequently, order is preserved, allowing you to retrieve your data in the exact order you serialize it.

Change string list to list

So, I saved a list to a file as a string. In particular, I did:
f = open('myfile.txt','w')
f.write(str(mylist))
f.close()
But, later when I open this file again, take the (string-ified) list, and want to change it back to a list, what happens is something along these lines:
>>> list('[1,2,3]')
['[', '1', ',', '2', ',', '3', ']']
Could I make it so that I got the list [1,2,3] from the file?
There are two easiest major options here. First, using ast.literal_eval:
>>> import ast
>>> ast.literal_eval('[1,2,3]')
[1, 2, 3]
Unlike eval, this is safer since it will only evaluate python literals, such as lists, dictionaries, NoneTypes, strings, etc. This would throw an error if we use a code inside.
Second, make use of the json module, using json.loads:
>>> import json
>>> json.loads('[1,2,3]')
[1, 2, 3]
A great advantage of using json is that it's cross-platform, and you can also write to file easily.
with open('data.txt', 'w') as f:
json.dump([1, 2, 3], f)
In [285]: import ast
In [286]: ast.literal_eval('[1,2,3]')
Out[286]: [1, 2, 3]
Use ast.literal_eval instead of eval whenever possible:
ast.literal_eval:
Safely evaluate[s] an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None.
Edit: Also, consider using json. json.loads operates on a different string format, but is generally faster than ast.literal_eval. (So if you use json.load, be sure to save your data using json.dump.) Moreover, the JSON format is language-independent.
Python developers traditionally use pickle to serialize their data and write it to a file.
You could do so like so:
import pickle
mylist = [1,2,3]
f = open('myfile', 'wb')
pickle.dump(mylist, f)
And then reopen like so:
import pickle
f = open('myfile', 'rb')
mylist = pickle.load(f) # [1,2,3]
I would write python objects to file using the built-in json encoding or, if you don't like json, with pickle and cpickle. Both allow for easy deserialization and serialization of data. I'm on my phone but when I get home I'll upload sample code.
EDIT:
Ok, get ready for a ton of Python code, and some opinions...
JSON
Python has builtin support for JSON, or JavaScript Object Notation, a lightweight data interchange format. JSON supports Python's basic data types, such as dictionaries (which JSON calls objects: basically just key-value pairs, and lists: comma-separated values encapsulated by [ and ]. For more information on JSON, see this resource. Now to the code:
import json #Don't forget to import
my_list = [1,2,"blue",["sub","list",5]]
with open('/path/to/file/', 'w') as f:
string_to_write = json.dumps(my_list) #Dump the string
f.write(string_to_write)
#The character string [1,2,"blue",["sub","list",5]] is written to the file
Note that the with statement will close the file automatically when the block finishes executing.
To load the string back, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = json.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
I like JSON. Use JSON unless you really, really have a good reason for not.
CPICKLE
Another method for serializing Python data to a file is called pickling, in which we write more than we 'need' to a file so that we have some meta-information about how the characters in the file relate to Python objects. There is a builtin pickle class, but we'll use cpickle, because it is implemented in C and is much, much faster than pickle (about 100X but I don't have a citation for that number). The dumping code then becomes
import cpickle #Don't forget to import
with open('/path/to/file/', 'w') as f:
string_to_write = cpickle.dumps(my_list) #Dump the string
f.write(string_to_write)
#A very weird character string is written to the file, but it does contain the contents of our list
To load, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = cpickle.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
Comparison
Note the similarities between the code we wrote using JSON and the code we wrote using cpickle. In fact, the only major difference between the two methods is what text (which characters) gets actually written to the file. I believe JSON is faster and more space-efficient than cpickle - but cpickle is a valid alternative. Also, JSON format is much more universal than cpickle's weird syntax.
A note on eval
Please don't use eval() haphazardly. It seems like you're relatively new to Python, and eval can be a risky function to jump right into. It allows for the unchecked evaluation of any Python code, and as such can be a) risky, if you ever are relying on the user to input text, and b) can lead to sloppy, non-Pythonic code.
That last point is just my two cents.
tl:dr; Use JSON to dump and load Python objects to file
Write to file without brackets: f.write(str(mylist)[1:-1])
After reading the line, split it to get a list: data = line.split(',')
To convert to integers: data = map(int, data)

Pickle problem writing to file

I have a problem writing a file with Pickle in Python
Here is my code:
test = "TEST"
f1 = open(path+filename, "wb", 0)
pickle.dump(test,f1,0)
f1.close()
return
This gives me the output in the .txt file as VTESTp0. I'm not sure why this is?
Shouldn't it just have been saved as TEST?
I'm very new to pickle and I didn't even know it existed until today so sorry if I'm asking a silly question.
No, pickle does not write strings just as strings. Pickle is a serialization protocol, it turns objects into strings of bytes so that you can later recreate them. The actual format depends on which version of the protocol you use, but you should really treat pickle data as an opaque type.
If you want to write the string "TEST" to the file, just write the string itself. Don't bother with pickle.
Think of pickling as saving binary data to disk. This is interesting if you have data structures in your program like a big dict or array, which took some time to create. You can save them to a file with pickle and read them in with pickle the next time your program runs, thus saving you the time it took to build the data structure. The downside is that other, non-Python programs will not be able to understand the pickle files.
As pickle is quite versatile you can of course also write simple text strings to a pickle file. But if you want to process them further, e.g. in a text editor or by another program, you need to store them verbatim, as Thomas Wouters suggests:
test = "TEST"
f1 = open(path+filename, "wb", 0)
f1.write(test)
f1.close()
return

Categories

Resources