So i want to write each element of a list in a new line in a binary file using Pickle, i want to be able to access these dictionaries later as well.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
Output:
{1: 11}
Could someone explain why the rest of the elements arent being dumped or suggest another way to write in a new line?
I'm using Python 3
They're all being dumped, but each dump is separate; to load them all, you need to match them with load calls.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
for _ in range(4):
print(pickle.load(file))
If you don't want to perform multiple loads, pickle them as a single data structure (e.g. the original list of dicts all at once).
In none of these cases are you writing newlines, nor should you be; pickle is a binary protocol, which means newlines are just another byte with independent meaning, and trying to inject newlines into the stream would get in the way of loading the data, and risk splitting up bits of data (if you actually read a line at a time for loading).
This question already has answers here:
Why is dumping with `pickle` much faster than `json`?
(3 answers)
Closed 3 years ago.
I wish to write to a text file with a dictionary. There are three methods that I've seen and it seems that they are all valid, but I am interested in which one will be most optimized or efficient for reading/writing, especially when I have a large dictionary with many entries and why.
new_dict = {}
new_dict["city"] = "Boston"
# Writing to the file by string conversion
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
new_file.write(str(new_dict))
# Writing to the file using Pickle
import pickle
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
pickle.dump(new_dict, new_file, protocol=pickle.HIGHEST_PROTOCOL)
# Writing to the file using JSON
import json
with open(r'C:\\Users\xy243\Documents\pop.txt', 'w') as new_file:
json.dump(new_dict, new_file)
The answers about efficiency have been pretty much been covered with the comments, however, it would probably be useful to you, if your dataset is large and you might want to replicate your approach, to consider SQL alternatives, made easier in python with SQLAlchemy. That way, you can access it quickly, but store it neatly in a database.
Objects of some python classes may not be json serializable. If your dictionary contains such objects (as values), then you can't use json object.
Sure, some objects of some python classes may not be pickle serializable (for example, keras/tensorflow objects). Then, again, you can't use pickle method.
In my opinion, classes which can't be json serialized are more than classes which can't be pickled.
That being said, pickle method may be applicable more widely than json.
Efficiency wise (considering your dictionary is both json serializable and pickle-able), pickle will always win because no string conversion is involved (number to string while serializing and string to number while deserializing).
If you are trying to transport the object to another process/server (written in another programming language especially ... Java etc.), then you have to live with json. This applies even if you write to file and another process read from that file.
So ... it depends on your use-case.
I have my pickle function working properly
with open(self._prepared_data_location_scalar, 'wb') as output:
# company1 = Company('banana', 40)
pickle.dump(X_scaler, output, pickle.HIGHEST_PROTOCOL)
pickle.dump(Y_scaler, output, pickle.HIGHEST_PROTOCOL)
with open(self._prepared_data_location_scalar, 'rb') as input_f:
X_scaler = pickle.load(input_f)
Y_scaler = pickle.load(input_f)
However, I am very curious how does pickle know which to load? Does it mean that everything has to be in the same sequence?
What you have is fine. It's a documented feature of pickle:
It is possible to make multiple calls to the dump() method of the same Pickler instance. These must then be matched to the same number of calls to the load() method of the corresponding Unpickler instance.
There is no magic here, pickle is a really simple stack-based language that serializes python objects into bytestrings. The pickle format knows about object boundaries: by design, pickle.dumps('x') + pickle.dumps('y') is not the same bytestring as pickle.dumps('xy').
If you're interested to learn some background on the implementation, this article is an easy read to shed some light on the python pickler.
wow I did not even know you could do this ... and I have been using python for a very long time... so thats totally awesome in my book, however you really should not do this it will be very hard to work with later(especially if it isnt you working on it)
I would recommend just doing
pickle.dump({"X":X_scalar,"Y":Y_scalar},output)
...
data = pickle.load(fp)
print "Y_scalar:",data['Y']
print "X_scalar:",data['X']
unless you have a very compelling reason to save and load the data like you were in your question ...
edit to answer the actual question...
it loads from the start of the file to the end (ie it loads them in the same order they were dumped)
Yes, pickle pick objects in order of saving.
Intuitively, pickle append to the end when it write (dump) to a file,
and read (load) sequentially the content from a file.
Consequently, order is preserved, allowing you to retrieve your data in the exact order you serialize it.
So, I saved a list to a file as a string. In particular, I did:
f = open('myfile.txt','w')
f.write(str(mylist))
f.close()
But, later when I open this file again, take the (string-ified) list, and want to change it back to a list, what happens is something along these lines:
>>> list('[1,2,3]')
['[', '1', ',', '2', ',', '3', ']']
Could I make it so that I got the list [1,2,3] from the file?
There are two easiest major options here. First, using ast.literal_eval:
>>> import ast
>>> ast.literal_eval('[1,2,3]')
[1, 2, 3]
Unlike eval, this is safer since it will only evaluate python literals, such as lists, dictionaries, NoneTypes, strings, etc. This would throw an error if we use a code inside.
Second, make use of the json module, using json.loads:
>>> import json
>>> json.loads('[1,2,3]')
[1, 2, 3]
A great advantage of using json is that it's cross-platform, and you can also write to file easily.
with open('data.txt', 'w') as f:
json.dump([1, 2, 3], f)
In [285]: import ast
In [286]: ast.literal_eval('[1,2,3]')
Out[286]: [1, 2, 3]
Use ast.literal_eval instead of eval whenever possible:
ast.literal_eval:
Safely evaluate[s] an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None.
Edit: Also, consider using json. json.loads operates on a different string format, but is generally faster than ast.literal_eval. (So if you use json.load, be sure to save your data using json.dump.) Moreover, the JSON format is language-independent.
Python developers traditionally use pickle to serialize their data and write it to a file.
You could do so like so:
import pickle
mylist = [1,2,3]
f = open('myfile', 'wb')
pickle.dump(mylist, f)
And then reopen like so:
import pickle
f = open('myfile', 'rb')
mylist = pickle.load(f) # [1,2,3]
I would write python objects to file using the built-in json encoding or, if you don't like json, with pickle and cpickle. Both allow for easy deserialization and serialization of data. I'm on my phone but when I get home I'll upload sample code.
EDIT:
Ok, get ready for a ton of Python code, and some opinions...
JSON
Python has builtin support for JSON, or JavaScript Object Notation, a lightweight data interchange format. JSON supports Python's basic data types, such as dictionaries (which JSON calls objects: basically just key-value pairs, and lists: comma-separated values encapsulated by [ and ]. For more information on JSON, see this resource. Now to the code:
import json #Don't forget to import
my_list = [1,2,"blue",["sub","list",5]]
with open('/path/to/file/', 'w') as f:
string_to_write = json.dumps(my_list) #Dump the string
f.write(string_to_write)
#The character string [1,2,"blue",["sub","list",5]] is written to the file
Note that the with statement will close the file automatically when the block finishes executing.
To load the string back, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = json.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
I like JSON. Use JSON unless you really, really have a good reason for not.
CPICKLE
Another method for serializing Python data to a file is called pickling, in which we write more than we 'need' to a file so that we have some meta-information about how the characters in the file relate to Python objects. There is a builtin pickle class, but we'll use cpickle, because it is implemented in C and is much, much faster than pickle (about 100X but I don't have a citation for that number). The dumping code then becomes
import cpickle #Don't forget to import
with open('/path/to/file/', 'w') as f:
string_to_write = cpickle.dumps(my_list) #Dump the string
f.write(string_to_write)
#A very weird character string is written to the file, but it does contain the contents of our list
To load, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = cpickle.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
Comparison
Note the similarities between the code we wrote using JSON and the code we wrote using cpickle. In fact, the only major difference between the two methods is what text (which characters) gets actually written to the file. I believe JSON is faster and more space-efficient than cpickle - but cpickle is a valid alternative. Also, JSON format is much more universal than cpickle's weird syntax.
A note on eval
Please don't use eval() haphazardly. It seems like you're relatively new to Python, and eval can be a risky function to jump right into. It allows for the unchecked evaluation of any Python code, and as such can be a) risky, if you ever are relying on the user to input text, and b) can lead to sloppy, non-Pythonic code.
That last point is just my two cents.
tl:dr; Use JSON to dump and load Python objects to file
Write to file without brackets: f.write(str(mylist)[1:-1])
After reading the line, split it to get a list: data = line.split(',')
To convert to integers: data = map(int, data)
I have a problem writing a file with Pickle in Python
Here is my code:
test = "TEST"
f1 = open(path+filename, "wb", 0)
pickle.dump(test,f1,0)
f1.close()
return
This gives me the output in the .txt file as VTESTp0. I'm not sure why this is?
Shouldn't it just have been saved as TEST?
I'm very new to pickle and I didn't even know it existed until today so sorry if I'm asking a silly question.
No, pickle does not write strings just as strings. Pickle is a serialization protocol, it turns objects into strings of bytes so that you can later recreate them. The actual format depends on which version of the protocol you use, but you should really treat pickle data as an opaque type.
If you want to write the string "TEST" to the file, just write the string itself. Don't bother with pickle.
Think of pickling as saving binary data to disk. This is interesting if you have data structures in your program like a big dict or array, which took some time to create. You can save them to a file with pickle and read them in with pickle the next time your program runs, thus saving you the time it took to build the data structure. The downside is that other, non-Python programs will not be able to understand the pickle files.
As pickle is quite versatile you can of course also write simple text strings to a pickle file. But if you want to process them further, e.g. in a text editor or by another program, you need to store them verbatim, as Thomas Wouters suggests:
test = "TEST"
f1 = open(path+filename, "wb", 0)
f1.write(test)
f1.close()
return