My program creates a probabilistic model that I want to save as a module to import later. How can I save it in a way that it can be directly imported?
Json is good for dicts, but I have different data structures, Pickle does not seem to allow to use import directly and pprint does not print the name and assignment of the structures.
I would just like to create some data structures:
states = (
'Bound',
'Not-bound'
)
Prob = {
'Bound': 0.45,
'Not-bound': 0.55
}
save them somehow to a 'py' file:
with open('model.py', 'wb') as out:
save(states)
save(Prob)
Then, import them later directly:
import model
print(model.states)
Take a look at the pickle module.
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
It won't be quite the way you want it to be but I think it's a simple and reasonable way of doing what you want.
Related
I am running a script which takes, say, an hour to generate the data I want. I want to be able to save all of the relevant variables to some external file so I can fiddle with them later without having to run the hour-long calculation over again. Is there an easy way I can save all of the variables I need into one convenient file?
In Matlab I would just contain all of the results of the calculation in a single structure so that later I could just load results.mat and I would have everything I need stored as results.output1, results.output2 or whatever. What is the Python equivalent of this?
In particular, the data that I would like to save includes arrays of complex numbers, which seems to present difficulties for using things like json.
I suggest taking look at built-in shelve module which provides persistent, dictionary-like object and generally does work with all native Python types so you can do:
Write complex to some file (in my example it is named mydata) under key n (keep in mind that keys should be strings).
import shelve
my_number = 2+7j
with shelve.open('mydata') as db:
db['n'] = my_number
Later retrieve that number from given file
import shelve
with shelve.open('mydata') as db:
my_number = db['n']
print(my_number) # (2+7j)
You can use pickle function in Python and then use the dump function to dump all your data into a file. You can reuse the data later.I suggest you find more about pickle.
I would recommend a json file. With json you can assign variables to keywords, just like dictionaries in stock python. The json package is automatically installed when installing python.
import json
dict = {var1: "abcde", var2: "fghij"}
with open(path, "w") as file:
json.dump(dict, file, indent=2, ensure_ascii = False)
You can also load this from a file using the same api:
with open(path, r) as file:
text = file.read()
dict = json.loads(text)
Edit: Json can also handle every datatype python can, so if you want to save an array you can just define that in the dict:
dict = {list1: ["ab", "cd", "ef"]}
Someone stored more than one objects in Pickle file. Now I want to unpickle that file, but how can I know how many objects are stored in the Pickle file? Is their any annotations or something else from which we may get information about Pickle file?
Pickle doesn't store that information and doesn't support storing more than one top-level object in a pickle at once anyway. So the simple answer is: it's always one object. Note that objects can be trivially nested, so you could store a list of objects, for example. That's still a single top-level list.
If you need to add multiple pickles to a file, you have to invent your own metadata, and store that in addition to the pickle data.
For example, you could store both the number of objects and, for each object, pickled separately, the length of the pickle data stream as a fixed-length number:
import pickle
import struct
with open(some_filename, 'wb') as output:
output.write(struct.pack('I', len(sequence_of_objects)))
for obj in sequence_of_objects:
pickled = pickle.dumps(obj)
output.write(struct.pack('I', len(pickled)))
output.write(pickled)
The above uses 4-byte unsigned integers to record how many objects there are as well as the pickle lengths; adjust as needed if your object counts or sizes can be that large.
The above can then be read again with, say, a generator function:
import pickle
import struct
def read_objects(filename):
with open(filename, 'rb') as inf:
count, = struct.unpack('I', inf.read(4))
logger.info("Reading up to %d objects from %s", count, filename)
while True:
length_bytes = inf.read(4)
if not length_bytes:
return
length, = struct.unpack('I', length_bytes)
yield pickle.loads(inf.read(length))
I want to do some testing on a feature of my (python) program which is computationally very heavy. I could run the code, store the output in a pandas.DataFrame, pickle the df and distribute with my package so that the tests can be ran by users. However I think this goes against the principles of unittesting, namely, that a test should be independent of external sources and self contained.
An alternative idea would be if I were to store a pickle file as a string within an importable python class then dynamically write the pickle file and clean it up after the test. Is this possible to do and if so how can I do it?
Here's a small bit of code that simply write a df to pickle.pickle in the current working directory.
import pickle
import os
import pandas
df = pandas.DataFrame([1,2,3,4,5,6])
filename = os.path.join(os.getcwd(), 'pickle.pickle')
df.to_pickle(filename)
Would it then be possible to somehow get a string version of the pickle so that I can store it in a class?
Would it then be possible to somehow get a string version of the
pickle so that I can store it in a class?
Just read the full file:
with open(filename, "rb") as f:
data = f.read()
Then if you need you can just unpicle it with loads
unpickled = pickle.loads(data)
Say I got a dictionary like this:
Test = {"apple":[3,{1:1,3:5,6:7}],"banana":[4,{1:1,3:5,6:7,11:2}]}
Now I want to save this dictionary into a temporary file so that I can reconstruct the dictionary later. (I am doing external sorting XD)
Can any kind guy help me? I know there is a way to save it in csv format, but this one is a special kind of dictionary. Thx a lot.
In order to save a data structure to a file you need to decide on a serialization format. A serialization format takes an in-memory data structure and turns it into a sequence of bytes that can be written to a file. The process of turning that sequence of bytes back into a data structure is called deserialization.
Python provides a number of options for seralization with different characteristics. Here are two common choices and their tradeoffs:
pickle uses a compact format that can represent almost any Python object. However, it's specific to Python, so only Python programs can (easily) decode the file later. The details of the format can also vary between Python releases, so it's best to use pickle only if you will re-read the data file with the same or a newer version of Python than the one that created it. Pickle is able to deal with recursive data structures. Finally, pickle is inappropriate for reading data provided by other possibly-malicious programs, since decoding a pickled data structure can cause arbitrary code to run in your application in order to reconstruct complex objects.
json uses a human-readable text-based format that can represent only a small set of data types: numbers, strings, None, booleans, lists and dictionaries. However, it is a standard format that can be read by equivalent libraries in many other programming languages, and it does not vary from one Python release to another. JSON does not support recursive data structures, but it is generally safe to decode a JSON object from an unknown source except that of course the resulting data structure might use a lot of memory.
In both of these modules the dump function can write a data structure to a file and the load function can later recover it. The difference is in the format of the data written to the file.
The pickle module is quite convenient for serializing python data. It is also probably the fastest way to dump and reload a python data structure.
>>> import pickle
>>> Test = {"apple":[3,{1:1,3:5,6:7}],"banana":[4,{1:1,3:5,6:7,11:2}]}
>>> pickle.dump(Test, open('test_file', 'w'))
>>> pickle.load(open('test_file', 'r'))
{'apple': [3, {1: 1, 3: 5, 6: 7}], 'banana': [4, {1: 1, 3: 5, 11: 2, 6: 7}]}
For me, clearly, the best serializer is msgpack.
http://jmoiron.net/blog/python-serialization/
Same methods as the others.
I'm using CherryPy and it seems to not behave nicely when it comes to retrieving data from stored files on the server. (I asked for help on that and nobody replied, so I'm on to plan B, or C...) Now I have stored a class containing a bunch of data structures (3 dictionaries and two lists of lists all related) in a MySQL table, and amazingly, it was easier than I thought to insert the binary object (longblob). I turned it into a pickle file and INSERTED it.
However, I can't figure out how to reconstitute the pickle and rebuild the class full of data from it now. The database returns a giant string that looks like the pickle, but how to you make a string into a file-like object so that pickle.load(data) will work?
Alternative solutions: How to save the class as a BLOB in database, or some ideas on why I can save a pickle of this class but when I go to load it later, the class seems to be lost. But in SSH / locally, it works - only when calling pickle.load(xxx) from cherrypy do I get errors.
I'm up for plan D - if there's a better way to store a collection of structured data for fast retrieval without pickles or MYSQL blobs...
You can create a file-like in-memory object with (c)StringIO:
>>> from cStringIO import StringIO
>>> fobj = StringIO('file\ncontent')
>>> for line in fobj:
... print line
...
file
content
But for pickle usage you can directly load and dump to a string (have a look at the s in the function names):
>>> import pickle
>>> obj = 1
>>> serialized = pickle.dumps(obj)
>>> serialized
'I1\n.'
>>> pickle.loads(serialized)
1
But for structured data stored in a database, I would suggest in general that you either use
a table, preferable with an ORM like sqlalchemy so it is directly mapped to a class or
a dictionary, which could be easily (de)serialized with JSON
and not using pickle at all.
I struggled with this myself.
Convert to bytes using the UTF-8 charset and try to load the data in your object.
CurrentShoppingCart.SetCartItems(pickle.loads(bytes(DBCart[0]['Cart'], 'UTF-8')))
Andrew