Restore objects from pickled file by name in python - python

Using Python 2.7.
Is there a way to restore only specified objects from a pickle file?
using the same example as a previous post:
import pickle
# obj0, obj1, obj2 are created here...
# Saving the objects:
with open('objs.pickle', 'w') as f:
pickle.dump([obj0, obj1, obj2], f)
Now, I would like to only restore, say, obj1
I am doing the following:
with open('objs.pickle', 'r') as f:
obj1=pickle.load(f)[1]
but let's say I don't know the order of objects, just it's name.
Writing this, I am guessing the names get dropped during pickling?

Instead of storing the objects in a list, you could use a dictionary to provide names for each object:
import pickle
s = pickle.dumps({'obj0': obj0, 'obj1': obj1, 'obj2': obj2})
obj1 = pickle.loads(s)['obj1']
The order of the items no longer matters, in fact there is no order because a dictionary is being restored.
I'm not 100% sure that this is what you wanted. Were you hoping to restore the object of interest only, i.e. without parsing and restoring the other objects? I don't think that can be done with pickles without writing your own parser, or a fair degree of hacking.

No. Python objects do not have "names" (besides some exceptions like functions and classes that know their declared names), the names are just pointing to the object, and an object does not know its name even at runtime, and thus cannot be persisted in a pickle either.
Perhaps you need a dictionary instead.

Related

Calculate hash of object , python

I have a class with string data, and I'm supposed to calculate hash of the whole object using hashlib.sha256() .
I was not directly able to get hash with
block c for example
Hash = hashlib.sha256(c.encode()).digest()
I want to calculate hash of the whole object,I was suggested to have a function in the class such that it returns hash of data inside it . Is it same as has of whole block ? What is better implementation?
You need to implement the magic method __hash__ for your class. You may then use an instance of your class, for example, as a key of a dictionary. Also, if you just need to get a hash, you can simply use the built-in function hash():
c = MyClass()
c_hash = hash(c)
The current answer let's it appear as if there was a need to define __hash__. As a partly answer, I hence clarify that this is logically false.
One should be able to implement sha256 or the python interpreter such that of a given class object the full hash is taken.
Reason: With pickle dumps it is possible to save an object uniquely to file. The data footprint of that file is digestable; thus it is a mere insufficiency of the language and should be fixable through some boilerplate code.

Named Stuctured Data storage in Python

I am writing an application that calculates various Pandas dataframes over various time periods. Each of these Dataframes have additional data that need to be stored with them.
I can quite easily define a structure using lists or dicts to carry the data, but it would be nice if it is nicely structured.
I have looked at (tried namedtuples). This is great as it simplifies the syntax when accessing the information a lot. Problems with tuples are of course that they are immutable.
Have gotten around this by either doing all the calcs ahead of time and living with not being able to change them (without jumping through few hoops) or by following code:
from collections import namedtuple
m = namedtuple("Month", 'df StartDate EndDate DaysInMonth
m.Month = 2
m.df = pandas.DataFrame()
etc....
this seems to work, but I am actually misusing the named tuple class. m in the above code is actually a "type" not an instance. Although it is working and I can now assign to it I am probably going to run into some problems later on.
type(m)
>>> type
Any suggestions on whether I could carry on with this structure or if i should rather create my own class for the data structure?
What you're doing setting m.Month to 2 is using something all classes can do because they walk and talk like dictionaries.
class Month():
pass
a = Month()
a.df = 2
This works without doing anything special. If you look inside a's _dict_ attribute
print(a.__dict__)
You'll see something like the following
{'__module__': '__main__', '__doc__': None, 'df': 2}
I would probably use the empty class instead of the namedtuple if you want to change the values at a later time. All the namedtuple machinery in the background get you nothing for your use case.

Persistent references in Python

I would like my program to store datas for later uses. Until now, not any problem: there is much ways of doing this in Python.
Things get a little more complicated because I want to keep references between instances. If a list X is a list Y (they have the same ID, modify one is modify the other), it should be true the next time I load the datas (another session of the program which has stopped in the meantime).
I know a solution : the pickle module keeps tracks of references and will remember that my X and Y lists are exactly the same (not only their contents, but their references).
Still, the problem using pickle is that it works if you dump every data in a single file. Which is not really clever if you have a large amount of data.
Do you know another way to handle this problem?
The simplest thing to do is probably to wrap up all your state you wish to save in a dictionary (keyed by variable name, perhaps, or some other unique but predictable identifier), then pickle and unpickle that dictionary. The objects within the dictionary will share references between one another like you want:
>>> class X(object):
... # just some object to be pickled
... pass
...
>>> l1 = [X(), X(), X()]
>>> l2 = [l1[0], X(), l1[2]]
>>> state = {'l1': l1, 'l2': l2}
>>> saved = pickle.dumps(state)
>>> restored = pickle.loads(saved)
>>> restored['l1'][0] is restored['l2'][0]
True
>>> restored['l1'][1] is restored['l2'][1]
False
I would recommand using shelve over pickle. It has higher level functionnality, and is simpler to use.
http://docs.python.org/library/shelve.html
If you have performance issue because you manipulate very large amount of data, you may try other librairies like pyTables:
http://www.pytables.org/moin
ZODB is developed to save persistent python objects and all references. Just inherit your class from Persistent and have a fun. http://www.zodb.org/

What is the Python equivalent of Ruby's "inspect"?

I just want to quickly see the properties and values of an object in Python, how do I do that in the terminal on a mac (very basic stuff, never used python)?
Specifically, I want to see what message.attachments are in this Google App Engine MailHandler example (images, videos, docs, etc.).
If you want to dump the entire object, you can use the pprint module to get a pretty-printed version of it.
from pprint import pprint
pprint(my_object)
# If there are many levels of recursion, and you don't want to see them all
# you can use the depth parameter to limit how many levels it goes down
pprint(my_object, depth=2)
Edit: I may have misread what you meant by 'object' - if you're wanting to look at class instances, as opposed to basic data structures like dicts, you may want to look at the inspect module instead.
use the getmembers attribute of the inspect module
It will return a list of (key, value) tuples. It gets the value from obj.__dict__ if available and uses getattr if the the there is no corresponding entry in obj.__dict__. It can save you from writing a few lines of code for this purpose.
Update
There are better ways to do this than dir. See other answers.
Original Answer
Use the built in function dir(fp) to see the attributes of fp.
I'm surprised no one else has mentioned Python's __str__ method, which provides a string representation of an object. Unfortunately, it doesn't seem to print automatically in pdb.
One can also use __repr__ for that, but __repr__ has other requirements: for one thing, you are (at least in theory) supposed to be able to eval() the output of __repr__, though that requirement seems to be enforced only rarely.
Try
repr(obj) # returns a printable representation of the given object
or
dir(obj) # the list of object methods
or
obj.__dict__ # object variables
Or unify Abrer and Mazur answers and get:
from pprint import pprint
pprint(my_object.__dict__ )

Accessing Dictionaries VS Accessing Shelves

Currently, I have a dictionary that has a number as the key and a Class as a value. I can access the attributes of that Class like so:
dictionary[str(instantiated_class_id_number)].attribute1
Due to memory issues, I want to use the shelve module. I am wondering if doing so is plausible. Does a shelve dictionary act the exact same as a standard dictionary? If not, how does it differ?
Shelve doesn't act extactly the same as dictionary, notably when modifying objects that are already in the dictionary.
The difference is that when you add a class to a dictionary a reference is stored, but shelve keeps a pickled (serialized) copy of the object. If you then modify the object you will
modify the in-memory copy, but not the pickled version. That can be handled (mostly) transparently by shelf.sync() and shelf.close(),
which write out entries. Making all that work does require tracking all retrieved objects which haven't been written back yet so you do have to call shelf.sync() to clear the cache.
The problem with shelf.sync() clearing the cache is that you can keep a reference to the object and modify it again.
This code doesn't work as expected with a shelf, but will work with a dictionary:
s["foo"] = MyClass()
s["foo"].X = 8
p = s["foo"] # store a reference to the object
p.X = 9 # update the reference
s.sync() # flushes the cache
p.X = 0
print "value in memory: %d" % p.X # prints 0
print "value in shelf: %d" % s["foo"].X # prints 9
Sync flushes the cache so the modified 'p' object is lost from the cache so it isn't written back.
Yes, it is plausible:
Shelf objects support all methods supported by dictionaries. This eases the transition from dictionary based scripts to those requiring persistent storage.
You need to call shelf.sync() every so often to clear the cache.
EDIT
Take care, it's not exactly a dict. See e.g. Laurion's answer.
Oh, and you can only have str keys.

Categories

Resources