I have a class with string data, and I'm supposed to calculate hash of the whole object using hashlib.sha256() .
I was not directly able to get hash with
block c for example
Hash = hashlib.sha256(c.encode()).digest()
I want to calculate hash of the whole object,I was suggested to have a function in the class such that it returns hash of data inside it . Is it same as has of whole block ? What is better implementation?
You need to implement the magic method __hash__ for your class. You may then use an instance of your class, for example, as a key of a dictionary. Also, if you just need to get a hash, you can simply use the built-in function hash():
c = MyClass()
c_hash = hash(c)
The current answer let's it appear as if there was a need to define __hash__. As a partly answer, I hence clarify that this is logically false.
One should be able to implement sha256 or the python interpreter such that of a given class object the full hash is taken.
Reason: With pickle dumps it is possible to save an object uniquely to file. The data footprint of that file is digestable; thus it is a mere insufficiency of the language and should be fixable through some boilerplate code.
Related
Suppose there is an Object with id among other attributes.
Suppose there is a generic function that would only need object.id for process (ex: retrieving object value from database from id)
Does it cost more in Python 3 (memory-wise or performance) to do:
def my_function(self, id):
do_stuff_with(id)
my_function(my_object.id)
or
def my_function(self, obj):
do_struff_with(obj.id)
my_function(my_object)
I imagine that answer depends on languages implementations, so I was wondering if there is an additional cost with object as argument in Python 3, since I feel it's more flexible (imagine that in the next implementation of this function you also need object.name, then you don't have to modify the signature)
In both cases, an object reference is passed to the function, I don't know how other implementations do this but, think C pointers to Python Objects for CPython (so same memory to make that call).
Also, the dot look-up in this case doesn't make a difference, you either perform a look-up for the id during the function call or you do it inside the body (and, really, attribute look-ups shouldn't be a concern).
In the end, it boils down to what makes more sense, passing the object and letting the function do what it wants with it is better conceptually. It's also is more flexible, as you stated, so pass the class instance.
def funcA(x):
return x
Is funcA.__code__.__hash__() a suitable way to check whether funcA has changed?
I know that funcA.__hash__() won't work as it the same as id(funcA) / 16. I checked and this isn't true for __code__.__hash__(). I also tested the behaviour in a ipython terminal and it seemed to hold. But is this guaranteed to work?
Why
I would like to have a way of comparing an old version of function to a new version of the same function.
I'm trying to create a decorator for disk-based/long-term caching. Thus I need a way to identify if a function has changed. I also need to look at the call graph to check that none of the called functions have changed but that is not part of this question.
Requirements:
Needs to be stable over multiple calls and machines. 1 says that in Python 3.3 hash() is randomized on each start of a new instance. Although it also says that "HASH RANDOMIZATION IS DISABLED BY DEFAULT". Ideally, I'd like a function that does is stable even with randomization enabled.
Ideally, it would yield the same hash for def funcA: pass and def funcB: pass, i.e. when only the name of the function changes. Probably not necessary.
I only care about Python 3.
One alternative would be to hash the text inside the file that contains the given function.
Yes, it seems that func_a.__code__.__hash__() is unique to the specific functionality of the code. I could not find where this is implemented, or where it is __code__.__hash__() defined.
The perfect way would be to use func_a.__code__.co_code.__hash__() because co_code has the byte code as a string. Note that in this case, the function name is not part of the hash and two functions with the same code but names func_a and func_b will have the same hash.
hash(func_a.__code__.co_code)
Source.
I have the following in a Python script:
setattr(stringRESULTS, "b", b)
Which gives me the following error:
AttributeError: 'str' object has no attribute 'b'
Can any-one telling me what the problem is here?
Don't do this. To quote the inestimable Greg Hewgill,
"If you ever find yourself using quoted names to refer to variables,
there's usually a better way to do whatever you're trying to do."
[Here you're one level up and using a string variable for the name, but it's the same underlying issue.] Or as S. Lott followed up with in the same thread:
"90% of the time, you should be using a dictionary. The other 10% of
the time, you need to stop what you're doing entirely."
If you're using the contents of stringRESULTS as a pointer to some object fred which you want to setattr, then these objects you want to target must already exist somewhere, and a dictionary is the natural data structure to store them. In fact, depending on your use case, you might be able to use dictionary key/value pairs instead of attributes in the first place.
IOW, my version of what (I'm guessing) you're trying to do would probably look like
d[stringRESULTS].b = b
or
d[stringRESULTS]["b"] = b
depending on whether I wanted/needed to work with an object instance or a dictionary would suffice.
(P.S. relatively few people subscribe to the python-3.x tag. You'll usually get more attention by adding the bare 'python' tag as well.)
Since str is a low-level primitive type, you can't really set any arbitrary attribute on it. You probably need either a dict or a subclass of str:
class StringResult(str):
pass
which should behave as you expect:
my_string_result = StringResult("spam_and_eggs")
my_string_result.b = b
EDIT:
If you're trying to do what DSM suggests, ie. modify a property on a variable that has the same name as the value of the stringRESULTS variable then this should do the trick:
locals()[stringRESULTS].b = b
Please note that this is an extremely dangerous operation and can wreak all kinds of havoc on your app if you aren't careful.
It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.
The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.
In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. As I build this dictionary up, my RAM goes up super high. Is there a way to write the dictionary as it is being built to the harddrive rather than the RAM so that I can save some memory? I've heard of something called "pickle" but I don't know if this is a feasible method for what I am doing.
Thanks for your help!
Maybe you should be using a database, but check out the shelve module
If shelve isn't powerful enough for you, there is always the industrial strength ZODB
shelve, as #gnibbler recommends, is what I would no doubt be using, but watch out for two traps: a simple one (all keys must be strings) and a subtle one (as the values don't normally exist in memory, calling mutators on them may not work as you expect).
For the simple problem, it's normally easy to find a workaround (and you do get a clear exception if you forget and try e.g. using an int or whatever as the key, so it's not hard t remember that you do need a workaround either).
For the subtle problem, consider for example:
x = d['foo']
x.amutatingmethod()
...much later...
y = d['foo']
# is y "mutated" or not now?
the answer to the question in the last comment depends on whether d is a real dict (in which case y will be mutated, and in fact exactly the same object as x) or a shelf (in which case y will be a distinct object from x, and in exactly the state you last saved to d['foo']!).
To get your mutations to persist, you need to "save them to disk" by doing
d['foo'] = x
after calling any mutators you want on x (so in particular you cannot just do
d['foo'].mutator()
and expect the mutation to "stick", as you would if d were a dict).
shelve does have an option to cache all fetched items in memory, but of course that can fill up the memory again, and result in long delays when you finally close the shelf object (since all the cached items must be saved back to disk then, just in case they had been mutated). That option was something I originally pushed for (as a Python core committer), but I've since changed my mind and I now apologize for getting it in (ah well, at least it's not the default!-), since the cases it should be used in are rare, and it can often trap the unwary user... sorry.
BTW, in case you don't know what a mutator, or "mutating method", is, it's any method that alters the state of the object you call it on -- e.g. .append if the object is a list, .pop if the object is any kind of container, and so on. No need to worry if the object is immutable, of course (numbers, strings, tuples, frozensets, ...), since it doesn't have mutating methods in that case;-).
Pickling an entire hash over and over again is bound to run into the same memory pressures that you're facing now -- maybe even worse, with all the data marshaling back and forth.
Instead, using an on-disk database that acts like a hash is probably the best bet; see this page for a quick introduction to using dbm-style databases in your program: http://docs.python.org/library/dbm
They act enough like hashes that it should be a simple transition for you.
"""I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings""" ... I presume that you mean: """I have a class with about 1000 attributes all of type str or list of str. I have a dictionary mapping about 6000 keys of unspecified type to corresponding instances of that class.""" If that's not a reasonable translation, please correct it.
For a start, 1000 attributes in a class is mindboggling. You must be treating the vast majority generically using value = getattr(obj, attr_name) and setattr(obj, attr_name, value). Consider using a dict instead of an instance: value = obj[attr_name] and obj[attr_name] = value.
Secondly, what percentage of those 6 million attributes are ""? If sufficiently high, you might like to consider implementing a sparse dict which doesn't physically have entries for those attributes, using the __missing__ hook -- docs here.