Why isn't the hash function deterministic?

Why isn't the hash function deterministic? - python

I'm developing a program using Python 3.6
I have a problem: if I use the deterministic hash function (from standard library of the language) on the same object, the string that results in output (after a run), is different for some runs!
For example:
class Generic:
def __init__(self, id, name, property):
self.id = id
self.name = name
self.property = property
def main():
my_object = Generic(3,'ddkdjsdk','casualstring')
print(hash(my_object))
I would like the output to always be the same (deterministic), but unfortunately different strings appear on the console:
8765256330262, -9223363264515786864, -9223363262437648366 and others...
Why this happens? I would like to guarantee the determinism with this function throughout my application! How do I solve the problem?

In this case it's probably easiest to define your own __eq__ function and __hash__ function. This will return the same hash every time for you:
class Generic:
def __init__(self, id, name, property):
self.id=id
self.name = name
self.property = property
def __eq__(self, other):
assert self.__class__ == other.__class__, "Types do not match"
return self.id == other.id and self.name == other.name and self.property == other.property
def __hash__(self):
return hash ( (self.id, self.name, self.property) )
This will also make hashes of equivalent objects equal, as well:
>>>obj = Generic(1, 'blah', 'blah')
>>>obj2 = Generic(1, 'blah', 'blah')
>>>obj == obj2
True
>>>hash(obj) == hash(obj2)
True
hope that helps!

For those looking to get hashes of built-in types, Python's built in hashlib might be easier than subclassing to redefine __hash__. Here's an example with for string.
from hashlib import md5
def string_hash(string):
return md5(string.encode()).hexdigest()
This will return the same hash for different string objects so long as the content is the same. Not all objects will work, but it could you save you time depending on your use case.

Related

typecast classes in python: how?

Here, I am attempting to mock up a social media profile as a class "Profile", in which you have name, a group of friends, and the ability to add and remove friends. There is a method that I would like to make, that when invoked, will print the list of friends in alphabetical order.
The issue: I get a warning that I cannot sort an unsortable type. Python is seeing my instance variable as a "Profile object", rather than a list that I can sort and print.
Here is my code:
class Profile(object):
"""
Represent a person's social profile
Argument:
name (string): a person's name - assumed to uniquely identify a person
Attributes:
name (string): a person's name - assumed to uniquely identify a person
statuses (list): a list containing a person's statuses - initialized to []
friends (set): set of friends for the given person.
it is the set of profile objects representing these friends.
"""
def __init__(self, name):
self.name = name
self.friends = set()
self.statuses = []
def __str__(self):
return self.name + " is " + self.get_last_status()
def update_status(self, status):
self.statuses.append(status)
return self
def get_last_status(self):
if len(self.statuses) == 0:
return "None"
else:
return self.statuses[-1]
def add_friend(self, friend_profile):
self.friends.add(friend_profile)
friend_profile.friends.add(self)
return self
def get_friends(self):
if len(self.friends) == 0:
return "None"
else:
friends_lst = list(self.friends)
return sorted(friends_lst)
After I fill out a list of friends (from a test module) and invoke the get_friends method, python tells me:
File "/home/tjm/Documents/CS021/social.py", line 84, in get_friends
return sorted(friends_lst)
TypeError: unorderable types: Profile() < Profile()
Why can't I simply typecast the object to get it in list form? What should I be doing instead so that get_friends will return an alphabetically sorted list of friends?

Sorting algorithms look for the existence of __eq__, __ne__, __lt__, __le__, __gt__,__ge__ methods in the class definition to compare instances created from them. You need to override those methods in order to tweak their behaviors.
For performance reasons, I'd recommend you to define some integer property for your class like id and use it for comparing instead of name which has string comparison overhead.
class Profile(object):
def __eq__(self, profile):
return self.id == profile.id # I made it up the id property.
def __lt__(self, profile):
return self.id < profile.id
def __hash__(self):
return hash(self.id)
...
Alternatively, you can pass a key function to sort algorithm if you don't want to bother yourself overriding those methods:
>>> friend_list = [<Profile: id=120>, <Profile: id=121>, <Profile: id=115>]
>>> friend_list.sort(key=lambda p: p.id, reverse=True)
Using operator.attrgetter;
>>> import operator
>>> new_friend_list = sorted(friend_list, key=operator.attrgetter('id'))

I think i'll take a crack at this. first, here's teh codes:
from collections import namedtuple
class Profile(namedtuple("Profile", "name")):
def __init__(self, name):
# don't set self.name, it's already set!
self.friends = set({})
self.statuses = list([])
# ... and all the rest the same. Only the base class changes.
what we've done here is to create a class with the shape of a tuple. As such, it's orderable, hashable, and all of the things. You could even drop your __str__() method, namedtuple provides a nice one.

Ruby like DSL in Python

I'm currently writing my first bigger project in Python, and I'm now wondering how to define a class method so that you can execute it in the class body of a subclass of the class.
First to give some more context, a slacked down (I removed everything non essential for this question) example of how I'd do the thing I'm trying to do in Ruby:
If I define a class Item like this:
class Item
def initialize(data={})
#data = data
end
def self.define_field(name)
define_method("#{name}"){ instance_variable_get("#data")[name.to_s] }
define_method("#{name}=") do |value|
instance_variable_get("#data")[name.to_s] = value
end
end
end
I can use it like this:
class MyItem < Item
define_field("name")
end
item = MyItem.new
item.name = "World"
puts "Hello #{item.name}!"
Now so far I tried achieving something similar in Python, but I'm not happy with the result I've got so far:
class ItemField(object):
def __init__(self, name):
self.name = name
def __get__(self, item, owner=None):
return item.values[self.name]
def __set__(self, item, value):
item.values[self.name] = value
def __delete__(self, item):
del item.values[self.name]
class Item(object):
def __init__(self, data=None):
if data == None: data = {}
self.values = data
for field in type(self).fields:
self.values[field.name] = None
setattr(self, field.name, field)
#classmethod
def define_field(cls, name):
if not hasattr(cls, "fields"): cls.fields = []
cls.fields.append(ItemField(name, default))
Now I don't know how I can call define_field from withing a subclass's body. This is what I wished that it was possible:
class MyItem(Item):
define_field("name")
item = MyItem({"name": "World"})
puts "Hello {}!".format(item.name)
item.name = "reader"
puts "Hello {}!".format(item.name)
There's this similar question but none of the answers are really satisfying, somebody recommends caling the function with __func__() but I guess I can't do that, because I can't get a reference to the class from within its anonymous body (please correct me if I'm wrong about this.)
Somebody else pointed out that it's better to use a module level function for doing this which I also think would be the easiest way, however the main intention of me doing this is to make the implementation of subclasses clean and having to load that module function wouldn't be to nice either. (Also I'd have to do the function call outside the class body and I don't know but I think this is messy.)
So basically I think my approach is wrong, because Python wasn't designed to allow this kind of thing to be done. What would be the best way to achieve something as in the Ruby example with Python?
(If there's no better way I've already thought about just having a method in the subclass which returns an array of the parameters for the define_field method.)

Perhaps calling a class method isn't the right route here. I'm not quite up to speed on exactly how and when Python creates classes, but my guess is that the class object doesn't yet exist when you'd call the class method to create an attribute.
It looks like you want to create something like a record. First, note that Python allows you to add attributes to your user-created classes after creation:
class Foo(object):
pass
>>> foo = Foo()
>>> foo.x = 42
>>> foo.x
42
Maybe you want to constrain which attributes the user can set. Here's one way.
class Item(object):
def __init__(self):
if type(self) is Item:
raise NotImplementedError("Item must be subclassed.")
def __setattr__(self, name, value):
if name not in self.fields:
raise AttributeError("Invalid attribute name.")
else:
self.__dict__[name] = value
class MyItem(Item):
fields = ("foo", "bar", "baz")
So that:
>>> m = MyItem()
>>> m.foo = 42 # works
>>> m.bar = "hello" # works
>>> m.test = 12 # raises AttributeError
Lastly, the above allows you the user subclass Item without defining fields, like such:
class MyItem(Item):
pass
This will result in a cryptic attribute error saying that the attribute fields could not be found. You can require that the fields attribute be defined at the time of class creation by using metaclasses. Furthermore, you can abstract away the need for the user to specify the metaclass by inheriting from a superclass that you've written to use the metaclass:
class ItemMetaclass(type):
def __new__(cls, clsname, bases, dct):
if "fields" not in dct:
raise TypeError("Subclass must define 'fields'.")
return type.__new__(cls, clsname, bases, dct)
class Item(object):
__metaclass__ = ItemMetaclass
fields = None
def __init__(self):
if type(self) == Item:
raise NotImplementedError("Must subclass Type.")
def __setattr__(self, name, value):
if name in self.fields:
self.__dict__[name] = value
else:
raise AttributeError("The item has no such attribute.")
class MyItem(Item):
fields = ("one", "two", "three")

You're almost there! If I understand you correctly:
class Item(object):
def __init__(self, data=None):
fields = data or {}
for field, value in data.items():
if hasattr(self, field):
setattr(self, field, value)
#classmethod
def define_field(cls, name):
setattr(cls, name, None)
EDIT: As far as I know, it's not possible to access the class being defined while defining it. You can however call the method on the __init__ method:
class Something(Item):
def __init__(self):
type(self).define_field("name")
But then you're just reinventing the wheel.

When defining a class, you cannot reference the class itself inside its own definition block. So you have to call define_field(...) on MyItem after its definition. E.g.,
class MyItem(Item):
pass
MyItem.define_field("name")
item = MyItem({"name": "World"})
print("Hello {}!".format(item.name))
item.name = "reader"
print("Hello {}!".format(item.name))

Python "callable" attribute (pseudo-property)

In python, I can alter the state of an instance by directly assigning to attributes, or by making method calls which alter the state of the attributes:
foo.thing = 'baz'
or:
foo.thing('baz')
Is there a nice way to create a class which would accept both of the above forms which scales to large numbers of attributes that behave this way? (Shortly, I'll show an example of an implementation that I don't particularly like.) If you're thinking that this is a stupid API, let me know, but perhaps a more concrete example is in order. Say I have a Document class. Document could have an attribute title. However, title may want to have some state as well (font,fontsize,justification,...), but the average user might be happy enough just setting the title to a string and being done with it ...
One way to accomplish this would be to:
class Title(object):
def __init__(self,text,font='times',size=12):
self.text = text
self.font = font
self.size = size
def __call__(self,*text,**kwargs):
if(text):
self.text = text[0]
for k,v in kwargs.items():
setattr(self,k,v)
def __str__(self):
return '<title font={font}, size={size}>{text}</title>'.format(text=self.text,size=self.size,font=self.font)
class Document(object):
_special_attr = set(['title'])
def __setattr__(self,k,v):
if k in self._special_attr and hasattr(self,k):
getattr(self,k)(v)
else:
object.__setattr__(self,k,v)
def __init__(self,text="",title=""):
self.title = Title(title)
self.text = text
def __str__(self):
return str(self.title)+'<body>'+self.text+'</body>'
Now I can use this as follows:
doc = Document()
doc.title = "Hello World"
print (str(doc))
doc.title("Goodbye World",font="Helvetica")
print (str(doc))
This implementation seems a little messy though (with __special_attr). Maybe that's because this is a messed up API. I'm not sure. Is there a better way to do this? Or did I leave the beaten path a little too far on this one?
I realize I could use #property for this as well, but that wouldn't scale well at all if I had more than just one attribute which is to behave this way -- I'd need to write a getter and setter for each, yuck.

It is a bit harder than the previous answers assume.
Any value stored in the descriptor will be shared between all instances, so it is not the right place to store per-instance data.
Also, obj.attrib(...) is performed in two steps:
tmp = obj.attrib
tmp(...)
Python doesn't know in advance that the second step will follow, so you always have to return something that is callable and has a reference to its parent object.
In the following example that reference is implied in the set argument:
class CallableString(str):
def __new__(class_, set, value):
inst = str.__new__(class_, value)
inst._set = set
return inst
def __call__(self, value):
self._set(value)
class A(object):
def __init__(self):
self._attrib = "foo"
def get_attrib(self):
return CallableString(self.set_attrib, self._attrib)
def set_attrib(self, value):
try:
value = value._value
except AttributeError:
pass
self._attrib = value
attrib = property(get_attrib, set_attrib)
a = A()
print a.attrib
a.attrib = "bar"
print a.attrib
a.attrib("baz")
print a.attrib
In short: what you want cannot be done transparently. You'll write better Python code if you don't insist hacking around this limitation

You can avoid having to use #property on potentially hundreds of attributes by simply creating a descriptor class that follows the appropriate rules:
# Warning: Untested code ahead
class DocAttribute(object):
tag_str = "<{tag}{attrs}>{text}</{tag}>"
def __init__(self, tag_name, default_attrs=None):
self._tag_name = tag_name
self._attrs = default_attrs if default_attrs is not None else {}
def __call__(self, *text, **attrs):
self._text = "".join(text)
self._attrs.update(attrs)
return self
def __get__(self, instance, cls):
return self
def __set__(self, instance, value):
self._text = value
def __str__(self):
# Attrs left as an exercise for the reader
return self.tag_str.format(tag=self._tag_name, text=self._text)
Then you can use Document's __setattr__ method to add a descriptor based on this class if it is in a white list of approved names (or not in a black list of forbidden ones, depending on your domain):
class Document(object):
# prelude
def __setattr__(self, name, value):
if self.is_allowed(name): # Again, left as an exercise for the reader
object.__setattr__(self, name, DocAttribute(name)(value))

Can you easily create a list-like object in python that uses something like a descriptor for its items?

I'm trying to write an interface that abstracts another interface somewhat.
The bottom interface is somewhat inconsistent about what it requires: sometimes id's, and sometimes names. I'm trying to hide details like these.
I want to create a list-like object that will allow you to add names to it, but internally store id's associated with those names.
Preferably, I'd like to use something like descriptors for class attributes, except that they work on list items instead. That is, a function (like __get__) is called for everything added to the list to convert it to the id's I want to store internally, and another function (like __set__) to return objects (that provide convenience methods) instead of the actual id's when trying to retrieve items from the list.
So that I can do something like this:
def get_thing_id_from_name(name):
# assume that this is more complicated
return other_api.get_id_from_name_or_whatever(name)
class Thing(object)
def __init__(self, thing_id):
self.id = thing_id
self.name = other_api.get_name_somehow(id)
def __eq__(self, other):
if isinstance(other, basestring):
return self.name == other
if isinstance(other, Thing):
return self.thing_id == other.thing_id
return NotImplemented
tl = ThingList()
tl.append('thing_one')
tl.append('thing_two')
tl[1] = 'thing_three'
print tl[0].id
print tl[0] == 'thing_one'
print tl[1] == Thing(3)
The documentation recommends defining 17 methods (not including a constructor) for an object that acts like a mutable sequence. I don't think subclassing list is going to help me out at all. It feels like I ought to be able to achieve this just defining a getter and setter somewhere.
UserList is apparently depreciated (although is in python3? I'm using 2.7 though).
Is there a way to achieve this, or something similar, without having to redefine so much functionality?

Yo don't need to override all the list methods -- __setitem__, __init__ and \append should be enough - you may want to have insert and some others as well. You could write __setitem__ and __getitem__ to call __set__ and __get__ methods on a sepecial "Thing" class exactly as descriptors do.
Here is a short example - maybe something like what you want:
class Thing(object):
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
class ThingList(list):
def __init__(self, items):
for item in items:
self.append(item)
def append(self, value):
list.append(self, Thing(value))
def __setitem__(self, index, value):
list.__setitem__(self, index, Thing(value))
Example:
>>> a = ThingList(range(3))
>>> a.append("three")
>>> a
[I am a 0, I am a 1, I am a 2, I am a three]
>>> a[0].id
35242896
>>>
-- edit --
The O.P. commented: "I was really hoping that there would be a way to have all the functionality from list - addition, extending, slices etc. and only have to redefine the get/set item behaviour."
So mote it be - one really has to override all relevant methods in this way. But if what we want to avoid is just a lot of boiler plate code with a lot of functions doing almost the same, the new, overriden methods, can be generated dynamically - all we need is a decorator to change ordinary objects into Things for all operations that set values:
class Thing(object):
# Prevents duplicating the wrapping of objects:
def __new__(cls, thing):
if isinstance(thing, cls):
return thing
return object.__new__(cls, thing)
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
def converter(func, cardinality=1):
def new_func(*args):
# Pick the last item in the argument list, which
# for all item setter methods on a list is the one
# which actually contains the values
if cardinality == 1:
args = args[:-1] + (Thing(args[-1] ),)
else:
args = args[:-1] + ([Thing(item) for item in args[-1]],)
return func(*args)
new_func.func_name = func.__name__
return new_func
my_list_dict = {}
for single_setter in ("__setitem__", "append", "insert"):
my_list_dict[single_setter] = converter(getattr(list, single_setter), cardinality=1)
for many_setter in ("__setslice__", "__add__", "__iadd__", "__init__", "extend"):
my_list_dict[many_setter] = converter(getattr(list, many_setter), cardinality="many")
MyList = type("MyList", (list,), my_list_dict)
And it works thus:
>>> a = MyList()
>>> a
[]
>>> a.append(5)
>>> a
[I am a 5]
>>> a + [2,3,4]
[I am a 5, I am a 2, I am a 3, I am a 4]
>>> a.extend(range(4))
>>> a
[I am a 5, I am a 0, I am a 1, I am a 2, I am a 3]
>>> a[1:2] = range(10,12)
>>> a
[I am a 5, I am a 10, I am a 11, I am a 1, I am a 2, I am a 3]
>>>

Namespaces inside class in Python3

I am new to Python and I wonder if there is any way to aggregate methods into 'subspaces'. I mean something similar to this syntax:
smth = Something()
smth.subspace.do_smth()
smth.another_subspace.do_smth_else()
I am writing an API wrapper and I'm going to have a lot of very similar methods (only different URI) so I though it would be good to place them in a few subspaces that refer to the API requests categories. In other words, I want to create namespaces inside a class. I don't know if this is even possible in Python and have know idea what to look for in Google.
I will appreciate any help.

One way to do this is by defining subspace and another_subspace as properties that return objects that provide do_smth and do_smth_else respectively:
class Something:
#property
def subspace(self):
class SubSpaceClass:
def do_smth(other_self):
print('do_smth')
return SubSpaceClass()
#property
def another_subspace(self):
class AnotherSubSpaceClass:
def do_smth_else(other_self):
print('do_smth_else')
return AnotherSubSpaceClass()
Which does what you want:
>>> smth = Something()
>>> smth.subspace.do_smth()
do_smth
>>> smth.another_subspace.do_smth_else()
do_smth_else
Depending on what you intend to use the methods for, you may want to make SubSpaceClass a singleton, but i doubt the performance gain is worth it.

I had this need a couple years ago and came up with this:
class Registry:
"""Namespace within a class."""
def __get__(self, obj, cls=None):
if obj is None:
return self
else:
return InstanceRegistry(self, obj)
def __call__(self, name=None):
def decorator(f):
use_name = name or f.__name__
if hasattr(self, use_name):
raise ValueError("%s is already registered" % use_name)
setattr(self, name or f.__name__, f)
return f
return decorator
class InstanceRegistry:
"""
Helper for accessing a namespace from an instance of the class.
Used internally by :class:`Registry`. Returns a partial that will pass
the instance as the first parameter.
"""
def __init__(self, registry, obj):
self.__registry = registry
self.__obj = obj
def __getattr__(self, attr):
return partial(getattr(self.__registry, attr), self.__obj)
# Usage:
class Something:
subspace = Registry()
another_subspace = Registry()
#MyClass.subspace()
def do_smth(self):
# `self` will be an instance of Something
pass
#MyClass.another_subspace('do_smth_else')
def this_can_be_called_anything_and_take_any_parameter_name(obj, other):
# Call it `obj` or whatever else if `self` outside a class is unsettling
pass
At runtime:
>>> smth = Something()
>>> smth.subspace.do_smth()
>>> smth.another_subspace.do_smth_else('other')
This is compatible with Py2 and Py3. Some performance optimizations are possible in Py3 because __set_name__ tells us what the namespace is called and allows caching the instance registry.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why isn't the hash function deterministic? - python

Related

typecast classes in python: how?

Ruby like DSL in Python

Python "callable" attribute (pseudo-property)

Can you easily create a list-like object in python that uses something like a descriptor for its items?

Namespaces inside class in Python3

Categories

Resources