Python dictionary with list keyword - python

I'm coding a N'th order markov chain.
It goes something like this:
class Chain:
def __init__(self, order):
self.order = order
self.state_table = {}
def train(self, next_state, *prev_states):
if len(prev_states) != self.order: raise ValueError("prev_states does not match chain order")
if prev_states in self.state_table:
if next_state in self.state_table[prev_states]:
self.state_table[prev_states][next_state] += 1
else:
self.state_table[prev_states][next_state] = 0
else:
self.state_table[prev_states] = {next_state: 0}
Unfortunally, list and tuples are unhashable, and I cannot use them as keywords in dicts...
I have hopefully explained my problem well enough for you to understand what I try to achieve.
Any good ideas how I can use multiple values for dictionary keyword?
Followup question:
I did not know that tuples are hashable.
But the entropy for the hashes seem low. Are there hash collisions possible for tuples?!

Tuples are hashable when their contents are.
>>> a = {}
>>> a[(1,2)] = 'foo'
>>> a[(1,[])]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
As for collisions, when I try a bunch of very similar tuples, I see them being mapped widely apart:
>>> hash((1,2))
3713081631934410656
>>> hash((1,3))
3713081631933328131
>>> hash((2,2))
3713082714462658231
>>> abs(hash((1,2)) - hash((1,3)))
1082525
>>> abs(hash((1,2)) - hash((2,2)))
1082528247575

You can use tuples as dictionary keys, they are hashable as long as their content is hashable (as #larsman said).
Don't worry about collisions, Python's dict takes care of it.
>>> hash('a')
12416037344
>>> hash(12416037344)
12416037344
>>> hash('a') == hash(12416037344)
True
>>> {'a': 'one', 12416037344: 'two'}
{'a': 'one', 12416037344: 'two'}
In this example I took a string and an integer. But it works the same with tuples. Just didn't have any idea how to find two tuples with identical hashes.

Related

What happens when you set a dictionary in python?

I was trying out some ideas to solve an unrelated problem when I came across this behavior when using set on a dictionary:
a = {"a": 1}
b = {"b": 2}
c = {"a": 1}
set([a, b, c])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-7c1da7b47bae> in <module>
----> 1 set([a, b, c])
TypeError: unhashable type: 'dict'
d = {"a": 1, "b": 2}
set(d)
Out[23]: {'a', 'b'}
set(a)
Out[24]: {'a'}
I kind of understand why the set of dictionaries is unhashable (they're mutable), but the whole thing does not make that much sense to me. Why is set(a) and set(d) returning just the keys and how would that be useful?
Thank you!
set converts an arbitrary iterable to a set, which means getting an iterator for its argument. The iterator for a dict returns its keys. It's not so much about set being useful with a dict, but set not caring what its argument is.
As for why a dict iterator returns the keys of the dict, it's a somewhat arbitrary choice made by the language designer, but keep in mind that given the choice of iterating over the keys, the values, or the key-value pairs, iterating over the keys is probably the best compromise between usefulness and simplicity. (All three are available explicitly via d.keys(), d.values(), and d.items(); in some sense iter(d) is a convenience for the common use case of d.keys().)
The function set(x) takes any iterable x and puts all the iterated values into a set.
Dictionaries are iterable. When you iterate through a dictionary, you are given the keys. So set(d) will give you a new set containing all the keys of the dict d.

Hash function for collection of items that disregards ordering

I am using a the hash() function to get the hash value of my object which contains two integers and two Strings. Moreover, I have a dictionary where I store these objects; the process is that I check if the object exists with the hash value, if yes I update if not I insert the new one.
The thing is that when creating the objects, I do not know the order of the object variables and I want to treat the objects as same no matter the order of these variables.
Is there an alternative function to the hash() function that does not consider the order of the variables?
#Consequently what I want is:
hash((int1,str1,int2,str2)) == hash((int2,str2,int1,str1))
You could use a frozenset instead of a tuple:
>>> hash(frozenset([1, 2, 'a', 'b']))
1190978740469805404
>>>
>>> hash(frozenset([1, 'a', 2, 'b']))
1190978740469805404
>>>
>>> hash(frozenset(['a', 2, 'b', 1]))
1190978740469805404
However, the removal of duplicates from the iterable presents a subtle problem:
>>> hash(frozenset([1,2,1])) == hash(frozenset([1,2,2]))
True
You can fix this by creating a counter from the iterable using collections.Counter, and calling frozenset on the counter's items, thus preserving the count of each item from the original iterable:
>>> from collections import Counter
>>>
>>> hash(frozenset(Counter([1,2,1]).items()))
-307001354391131208
>>> hash(frozenset(Counter([1,1,2]).items()))
-307001354391131208
>>>
>>> hash(frozenset(Counter([1,2,1]).items())) == hash(frozenset(Counter([1,2,2]).items()))
False
Usually for things like this it helps immeasurably if you post some sample code, but I'll assume you've got something like this:
class Foo():
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash((self.x, self.y))
You're taking a hash of a tuple there, which does care about order. If you want your hash to not care about the order of the ints, then just use a frozenset:
def __hash__(self):
return hash(frozenset([self.x, self.y]))
If the range of the values is not too great you could add them together, that way the order can be disregarded, however it does increase the possibility for 2 hashes to have the same value:
def hash_list(items):
value = 0
for item in items:
value+= hash(item)
return value
hash_list(['a', 'b', 'c'])
>>> 8409777985338339540
hash_list(['b', 'a', 'c'])
>>> 8409777985338339540

Hash sum of an object in python

In my script I work with large and complex object (a multi-dimentional list that contains strings, dictionaries, and class objects of custom types). I need to copy, pickle (cache) and unpickle it, as well as send between child processes through MPI interface. At some points I get suspicious that the data transfer is error-free, i.e. if in the end I have the same object.
Therefore, I want to calculate its hash sum or some other type of fingerprint. I know that there is, for example, hashlib library; however, it is limited in terms of object type:
>>> import hashlib
>>> a = "123"
>>> hashlib.sha224(a.encode()).hexdigest()
'78d8045d684abd2eece923758f3cd781489df3a48e1278982466017f'
>>> a = [1, 2, 3]
>>> hashlib.sha224(a).hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object supporting the buffer API required
Thus, the question: is there some analog of this function that works with objects of any type?
One option would be to recursively convert all elements of the structure into hashable counterparts, i.e. lists into tuples, dicts and objects into frozensets, and then simply apply hash() to the whole thing. An illustration:
def to_hashable(s):
if isinstance(s, dict):
return frozenset((x, to_hashable(y)) for x, y in s.items())
if isinstance(s, list):
return tuple(to_hashable(x) for x in s)
if isinstance(s, set):
return frozenset(s)
if isinstance(s, MyObject):
d = {'__class__': s.__class__.__name__}
d.update(s.__dict__)
return to_hashable(d)
return s
class MyObject:
pass
class X(MyObject):
def __init__(self, zzz):
self.zzz = zzz
my_list = [
1,
{'a': [1,2,3], 'b': [4,5,6]},
{1,2,3,4,5},
X({1:2,3:4}),
X({5:6,7:8})
]
print hash(to_hashable(my_list))
my_list2 = [
1,
{'b': [4,5,6], 'a': [1,2,3]},
{5,4,3,2,1},
X({3:4,1:2}),
X({7:8,5:6})
]
print hash(to_hashable(my_list2)) # the same as above
pickle.dumps(...)
returns a string, which is a hashable object. You can do it as follows
import pickle
a=[1,2,3,4]
h=pickle.dumps(a)
print hash(h)
# or like this
from hashlib import sha512
print sha512(h).hexdigest()
c=pickle.loads(h)
assert c==a

Add set to set and make nested sets

In Python I want to make sets consisting of sets, so I get a set of sets (nested sets).
Example:
{{1,2}, {2,3}, {4,5}}
However when I try the following:
s = set()
s.add(set((1,2)))
I get an error:
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
s.add(set((1,2)))
TypeError: unhashable type: 'set'
Can anyone tell me where my mistake is and how I achieve my goal please?
Your issue is that sets can only contain hashable objects, and a set is not hashable.
You should use the frozenset type, which is hashable, for the elements of the outer set.
In [3]: s = set([frozenset([1,2]), frozenset([3,4])])
In [4]: s
Out[4]: {frozenset({1, 2}), frozenset({3, 4})}
You cannot have a set of sets because sets are unhashable objects; they can be mutated by adding or removing items from them.
You will need to use a set of frozensets instead:
s = set()
s.add(frozenset((1,2)))
Demo:
>>> s = set()
>>> s.add(frozenset((1,2)))
>>> s.add(frozenset((2,3)))
>>> s.add(frozenset((4,5)))
>>> s
{frozenset({1, 2}), frozenset({2, 3}), frozenset({4, 5})}
>>>
Frozensets are like normal sets in every respect except that they cannot be mutated. This feature makes them hashable and allows you to use them as items of a set or keys of a dictionary.

Python: How to compare two lists of dictionaries

Folks,
Relative n00b to python, trying to find out the diff of two lists of dictionaries.
If these were just regular lists, I could create sets and then do a '-'/intersect operation.
However, set operation does not work on lists of dictionaries:
>>> l = []
>>> pool1 = {}
>>> l.append(pool1)
>>> s = set(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
You need a "hashable" dictionary.
The items() attribute is a list of tuples. Make this a tuple() and you have a hashable version of a dictionary.
tuple( sorted( some_dict.items() ) )
You can define your own dict wrapper that defines __hash__ method:
class HashableDict(dict):
def __hash__(self):
return hash(tuple(sorted(self.items())))
this wrapper is safe as long as you do not modify the dictionary while finding the intersection.
Python won't allow you to use a dictionary as a key in either a set or dictionary because it has no default __hash__ method defined. Unfortunately, collections.OrderedDict is also not hashable. There also isn't a built-in dictionary analogue to frozenset. You can either create a subclass of dict with your own hash method, or do something like this:
>>> def dict_item_set(dict_list):
... return set(tuple(*sorted(d.items())) for d in dict_list)
>>> a = [{1:2}, {3:4}]
>>> b = [{3:4}, {5:6}]
>>> dict(dict_item_set(a) - dict_item_set(b))
{1: 2}
>>> dict(dict_item_set(a) & dict_item_set(b))
{3: 4}
Of course, this is neither efficient nor pretty.

Categories

Resources