Python: How to compare two lists of dictionaries - python

Folks,
Relative n00b to python, trying to find out the diff of two lists of dictionaries.
If these were just regular lists, I could create sets and then do a '-'/intersect operation.
However, set operation does not work on lists of dictionaries:
>>> l = []
>>> pool1 = {}
>>> l.append(pool1)
>>> s = set(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

You need a "hashable" dictionary.
The items() attribute is a list of tuples. Make this a tuple() and you have a hashable version of a dictionary.
tuple( sorted( some_dict.items() ) )

You can define your own dict wrapper that defines __hash__ method:
class HashableDict(dict):
def __hash__(self):
return hash(tuple(sorted(self.items())))
this wrapper is safe as long as you do not modify the dictionary while finding the intersection.

Python won't allow you to use a dictionary as a key in either a set or dictionary because it has no default __hash__ method defined. Unfortunately, collections.OrderedDict is also not hashable. There also isn't a built-in dictionary analogue to frozenset. You can either create a subclass of dict with your own hash method, or do something like this:
>>> def dict_item_set(dict_list):
... return set(tuple(*sorted(d.items())) for d in dict_list)
>>> a = [{1:2}, {3:4}]
>>> b = [{3:4}, {5:6}]
>>> dict(dict_item_set(a) - dict_item_set(b))
{1: 2}
>>> dict(dict_item_set(a) & dict_item_set(b))
{3: 4}
Of course, this is neither efficient nor pretty.

Related

What happens when you set a dictionary in python?

I was trying out some ideas to solve an unrelated problem when I came across this behavior when using set on a dictionary:
a = {"a": 1}
b = {"b": 2}
c = {"a": 1}
set([a, b, c])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-7c1da7b47bae> in <module>
----> 1 set([a, b, c])
TypeError: unhashable type: 'dict'
d = {"a": 1, "b": 2}
set(d)
Out[23]: {'a', 'b'}
set(a)
Out[24]: {'a'}
I kind of understand why the set of dictionaries is unhashable (they're mutable), but the whole thing does not make that much sense to me. Why is set(a) and set(d) returning just the keys and how would that be useful?
Thank you!
set converts an arbitrary iterable to a set, which means getting an iterator for its argument. The iterator for a dict returns its keys. It's not so much about set being useful with a dict, but set not caring what its argument is.
As for why a dict iterator returns the keys of the dict, it's a somewhat arbitrary choice made by the language designer, but keep in mind that given the choice of iterating over the keys, the values, or the key-value pairs, iterating over the keys is probably the best compromise between usefulness and simplicity. (All three are available explicitly via d.keys(), d.values(), and d.items(); in some sense iter(d) is a convenience for the common use case of d.keys().)
The function set(x) takes any iterable x and puts all the iterated values into a set.
Dictionaries are iterable. When you iterate through a dictionary, you are given the keys. So set(d) will give you a new set containing all the keys of the dict d.

Create a set from a list using {}

Sometimes I have a list and I want to do some set actions with it. What I do is to write things like:
>>> mylist = [1,2,3]
>>> myset = set(mylist)
{1, 2, 3}
Today I discovered that from Python 2.7 you can also define a set by directly saying {1,2,3}, and it appears to be an equivalent way to define it.
Then, I wondered if I can use this syntax to create a set from a given list.
{list} fails because it tries to create a set with just one element, the list. And lists are unhashable.
>>> mylist = [1,2,3]
>>> {mylist}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
So, I wonder: is there any way to create a set out of a list using the {} syntax instead of set()?
Basically they are not equivalent (expression vs function). The main purpose of adding {} to python was because of set comprehension (like list comprehension) which you can also create a set using it by passing some hashable objects.
So if you want to create a set using {} from an iterable you can use a set comprehension like following:
{item for item in iterable}
Also note that empty braces represent a dictionary in python not a set. So if you want to just create an empty set the proper way is using set() function.
I asked a related question recently: Python Set: why is my_set = {*my_list} invalid?. My question contains your answer if you are using Python 3.5
>>> my_list = [1,2,3,4,5]
>>> my_set = {*my_list}
>>> my_set
{1, 2, 3, 4, 5}
It won't work on Python 2 (that was my question)
You can use
>>> ls = [1,2,3]
>>> {i for i in ls}
{1,2,3}

Hash sum of an object in python

In my script I work with large and complex object (a multi-dimentional list that contains strings, dictionaries, and class objects of custom types). I need to copy, pickle (cache) and unpickle it, as well as send between child processes through MPI interface. At some points I get suspicious that the data transfer is error-free, i.e. if in the end I have the same object.
Therefore, I want to calculate its hash sum or some other type of fingerprint. I know that there is, for example, hashlib library; however, it is limited in terms of object type:
>>> import hashlib
>>> a = "123"
>>> hashlib.sha224(a.encode()).hexdigest()
'78d8045d684abd2eece923758f3cd781489df3a48e1278982466017f'
>>> a = [1, 2, 3]
>>> hashlib.sha224(a).hexdigest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object supporting the buffer API required
Thus, the question: is there some analog of this function that works with objects of any type?
One option would be to recursively convert all elements of the structure into hashable counterparts, i.e. lists into tuples, dicts and objects into frozensets, and then simply apply hash() to the whole thing. An illustration:
def to_hashable(s):
if isinstance(s, dict):
return frozenset((x, to_hashable(y)) for x, y in s.items())
if isinstance(s, list):
return tuple(to_hashable(x) for x in s)
if isinstance(s, set):
return frozenset(s)
if isinstance(s, MyObject):
d = {'__class__': s.__class__.__name__}
d.update(s.__dict__)
return to_hashable(d)
return s
class MyObject:
pass
class X(MyObject):
def __init__(self, zzz):
self.zzz = zzz
my_list = [
1,
{'a': [1,2,3], 'b': [4,5,6]},
{1,2,3,4,5},
X({1:2,3:4}),
X({5:6,7:8})
]
print hash(to_hashable(my_list))
my_list2 = [
1,
{'b': [4,5,6], 'a': [1,2,3]},
{5,4,3,2,1},
X({3:4,1:2}),
X({7:8,5:6})
]
print hash(to_hashable(my_list2)) # the same as above
pickle.dumps(...)
returns a string, which is a hashable object. You can do it as follows
import pickle
a=[1,2,3,4]
h=pickle.dumps(a)
print hash(h)
# or like this
from hashlib import sha512
print sha512(h).hexdigest()
c=pickle.loads(h)
assert c==a

__getitem__ for a list vs a dict

The Dictionary __getitem__ method does not seem to work the same way as it does for List, and it is causing me headaches. Here is what I mean:
If I subclass list, I can overload __getitem__ as:
class myList(list):
def __getitem__(self,index):
if isinstance(index,int):
#do one thing
if isinstance(index,slice):
#do another thing
If I subclass dict, however, the __getitem__ does not expose index, but key instead as in:
class myDict(dict):
def __getitem__(self,key):
#Here I want to inspect the INDEX, but only have access to key!
So, my question is how can I intercept the index of a dict, instead of just the key?
Example use case:
a = myDict()
a['scalar'] = 1 # Create dictionary entry called 'scalar', and assign 1
a['vector_1'] = [1,2,3,4,5] # I want all subsequent vectors to be 5 long
a['vector_2'][[0,1,2]] = [1,2,3] # I want to intercept this and force vector_2 to be 5 long
print(a['vector_2'])
[1,2,3,0,0]
a['test'] # This should throw a KeyError
a['test'][[0,2,3]] # So should this
Dictionaries have no order; there is no index to pass in; this is why Python can use the same syntax ([..]) and the same magic method (__getitem__) for both lists and dictionaries.
When you index a dictionary on an integer like 0, the dictionary treats that like any other key:
>>> d = {'foo': 'bar', 0: 42}
>>> d.keys()
[0, 'foo']
>>> d[0]
42
>>> d['foo']
'bar'
Chained indexing applies to return values; the expression:
a['vector_2'][0, 1, 2]
is executed as:
_result = a['vector_2'] # via a.__getitem__('vector_2')
_result[0, 1, 2] # via _result.__getitem__((0, 1, 2))
so if you want values in your dictionary to behave in a certain way, you must return objects that support those operations.

Python dictionary with list keyword

I'm coding a N'th order markov chain.
It goes something like this:
class Chain:
def __init__(self, order):
self.order = order
self.state_table = {}
def train(self, next_state, *prev_states):
if len(prev_states) != self.order: raise ValueError("prev_states does not match chain order")
if prev_states in self.state_table:
if next_state in self.state_table[prev_states]:
self.state_table[prev_states][next_state] += 1
else:
self.state_table[prev_states][next_state] = 0
else:
self.state_table[prev_states] = {next_state: 0}
Unfortunally, list and tuples are unhashable, and I cannot use them as keywords in dicts...
I have hopefully explained my problem well enough for you to understand what I try to achieve.
Any good ideas how I can use multiple values for dictionary keyword?
Followup question:
I did not know that tuples are hashable.
But the entropy for the hashes seem low. Are there hash collisions possible for tuples?!
Tuples are hashable when their contents are.
>>> a = {}
>>> a[(1,2)] = 'foo'
>>> a[(1,[])]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
As for collisions, when I try a bunch of very similar tuples, I see them being mapped widely apart:
>>> hash((1,2))
3713081631934410656
>>> hash((1,3))
3713081631933328131
>>> hash((2,2))
3713082714462658231
>>> abs(hash((1,2)) - hash((1,3)))
1082525
>>> abs(hash((1,2)) - hash((2,2)))
1082528247575
You can use tuples as dictionary keys, they are hashable as long as their content is hashable (as #larsman said).
Don't worry about collisions, Python's dict takes care of it.
>>> hash('a')
12416037344
>>> hash(12416037344)
12416037344
>>> hash('a') == hash(12416037344)
True
>>> {'a': 'one', 12416037344: 'two'}
{'a': 'one', 12416037344: 'two'}
In this example I took a string and an integer. But it works the same with tuples. Just didn't have any idea how to find two tuples with identical hashes.

Categories

Resources