Creating custom JSONEncoder

Creating custom JSONEncoder - python

I'm running Python 2.7 and I'm trying to create a custom FloatEncoder subclass of JSONEncoder. I've followed many examples such as this but none seem to work. Here is my FloatEncoder class:
class FloatEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, float):
return (str(obj) for obj in [obj])
return super(FloatEncoder, self)._iterencode(obj, markers)
And here is where I call json.dumps:
with patch("utils.fileio.FloatEncoder") as float_patch:
for val,res in ((.00123456,'0.0012'),(.00009,'0.0001'),(0.99999,'1.0000'),({'hello':1.00001,'world':[True,1.00009]},'{"world": [true, 1.0001], "hello": 1.0000}')):
untrusted = dumps(val, cls=FloatEncoder)
self.assertTrue(float_patch._iterencode.called)
self.assertEqual(untrusted, res)
The first assertion fails, meaning that _iterencode is not being executed. After reading the JSON documentation,I tried overriding the default() method but that also was not being called.

You seem to be trying to round float values down to 4 decimal points while generating JSON (based on test examples).
JSONEncoder shipping with Python 2.7 does not have have _iterencode method, so that's why it's not getting called. Also a quick glance at json/encoder.py suggests that this class is written in such a way that makes it difficult to change the float encoding behavior. Perhaps, it would be better to separate concerns, and round the floats before doing JSON serialization.
EDIT: Alex Martelli also supplies a monkey-patch solution in a related answer. The problem with that approach is that you're introducing a global modification to json library behavior that may unwittingly affect some other piece of code in your application that was written with assumption that floats were encoded without rounding.
Try this:
from collections import Mapping, Sequence
from unittest import TestCase, main
from json import dumps
def round_floats(o):
if isinstance(o, float):
return round(o, 4)
elif isinstance(o, basestring):
return o
elif isinstance(o, Sequence):
return [round_floats(item) for item in o]
elif isinstance(o, Mapping):
return dict((key, round_floats(value)) for key, value in o.iteritems())
else:
return o
class TestFoo(TestCase):
def test_it(self):
for val, res in ((.00123456, '0.0012'),
(.00009, '0.0001'),
(0.99999, '1.0'),
({'hello': 1.00001, 'world': [True, 1.00009]},
'{"world": [true, 1.0001], "hello": 1.0}')):
untrusted = dumps(round_floats(val))
self.assertEqual(untrusted, res)
if __name__ == '__main__':
main()

Don't define _iterencode, define default, as shown in the third answer on that page.

Related

How to override JSONDecoder to handle certain string matches differently

I want to change certain values in a json file (nested dicts and arrays). I thought a handy way to do that would be to take advantage of the JSONDecoder.
However, it's not working as I'd expect it to. I've done this exact same approach for getting JSONEncoder to convert np.arrays to lists so it wouldn't break the encoder.
After not getting it to do what I wanted, I thought maybe to try the Decoder instead. Same issue, it never calls default for handling strings it seems. Maybe default is never called when handling a string, just when handling other types of objects?
# key, val are arguments passed in, e.g. ("bar", "2.0rc1")
# Replace the value "2.0rc1" everywhere the "bar" key is found
class StringReplaceDecoder(json.JSONDecoder):
def default(self, obj):
if isinstance(obj, str):
print("Handling obj str: {}".format(obj))
if obj == key:
return val
return json.JSONEncoder.default(self, obj)
json_dump = json.dumps(dict)
json_load = json.loads(json_dump, cls=StringReplaceDecoder)
# Example input
{a:{foo:"", bar:"1.3"}, b:{d:{foo:""}, z:{bar:"1.5"}}}
# Example desired output:
{a:{foo:"", bar:"2.0rc1"}, b:{d:{foo:""}, z:{bar:"2.0rc1"}}}

After finding a solution that worked, I found a far superior one liner that didn't come up in previous google searches.
The correct answer for my problem is from nested_lookup import nested_update
For what it's worth, I also found the object_hook did exactly what I wanted as well:
def val_hook(obj):
return_d = {}
if isinstance(obj, dict):
for k in obj:
if in_key == k:
return_d[k] = in_val
else:
return_d[k] = obj[k]
return return_d
else:
return obj
json_dump = json.dumps(in_dict)
json_load = json.loads(json_dump, object_hook=val_hook)
References
https://gist.github.com/douglasmiranda/5127251
https://github.com/russellballestrini/nested-lookup
https://pypi.org/project/nested-lookup/

How to remove a key/value pair from yaml dump, in Python?

Suppose I have a naive class definition:
import yaml
class A:
def __init__(self):
self.abc = 1
self.hidden = 100
self.xyz = 2
def __repr__(self):
return yaml.dump(self)
A()
printing
!!python/object:__main__.A
abc: 1
hidden: 100
xyz: 2
Is there a clean way to remove a line containing hidden: 100 from yaml dump's printed output? The key name hidden is known in advance, but its numeric value may change.
Desired output:
!!python/object:__main__.A
abc: 1
xyz: 2
FYI: This dump is for display only and will not be loaded.
I suppose one can suppress key/value pair with key=hidden with use of yaml.representative. Another way is find hidden: [number] with RegEx in a string output.

I looked at the documentation for pyyaml and did not find a way to achieve your objective. A work-around would be to delete the attribte hidden, call yaml.dump, then add it back in:
def __repr__(self):
hidden = self.hidden
del self.hidden
return yaml.dump(self)
self.hidden = hidden
Taking a step back, why do you want to use yaml for __repr__? Can you just roll your own instead of relying on yaml?

json is mature solution and (at the moment of writing) have much better docs than pyyaml;
I'd use it instead while pyyaml's docs are hard to fully understand. As a bonus, YAML is (almost) superset of JSON, so you'll be able to read your data as YAML without converting it.
However, to easily use all goodies of YAML you will probably have to convert the data to YAML
json module is unable to serialize custom objects by default, but it can be easily extended:
import json
def default(o):
if isinstance(o, A):
result = vars(o).copy()
del result['hidden']
result['__class__'] = o.__class__.__name__
return result
else:
return o
json.dumps(A(), default=default) # => '{"__class__": "A", "xyz": 2, "abc": 1}'
If you don't want to write default=default everywhere you dumps, you can create custom serializer:
dumper = json.JSONEncoder(default=default)
dumper.encode(A()) # => '{"__class__": "A", "xyz": 2, "abc": 1}'
Or, to be able to easily extend it even further via subclassing:
class Dumper(json.JSONEncoder):
__slots__ = ()
def default(self, o):
if isinstance(o, A):
result = vars(o).copy()
del result['hidden']
result['__class__'] = o.__class__.__name__
return result
else:
return super().default(o)
dumper = Dumper()
dumper.encode(A()) # => '{"__class__": "A", "xyz": 2, "abc": 1}'
Note that fields in JSON are unordered.
Also, if you want to use this, I'd advise you not to serialize dict with key __class__, because it might be hard to distinguish it from serialized object.
See it working online

Correct way of unit testing repr with dict

Say I have a class given by:
class MyClass:
def __init__(self, **kwargs):
self.kwargs = kwargs
def __repr__(self):
return "<%s: %r>" % (self.__class__.__name__, self.kwargs)
__repr__ isn't really important for core functionality, but occasionally gets called for logging, shows up in stack traces, and such, so I'd like to unit-test it.
The problem is similar to that faced when using doctest, but I'd rather keep any complexity (like sorting) in the test function and not in __repr__.
Right now I'm using eval and re to extract the dictionary from the repr() call, but I wanted to check if there were non-eval alternatives out there that people used.
def test_repr():
retval = repr(MyClass(a=1, b=2, c=3))
match = re.match("^<MyClass: ({.*})>\Z", retval)
assert match
assert eval(match.group(1)) == dict(a=1, b=2, c=3)

You only need to check that the kwargs dictionary is correctly being represented in the output, so just pass zero or one keyword arguments:
>>> repr(MyClass(foo='bar')) == "<MyClass: {'foo': 'bar'}>"
True
>>> repr(MyClass()) == '<MyClass: {}>'
True
Then ordering doesn't matter at all.
If you decide to stick with evaluating the extracted dictionary, use ast.literal_eval instead of vanilla eval. I would also use a slice rather than re, as you know the expected format:
>>> '<MyClass: {}>'[10:-1]
'{}'

Is there a way to override python's json handler?

I'm having trouble encoding infinity in json.
json.dumps will convert this to "Infinity", but I would like it do convert it to null or another value of my choosing.
Unfortunately, setting default argument only seems to work if dumps does't already understand the object, otherwise the default handler appears to be bypassed.
Is there a way I can pre-encode the object, change the default way a type/class is encoded, or convert a certain type/class into a different object prior to normal encoding?

Look at the source here: http://hg.python.org/cpython/file/7ec9255d4189/Lib/json/encoder.py
If you subclass JSONEncoder, you can override just the iterencode(self, o, _one_shot=False) method, which has explicit special casing for Infinity (inside an inner function).
To make this reusable, you'll also want to alter the __init__ to take some new options, and store them in the class.
Alternatively, you could pick a json library from pypi which has the appropriate extensibility you are looking for: https://pypi.python.org/pypi?%3Aaction=search&term=json&submit=search
Here's an example:
import json
class FloatEncoder(json.JSONEncoder):
def __init__(self, nan_str = "null", **kwargs):
super(FloatEncoder,self).__init__(**kwargs)
self.nan_str = nan_str
# uses code from official python json.encoder module.
# Same licence applies.
def iterencode(self, o, _one_shot=False):
"""Encode the given object and yield each string
representation as available.
For example::
for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)
"""
if self.check_circular:
markers = {}
else:
markers = None
if self.ensure_ascii:
_encoder = json.encoder.encode_basestring_ascii
else:
_encoder = json.encoder.encode_basestring
if self.encoding != 'utf-8':
def _encoder(o, _orig_encoder=_encoder,
_encoding=self.encoding):
if isinstance(o, str):
o = o.decode(_encoding)
return _orig_encoder(o)
def floatstr(o, allow_nan=self.allow_nan,
_repr=json.encoder.FLOAT_REPR,
_inf=json.encoder.INFINITY,
_neginf=-json.encoder.INFINITY,
nan_str = self.nan_str):
# Check for specials. Note that this type of test is
# processor and/or platform-specific, so do tests which
# don't depend on the internals.
if o != o:
text = nan_str
elif o == _inf:
text = 'Infinity'
elif o == _neginf:
text = '-Infinity'
else:
return _repr(o)
if not allow_nan:
raise ValueError(
"Out of range float values are not JSON compliant: " +
repr(o))
return text
_iterencode = json.encoder._make_iterencode(
markers, self.default, _encoder, self.indent, floatstr,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, _one_shot)
return _iterencode(o, 0)
example_obj = {
'name': 'example',
'body': [
1.1,
{"3.3": 5, "1.1": float('Nan')},
[float('inf'), 2.2]
]}
print json.dumps(example_obj, cls=FloatEncoder)
ideone: http://ideone.com/dFWaNj

No, there is no simple way to achieve this. In fact, NaN and Infinity floating point values shouldn't be serialized with json at all, according to the standard.
Python uses an extension of the standard. You can make the python encoding standard-compliant passing the allow_nan=False parameter to dumps, but this will raise a ValueError for infinity/nans even if you provide a default function.
You have two ways of doing what you want:
Subclass JSONEncoder and change how these values are encoded. Note that you will have to take into account cases where a sequence can contain an infinity value etc. AFAIK there is no API to redefine how objects of a specific class are encoded.
Make a copy of the object to encode and replace any occurrence of infinity/nan with None or some other object that is encoded as you want.
A less robust, yet much simpler solution, is to modify the encoded data, for example replacing all Infinity substrings with null:
>>> import re
>>> infty_regex = re.compile(r'\bInfinity\b')
>>> def replace_infinities(encoded):
... regex = re.compile(r'\bInfinity\b')
... return regex.sub('null', encoded)
...
>>> import json
>>> replace_infinities(json.dumps([1, 2, 3, float('inf'), 4]))
'[1, 2, 3, null, 4]'
Obviously you should take into account the text Infinity inside strings etc., so even here a robust solution is not immediate, nor elegant.

Context
I ran into this issue and didn't want to bring an extra dependency into the project just to handle this case. Additionally, my project supports Python 2.6, 2.7, 3.3, and 3.4 and user's of simplejson. Unfortunately there are three different implementations of iterencode between these versions, so hard-coding a particular version was undesirable.
Hopefully this will help someone else with similar requirements!
Qualifiers
If the encoding time/processing-power surrounding your json.dumps call is small compared to other components of your project, you can un-encode/re-encode the JSON to get your desired result leveraging the parse_constant kwarg.
Benefits
It doesn't matter if the end-user has Python 2.x's json, Python 3.x's json or is using simplejson (e.g, import simplejson as json)
It only uses public json interfaces which are unlikely to change.
Caveats
This will take ~3X as long to encode things
This implementation doesn't handle object_pairs_hook because then it wouldn't work for python 2.6
Invalid separators will fail
Code
class StrictJSONEncoder(json.JSONEncoder):
def default(self, o):
"""Make sure we don't instantly fail"""
return o
def coerce_to_strict(self, const):
"""
This is used to ultimately *encode* into strict JSON, see `encode`
"""
# before python 2.7, 'true', 'false', 'null', were include here.
if const in ('Infinity', '-Infinity', 'NaN'):
return None
else:
return const
def encode(self, o):
"""
Load and then dump the result using parse_constant kwarg
Note that setting invalid separators will cause a failure at this step.
"""
# this will raise errors in a normal-expected way
encoded_o = super(StrictJSONEncoder, self).encode(o)
# now:
# 1. `loads` to switch Infinity, -Infinity, NaN to None
# 2. `dumps` again so you get 'null' instead of extended JSON
try:
new_o = json.loads(encoded_o, parse_constant=self.coerce_to_strict)
except ValueError:
# invalid separators will fail here. raise a helpful exception
raise ValueError(
"Encoding into strict JSON failed. Did you set the separators "
"valid JSON separators?"
)
else:
return json.dumps(new_o, sort_keys=self.sort_keys,
indent=self.indent,
separators=(self.item_separator,
self.key_separator))

You could do something along these lines:
import json
import math
target=[1.1,1,2.2,float('inf'),float('nan'),'a string',int(2)]
def ffloat(f):
if not isinstance(f,float):
return f
if math.isnan(f):
return 'custom NaN'
if math.isinf(f):
return 'custom inf'
return f
print 'regular json:',json.dumps(target)
print 'customized:',json.dumps(map(ffloat,target))
Prints:
regular json: [1.1, 1, 2.2, Infinity, NaN, "a string", 2]
customized: [1.1, 1, 2.2, "custom inf", "custom NaN", "a string", 2]
If you want to handle nested data structures, this is also not that hard:
import json
import math
from collections import Mapping, Sequence
def nested_json(o):
if isinstance(o, float):
if math.isnan(o):
return 'custom NaN'
if math.isinf(o):
return 'custom inf'
return o
elif isinstance(o, basestring):
return o
elif isinstance(o, Sequence):
return [nested_json(item) for item in o]
elif isinstance(o, Mapping):
return dict((key, nested_json(value)) for key, value in o.iteritems())
else:
return o
nested_tgt=[1.1,{1.1:float('inf'),3.3:5},(float('inf'),2.2),]
print 'regular json:',json.dumps(nested_tgt)
print 'nested json',json.dumps(nested_json(nested_tgt))
Prints:
regular json: [1.1, {"3.3": 5, "1.1": Infinity}, [Infinity, 2.2]]
nested json [1.1, {"3.3": 5, "1.1": "custom inf"}, ["custom inf", 2.2]]

Hashing a dictionary?

For caching purposes I need to generate a cache key from GET arguments which are present in a dict.
Currently I'm using sha1(repr(sorted(my_dict.items()))) (sha1() is a convenience method that uses hashlib internally) but I'm curious if there's a better way.

Using sorted(d.items()) isn't enough to get us a stable repr. Some of the values in d could be dictionaries too, and their keys will still come out in an arbitrary order. As long as all the keys are strings, I prefer to use:
json.dumps(d, sort_keys=True)
That said, if the hashes need to be stable across different machines or Python versions, I'm not certain that this is bulletproof. You might want to add the separators and ensure_ascii arguments to protect yourself from any changes to the defaults there. I'd appreciate comments.

If your dictionary is not nested, you could make a frozenset with the dict's items and use hash():
hash(frozenset(my_dict.items()))
This is much less computationally intensive than generating the JSON string or representation of the dictionary.
UPDATE: Please see the comments below, why this approach might not produce a stable result.

EDIT: If all your keys are strings, then before continuing to read this answer, please see Jack O'Connor's significantly simpler (and faster) solution (which also works for hashing nested dictionaries).
Although an answer has been accepted, the title of the question is "Hashing a python dictionary", and the answer is incomplete as regards that title. (As regards the body of the question, the answer is complete.)
Nested Dictionaries
If one searches Stack Overflow for how to hash a dictionary, one might stumble upon this aptly titled question, and leave unsatisfied if one is attempting to hash multiply nested dictionaries. The answer above won't work in this case, and you'll have to implement some sort of recursive mechanism to retrieve the hash.
Here is one such mechanism:
import copy
def make_hash(o):
"""
Makes a hash from a dictionary, list, tuple or set to any level, that contains
only other hashable types (including any lists, tuples, sets, and
dictionaries).
"""
if isinstance(o, (set, tuple, list)):
return tuple([make_hash(e) for e in o])
elif not isinstance(o, dict):
return hash(o)
new_o = copy.deepcopy(o)
for k, v in new_o.items():
new_o[k] = make_hash(v)
return hash(tuple(frozenset(sorted(new_o.items()))))
Bonus: Hashing Objects and Classes
The hash() function works great when you hash classes or instances. However, here is one issue I found with hash, as regards objects:
class Foo(object): pass
foo = Foo()
print (hash(foo)) # 1209812346789
foo.a = 1
print (hash(foo)) # 1209812346789
The hash is the same, even after I've altered foo. This is because the identity of foo hasn't changed, so the hash is the same. If you want foo to hash differently depending on its current definition, the solution is to hash off whatever is actually changing. In this case, the __dict__ attribute:
class Foo(object): pass
foo = Foo()
print (make_hash(foo.__dict__)) # 1209812346789
foo.a = 1
print (make_hash(foo.__dict__)) # -78956430974785
Alas, when you attempt to do the same thing with the class itself:
print (make_hash(Foo.__dict__)) # TypeError: unhashable type: 'dict_proxy'
The class __dict__ property is not a normal dictionary:
print (type(Foo.__dict__)) # type <'dict_proxy'>
Here is a similar mechanism as previous that will handle classes appropriately:
import copy
DictProxyType = type(object.__dict__)
def make_hash(o):
"""
Makes a hash from a dictionary, list, tuple or set to any level, that
contains only other hashable types (including any lists, tuples, sets, and
dictionaries). In the case where other kinds of objects (like classes) need
to be hashed, pass in a collection of object attributes that are pertinent.
For example, a class can be hashed in this fashion:
make_hash([cls.__dict__, cls.__name__])
A function can be hashed like so:
make_hash([fn.__dict__, fn.__code__])
"""
if type(o) == DictProxyType:
o2 = {}
for k, v in o.items():
if not k.startswith("__"):
o2[k] = v
o = o2
if isinstance(o, (set, tuple, list)):
return tuple([make_hash(e) for e in o])
elif not isinstance(o, dict):
return hash(o)
new_o = copy.deepcopy(o)
for k, v in new_o.items():
new_o[k] = make_hash(v)
return hash(tuple(frozenset(sorted(new_o.items()))))
You can use this to return a hash tuple of however many elements you'd like:
# -7666086133114527897
print (make_hash(func.__code__))
# (-7666086133114527897, 3527539)
print (make_hash([func.__code__, func.__dict__]))
# (-7666086133114527897, 3527539, -509551383349783210)
print (make_hash([func.__code__, func.__dict__, func.__name__]))
NOTE: all of the above code assumes Python 3.x. Did not test in earlier versions, although I assume make_hash() will work in, say, 2.7.2. As far as making the examples work, I do know that
func.__code__
should be replaced with
func.func_code

The code below avoids using the Python hash() function because it will not provide hashes that are consistent across restarts of Python (see hash function in Python 3.3 returns different results between sessions). make_hashable() will convert the object into nested tuples and make_hash_sha256() will also convert the repr() to a base64 encoded SHA256 hash.
import hashlib
import base64
def make_hash_sha256(o):
hasher = hashlib.sha256()
hasher.update(repr(make_hashable(o)).encode())
return base64.b64encode(hasher.digest()).decode()
def make_hashable(o):
if isinstance(o, (tuple, list)):
return tuple((make_hashable(e) for e in o))
if isinstance(o, dict):
return tuple(sorted((k,make_hashable(v)) for k,v in o.items()))
if isinstance(o, (set, frozenset)):
return tuple(sorted(make_hashable(e) for e in o))
return o
o = dict(x=1,b=2,c=[3,4,5],d={6,7})
print(make_hashable(o))
# (('b', 2), ('c', (3, 4, 5)), ('d', (6, 7)), ('x', 1))
print(make_hash_sha256(o))
# fyt/gK6D24H9Ugexw+g3lbqnKZ0JAcgtNW+rXIDeU2Y=

Here is a clearer solution.
def freeze(o):
if isinstance(o,dict):
return frozenset({ k:freeze(v) for k,v in o.items()}.items())
if isinstance(o,list):
return tuple([freeze(v) for v in o])
return o
def make_hash(o):
"""
makes a hash out of anything that contains only list,dict and hashable types including string and numeric types
"""
return hash(freeze(o))

MD5 HASH
The method which resulted in the most stable results for me was using md5 hashes and json.stringify
from typing import Dict, Any
import hashlib
import json
def dict_hash(dictionary: Dict[str, Any]) -> str:
"""MD5 hash of a dictionary."""
dhash = hashlib.md5()
# We need to sort arguments so {'a': 1, 'b': 2} is
# the same as {'b': 2, 'a': 1}
encoded = json.dumps(dictionary, sort_keys=True).encode()
dhash.update(encoded)
return dhash.hexdigest()

While hash(frozenset(x.items()) and hash(tuple(sorted(x.items())) work, that's doing a lot of work allocating and copying all the key-value pairs. A hash function really should avoid a lot of memory allocation.
A little bit of math can help here. The problem with most hash functions is that they assume that order matters. To hash an unordered structure, you need a commutative operation. Multiplication doesn't work well as any element hashing to 0 means the whole product is 0. Bitwise & and | tend towards all 0's or 1's. There are two good candidates: addition and xor.
from functools import reduce
from operator import xor
class hashable(dict):
def __hash__(self):
return reduce(xor, map(hash, self.items()), 0)
# Alternative
def __hash__(self):
return sum(map(hash, self.items()))
One point: xor works, in part, because dict guarantees keys are unique. And sum works because Python will bitwise truncate the results.
If you want to hash a multiset, sum is preferable. With xor, {a} would hash to the same value as {a, a, a} because x ^ x ^ x = x.
If you really need the guarantees that SHA makes, this won't work for you. But to use a dictionary in a set, this will work fine; Python containers are resiliant to some collisions, and the underlying hash functions are pretty good.

Updated from 2013 reply...
None of the above answers seem reliable to me. The reason is the use of items(). As far as I know, this comes out in a machine-dependent order.
How about this instead?
import hashlib
def dict_hash(the_dict, *ignore):
if ignore: # Sometimes you don't care about some items
interesting = the_dict.copy()
for item in ignore:
if item in interesting:
interesting.pop(item)
the_dict = interesting
result = hashlib.sha1(
'%s' % sorted(the_dict.items())
).hexdigest()
return result

Use DeepHash from DeepDiff Module
from deepdiff import DeepHash
obj = {'a':'1',b:'2'}
hashes = DeepHash(obj)[obj]

To preserve key order, instead of hash(str(dictionary)) or hash(json.dumps(dictionary)) I would prefer quick-and-dirty solution:
from pprint import pformat
h = hash(pformat(dictionary))
It will work even for types like DateTime and more that are not JSON serializable.

You can use the maps library to do this. Specifically, maps.FrozenMap
import maps
fm = maps.FrozenMap(my_dict)
hash(fm)
To install maps, just do:
pip install maps
It handles the nested dict case too:
import maps
fm = maps.FrozenMap.recurse(my_dict)
hash(fm)
Disclaimer: I am the author of the maps library.

You could use the third-party frozendict module to freeze your dict and make it hashable.
from frozendict import frozendict
my_dict = frozendict(my_dict)
For handling nested objects, you could go with:
import collections.abc
def make_hashable(x):
if isinstance(x, collections.abc.Hashable):
return x
elif isinstance(x, collections.abc.Sequence):
return tuple(make_hashable(xi) for xi in x)
elif isinstance(x, collections.abc.Set):
return frozenset(make_hashable(xi) for xi in x)
elif isinstance(x, collections.abc.Mapping):
return frozendict({k: make_hashable(v) for k, v in x.items()})
else:
raise TypeError("Don't know how to make {} objects hashable".format(type(x).__name__))
If you want to support more types, use functools.singledispatch (Python 3.7):
#functools.singledispatch
def make_hashable(x):
raise TypeError("Don't know how to make {} objects hashable".format(type(x).__name__))
#make_hashable.register
def _(x: collections.abc.Hashable):
return x
#make_hashable.register
def _(x: collections.abc.Sequence):
return tuple(make_hashable(xi) for xi in x)
#make_hashable.register
def _(x: collections.abc.Set):
return frozenset(make_hashable(xi) for xi in x)
#make_hashable.register
def _(x: collections.abc.Mapping):
return frozendict({k: make_hashable(v) for k, v in x.items()})
# add your own types here

One way to approach the problem is to make a tuple of the dictionary's items:
hash(tuple(my_dict.items()))

This is not a general solution (i.e. only trivially works if your dict is not nested), but since nobody here suggested it, I thought it might be useful to share it.
One can use a (third-party) immutables package and create an immutable 'snapshot' of a dict like this:
from immutables import Map
map = dict(a=1, b=2)
immap = Map(map)
hash(immap)
This seems to be faster than, say, stringification of the original dict.
I learned about this from this nice article.

For nested structures, having string keys at the top level dict, you can use pickle(protocol=5) and hash the bytes object. If you need safety, you can use a safe serializer.

I do it like this:
hash(str(my_dict))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating custom JSONEncoder - python

Don't define _iterencode, define default, as shown in the third answer on that page.

Related

How to override JSONDecoder to handle certain string matches differently

How to remove a key/value pair from yaml dump, in Python?

Correct way of unit testing repr with dict

Is there a way to override python's json handler?

Hashing a dictionary?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating custom JSONEncoder - python

Don't define _iterencode, define default, as shown in the third answer on that page.

Related

How to override JSONDecoder to handle certain string matches differently

How to remove a key/value pair from yaml dump, in Python?

Correct way of unit testing __repr__ with dict

Is there a way to override python's json handler?

Hashing a dictionary?

Categories

Resources

Correct way of unit testing repr with dict