Python doesn't allow non-hashable objects to be used as keys in other dictionaries. As pointed out by Andrey Vlasovskikh, there is a nice workaround for the special case of using non-nested dictionaries as keys:
frozenset(a.items())#Can be put in the dictionary instead
Is there a method of using arbitrary objects as keys in dictionaries?
Example:
How would this be used as a key?
{"a":1, "b":{"c":10}}
It is extremely rare that you will actually have to use something like this in your code. If you think this is the case, consider changing your data model first.
Exact use case
The use case is caching calls to an arbitrary keyword only function. Each key in the dictionary is a string (the name of the argument) and the objects can be quite complicated, consisting of layered dictionaries, lists, tuples, ect.
Related problems
This sub-problem has been split off from the problem here. Solutions here deal with the case where the dictionaries is not layered.
Based off solution by Chris Lutz again.
import collections
def hashable(obj):
if isinstance(obj, collections.Hashable):
items = obj
elif isinstance(obj, collections.Mapping):
items = frozenset((k, hashable(v)) for k, v in obj.iteritems())
elif isinstance(obj, collections.Iterable):
items = tuple(hashable(item) for item in obj)
else:
raise TypeError(type(obj))
return items
Don't. I agree with Andreys comment on the previous question that is doesn't make sense to have dictionaries as keys, and especially not nested ones. Your data-model is obviously quite complex, and dictionaries are probably not the right answer. You should try some OO instead.
Based off solution by Chris Lutz. Note that this doesn't handle objects that are changed by iteration, such as streams, nor does it handle cycles.
import collections
def make_hashable(obj):
"""WARNING: This function only works on a limited subset of objects
Make a range of objects hashable.
Accepts embedded dictionaries, lists or tuples (including namedtuples)"""
if isinstance(obj, collections.Hashable):
#Fine to be hashed without any changes
return obj
elif isinstance(obj, collections.Mapping):
#Convert into a frozenset instead
items=list(obj.items())
for i, item in enumerate(items):
items[i]=make_hashable(item)
return frozenset(items)
elif isinstance(obj, collections.Iterable):
#Convert into a tuple instead
ret=[type(obj)]
for i, item in enumerate(obj):
ret.append(make_hashable(item))
return tuple(ret)
#Use the id of the object
return id(obj)
I agree with Lennart Regebro that you don't. However I often find it useful to cache some function calls, callable object and/or Flyweight objects, since they may use keyword arguments.
But if you really want it, try pickle.dumps (or cPickle if python 2.6) as a quick and dirty hack. It is much faster than any of the answers that uses recursive calls to make items immutable, and strings are hashable.
import pickle
hashable_str = pickle.dumps(unhashable_object)
If you really must, make your objects hashable. Subclass whatever you want to put in as a key, and provide a __hash__ function which returns an unique key to this object.
To illustrate:
>>> ("a",).__hash__()
986073539
>>> {'a': 'b'}.__hash__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable
If your hash is not unique enough you will get collisions. May be slow as well.
I totally disagree with comments & answers saying that this shouldn't be done for data model purity reason.
A dictionary associates an object with another object using the former one as a key. Dictionaries can't be used as keys because they're not hashable. This doesn't make any less meaningful/practical/necessary to map dictionaries to other objects.
As I understand the Python binding system, you can bind any dictionary to a number of variables (or the reverse, depends on your terminology) which means that these variables all know the same unique 'pointer' to that dictionary. Wouldn't it be possible to use that identifier as the hashing key ?
If your data model ensures/enforces that you can't have two dictionaries with the same content used as keys then that seems to be a safe technique to me.
I should add that I have no idea whatsoever of how that can/should be done though.
I'm not entirely whether this should be an answer or a comment. Please correct me if needed.
With recursion!
def make_hashable(h):
items = h.items()
for item in items:
if type(items) == dict:
item = make_hashable(item)
return frozenset(items)
You can add other type tests for any other mutable types you want to make hashable. It shouldn't be hard.
I encountered this issue when using a decorator that caches the results of previous calls based on call signature. I do not agree with the comments/answers here to the effect of "you should not do this", but I think it is important to recognize the potential for surprising and unexpected behavior when going down this path. My thought is that since instances are both mutable and hashable, and it does not seem practical to change that, there is nothing inherently wrong with creating hashable equivalents of non-hashable types or objects. But of course that is only my opinion.
For anyone who requires Python 2.5 compatibility, the below may be useful. I based it on the earlier answer.
from itertools import imap
tuplemap = lambda f, data: tuple(imap(f, data))
def make_hashable(obj):
u"Returns a deep, non-destructive conversion of given object to an equivalent hashable object"
if isinstance(obj, list):
return tuplemap(make_hashable, iter(obj))
elif isinstance(obj, dict):
return frozenset(tuplemap(make_hashable, obj.iteritems()))
elif hasattr(obj, '__hash__') and callable(obj.__hash__):
try:
obj.__hash__()
except:
if hasattr(obj, '__iter__') and callable(obj.__iter__):
return tuplemap(make_hashable, iter(obj))
else:
raise NotImplementedError, 'object of type %s cannot be made hashable' % (type(obj),)
else:
return obj
elif hasattr(obj, '__iter__') and callable(obj.__iter__):
return tuplemap(make_hashable, iter(obj))
else:
raise NotImplementedError, 'object of type %s cannot be made hashable' % (type, obj)
Related
I always use mypy in my Python programs.
What is the type (from typing) for immutable objects, the ones that could be used for a dictionary key?
To put back into context, I want to write a class inherited from dictionary and I have the following code
class SliceableDict(dict):
def __getitem__(self, attr:Any) -> Any:
return super().__getitem__(attr)
Type hints in that case are pretty useless, isn't it?
Thanks
The keys of a dict are hashable (see the first sentence of the docs on dict), and the hashable type is typing.Hashable, an alias for collections.abc.Hashable.
typing.Hashable refers to any type that can serve as a valid key in a dict or value in a set.
It doesn't require true immutability (any object can define __hash__), but in general, hashable things should be things that are "treated as immutable", since it will break things should they be mutated after insertion into a set or after being inserted as a dict key.
I am using defaultdict(set) to populate an internal mapping in a very large data structure. After it's populated, the whole structure (including the mapping) is exposed to the client code. At that point, I don't want anyone modifying the mapping.
And nobody does, intentionally. But sometimes, client code may by accident refer to an element that doesn't exist. At that point, a normal dictionary would have raised KeyError, but since the mapping is defaultdict, it simply creates a new element (an empty set) at that key. This is quite hard to catch, since everything happens silently. But I need to ensure this doesn't happen (the semantics actually doesn't break, but the mapping grows to a huge size).
What should I do? I can see these choices:
Find all the instances in current and future client code where a dictionary lookup is performed on the mapping, and convert it to mapping.get(k, {}) instead. This is just terrible.
"Freeze" defaultdict after the data structure is fully initialized, by converting it to dict. (I know it's not really frozen, but I trust client code to not actually write mapping[k] = v.) Inelegant, and a large performance hit.
Wrap defaultdict into a dict interface. What's an elegant way to do that? I'm afraid the performance hit may be huge though (this lookup is heavily used in tight loops).
Subclass defaultdict and add a method that "shuts down" all the defaultdict features, leaving it to behave as if it's a regular dict. It's a variant of 3 above, but I'm not sure if it's any faster. And I don't know if it's doable without relying on the implementation details.
Use regular dict in the data structure, rewriting all the code there to first check if the element is in the dictionary and adding it if it's not. Not good.
defaultdict docs say for default_factory:
If the default_factory attribute is None, this raises a KeyError
exception with the key as argument.
What if you just set your defaultdict's default_factory to None? E.g.,
>>> d = defaultdict(int)
>>> d['a'] += 1
>>> d
defaultdict(<type 'int'>, {'a': 1})
>>> d.default_factory = None
>>> d['b'] += 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'b'
>>>
Not sure if this is the best approach, but seems to work.
Once you have finished populating your defaultdict, you can simply create a regular dict from it:
my_dict = dict(my_default_dict)
One can optionally use the typing.Final type annotation.
If the default dict is a recursive default dict, see this answer which uses a recursive solution.
You could make a class that holds a reference to your dict and prevent setitem()
from collections import Mapping
class MyDict(Mapping):
def __init__(self, d):
self.d = d;
def __getitem__(self, k):
return self.d[k]
def __iter__(self):
return self.__iter__()
def __setitem__(self, k, v):
if k not in self.d.keys():
raise KeyError
else:
self.d[k] = v
I have a python dictionary that contains iterables, some of which are lists, but most of which are other dictionaries. I'd like to do glob-style assignment similar to the following:
myiter['*']['*.txt']['name'] = 'Woot'
That is, for each element in myiter, look up all elements with keys ending in '.txt' and then set their 'name' item to 'Woot'.
I've thought about sub-classing dict and using the fnmatch module. But, it's unclear to me what the best way of accomplishing this is.
The best way, I think, would be not to do it -- '*' is a perfectly valid key in a dict, so myiter['*'] has a perfectly well defined meaning and usefulness, and subverting that can definitely cause problems. How to "glob" over keys which are not strings, including the exclusively integer "keys" (indices) in elements which are lists and not mappings, is also quite a design problem.
If you nevertheless must do it, I would recommend taking full control by subclassing the abstract base class collections.MutableMapping, and implement the needed methods (__len__, __iter__, __getitem__, __setitem__, __delitem__, and, for better performance, also override others such as __contains__, which the ABC does implement on the base of the others, but slowly) in terms of a contained dict. Subclassing dict instead, as per other suggestions, would require you to override a huge number of methods to avoid inconsistent behavior between the use of "keys containing wildcards" in the methods you do override, and in those you don't.
Whether you subclass collections.MutableMapping, or dict, to make your Globbable class, you have to make a core design decision: what does yourthing[somekey] return when yourthing is a Globbable?
Presumably it has to return a different type when somekey is a string containing wildcards, versus anything else. In the latter case, one would imagine, just what is actually at that entry; but in the former, it can't just return another Globbable -- otherwise, what would yourthing[somekey] = 'bah' do in the general case? For your single "slick syntax" example, you want it to set a somekey entry in each of the items of yourthing (a HUGE semantic break with the behavior of every other mapping in the universe;-) -- but then, how would you ever set an entry in yourthing itself?!
Let's see if the Zen of Python has anything to say about this "slick syntax" for which you yearn...:
>>> import this
...
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Consider for a moment the alternative of losing the "slick syntax" (and all the huge semantic headaches it necessarily implies) in favor of clarity and simplicity (using Python 2.7-and-better syntax here, just for the dict comprehension -- use an explicit dict(...) call instead if you're stuck with 2.6 or earlier), e.g.:
def match(s, pat):
try: return fnmatch.fnmatch(s, pat)
except TypeError: return False
def sel(ds, pat):
return [d[k] for d in ds for k in d if match(k, pat)]
def set(ds, k, v):
for d in ds: d[k] = v
so your assignment might become
set(sel(sel([myiter], '*')), '*.txt'), 'name', 'Woot')
(the selection with '*' being redundant if all , I'm just omitting it). Is this so horrible as to be worth the morass of issues I've mentioned above in order to use instead
myiter['*']['*.txt']['name'] = 'Woot'
...? By far the clearest and best-performing way, of course, remains the even-simpler
def match(k, v, pat):
try:
if fnmatch.fnmatch(k, pat):
return isinstance(v, dict)
except TypeError:
return False
for k, v in myiter.items():
if match(k, v, '*'):
for sk, sv in v.items():
if match(sk, sv, '*.txt'):
sv['name'] = 'Woot'
but if you absolutely crave conciseness and compactness, despising the Zen of Python's koan "Sparse is better than dense", you can at least obtain them without the various nightmares I mentioned as needed to achieve your ideal "syntax sugar".
The best way is to subclass dict and use the fnmatch module.
subclass dict: adding functionality you want in an object-oriented way.
fnmatch module: reuse of existing functionality.
You could use fnmatch for functionality to match on dictionary keys although you would have to compromise syntax slightly, especially if you wanted to do this on a nested dictionary. Perhaps a custom dictionary-like class with a search method to return wildcard matches would work well.
Here is a VERY BASIC example that comes with a warning that this is NOT RECURSIVE and will not handle nested dictionaries:
from fnmatch import fnmatch
class GlobDict(dict):
def glob(self, match):
"""#match should be a glob style pattern match (e.g. '*.txt')"""
return dict([(k,v) for k,v in self.items() if fnmatch(k, match)])
# Start with a basic dict
basic_dict = {'file1.jpg':'image', 'file2.txt':'text', 'file3.mpg':'movie',
'file4.txt':'text'}
# Create a GlobDict from it
glob_dict = GlobDict( **basic_dict )
# Then get glob-styl results!
globbed_results = glob_dict.glob('*.txt')
# => {'file4.txt': 'text', 'file2.txt': 'text'}
As for what way is the best? The best way is the one that works. Don't try to optimize a solution before it's even created!
Following the principle of least magic, perhaps just define a recursive function, rather than subclassing dict:
import fnmatch
def set_dict_with_pat(it,key_patterns,value):
if len(key_patterns)>1:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
set_dict_with_pat(it[key],key_patterns[1:],value)
else:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
it[key]=value
Which could be used like this:
myiter=({'dir1':{'a.txt':{'name':'Roger'},'b.notxt':{'name':'Carl'}},'dir2':{'b.txt':{'name':'Sally'}}})
set_dict_with_pat(myiter,['*','*.txt','name'],'Woot')
print(myiter)
# {'dir2': {'b.txt': {'name': 'Woot'}}, 'dir1': {'b.notxt': {'name': 'Carl'}, 'a.txt': {'name': 'Woot'}}}
Is there some simple method that can check if an input object to some function adheres to a specific structure? For example, I want only a dictionary of string keys and values that are a list of integers.
One method would be to write a recursive function that you pass in the object and you iterate over it, checking at each level it is what you expect. But I feel that there should be a more elegant way to do it than this in python.
Why would you expect Python to provide an "elegant way" to check types, since the whole idea of type-checking is so utterly alien to the Pythonic way of conceiving the world and interacting with it?! Normally in Python you'd use duck typing -- so "an integer" might equally well be an int, a long, a gmpy.mpz -- types with no relation to each other except they all implement the same core signature... just as "a dict" might be any implementation of mapping, and so forth.
The new-in-2.6-and-later concept of "abstract base classes" provides a more systematic way to implement and verify duck typing, and 3.0-and-later function annotations let you interface with such a checking system (third-party, since Python adopts no such system for the foreseeable future). For example, this recipe provides a 3.0-and-later way to perform "kinda but not quite" type checking based on function annotations -- though I doubt it goes anywhere as deep as you desire, but then, it's early times for function annotations, and most of us Pythonistas feel so little craving for such checking that we're unlikely to run flat out to implement such monumental systems in lieu of actually useful code;-).
Short answer, no, you have to create your own function.
Long answer: its not pythonic to do what you're asking. There might be some special cases (e.g, marshalling a dict to xmlrpc), but by and large, assume the objects will act like what they're documented to be. If they don't, let the AttributeError bubble up. If you are ok with coercing values, then use str() and int() to convert them. They could, afterall, implement __str__, __add__, etc that makes them not descendants of int/str, but still usable.
def dict_of_string_and_ints(obj):
assert isinstance(obj, dict)
for key, value in obj.iteritems(): # py2.4
assert isinstance(key, basestring)
assert isinstance(value, list)
assert sum(isinstance(x, int) for x in value) == len(list)
Since Python emphasizes things just working, your best bet is to just assert as you go and trust the users of your library to feed you proper data. Let exceptions happen, if you must; that's on your clients for not reading your docstring.
In your case, something like this:
def myfunction(arrrrrgs):
assert issubclass(dict, type(arrrrrgs)), "Need a dictionary!"
for key in arrrrrgs:
assert type(key) is str, "Need a string!"
val = arrrrrgs[key]
assert type(val) is list, "Need a list!"
And so forth.
Really, it isn't worth the effort, and let your program blow up if you express yourself clearly in the docstring, or throw well-placed exceptions to guide the late-night debugger.
I will take a shot and propose a helper function that can do something like that for you in a more generic+elegant way:
def check_type(value, type_def):
"""
This validates an object instanct <value> against a type template <type_def>
presented as a simplified object.
E.g.
if value is list of dictionaries that have string values as key and integers
as values:
>> check_type(value, [{'':0}])
if value is list of dictionaries, no restriction on key/values
>> check_type(value, [{}])
"""
if type(value) != type(type_def):
return False
if hasattr(value, '__iter__'):
if len(type_def) == 0:
return True
type_def_val = iter(type_def).next()
for key in value:
if not check_type(key, type_def_val):
return False
if type(value) is dict:
if not check_type(value.values(), type_def.values()):
return False
return True
The comment explains a sample of usage, but you can always go pretty deep, e.g.
>>> check_type({1:['a', 'b'], 2:['c', 'd']}, {0:['']})
True
>>> check_type({1:['a', 'b'], 2:['c', 3]}, {0:['']})
False
P.S. Feel free to modify it if you want one-by-one tuple validation (e.g. validation against ([], '', {0:0}) which is not handled as it is expected now)
Does Pickle always produce the same output for a certain input value? I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories. My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.
I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories.
Right:
>>> pickle.dumps({1: 0, 9: 0}) == pickle.dumps({9: 0, 1: 0})
False
See also: pickle.dumps not suitable for hashing
My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.
There's a number of fundamental problems with this. It's impossible to come up with an object-to-string transformation that maps equality correctly—think of the problem of object identity:
>>> a = object()
>>> b = object()
>>> a == b
False
>>> pickle.dumps(b) == pickle.dumps(a)
True
Depending on your exact requirements, you may be able to transform object hierarchies into ones that you could then hash:
def hashablize(obj):
"""Convert a container hierarchy into one that can be hashed.
Don't use this with recursive structures!
Also, this won't be useful if you pass dictionaries with
keys that don't have a total order.
Actually, maybe you're best off not using this function at all."""
try:
hash(obj)
except TypeError:
if isinstance(obj, dict):
return tuple((k, hashablize(v)) for (k, v) in sorted(obj.iteritems()))
elif hasattr(obj, '__iter__'):
return tuple(hashablize(o) for o in obj)
else:
raise TypeError("Can't hashablize object of type %r" % type(obj))
else:
return obj
What do you mean by same output ? You should normally always get the same output for a roundtrip (pickling -> unpickling), but I don't think the serialized format itself is guaranteed to be the same in every condition. Certainly, it may change between platforms and all that.
Within one run of your program, using pickling for memoization should be fine - I have used this scheme several times without trouble, but that was for quite simple problems. One problem is that this does not cover every useful case (function come to mind: you cannot pickle them, so if your function takes a callable argument, that won't work).