How to check if a variable is a dictionary in Python? - python

How would you check if a variable is a dictionary in Python?
For example, I'd like it to loop through the values in the dictionary until it finds a dictionary. Then, loop through the one it finds:
dict = {'abc': 'abc', 'def': {'ghi': 'ghi', 'jkl': 'jkl'}}
for k, v in dict.iteritems():
if ###check if v is a dictionary:
for k, v in v.iteritems():
print(k, ' ', v)
else:
print(k, ' ', v)

You could use if type(ele) is dict or use isinstance(ele, dict) which would work if you had subclassed dict:
d = {'abc': 'abc', 'def': {'ghi': 'ghi', 'jkl': 'jkl'}}
for element in d.values():
if isinstance(element, dict):
for k, v in element.items():
print(k,' ',v)

How would you check if a variable is a dictionary in Python?
This is an excellent question, but it is unfortunate that the most upvoted answer leads with a poor recommendation, type(obj) is dict.
(Note that you should also not use dict as a variable name - it's the name of the builtin object.)
If you are writing code that will be imported and used by others, do not presume that they will use the dict builtin directly - making that presumption makes your code more inflexible and in this case, create easily hidden bugs that would not error the program out.
I strongly suggest, for the purposes of correctness, maintainability, and flexibility for future users, never having less flexible, unidiomatic expressions in your code when there are more flexible, idiomatic expressions.
is is a test for object identity. It does not support inheritance, it does not support any abstraction, and it does not support the interface.
So I will provide several options that do.
Supporting inheritance:
This is the first recommendation I would make, because it allows for users to supply their own subclass of dict, or a OrderedDict, defaultdict, or Counter from the collections module:
if isinstance(any_object, dict):
But there are even more flexible options.
Supporting abstractions:
from collections.abc import Mapping
if isinstance(any_object, Mapping):
This allows the user of your code to use their own custom implementation of an abstract Mapping, which also includes any subclass of dict, and still get the correct behavior.
Use the interface
You commonly hear the OOP advice, "program to an interface".
This strategy takes advantage of Python's polymorphism or duck-typing.
So just attempt to access the interface, catching the specific expected errors (AttributeError in case there is no .items and TypeError in case items is not callable) with a reasonable fallback - and now any class that implements that interface will give you its items (note .iteritems() is gone in Python 3):
try:
items = any_object.items()
except (AttributeError, TypeError):
non_items_behavior(any_object)
else: # no exception raised
for item in items: ...
Perhaps you might think using duck-typing like this goes too far in allowing for too many false positives, and it may be, depending on your objectives for this code.
Conclusion
Don't use is to check types for standard control flow. Use isinstance, consider abstractions like Mapping or MutableMapping, and consider avoiding type-checking altogether, using the interface directly.

In python 3.6
typeVariable = type(variable)
print('comparison',typeVariable == dict)
if typeVariable == dict:
#'true'
else:
#'false'

The OP did not exclude the starting variable, so for completeness here is how to handle the generic case of processing a supposed dictionary that may include items as dictionaries.
Also following the pure Python(3.8) recommended way to test for dictionary in the above comments.
from collections.abc import Mapping
my_dict = {'abc': 'abc', 'def': {'ghi': 'ghi', 'jkl': 'jkl'}}
def parse_dict(in_dict):
if isinstance(in_dict, Mapping):
for k_outer, v_outer in in_dict.items():
if isinstance(v_outer, Mapping):
for k_inner, v_inner in v_outer.items():
print(k_inner, v_inner)
else:
print(k_outer, v_outer)
parse_dict(my_dict)

My testing has found this to work now we have type hints:
from typing import Dict
if isinstance(my_dict, Dict):
# True
else:
# False
Side note some discussion about typing.Dict here

Related

Exposing `defaultdict` as a regular `dict`

I am using defaultdict(set) to populate an internal mapping in a very large data structure. After it's populated, the whole structure (including the mapping) is exposed to the client code. At that point, I don't want anyone modifying the mapping.
And nobody does, intentionally. But sometimes, client code may by accident refer to an element that doesn't exist. At that point, a normal dictionary would have raised KeyError, but since the mapping is defaultdict, it simply creates a new element (an empty set) at that key. This is quite hard to catch, since everything happens silently. But I need to ensure this doesn't happen (the semantics actually doesn't break, but the mapping grows to a huge size).
What should I do? I can see these choices:
Find all the instances in current and future client code where a dictionary lookup is performed on the mapping, and convert it to mapping.get(k, {}) instead. This is just terrible.
"Freeze" defaultdict after the data structure is fully initialized, by converting it to dict. (I know it's not really frozen, but I trust client code to not actually write mapping[k] = v.) Inelegant, and a large performance hit.
Wrap defaultdict into a dict interface. What's an elegant way to do that? I'm afraid the performance hit may be huge though (this lookup is heavily used in tight loops).
Subclass defaultdict and add a method that "shuts down" all the defaultdict features, leaving it to behave as if it's a regular dict. It's a variant of 3 above, but I'm not sure if it's any faster. And I don't know if it's doable without relying on the implementation details.
Use regular dict in the data structure, rewriting all the code there to first check if the element is in the dictionary and adding it if it's not. Not good.
defaultdict docs say for default_factory:
If the default_factory attribute is None, this raises a KeyError
exception with the key as argument.
What if you just set your defaultdict's default_factory to None? E.g.,
>>> d = defaultdict(int)
>>> d['a'] += 1
>>> d
defaultdict(<type 'int'>, {'a': 1})
>>> d.default_factory = None
>>> d['b'] += 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'b'
>>>
Not sure if this is the best approach, but seems to work.
Once you have finished populating your defaultdict, you can simply create a regular dict from it:
my_dict = dict(my_default_dict)
One can optionally use the typing.Final type annotation.
If the default dict is a recursive default dict, see this answer which uses a recursive solution.
You could make a class that holds a reference to your dict and prevent setitem()
from collections import Mapping
class MyDict(Mapping):
def __init__(self, d):
self.d = d;
def __getitem__(self, k):
return self.d[k]
def __iter__(self):
return self.__iter__()
def __setitem__(self, k, v):
if k not in self.d.keys():
raise KeyError
else:
self.d[k] = v

Why aren't Python dicts unified?

After reading this question, I noticed that S. Lott might have liked to use an “ordered defaultdict”, but it doesn't exist. Now, I wonder: Why do we have so many dict classes in Python?
dict
blist.sorteddict
collections.OrderedDict
collections.defaultdict
weakref.WeakKeyDictionary
weakref.WeakValueDictionary
others?
Why not have something like this,
dict(initializer=[], sorted=False, ordered=False, default=None,
weak_keys=False, weak_values=False)
that unifies everything, and provides every useful combination?
One issue is that making this change would break backward-compatibility, due to this type of constructor usage that exists now:
>>> dict(one=1, two=2)
{'two': 2, 'one': 1}
Those extra options don't come for free. Since 99.9% of Python is built on dict, it is very important to make it as minimal and fast as possible.
Because the implementations differ a lot. You'd basically end up with a dict factory that returns an instance of a _dict(A very fast, low-overhead dictionary - the current dict), ordereddict, defaultdict, ... class. Also, you could not initialize dictionaries with keyword arguments anymore; programs relying on this would fail:
>>> dict(sorted=42)
{'sorted': 42}
# Your proposal would lead to an empty dictionary here (breaking compatibility)
Besides, when it's reasonable, the various classes already inherit from each other:
>>> collections.defaultdict.__bases__
(<type 'dict'>,)
This is why languages have "mixins".
You could try to invent something like the following by defining the right bunch of classes.
class defaultdict( dict, unordered, default_init ): pass
class OrderedDict( dict, ordered, nodefault_init ): pass
class WeakKeyDict( dict, ordered, nodefault_init, weakkey ): pass
class KeyValueDict( dict, ordered, nodefault_init, weakvalue ): pass
Then, once you have those "unified" dictionaries, your applications look like this
groups= defaultdict( list )
No real change to the app, is there?

How to glob for iterable element

I have a python dictionary that contains iterables, some of which are lists, but most of which are other dictionaries. I'd like to do glob-style assignment similar to the following:
myiter['*']['*.txt']['name'] = 'Woot'
That is, for each element in myiter, look up all elements with keys ending in '.txt' and then set their 'name' item to 'Woot'.
I've thought about sub-classing dict and using the fnmatch module. But, it's unclear to me what the best way of accomplishing this is.
The best way, I think, would be not to do it -- '*' is a perfectly valid key in a dict, so myiter['*'] has a perfectly well defined meaning and usefulness, and subverting that can definitely cause problems. How to "glob" over keys which are not strings, including the exclusively integer "keys" (indices) in elements which are lists and not mappings, is also quite a design problem.
If you nevertheless must do it, I would recommend taking full control by subclassing the abstract base class collections.MutableMapping, and implement the needed methods (__len__, __iter__, __getitem__, __setitem__, __delitem__, and, for better performance, also override others such as __contains__, which the ABC does implement on the base of the others, but slowly) in terms of a contained dict. Subclassing dict instead, as per other suggestions, would require you to override a huge number of methods to avoid inconsistent behavior between the use of "keys containing wildcards" in the methods you do override, and in those you don't.
Whether you subclass collections.MutableMapping, or dict, to make your Globbable class, you have to make a core design decision: what does yourthing[somekey] return when yourthing is a Globbable?
Presumably it has to return a different type when somekey is a string containing wildcards, versus anything else. In the latter case, one would imagine, just what is actually at that entry; but in the former, it can't just return another Globbable -- otherwise, what would yourthing[somekey] = 'bah' do in the general case? For your single "slick syntax" example, you want it to set a somekey entry in each of the items of yourthing (a HUGE semantic break with the behavior of every other mapping in the universe;-) -- but then, how would you ever set an entry in yourthing itself?!
Let's see if the Zen of Python has anything to say about this "slick syntax" for which you yearn...:
>>> import this
...
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Consider for a moment the alternative of losing the "slick syntax" (and all the huge semantic headaches it necessarily implies) in favor of clarity and simplicity (using Python 2.7-and-better syntax here, just for the dict comprehension -- use an explicit dict(...) call instead if you're stuck with 2.6 or earlier), e.g.:
def match(s, pat):
try: return fnmatch.fnmatch(s, pat)
except TypeError: return False
def sel(ds, pat):
return [d[k] for d in ds for k in d if match(k, pat)]
def set(ds, k, v):
for d in ds: d[k] = v
so your assignment might become
set(sel(sel([myiter], '*')), '*.txt'), 'name', 'Woot')
(the selection with '*' being redundant if all , I'm just omitting it). Is this so horrible as to be worth the morass of issues I've mentioned above in order to use instead
myiter['*']['*.txt']['name'] = 'Woot'
...? By far the clearest and best-performing way, of course, remains the even-simpler
def match(k, v, pat):
try:
if fnmatch.fnmatch(k, pat):
return isinstance(v, dict)
except TypeError:
return False
for k, v in myiter.items():
if match(k, v, '*'):
for sk, sv in v.items():
if match(sk, sv, '*.txt'):
sv['name'] = 'Woot'
but if you absolutely crave conciseness and compactness, despising the Zen of Python's koan "Sparse is better than dense", you can at least obtain them without the various nightmares I mentioned as needed to achieve your ideal "syntax sugar".
The best way is to subclass dict and use the fnmatch module.
subclass dict: adding functionality you want in an object-oriented way.
fnmatch module: reuse of existing functionality.
You could use fnmatch for functionality to match on dictionary keys although you would have to compromise syntax slightly, especially if you wanted to do this on a nested dictionary. Perhaps a custom dictionary-like class with a search method to return wildcard matches would work well.
Here is a VERY BASIC example that comes with a warning that this is NOT RECURSIVE and will not handle nested dictionaries:
from fnmatch import fnmatch
class GlobDict(dict):
def glob(self, match):
"""#match should be a glob style pattern match (e.g. '*.txt')"""
return dict([(k,v) for k,v in self.items() if fnmatch(k, match)])
# Start with a basic dict
basic_dict = {'file1.jpg':'image', 'file2.txt':'text', 'file3.mpg':'movie',
'file4.txt':'text'}
# Create a GlobDict from it
glob_dict = GlobDict( **basic_dict )
# Then get glob-styl results!
globbed_results = glob_dict.glob('*.txt')
# => {'file4.txt': 'text', 'file2.txt': 'text'}
As for what way is the best? The best way is the one that works. Don't try to optimize a solution before it's even created!
Following the principle of least magic, perhaps just define a recursive function, rather than subclassing dict:
import fnmatch
def set_dict_with_pat(it,key_patterns,value):
if len(key_patterns)>1:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
set_dict_with_pat(it[key],key_patterns[1:],value)
else:
for key in it:
if fnmatch.fnmatch(key,key_patterns[0]):
it[key]=value
Which could be used like this:
myiter=({'dir1':{'a.txt':{'name':'Roger'},'b.notxt':{'name':'Carl'}},'dir2':{'b.txt':{'name':'Sally'}}})
set_dict_with_pat(myiter,['*','*.txt','name'],'Woot')
print(myiter)
# {'dir2': {'b.txt': {'name': 'Woot'}}, 'dir1': {'b.notxt': {'name': 'Carl'}, 'a.txt': {'name': 'Woot'}}}

Verifying that an object in python adheres to a specific structure

Is there some simple method that can check if an input object to some function adheres to a specific structure? For example, I want only a dictionary of string keys and values that are a list of integers.
One method would be to write a recursive function that you pass in the object and you iterate over it, checking at each level it is what you expect. But I feel that there should be a more elegant way to do it than this in python.
Why would you expect Python to provide an "elegant way" to check types, since the whole idea of type-checking is so utterly alien to the Pythonic way of conceiving the world and interacting with it?! Normally in Python you'd use duck typing -- so "an integer" might equally well be an int, a long, a gmpy.mpz -- types with no relation to each other except they all implement the same core signature... just as "a dict" might be any implementation of mapping, and so forth.
The new-in-2.6-and-later concept of "abstract base classes" provides a more systematic way to implement and verify duck typing, and 3.0-and-later function annotations let you interface with such a checking system (third-party, since Python adopts no such system for the foreseeable future). For example, this recipe provides a 3.0-and-later way to perform "kinda but not quite" type checking based on function annotations -- though I doubt it goes anywhere as deep as you desire, but then, it's early times for function annotations, and most of us Pythonistas feel so little craving for such checking that we're unlikely to run flat out to implement such monumental systems in lieu of actually useful code;-).
Short answer, no, you have to create your own function.
Long answer: its not pythonic to do what you're asking. There might be some special cases (e.g, marshalling a dict to xmlrpc), but by and large, assume the objects will act like what they're documented to be. If they don't, let the AttributeError bubble up. If you are ok with coercing values, then use str() and int() to convert them. They could, afterall, implement __str__, __add__, etc that makes them not descendants of int/str, but still usable.
def dict_of_string_and_ints(obj):
assert isinstance(obj, dict)
for key, value in obj.iteritems(): # py2.4
assert isinstance(key, basestring)
assert isinstance(value, list)
assert sum(isinstance(x, int) for x in value) == len(list)
Since Python emphasizes things just working, your best bet is to just assert as you go and trust the users of your library to feed you proper data. Let exceptions happen, if you must; that's on your clients for not reading your docstring.
In your case, something like this:
def myfunction(arrrrrgs):
assert issubclass(dict, type(arrrrrgs)), "Need a dictionary!"
for key in arrrrrgs:
assert type(key) is str, "Need a string!"
val = arrrrrgs[key]
assert type(val) is list, "Need a list!"
And so forth.
Really, it isn't worth the effort, and let your program blow up if you express yourself clearly in the docstring, or throw well-placed exceptions to guide the late-night debugger.
I will take a shot and propose a helper function that can do something like that for you in a more generic+elegant way:
def check_type(value, type_def):
"""
This validates an object instanct <value> against a type template <type_def>
presented as a simplified object.
E.g.
if value is list of dictionaries that have string values as key and integers
as values:
>> check_type(value, [{'':0}])
if value is list of dictionaries, no restriction on key/values
>> check_type(value, [{}])
"""
if type(value) != type(type_def):
return False
if hasattr(value, '__iter__'):
if len(type_def) == 0:
return True
type_def_val = iter(type_def).next()
for key in value:
if not check_type(key, type_def_val):
return False
if type(value) is dict:
if not check_type(value.values(), type_def.values()):
return False
return True
The comment explains a sample of usage, but you can always go pretty deep, e.g.
>>> check_type({1:['a', 'b'], 2:['c', 'd']}, {0:['']})
True
>>> check_type({1:['a', 'b'], 2:['c', 3]}, {0:['']})
False
P.S. Feel free to modify it if you want one-by-one tuple validation (e.g. validation against ([], '', {0:0}) which is not handled as it is expected now)

Using non-hashable Python objects as keys in dictionaries

Python doesn't allow non-hashable objects to be used as keys in other dictionaries. As pointed out by Andrey Vlasovskikh, there is a nice workaround for the special case of using non-nested dictionaries as keys:
frozenset(a.items())#Can be put in the dictionary instead
Is there a method of using arbitrary objects as keys in dictionaries?
Example:
How would this be used as a key?
{"a":1, "b":{"c":10}}
It is extremely rare that you will actually have to use something like this in your code. If you think this is the case, consider changing your data model first.
Exact use case
The use case is caching calls to an arbitrary keyword only function. Each key in the dictionary is a string (the name of the argument) and the objects can be quite complicated, consisting of layered dictionaries, lists, tuples, ect.
Related problems
This sub-problem has been split off from the problem here. Solutions here deal with the case where the dictionaries is not layered.
Based off solution by Chris Lutz again.
import collections
def hashable(obj):
if isinstance(obj, collections.Hashable):
items = obj
elif isinstance(obj, collections.Mapping):
items = frozenset((k, hashable(v)) for k, v in obj.iteritems())
elif isinstance(obj, collections.Iterable):
items = tuple(hashable(item) for item in obj)
else:
raise TypeError(type(obj))
return items
Don't. I agree with Andreys comment on the previous question that is doesn't make sense to have dictionaries as keys, and especially not nested ones. Your data-model is obviously quite complex, and dictionaries are probably not the right answer. You should try some OO instead.
Based off solution by Chris Lutz. Note that this doesn't handle objects that are changed by iteration, such as streams, nor does it handle cycles.
import collections
def make_hashable(obj):
"""WARNING: This function only works on a limited subset of objects
Make a range of objects hashable.
Accepts embedded dictionaries, lists or tuples (including namedtuples)"""
if isinstance(obj, collections.Hashable):
#Fine to be hashed without any changes
return obj
elif isinstance(obj, collections.Mapping):
#Convert into a frozenset instead
items=list(obj.items())
for i, item in enumerate(items):
items[i]=make_hashable(item)
return frozenset(items)
elif isinstance(obj, collections.Iterable):
#Convert into a tuple instead
ret=[type(obj)]
for i, item in enumerate(obj):
ret.append(make_hashable(item))
return tuple(ret)
#Use the id of the object
return id(obj)
I agree with Lennart Regebro that you don't. However I often find it useful to cache some function calls, callable object and/or Flyweight objects, since they may use keyword arguments.
But if you really want it, try pickle.dumps (or cPickle if python 2.6) as a quick and dirty hack. It is much faster than any of the answers that uses recursive calls to make items immutable, and strings are hashable.
import pickle
hashable_str = pickle.dumps(unhashable_object)
If you really must, make your objects hashable. Subclass whatever you want to put in as a key, and provide a __hash__ function which returns an unique key to this object.
To illustrate:
>>> ("a",).__hash__()
986073539
>>> {'a': 'b'}.__hash__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable
If your hash is not unique enough you will get collisions. May be slow as well.
I totally disagree with comments & answers saying that this shouldn't be done for data model purity reason.
A dictionary associates an object with another object using the former one as a key. Dictionaries can't be used as keys because they're not hashable. This doesn't make any less meaningful/practical/necessary to map dictionaries to other objects.
As I understand the Python binding system, you can bind any dictionary to a number of variables (or the reverse, depends on your terminology) which means that these variables all know the same unique 'pointer' to that dictionary. Wouldn't it be possible to use that identifier as the hashing key ?
If your data model ensures/enforces that you can't have two dictionaries with the same content used as keys then that seems to be a safe technique to me.
I should add that I have no idea whatsoever of how that can/should be done though.
I'm not entirely whether this should be an answer or a comment. Please correct me if needed.
With recursion!
def make_hashable(h):
items = h.items()
for item in items:
if type(items) == dict:
item = make_hashable(item)
return frozenset(items)
You can add other type tests for any other mutable types you want to make hashable. It shouldn't be hard.
I encountered this issue when using a decorator that caches the results of previous calls based on call signature. I do not agree with the comments/answers here to the effect of "you should not do this", but I think it is important to recognize the potential for surprising and unexpected behavior when going down this path. My thought is that since instances are both mutable and hashable, and it does not seem practical to change that, there is nothing inherently wrong with creating hashable equivalents of non-hashable types or objects. But of course that is only my opinion.
For anyone who requires Python 2.5 compatibility, the below may be useful. I based it on the earlier answer.
from itertools import imap
tuplemap = lambda f, data: tuple(imap(f, data))
def make_hashable(obj):
u"Returns a deep, non-destructive conversion of given object to an equivalent hashable object"
if isinstance(obj, list):
return tuplemap(make_hashable, iter(obj))
elif isinstance(obj, dict):
return frozenset(tuplemap(make_hashable, obj.iteritems()))
elif hasattr(obj, '__hash__') and callable(obj.__hash__):
try:
obj.__hash__()
except:
if hasattr(obj, '__iter__') and callable(obj.__iter__):
return tuplemap(make_hashable, iter(obj))
else:
raise NotImplementedError, 'object of type %s cannot be made hashable' % (type(obj),)
else:
return obj
elif hasattr(obj, '__iter__') and callable(obj.__iter__):
return tuplemap(make_hashable, iter(obj))
else:
raise NotImplementedError, 'object of type %s cannot be made hashable' % (type, obj)

Categories

Resources