How can I convert nested dictionary to nested defaultdict?
dic = {"a": {"aa": "xxx"}}
default = defaultdict(lambda: None, dic)
print(default["dummy_key"]) # return None
print(default["a"]["dummy_key"]) # KeyError
You need to either loop or recurse over the nested dictionary, through all of its levels.
Unless it's potentially ridiculously deep (as in hundreds of levels), or so wide that small performance factors make a difference, recursion is probably simplest here:
def defaultify(d):
if not isinstance(d, dict):
return d
return defaultdict(lambda: None, {k: defaultify(v) for k, v in d.items()})
Or if you want it to work with all mappings, not just dicts, you could use collections.abc.Mapping instead of dict in your isinstance check.
Of course this is assuming you have a pure nested dict. If you've got, say, something you parsed from a typical JSON response, where there might be dicts with list values with dict elements, you have to handle the other possibilities too:
def defaultify(d):
if isinstance(d, dict):
return defaultdict(lambda: None, {k: defaultify(v) for k, v in d.items()})
elif isinstance(d, list):
return [defaultify(e) for e in d]
else:
return d
But if this actually is coming from JSON, it's probably better to just use your defaultdict as an object_pairs_hook while the JSON is being parsed, rather than parsing it to a dict and then converting it to a defaultdict later.
There's an example in the docs of using an OrderedDict in place of dict, but that won't quite work for us—unlike OrderedDict and dict, defaultdict can't just take an iterable of pairs as its only argument; it needs the default value factory first. So we can bind that in, using functools.partial:
d = json.loads(jsonstring, object_hook_pairs=partial(defaultdict, lambda: None))
And so on.
Related
There's a common problem where I need to keep track of a bunch of collections in a dictionary. Let's say I want to keep track of which items I borrowed from my friends. The defaultdict class is quite useful to do this:
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
# defaultdict(<class 'set'>, {'Peter': {'salt'}, 'Eric': {'jacket', 'car'}})
This allows me to add items to the respective sets without worrying if any key is already in the set. Now if I return the salt to Peter. This means I owe him nothing and he can be removed from the dictionary. Doing this is slightly more cumbersome.
d['Peter'].remove('salt')
if not d['Peter']:
del(d['Peter'])
I know I could put this in some function, but for readability I would like a class that removes the key automatically if the corresponding set is empty. Is there some way to do this?
Edit
Okay I realize a pretty major problem with this idea when trying to solve it using inheritance and changing the index function. This is that that when calling d[index] the value is obviously returned already before calling .remove(something), which makes it impossible for the dictionary to know that it has been emptied. I'm guessing there's not really a way around using something different.
The problem with using a defaultdict to do what you want is that even accessing a key sets that key using the factory function. Consider:
from collections import defaultdict
d = defaultdict(set)
if d["Peter"]:
print("I owe something to Peter")
print(d)
# defaultdict(set, {'Peter': set()})
Also, the problem with creating a sub-class, as you've realized, the __getitem__() method is called before the set is ever emptied, so you'd have to call another function that checks if the set is empty and remove it.
A better idea might be to just not include keys with empty sets when you're creating the string representation.
class NewDefaultDict(defaultdict):
def __repr__(self):
return (f"NewDefaultDict({repr(self.default_factory)}, {{" +
", ".join(f"{repr(k)}: {repr(v)}" for k, v in self.items() if v) +
"})")
nd = NewDefaultDict(set)
nd["Peter"].add("salt")
nd["Paul"].add("pepper")
nd["Paul"].remove("pepper")
print(nd)
# NewDefaultDict(<class 'set'>, {'Peter': {'salt'}})
You would also need to redefine __contains__() to check if the value is empty, so that e.g. "Paul" in nd returns False:
def __contains__(self, key):
return defaultdict.__contains__(self, key) and self[key]
To make it compatible with for ... in nd constructs and dict-unpacking, you can redefine __iter__():
def __iter__(self):
for key in defaultdict.__iter__(self):
if self[key]: yield key
Then,
for k in nd:
print(k)
gives:
Peter
A dictionary comprehension might be useful.
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
d['Peter'].remove('salt')
d2 = {k: v for k, v in d.items() if len(v) > 0}
The d2 dictionary is now:
{'Eric': {'car', 'jacket'}}
Alternatively, using the fact that an empty set is considered false in Python.
d2 = {k: v for k, v in d.items() if v}
Defining a class to implement this logic, similar to the other answer, we can simply ignore keys/values where the value meets a criteria. A function is passed using the ignore parameter to define that criteria.
from collections import defaultdict
class default_ignore_dict(defaultdict):
def __init__(self, factory, ignore, *args, **kwargs):
defaultdict.__init__(self, factory, *args, **kwargs)
self.ignore = ignore
def __contains__(self, key):
return defaultdict.__contains__(self, key) and not self.ignore(self[key])
def items(self):
return ((k, v) for k, v in defaultdict.items(self) if not self.ignore(v))
def keys(self):
return (k for k, _ in self.items())
def values(self):
return (v for _, v in self.items())
Testing this:
>>> d = default_ignore_dict(set, lambda s: not s)
>>> d['Peter'].add('salt')
>>> d['Peter'].remove('salt')
>>> d['Eric'].add('car')
>>> d['Eric'].add('jacket')
>>>
>>> 'Peter' in d
False
>>> list(d.items())
[('Eric', {'car', 'jacket'})]
>>>
Given a dictionary { k1: v1, k2: v2 ... } I want to get { k1: f(v1), k2: f(v2) ... } provided I pass a function f.
Is there any such built in function? Or do I have to do
dict([(k, f(v)) for (k, v) in my_dictionary.iteritems()])
Ideally I would just write
my_dictionary.map_values(f)
or
my_dictionary.mutate_values_with(f)
That is, it doesn't matter to me if the original dictionary is mutated or a copy is created.
There is no such function; the easiest way to do this is to use a dict comprehension:
my_dictionary = {k: f(v) for k, v in my_dictionary.items()}
In python 2.7, use the .iteritems() method instead of .items() to save memory. The dict comprehension syntax wasn't introduced until python 2.7.
Note that there is no such method on lists either; you'd have to use a list comprehension or the map() function.
As such, you could use the map() function for processing your dict as well:
my_dictionary = dict(map(lambda kv: (kv[0], f(kv[1])), my_dictionary.iteritems()))
but that's not that readable, really.
These toolz are great for this kind of simple yet repetitive logic.
http://toolz.readthedocs.org/en/latest/api.html#toolz.dicttoolz.valmap
Gets you right where you want to be.
import toolz
def f(x):
return x+1
toolz.valmap(f, my_list)
Due to PEP-0469 which renamed iteritems() to items() and PEP-3113 which removed Tuple parameter unpacking, in Python 3.x you should write Martijn Pieters♦ answer like this:
my_dictionary = dict(map(lambda item: (item[0], f(item[1])), my_dictionary.items()))
You can do this in-place, rather than create a new dict, which may be preferable for large dictionaries (if you do not need a copy).
def mutate_dict(f,d):
for k, v in d.iteritems():
d[k] = f(v)
my_dictionary = {'a':1, 'b':2}
mutate_dict(lambda x: x+1, my_dictionary)
results in my_dictionary containing:
{'a': 2, 'b': 3}
While my original answer missed the point (by trying to solve this problem with the solution to Accessing key in factory of defaultdict), I have reworked it to propose an actual solution to the present question.
Here it is:
class walkableDict(dict):
def walk(self, callback):
try:
for key in self:
self[key] = callback(self[key])
except TypeError:
return False
return True
Usage:
>>> d = walkableDict({ k1: v1, k2: v2 ... })
>>> d.walk(f)
The idea is to subclass the original dict to give it the desired functionality: "mapping" a function over all the values.
The plus point is that this dictionary can be used to store the original data as if it was a dict, while transforming any data on request with a callback.
Of course, feel free to name the class and the function the way you want (the name chosen in this answer is inspired by PHP's array_walk() function).
Note: Neither the try-except block nor the return statements are mandatory for the functionality, they are there to further mimic the behavior of the PHP's array_walk.
To avoid doing indexing from inside lambda, like:
rval = dict(map(lambda kv : (kv[0], ' '.join(kv[1])), rval.iteritems()))
You can also do:
rval = dict(map(lambda(k,v) : (k, ' '.join(v)), rval.iteritems()))
Just came accross this use case. I implemented gens's answer, adding a recursive approach for handling values that are also dicts:
def mutate_dict_in_place(f, d):
for k, v in d.iteritems():
if isinstance(v, dict):
mutate_dict_in_place(f, v)
else:
d[k] = f(v)
# Exemple handy usage
def utf8_everywhere(d):
mutate_dict_in_place((
lambda value:
value.decode('utf-8')
if isinstance(value, bytes)
else value
),
d
)
my_dict = {'a': b'byte1', 'b': {'c': b'byte2', 'd': b'byte3'}}
utf8_everywhere(my_dict)
print(my_dict)
This can be useful when dealing with json or yaml files that encode strings as bytes in Python 2
My way to map over dictionary
def f(x): return x+2
bill = {"Alice": 20, "Bob": 10}
d = {map(lambda x: f(x),bill.values())}
print('d: ',dict(d))
Results
: d: {22: 12}
Map over iterable in values within dictionary
bills = {"Alice": [20, 15, 30], "Bob": [10, 35]}
d= {map(lambda v: sum(v),bills.values())}
g= dict(map(lambda v: (v[0],sum(v[1])),bills.items()))
# prints
print('d: ',dict(d))
print('g: ',g)
Results
d: {65: 45}
g: {'Alice': 65, 'Bob': 45}
I have a dataset which might have n level of ordered dictionary of ordered dictionaries which might be again inside list of tuples or tuples or just lists,Now i need to convert all of them into normal dictionaries,Is there a easier method to do other than recursive search and conversion.
from collections import OrderedDict
def ordered_to_regular_dict(d):
if isinstance(d, OrderedDict):
d = {k: ordered_to_regular_dict(v) for k, v in d.items()}
return d
I got an answer from stack overflow which helps with ordered dictionary of ordered dictionaries but not with the dictionaries inside list of tuple or ordered dictionary inside a list or a tuple.
Why not just writing an if for every possibility (tuple, list, dict) like that:
from collections import OrderedDict
def ordered_to_regular_dict(d):
if isinstance(d, OrderedDict) or isinstance(d, dict):
d = {k: ordered_to_regular_dict(v) for k, v in d.items()}
elif isinstance(d, list):
d = [ordered_to_regular_dict(v) for v in d.items()]
elif isinstance(d, tuple):
d = (ordered_to_regular_dict(v) for v in d.items())
return d
You should leverage Python's builtin copy mechanism.
You can override copying behavior for OrderedDict via Python's copyreg module (also used by pickle). Then you can use Python's builtin copy.deepcopy() function to perform the conversion.
import copy
import copyreg
from collections import OrderedDict
def convert_nested_ordered_dict(x):
"""
Perform a deep copy of the given object, but convert
all internal OrderedDicts to plain dicts along the way.
Args:
x: Any pickleable object
Returns:
A copy of the input, in which all OrderedDicts contained
anywhere in the input (as iterable items or attributes, etc.)
have been converted to plain dicts.
"""
# Temporarily install a custom pickling function
# (used by deepcopy) to convert OrderedDict to dict.
orig_pickler = copyreg.dispatch_table.get(OrderedDict, None)
copyreg.pickle(
OrderedDict,
lambda d: (dict, ([*d.items()],))
)
try:
return copy.deepcopy(x)
finally:
# Restore the original OrderedDict pickling function (if any)
del copyreg.dispatch_table[OrderedDict]
if orig_pickler:
copyreg.dispatch_table[OrderedDict] = orig_pickler
Merely by using Python's builtin copying infrastructure, this solution has several nice properties:
Works for more than just JSON data.
Works for arbitrary data hierarchies.
Does not require you to implement special logic for each possible element type (e.g. list, tuple, etc.)
deepcopy() will properly handle duplicate objects within the collection:
x = [1,2,3]
d = {'a': x, 'b': x}
assert d['a'] is d['b']
d2 = copy.deepcopy(d)
assert d2['a'] is d2['b']
Since our solution is based on deepcopy() we'll have the same advantage.
This solution also converts attributes that happen to be OrderedDict, not only collection elements:
class C:
def __init__(self, a):
self.a = a
def __repr__(self):
return f"C(a={self.a})"
c = C(OrderedDict([(1, 'one'), (2, 'two')]))
print("original: ", c)
print("converted:", convert_nested_ordered_dict(c))
original: C(a=OrderedDict([(1, 'one'), (2, 'two')]))
converted: C(a={1: 'one', 2: 'two'})
I have a defaultdict(Set):
from sets import Set
from collections import defaultdict
values = defaultdict(Set)
I want the Set functionality when building it up in order to remove duplicates. Next step I want to store this as json. Since json doesn't support this datastructure I would like to convert the datastructure into a defaultdict(list) but when I try:
defaultdict(list)(values)
I get: TypeError: 'collections.defaultdict' object is not callable, how should I do the conversion?
You can use following:
>>> values = defaultdict(Set)
>>> values['a'].add(1)
>>> defaultdict(list, ((k, list(v)) for k, v in values.items()))
defaultdict(<type 'list'>, {'a': [1]})
defaultdict constructor takes default_factory as a first argument which can be followed by the same arguments as in normal dict. In this case the second argument is a generator expression that returns tuples consisting key and value.
Note that if you only need to store it as a JSON normal dict will do just fine:
>>> {k: list(v) for k, v in values.items()}
{'a': [1]}
defaultdict(list, values)
The defaultdict constructor works like the dict constructor with a mandatory default_factory argument in front. However, this won't convert any existing values from Sets to lists. If you want to do that, you need to do it manually:
defaultdict(list, ((k, list(v)) for k, v in values.viewitems()))
You might not even want a defaultdict at all at that point, though:
{k: list(v) for k, v in values.viewitems()}
Say that a = set(), and you have populated it already with unique values. Then, when using defaultdict you could cast it into a list: defaultdict(list(a))
Given a dictionary { k1: v1, k2: v2 ... } I want to get { k1: f(v1), k2: f(v2) ... } provided I pass a function f.
Is there any such built in function? Or do I have to do
dict([(k, f(v)) for (k, v) in my_dictionary.iteritems()])
Ideally I would just write
my_dictionary.map_values(f)
or
my_dictionary.mutate_values_with(f)
That is, it doesn't matter to me if the original dictionary is mutated or a copy is created.
There is no such function; the easiest way to do this is to use a dict comprehension:
my_dictionary = {k: f(v) for k, v in my_dictionary.items()}
In python 2.7, use the .iteritems() method instead of .items() to save memory. The dict comprehension syntax wasn't introduced until python 2.7.
Note that there is no such method on lists either; you'd have to use a list comprehension or the map() function.
As such, you could use the map() function for processing your dict as well:
my_dictionary = dict(map(lambda kv: (kv[0], f(kv[1])), my_dictionary.iteritems()))
but that's not that readable, really.
These toolz are great for this kind of simple yet repetitive logic.
http://toolz.readthedocs.org/en/latest/api.html#toolz.dicttoolz.valmap
Gets you right where you want to be.
import toolz
def f(x):
return x+1
toolz.valmap(f, my_list)
Due to PEP-0469 which renamed iteritems() to items() and PEP-3113 which removed Tuple parameter unpacking, in Python 3.x you should write Martijn Pieters♦ answer like this:
my_dictionary = dict(map(lambda item: (item[0], f(item[1])), my_dictionary.items()))
You can do this in-place, rather than create a new dict, which may be preferable for large dictionaries (if you do not need a copy).
def mutate_dict(f,d):
for k, v in d.iteritems():
d[k] = f(v)
my_dictionary = {'a':1, 'b':2}
mutate_dict(lambda x: x+1, my_dictionary)
results in my_dictionary containing:
{'a': 2, 'b': 3}
While my original answer missed the point (by trying to solve this problem with the solution to Accessing key in factory of defaultdict), I have reworked it to propose an actual solution to the present question.
Here it is:
class walkableDict(dict):
def walk(self, callback):
try:
for key in self:
self[key] = callback(self[key])
except TypeError:
return False
return True
Usage:
>>> d = walkableDict({ k1: v1, k2: v2 ... })
>>> d.walk(f)
The idea is to subclass the original dict to give it the desired functionality: "mapping" a function over all the values.
The plus point is that this dictionary can be used to store the original data as if it was a dict, while transforming any data on request with a callback.
Of course, feel free to name the class and the function the way you want (the name chosen in this answer is inspired by PHP's array_walk() function).
Note: Neither the try-except block nor the return statements are mandatory for the functionality, they are there to further mimic the behavior of the PHP's array_walk.
To avoid doing indexing from inside lambda, like:
rval = dict(map(lambda kv : (kv[0], ' '.join(kv[1])), rval.iteritems()))
You can also do:
rval = dict(map(lambda(k,v) : (k, ' '.join(v)), rval.iteritems()))
Just came accross this use case. I implemented gens's answer, adding a recursive approach for handling values that are also dicts:
def mutate_dict_in_place(f, d):
for k, v in d.iteritems():
if isinstance(v, dict):
mutate_dict_in_place(f, v)
else:
d[k] = f(v)
# Exemple handy usage
def utf8_everywhere(d):
mutate_dict_in_place((
lambda value:
value.decode('utf-8')
if isinstance(value, bytes)
else value
),
d
)
my_dict = {'a': b'byte1', 'b': {'c': b'byte2', 'd': b'byte3'}}
utf8_everywhere(my_dict)
print(my_dict)
This can be useful when dealing with json or yaml files that encode strings as bytes in Python 2
My way to map over dictionary
def f(x): return x+2
bill = {"Alice": 20, "Bob": 10}
d = {map(lambda x: f(x),bill.values())}
print('d: ',dict(d))
Results
: d: {22: 12}
Map over iterable in values within dictionary
bills = {"Alice": [20, 15, 30], "Bob": [10, 35]}
d= {map(lambda v: sum(v),bills.values())}
g= dict(map(lambda v: (v[0],sum(v[1])),bills.items()))
# prints
print('d: ',dict(d))
print('g: ',g)
Results
d: {65: 45}
g: {'Alice': 65, 'Bob': 45}