Converting ordered dictionaries to dictionary - python

I have a dataset which might have n level of ordered dictionary of ordered dictionaries which might be again inside list of tuples or tuples or just lists,Now i need to convert all of them into normal dictionaries,Is there a easier method to do other than recursive search and conversion.
from collections import OrderedDict
def ordered_to_regular_dict(d):
if isinstance(d, OrderedDict):
d = {k: ordered_to_regular_dict(v) for k, v in d.items()}
return d
I got an answer from stack overflow which helps with ordered dictionary of ordered dictionaries but not with the dictionaries inside list of tuple or ordered dictionary inside a list or a tuple.

Why not just writing an if for every possibility (tuple, list, dict) like that:
from collections import OrderedDict
def ordered_to_regular_dict(d):
if isinstance(d, OrderedDict) or isinstance(d, dict):
d = {k: ordered_to_regular_dict(v) for k, v in d.items()}
elif isinstance(d, list):
d = [ordered_to_regular_dict(v) for v in d.items()]
elif isinstance(d, tuple):
d = (ordered_to_regular_dict(v) for v in d.items())
return d

You should leverage Python's builtin copy mechanism.
You can override copying behavior for OrderedDict via Python's copyreg module (also used by pickle). Then you can use Python's builtin copy.deepcopy() function to perform the conversion.
import copy
import copyreg
from collections import OrderedDict
def convert_nested_ordered_dict(x):
"""
Perform a deep copy of the given object, but convert
all internal OrderedDicts to plain dicts along the way.
Args:
x: Any pickleable object
Returns:
A copy of the input, in which all OrderedDicts contained
anywhere in the input (as iterable items or attributes, etc.)
have been converted to plain dicts.
"""
# Temporarily install a custom pickling function
# (used by deepcopy) to convert OrderedDict to dict.
orig_pickler = copyreg.dispatch_table.get(OrderedDict, None)
copyreg.pickle(
OrderedDict,
lambda d: (dict, ([*d.items()],))
)
try:
return copy.deepcopy(x)
finally:
# Restore the original OrderedDict pickling function (if any)
del copyreg.dispatch_table[OrderedDict]
if orig_pickler:
copyreg.dispatch_table[OrderedDict] = orig_pickler
Merely by using Python's builtin copying infrastructure, this solution has several nice properties:
Works for more than just JSON data.
Works for arbitrary data hierarchies.
Does not require you to implement special logic for each possible element type (e.g. list, tuple, etc.)
deepcopy() will properly handle duplicate objects within the collection:
x = [1,2,3]
d = {'a': x, 'b': x}
assert d['a'] is d['b']
d2 = copy.deepcopy(d)
assert d2['a'] is d2['b']
Since our solution is based on deepcopy() we'll have the same advantage.
This solution also converts attributes that happen to be OrderedDict, not only collection elements:
class C:
def __init__(self, a):
self.a = a
def __repr__(self):
return f"C(a={self.a})"
c = C(OrderedDict([(1, 'one'), (2, 'two')]))
print("original: ", c)
print("converted:", convert_nested_ordered_dict(c))
original: C(a=OrderedDict([(1, 'one'), (2, 'two')]))
converted: C(a={1: 'one', 2: 'two'})

Related

Simple way to remove empty sets from dict

There's a common problem where I need to keep track of a bunch of collections in a dictionary. Let's say I want to keep track of which items I borrowed from my friends. The defaultdict class is quite useful to do this:
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
# defaultdict(<class 'set'>, {'Peter': {'salt'}, 'Eric': {'jacket', 'car'}})
This allows me to add items to the respective sets without worrying if any key is already in the set. Now if I return the salt to Peter. This means I owe him nothing and he can be removed from the dictionary. Doing this is slightly more cumbersome.
d['Peter'].remove('salt')
if not d['Peter']:
del(d['Peter'])
I know I could put this in some function, but for readability I would like a class that removes the key automatically if the corresponding set is empty. Is there some way to do this?
Edit
Okay I realize a pretty major problem with this idea when trying to solve it using inheritance and changing the index function. This is that that when calling d[index] the value is obviously returned already before calling .remove(something), which makes it impossible for the dictionary to know that it has been emptied. I'm guessing there's not really a way around using something different.
The problem with using a defaultdict to do what you want is that even accessing a key sets that key using the factory function. Consider:
from collections import defaultdict
d = defaultdict(set)
if d["Peter"]:
print("I owe something to Peter")
print(d)
# defaultdict(set, {'Peter': set()})
Also, the problem with creating a sub-class, as you've realized, the __getitem__() method is called before the set is ever emptied, so you'd have to call another function that checks if the set is empty and remove it.
A better idea might be to just not include keys with empty sets when you're creating the string representation.
class NewDefaultDict(defaultdict):
def __repr__(self):
return (f"NewDefaultDict({repr(self.default_factory)}, {{" +
", ".join(f"{repr(k)}: {repr(v)}" for k, v in self.items() if v) +
"})")
nd = NewDefaultDict(set)
nd["Peter"].add("salt")
nd["Paul"].add("pepper")
nd["Paul"].remove("pepper")
print(nd)
# NewDefaultDict(<class 'set'>, {'Peter': {'salt'}})
You would also need to redefine __contains__() to check if the value is empty, so that e.g. "Paul" in nd returns False:
def __contains__(self, key):
return defaultdict.__contains__(self, key) and self[key]
To make it compatible with for ... in nd constructs and dict-unpacking, you can redefine __iter__():
def __iter__(self):
for key in defaultdict.__iter__(self):
if self[key]: yield key
Then,
for k in nd:
print(k)
gives:
Peter
A dictionary comprehension might be useful.
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
d['Peter'].remove('salt')
d2 = {k: v for k, v in d.items() if len(v) > 0}
The d2 dictionary is now:
{'Eric': {'car', 'jacket'}}
Alternatively, using the fact that an empty set is considered false in Python.
d2 = {k: v for k, v in d.items() if v}
Defining a class to implement this logic, similar to the other answer, we can simply ignore keys/values where the value meets a criteria. A function is passed using the ignore parameter to define that criteria.
from collections import defaultdict
class default_ignore_dict(defaultdict):
def __init__(self, factory, ignore, *args, **kwargs):
defaultdict.__init__(self, factory, *args, **kwargs)
self.ignore = ignore
def __contains__(self, key):
return defaultdict.__contains__(self, key) and not self.ignore(self[key])
def items(self):
return ((k, v) for k, v in defaultdict.items(self) if not self.ignore(v))
def keys(self):
return (k for k, _ in self.items())
def values(self):
return (v for _, v in self.items())
Testing this:
>>> d = default_ignore_dict(set, lambda s: not s)
>>> d['Peter'].add('salt')
>>> d['Peter'].remove('salt')
>>> d['Eric'].add('car')
>>> d['Eric'].add('jacket')
>>>
>>> 'Peter' in d
False
>>> list(d.items())
[('Eric', {'car', 'jacket'})]
>>>

How can I convert nested dictionary to defaultdict?

How can I convert nested dictionary to nested defaultdict?
dic = {"a": {"aa": "xxx"}}
default = defaultdict(lambda: None, dic)
print(default["dummy_key"]) # return None
print(default["a"]["dummy_key"]) # KeyError
You need to either loop or recurse over the nested dictionary, through all of its levels.
Unless it's potentially ridiculously deep (as in hundreds of levels), or so wide that small performance factors make a difference, recursion is probably simplest here:
def defaultify(d):
if not isinstance(d, dict):
return d
return defaultdict(lambda: None, {k: defaultify(v) for k, v in d.items()})
Or if you want it to work with all mappings, not just dicts, you could use collections.abc.Mapping instead of dict in your isinstance check.
Of course this is assuming you have a pure nested dict. If you've got, say, something you parsed from a typical JSON response, where there might be dicts with list values with dict elements, you have to handle the other possibilities too:
def defaultify(d):
if isinstance(d, dict):
return defaultdict(lambda: None, {k: defaultify(v) for k, v in d.items()})
elif isinstance(d, list):
return [defaultify(e) for e in d]
else:
return d
But if this actually is coming from JSON, it's probably better to just use your defaultdict as an object_pairs_hook while the JSON is being parsed, rather than parsing it to a dict and then converting it to a defaultdict later.
There's an example in the docs of using an OrderedDict in place of dict, but that won't quite work for us—unlike OrderedDict and dict, defaultdict can't just take an iterable of pairs as its only argument; it needs the default value factory first. So we can bind that in, using functools.partial:
d = json.loads(jsonstring, object_hook_pairs=partial(defaultdict, lambda: None))
And so on.

How can I convert defaultdict(Set) to defaultdict(list)?

I have a defaultdict(Set):
from sets import Set
from collections import defaultdict
values = defaultdict(Set)
I want the Set functionality when building it up in order to remove duplicates. Next step I want to store this as json. Since json doesn't support this datastructure I would like to convert the datastructure into a defaultdict(list) but when I try:
defaultdict(list)(values)
I get: TypeError: 'collections.defaultdict' object is not callable, how should I do the conversion?
You can use following:
>>> values = defaultdict(Set)
>>> values['a'].add(1)
>>> defaultdict(list, ((k, list(v)) for k, v in values.items()))
defaultdict(<type 'list'>, {'a': [1]})
defaultdict constructor takes default_factory as a first argument which can be followed by the same arguments as in normal dict. In this case the second argument is a generator expression that returns tuples consisting key and value.
Note that if you only need to store it as a JSON normal dict will do just fine:
>>> {k: list(v) for k, v in values.items()}
{'a': [1]}
defaultdict(list, values)
The defaultdict constructor works like the dict constructor with a mandatory default_factory argument in front. However, this won't convert any existing values from Sets to lists. If you want to do that, you need to do it manually:
defaultdict(list, ((k, list(v)) for k, v in values.viewitems()))
You might not even want a defaultdict at all at that point, though:
{k: list(v) for k, v in values.viewitems()}
Say that a = set(), and you have populated it already with unique values. Then, when using defaultdict you could cast it into a list: defaultdict(list(a))

Merging two dictionaries with order saving

I have two dictionaries:
a = {u'Anthracite': [u'3/optimized/8593793_fpx.tif'],
u'Black': [u'6/optimized/8593796_fpx.tif'],
u'Cobalt': [u'9/optimized/8593799_fpx.tif'],
u'Fire': [u'2/optimized/8593802_fpx.tif'],
u'Fuschia': [u'5/optimized/8593805_fpx.tif'],
u'Iris': [u'8/optimized/8593808_fpx.tif'],
u'Midnight': [u'1/optimized/8593811_fpx.tif']}
b = {u'Anthracite': [u'5/optimized/8593795_fpx.tif'],
u'Black': [u'8/optimized/8593798_fpx.tif'],
u'Cobalt': [u'1/optimized/8593801_fpx.tif'],
u'Fire': [u'4/optimized/8593804_fpx.tif'],
u'Fuschia': [u'7/optimized/8593807_fpx.tif'],
u'Iris': [u'0/optimized/8593810_fpx.tif'],
u'Midnight': [u'3/optimized/8593813_fpx.tif']}
I need to produce such dict:
c = {u'Anthracite': [u'3/optimized/8593793_fpx.tif', u'5/optimized/8593795_fpx.tif'],
u'Black': [u'6/optimized/8593796_fpx.tif', u'8/optimized/8593798_fpx.tif'],
....
}
So I need to collect all items from lists with same keys, but I need to save first order.
Dictionaries always have same keys
I have try to do this with zip but I`m getting total mess
Why not just iterating over the dictionaries and copy them to a new dictionary? A defaultdict is used in the following code for simplicity :
from collections import defaultdict
c = defaultdict(list)
a = {"foo": ["bar"]}
b = {"foo": ["baz"], "bah": ["foo"]}
for k, v in a.items() + b.items():
c[k].extend(v)
If the keys are the same, you can copy the first dictionary and update its content :
d = a.copy()
for k, v in b.iteritems():
d[k].extend(v)
Note that the latter creates a shallow copy and therefore the dictionary a is also modified during the process.
If you want alphabetical order, use an OrderedDict and sort the keys:
from collections import OrderedDict
srt_keys = sorted(a.keys())
d = OrderedDict()
for k in srt_keys:
d[k] = a[k]
d[k] += b[k]
print d
OrderedDict([(u'Anthracite', [u'3/optimized/8593793_fpx.tif', u'5/optimized/8593795_fpx.tif']), (u'Black', [u'6/optimized/8593796_fpx.tif', u'8/optimized/8593798_fpx.tif']), (u'Cobalt', [u'9/optimized/8593799_fpx.tif', u'1/optimized/8593801_fpx.tif']), (u'Fire', [u'2/optimized/8593802_fpx.tif', u'4/optimized/8593804_fpx.tif']), (u'Fuschia', [u'5/optimized/8593805_fpx.tif', u'7/optimized/8593807_fpx.tif']), (u'Iris', [u'8/optimized/8593808_fpx.tif', u'0/optimized/8593810_fpx.tif']), (u'Midnight', [u'1/optimized/8593811_fpx.tif', u'3/optimized/8593813_fpx.tif'])])
How about using OrderedDict with a tuple list to set initial order. and then simply maintaining it.
Check my answer here for nicer dict syntax: Override the {...} notation so i get an OrderedDict() instead of a dict()?
from collections import OrderedDict
#Use an ordered dict, with a tuple list init to maintain initial order
a = OrderedDict([
(u'Anthracite', [u'3/optimized/8593793_fpx.tif']),
(u'Black', [u'6/optimized/8593796_fpx.tif']),
(u'Cobalt', [u'9/optimized/8593799_fpx.tif']),
(u'Fire', [u'2/optimized/8593802_fpx.tif']),
(u'Fuschia', [u'5/optimized/8593805_fpx.tif']),
(u'Iris', [u'8/optimized/8593808_fpx.tif']),
(u'Midnight', [u'1/optimized/8593811_fpx.tif'])
])
#We don't care about b's order
b = {u'Anthracite': [u'5/optimized/8593795_fpx.tif'],
u'Black': [u'8/optimized/8593798_fpx.tif'],
u'Cobalt': [u'1/optimized/8593801_fpx.tif'],
u'Fire': [u'4/optimized/8593804_fpx.tif'],
u'Fuschia': [u'7/optimized/8593807_fpx.tif'],
u'Iris': [u'0/optimized/8593810_fpx.tif'],
u'Midnight': [u'3/optimized/8593813_fpx.tif']}
merge = OrderedDict()
#Since b has the same keys as a(we don't need to care for diffrent keys), but we want a's order
for key in a:
#We insert by order to an OrderedDict so the same order will be maintained
merge[key] = a[key] + b[key]

Override the {...} notation so i get an OrderedDict() instead of a dict()?

Update: dicts retaining insertion order is guaranteed for Python 3.7+
I want to use a .py file like a config file.
So using the {...} notation I can create a dictionary using strings as keys but the definition order is lost in a standard python dictionary.
My question: is it possible to override the {...} notation so that I get an OrderedDict() instead of a dict()?
I was hoping that simply overriding dict constructor with OrderedDict (dict = OrderedDict) would work, but it doesn't.
Eg:
dict = OrderedDict
dictname = {
'B key': 'value1',
'A key': 'value2',
'C key': 'value3'
}
print dictname.items()
Output:
[('B key', 'value1'), ('A key', 'value2'), ('C key', 'value3')]
Here's a hack that almost gives you the syntax you want:
class _OrderedDictMaker(object):
def __getitem__(self, keys):
if not isinstance(keys, tuple):
keys = (keys,)
assert all(isinstance(key, slice) for key in keys)
return OrderedDict([(k.start, k.stop) for k in keys])
ordereddict = _OrderedDictMaker()
from nastyhacks import ordereddict
menu = ordereddict[
"about" : "about",
"login" : "login",
'signup': "signup"
]
Edit: Someone else discovered this independently, and has published the odictliteral package on PyPI that provides a slightly more thorough implementation - use that package instead
To literally get what you are asking for, you have to fiddle with the syntax tree of your file. I don't think it is advisable to do so, but I couldn't resist the temptation to try. So here we go.
First, we create a module with a function my_execfile() that works like the built-in execfile(), except that all occurrences of dictionary displays, e.g. {3: 4, "a": 2} are replaced by explicit calls to the dict() constructor, e.g. dict([(3, 4), ('a', 2)]). (Of course we could directly replace them by calls to collections.OrderedDict(), but we don't want to be too intrusive.) Here's the code:
import ast
class DictDisplayTransformer(ast.NodeTransformer):
def visit_Dict(self, node):
self.generic_visit(node)
list_node = ast.List(
[ast.copy_location(ast.Tuple(list(x), ast.Load()), x[0])
for x in zip(node.keys, node.values)],
ast.Load())
name_node = ast.Name("dict", ast.Load())
new_node = ast.Call(ast.copy_location(name_node, node),
[ast.copy_location(list_node, node)],
[], None, None)
return ast.copy_location(new_node, node)
def my_execfile(filename, globals=None, locals=None):
if globals is None:
globals = {}
if locals is None:
locals = globals
node = ast.parse(open(filename).read())
transformed = DictDisplayTransformer().visit(node)
exec compile(transformed, filename, "exec") in globals, locals
With this modification in place, we can modify the behaviour of dictionary displays by overwriting dict. Here is an example:
# test.py
from collections import OrderedDict
print {3: 4, "a": 2}
dict = OrderedDict
print {3: 4, "a": 2}
Now we can run this file using my_execfile("test.py"), yielding the output
{'a': 2, 3: 4}
OrderedDict([(3, 4), ('a', 2)])
Note that for simplicity, the above code doesn't touch dictionary comprehensions, which should be transformed to generator expressions passed to the dict() constructor. You'd need to add a visit_DictComp() method to the DictDisplayTransformer class. Given the above example code, this should be straight-forward.
Again, I don't recommend this kind of messing around with the language semantics. Did you have a look into the ConfigParser module?
OrderedDict is not "standard python syntax", however, an ordered set of key-value pairs (in standard python syntax) is simply:
[('key1 name', 'value1'), ('key2 name', 'value2'), ('key3 name', 'value3')]
To explicitly get an OrderedDict:
OrderedDict([('key1 name', 'value1'), ('key2 name', 'value2'), ('key3 name', 'value3')])
Another alternative, is to sort dictname.items(), if that's all you need:
sorted(dictname.items())
As of python 3.6, all dictionaries will be ordered by default. For now, this is an implementation detail of dict and should not be relied upon, but it will likely become standard after v3.6.
Insertion order is always preserved in the new dict implementation:
>>>x = {'a': 1, 'b':2, 'c':3 }
>>>list(x.keys())
['a', 'b', 'c']
As of python 3.6 **kwargs order [PEP468] and class attribute order [PEP520] are preserved. The new compact, ordered dictionary implementation is used to implement the ordering for both of these.
What you are asking for is impossible, but if a config file in JSON syntax is sufficient you can do something similar with the json module:
>>> import json, collections
>>> d = json.JSONDecoder(object_pairs_hook = collections.OrderedDict)
>>> d.decode('{"a":5,"b":6}')
OrderedDict([(u'a', 5), (u'b', 6)])
The one solution I found is to patch python itself, making the dict object remember the order of insertion.
This then works for all kind of syntaxes:
x = {'a': 1, 'b':2, 'c':3 }
y = dict(a=1, b=2, c=3)
etc.
I have taken the ordereddict C implementation from https://pypi.python.org/pypi/ruamel.ordereddict/ and merged back into the main python code.
If you do not mind re-building the python interpreter, here is a patch for Python 2.7.8:
https://github.com/fwyzard/cpython/compare/2.7.8...ordereddict-2.7.8.diff
.A
If what you are looking for is a way to get easy-to-use initialization syntax - consider creating a subclass of OrderedDict and adding operators to it that update the dict, for example:
from collections import OrderedDict
class OrderedMap(OrderedDict):
def __add__(self,other):
self.update(other)
return self
d = OrderedMap()+{1:2}+{4:3}+{"key":"value"}
d will be- OrderedMap([(1, 2), (4, 3), ('key','value')])
Another possible syntactic-sugar example using the slicing syntax:
class OrderedMap(OrderedDict):
def __getitem__(self, index):
if isinstance(index, slice):
self[index.start] = index.stop
return self
else:
return OrderedDict.__getitem__(self, index)
d = OrderedMap()[1:2][6:4][4:7]["a":"H"]

Categories

Resources