Reusing function results in Python dict - python

I have the following (very simplified) dict. The get_details function is an API call that I would like to avoid doing twice.
ret = {
'a': a,
'b': [{
'c': item.c,
'e': item.get_details()[0].e,
'h': [func_h(detail) for detail in item.get_details()],
} for item in items]
}
I could of course rewrite the code like this:
b = []
for item in items:
details = item.get_details()
b.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
ret = {
'a': a,
'b': b
}
but would like to use the first approach since it seems more pythonic.

You could use an intermediary generator to extract the details from your items. Something like this:
ret = {
'a': a,
'b': [{
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
} for (item, details) in ((item, item.get_details()) for item in items)]
}

I don't find the second one particularly un-pythonic; you have a complex initialization, and you shouldn't expect to boil down to a single simple expression. That said, you don't need the temporary list b; you can work directly with ret['b']:
ret = {
'a': a,
'b': []
}
for item in items:
details = item.get_details()
d = details[0]
ret['b'].append({
'c': item.c,
'e': d.e,
'h': map(func_h, details)
})
This is also a case where I would choose map over a list comprehension. (If this were Python 3, you would need to wrap that in an additional call to list.)

I wouldn't try too hard to be more pythonic if it means looking like your first approach. I would take your second approach a step further, and just use a separate function:
ret = {
'a': a,
'b': get_b_from_items(items)
}
I think that's as clean as it can get. Use comments/docstrings to indicate what 'b' is, test the function, and then the next person who comes along can quickly read and trust your code. I know you know how to write the function, but for the sake of completeness, here's how I would do it:
# and add this in where you want it
def get_b_from_items(items):
"""Return a list of (your description here)."""
result = []
for item in items:
details = item.get_details()
result.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
return result
That is plenty pythonic (note the docstring-- very pythonic), and very readable. And of course, it has the advantage of being slightly more granularly testable, complex logic abstracted away from the higher level logic, and all the other advantages of using functions.

Related

Building sub-dict from large dict using recursion

I have a dictionary that links various species in a parent-daughter decay chain. For example:
d = {
'A':{'daughter':['B']},
'B':{'daughter':['C']},
'C':{'daughter':['D']},
'D':{'daughter':['None']},
'E':{'daughter':['F']},
'F':{'daughter':['G']},
'G':{'daughter':['H']},
'H':{'daughter':[None]}
}
In this dictionary, the top level key is the 'parent' and the 'daughter' (i.e. what the parent decays to in the chain) is defined as a key:value item in the dictionary attached to the parent key. When None is given for the daughter, that is considered to be the end of the chain.
I want a function to return a sub dictionary containing the items in the chain according to the users input for the starting parent. I would also like to know the position of each item in the chain. In the sub-dictionary this can be a second field ('position').
For example, if the user wants to start the chain at 'A', I would like the function to return:
{'A':{'position':1, 'daughter':['B']},
'B':{'position':2, 'daughter':['C']},
'C':{'position':3, 'daughter':['D']},
'D':{'position':4, 'daughter':['None']}}
Similarly, if the starting value was 'E' I would like it to return:
{'F':{'position':1, 'daughter':['G']},
'G':{'position':3, 'daughter':['H']},
'H':{'position':4, 'daughter':['None']}}
This is relatively easy when the linking is one-to-one i.e. one item decays into another, into another etc.
If I now use a more complex example, as below, you can see that 'B' actually decays into both 'C' and 'D' and from there onwards the chains are separate.
A => B => C => E => G and A => B => D => F => H
d = {
'A':{'daughter':['B']},
'B':{'daughter':['C', 'D']},
'C':{'daughter':['E']},
'D':{'daughter':['F']},
'E':{'daughter':['G']},
'F':{'daughter':['H']},
'G':{'daughter':[None]},
'H':{'daughter':[None]}
}
In this case I would like a function to return the following output. You'll notice because of the diversion of the two chains, the position values are close to the level in the heirachy e.g. C=3 and D = 4, but not exactly the same. I don't want to follow the C chain all the way down, and then repeat for the D chain.
{'A':{'position':1, 'daughter':['B']},
'B':{'position':2, 'daughter':['C']},
'C':{'position':3, 'daughter':['E']},
'D':{'position':4, 'daughter':['F']}
'E':{'position':5, 'daughter':['G']}
'F':{'position':6, 'daughter':['H']}
'G':{'position':8, 'daughter':['None']}
'H':{'position':9, 'daughter':['None']}
}
Any thoughts? The function should be able to cope with more than one diversion in the chain.
Mark
If you don't want to go all the way down from C, then breadth-first search may help。
def bfs(d, start):
answer = {}
queue = [start]
head = 0
while head < len(queue):
# Fetch the first element from queue
now = queue[head]
answer[now] = {
'position': head+1,
'daughter': d[now]['daughter']
}
# Add daughters to the queue
for nxt in d[now]['daughter']:
if nxt == None:
continue
queue.append(nxt)
head += 1
return answer
d = {
'A': {'daughter': ['B']},
'B': {'daughter': ['C', 'D']},
'C': {'daughter': ['E']},
'D': {'daughter': ['F']},
'E': {'daughter': ['G']},
'F': {'daughter': ['H']},
'G': {'daughter': [None]},
'H': {'daughter': [None]}
}
print(bfs(d, 'A'))

Avoiding key error storing values in nested dictionary (Python)

Introduction
Following dictionary has three levels of keys and then a value.
d = {
1:{
'A':{
'i': 100,
'ii': 200
},
'B':{
'i': 300
}
},
2:{
'A':{
'ii': 500
}
}
}
Examples that need to be added.
d[1]['B']['ii'] = 600 # OK
d[2]['C']['iii'] = 700 # Keyerror on 'C'
d[3]['D']['iv'] = 800 # Keyerror on 3
Problem Statement
I wanted to create code that would create the necessary nested keys and avoid any key errors.
Solution 1
The first solution I came up with, was:
def NewEntry_1(d, lv1, lv2, lv3, value):
if lv1 in d:
if lv2 in d:
d[lv1][lv2][lv3] = value
else:
d[lv1][lv2] = {lv3: value}
else:
d[lv1] = {lv2: {lv3: value}}
Seems legit, but embedding this in order pieces of code made it mind boggling. I explored Stackoverflow for other solutions and was reading on the get() and setdefault() functions.
Solution 2
There is plenty of material to find about get() and setdefault(), but not so much on nested dictionaries. Ultimately I was able to come up with:
def NewEntry_2(d, lv1, lv2, lv3, value):
return d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, value)
It is one line of code so it is not really necessary to make it a function. Easily modifiable to include operations:
d[lv1][lv2][lv3] = d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, 0) + value
Seems perfect?
Question
When adding large quantities of entries and doing many modifications, is option 2 better than option 1? Or should I define function 1 and call it? The answers I'm looking should take into account speed and/or potential for errors.
Examples
NewEntry_1(d, 1, 'B', 'ii', 600)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}}}
NewEntry_1(d, 2, 'C', 'iii', 700)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}}
NewEntry_1(d, 3, 'D', 'iv', 800)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}, 3: {'D': {'iv': 800}}}
More background
I'm a business analyst exploring using Python for creating Graph DB that would help me with very specific analysis. The dictionary structure is used to story the influence one node has on one of its neighbors:
lv1 is Node From
lv2 is Node To
lv3 is Iteration
value is Influence (in
%)
In the first iteration Node 1 has direct influence on Node 2. In the second iteration Node 1 influences all the Nodes that Node 2 is influencing.
I'm aware of packages that can help me with it (networkx), but I'm trying to understand Python/GraphDB before I want to start using them.
As for the nested dictionaries, you should take a look at defaultdict. Using it will save you a lot of the function-calling overhead. The nested defaultdict construction resorts to lambda functions for their default factories:
d = defaultdict(lambda: defaultdict(lambda: defaultdict(int))) # new, shiny, empty
d[1]['B']['ii'] = 600 # OK
d[2]['C']['iii'] = 700 # OK
d[3]['D']['iv'] = 800 # OK
Update: A useful trick to know to create a deeply nested defaultdict is the following:
def tree():
return defaultdict(tree)
d = tree()
# now any depth is possible
# d[1][2][3][4][5][6][7][8] = 9

How to assert a dict contains another dict without assertDictContainsSubset in python? [duplicate]

This question already has answers here:
Python unittest's assertDictContainsSubset recommended alternative [duplicate]
(4 answers)
Closed 1 year ago.
I know assertDictContainsSubset can do this in python 2.7, but for some reason it's deprecated in python 3.2. So is there any way to assert a dict contains another one without assertDictContainsSubset?
This seems not good:
for item in dic2:
self.assertIn(item, dic)
any other good way? Thanks
Although I'm using pytest, I found the following idea in a comment. It worked really great for me, so I thought it could be useful here.
Python 3:
assert dict1.items() <= dict2.items()
Python 2:
assert dict1.viewitems() <= dict2.viewitems()
It works with non-hashable items, but you can't know exactly which item eventually fails.
>>> d1 = dict(a=1, b=2, c=3, d=4)
>>> d2 = dict(a=1, b=2)
>>> set(d2.items()).issubset( set(d1.items()) )
True
And the other way around:
>>> set(d1.items()).issubset( set(d2.items()) )
False
Limitation: the dictionary values have to be hashable.
The big problem with the accepted answer is that it does not work if you have non hashable values in your objects values. The second thing is that you get no useful output - the test passes or fails but doesn't tell you which field within the object is different.
As such it is easier to simply create a subset dictionary then test that. This way you can use the TestCase.assertDictEquals() method which will give you very useful formatted output in your test runner showing the diff between the actual and the expected.
I think the most pleasing and pythonic way to do this is with a simple dictionary comprehension as such:
from unittest import TestCase
actual = {}
expected = {}
subset = {k:v for k, v in actual.items() if k in expected}
TestCase().assertDictEqual(subset, expected)
NOTE obviously if you are running your test in a method that belongs to a child class that inherits from TestCase (as you almost certainly should be) then it is just self.assertDictEqual(subset, expected)
John1024's solution worked for me. However, in case of a failure it only tells you False instead of showing you which keys are not matching. So, I tried to avoid the deprecated assert method by using other assertion methods that will output helpful failure messages:
expected = {}
response_keys = set(response.data.keys())
for key in input_dict.keys():
self.assertIn(key, response_keys)
expected[key] = response.data[key]
self.assertDictEqual(input_dict, expected)
You can use assertGreaterEqual or assertLessEqual.
users = {'id': 28027, 'email': 'chungs.lama#gmail.com', 'created_at': '2005-02-13'}
data = {"email": "chungs.lama#gmail.com"}
self.assertGreaterEqual(user.items(), data.items())
self.assertLessEqual(data.items(), user.items()) # Reversed alternative
Be sure to specify .items() or it won't work.
In Python 3 and Python 2.7, you can create a set-like "item view" of a dict without copying any data. This allows you can use comparison operators to test for a subset relationship.
In Python 3, this looks like:
# Test if d1 is a sub-dict of d2
d1.items() <= d2.items()
# Get items in d1 not found in d2
difference = d1.items() - d2.items()
In Python 2.7 you can use the viewitems() method in place of items() to achieve the same result.
In Python 2.6 and below, your best bet is to iterate over the keys in the first dict and check for inclusion in the second.
# Test if d1 is a subset of d2
all(k in d2 and d2[k] == d1[k] for k in d1)
This answers a little broader question than you're asking but I use this in my test harnesses to see if the container dictionary contains something that looks like the contained dictionary. This checks keys and values. Additionally you can use the keyword 'ANYTHING' to indicate that you don't care how it matches.
def contains(container, contained):
'''ensure that `contained` is present somewhere in `container`
EXAMPLES:
contains(
{'a': 3, 'b': 4},
{'a': 3}
) # True
contains(
{'a': [3, 4, 5]},
{'a': 3},
) # True
contains(
{'a': 4, 'b': {'a':3}},
{'a': 3}
) # True
contains(
{'a': 4, 'b': {'a':3, 'c': 5}},
{'a': 3, 'c': 5}
) # True
# if an `contained` has a list, then every item from that list must be present
# in the corresponding `container` list
contains(
{'a': [{'b':1}, {'b':2}, {'b':3}], 'c':4},
{'a': [{'b':1},{'b':2}], 'c':4},
) # True
# You can also use the string literal 'ANYTHING' to match anything
contains(
{'a': [{'b':3}]},
{'a': 'ANYTHING'},
) # True
# You can use 'ANYTHING' as a dict key and it indicates to match the corresponding value anywhere
# below the current point
contains(
{'a': [ {'x':1,'b1':{'b2':{'c':'SOMETHING'}}}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
contains(
{'a': [ {'x':1, 'b':'SOMETHING'}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
contains(
{'a': [ {'x':1,'b1':{'b2':{'c':'SOMETHING'}}}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
'''
ANYTHING = 'ANYTHING'
if contained == ANYTHING:
return True
if container == contained:
return True
if isinstance(container, list):
if not isinstance(contained, list):
contained = [contained]
true_count = 0
for contained_item in contained:
for item in container:
if contains(item, contained_item):
true_count += 1
break
if true_count == len(contained):
return True
if isinstance(contained, dict) and isinstance(container, dict):
contained_keys = set(contained.keys())
if ANYTHING in contained_keys:
contained_keys.remove(ANYTHING)
if not contains(container, contained[ANYTHING]):
return False
container_keys = set(container.keys())
if len(contained_keys - container_keys) == 0:
# then all the contained keys are in this container ~ recursive check
if all(
contains(container[key], contained[key])
for key in contained_keys
):
return True
# well, we're here, so I guess we didn't find a match yet
if isinstance(container, dict):
for value in container.values():
if contains(value, contained):
return True
return False
Here is a comparison that works even if you have lists in the dictionaries:
superset = {'a': 1, 'b': 2}
subset = {'a': 1}
common = { key: superset[key] for key in set(superset.keys()).intersection(set(subset.keys())) }
self.assertEquals(common, subset)

How Can I Get a Subset of an Object's Properties as a Python Dictionary?

Short Version:
In Python is there a way to (cleanly/elegantly) say "Give me these 5 (or however many) properties of an object, and nothing else, as a dictionary"?
Longer Version:
Using the Javascript Underscore library, I can reduce an bunch of objects/dictionaries (in JS they're the same thing) to a bunch of subsets of their properties like so:
var subsets = _(someObjects).map(function(someObject) {
_(someObject).pick(['a', 'd']);
});
If I want to do the same thing with a Python object (not a dictionary) however it seems like the best I can do is use a list comprehension and manually set each property:
subsets = [{"a": x.a, "d": x.d} for x in someObjects]
That doesn't look so bad when there's only two properties, and they're both one letter, but it gets uglier fast if I start having more/longer properties (plus I feel wrong whenever I write a multi-line list comprehension). I could turn the whole thing in to a function that uses a for loop, but before I do that, is there any cool built-in Python utility thing that I can use to do this as cleanly (or even more cleanly) than the JS version?
This can be done simply by combining a list comprehension with a dictionary comprehension.
subsets = [{attr: getattr(x, attr) for attr in ["a", "d"]}
for x in someObjects]
Naturally, you could distill out that comprehension if you wanted to:
def pick(*attrs):
return {attr: getattr(x, attr) for attr in attrs}
subsets = [pick("a", "d") for x in someObjects]
>>> A = ['a', 'c']
>>> O = [{'a': 1, 'b': 2, 'c': 3}, {'a': 11, 'b': 22, 'c': 33, 'd': 44}]
>>> [{a: o[a] for a in A} for o in O]
[{'a': 1, 'c': 3}, {'a': 11, 'c': 33}]
>>> list(map(lambda o: {a: o[a] for a in A}, O))
[{'a': 1, 'c': 3}, {'a': 11, 'c': 33}]

Python "extend" for a dictionary

What is the best way to extend a dictionary with another one while avoiding the use of a for loop? For instance:
>>> a = { "a" : 1, "b" : 2 }
>>> b = { "c" : 3, "d" : 4 }
>>> a
{'a': 1, 'b': 2}
>>> b
{'c': 3, 'd': 4}
Result:
{ "a" : 1, "b" : 2, "c" : 3, "d" : 4 }
Something like:
a.extend(b) # This does not work
a.update(b)
Latest Python Standard Library Documentation
A beautiful gem in this closed question:
The "oneliner way", altering neither of the input dicts, is
basket = dict(basket_one, **basket_two)
Learn what **basket_two (the **) means here.
In case of conflict, the items from basket_two will override the ones from basket_one. As one-liners go, this is pretty readable and transparent, and I have no compunction against using it any time a dict that's a mix of two others comes in handy (any reader who has trouble understanding it will in fact be very well served by the way this prompts him or her towards learning about dict and the ** form;-). So, for example, uses like:
x = mungesomedict(dict(adict, **anotherdict))
are reasonably frequent occurrences in my code.
Originally submitted by Alex Martelli
Note: In Python 3, this will only work if every key in basket_two is a string.
Have you tried using dictionary comprehension with dictionary mapping:
a = {'a': 1, 'b': 2}
b = {'c': 3, 'd': 4}
c = {**a, **b}
# c = {"a": 1, "b": 2, "c": 3, "d": 4}
Another way of doing is by Using dict(iterable, **kwarg)
c = dict(a, **b)
# c = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
In Python 3.9 you can add two dict using union | operator
# use the merging operator |
c = a | b
# c = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
a.update(b)
Will add keys and values from b to a, overwriting if there's already a value for a key.
As others have mentioned, a.update(b) for some dicts a and b will achieve the result you've asked for in your question. However, I want to point out that many times I have seen the extend method of mapping/set objects desire that in the syntax a.extend(b), a's values should NOT be overwritten by b's values. a.update(b) overwrites a's values, and so isn't a good choice for extend.
Note that some languages call this method defaults or inject, as it can be thought of as a way of injecting b's values (which might be a set of default values) in to a dictionary without overwriting values that might already exist.
Of course, you could simple note that a.extend(b) is nearly the same as b.update(a); a=b. To remove the assignment, you could do it thus:
def extend(a,b):
"""Create a new dictionary with a's properties extended by b,
without overwriting.
>>> extend({'a':1,'b':2},{'b':3,'c':4})
{'a': 1, 'c': 4, 'b': 2}
"""
return dict(b,**a)
Thanks to Tom Leys for that smart idea using a side-effect-less dict constructor for extend.
Notice that since Python 3.9 a much easier syntax was introduced (Union Operators):
d1 = {'a': 1}
d2 = {'b': 2}
extended_dict = d1 | d2
>> {'a':1, 'b': 2}
Pay attention: in case first dict shared keys with second dict, position matters!
d1 = {'b': 1}
d2 = {'b': 2}
d1 | d2
>> {'b': 2}
Relevant PEP
You can also use python's collections.ChainMap which was introduced in python 3.3.
from collections import ChainMap
c = ChainMap(a, b)
c['a'] # returns 1
This has a few possible advantages, depending on your use-case. They are explained in more detail here, but I'll give a brief overview:
A chainmap only uses views of the dictionaries, so no data is actually copied. This results in faster chaining (but slower lookup)
No keys are actually overwritten so, if necessary, you know whether the data comes from a or b.
This mainly makes it useful for things like configuration dictionaries.
In terms of efficiency, it seems faster to use the unpack operation, compared with the update method.
Here an image of a test I did:

Categories

Resources