Using a loop to .setdefault on Dict Creates Nested Dict

Using a loop to .setdefault on Dict Creates Nested Dict - python

I'm trying to understand why
tree = {}
def add_to_tree(root, value_string):
"""Given a string of characters `value_string`, create or update a
series of dictionaries where the value at each level is a dictionary of
the characters that have been seen following the current character.
"""
for character in value_string:
root = root.setdefault(character, {})
add_to_tree(tree, 'abc')
creates {'a': {'b': {'c': {}}}}
while
root = {}
root.setdefault('a', {})
root.setdefault('b', {})
root.setdefault('c', {})
creates {'a': {}, 'b': {}, 'c': {}}
What is putting us into the assigned dict value on each iteration of the loop?

root.setdefault(character, {}) returns root[character] if character is a key in root or it returns the empty dict {}. It is the same as root.get(character, {}) except that it also assigns root[character] = {} if character is not already a key in root.
root = root.setdefault(character, {})
reassigns root to a new dict if character is not already a key in the original root.
In [4]: root = dict()
In [5]: newroot = root.setdefault('a', {})
In [6]: root
Out[6]: {'a': {}}
In [7]: newroot
Out[7]: {}
In contrast, using root.setdefault('a', {}) without reassigning its return value to root works:
tree = {}
def add_to_tree(root, value_string):
"""Given a string of characters `value_string`, create or update a
series of dictionaries where the value at each level is a dictionary of
the characters that have been seen following the current character.
"""
for character in value_string:
root.setdefault(character, {})
add_to_tree(tree, 'abc')
print(tree)
# {'a': {}, 'c': {}, 'b': {}}

For anyone else who is as slow as me. The answer to, "Why does the (above) function produce {'a': {'b': {'c': {}, 'd': {}}}} and not {'a': {}, 'b': {}, 'c': {}}?" is:
Because we’re looping within a function and reassigning the the result to root each time, it’s kind of like in the TV infomercials where they keep saying, “but wait! there’s more!”. So when .setdefault gets called on 'a', before that gets returned, it’s result, {'a': {}} is held {inside the loop} while it’s run on 'b', which yields {'b': {}}, within {'a': {}} and that is held to the side and {'a': {}} is run, then the whole thing is returned from the loop and applied to tree. Note that each time, what is actually returned by .setdefault IS the default, which in this case is {}. Here is a Python Visualizer illustration of the process.

Related

Python: update dict string that has placeholders?

Consider this string: "{'a': A, 'b': B, 'c': 10}". Now I want to update this "string" and add new key d with let say value 20, so result would be "{'a': A, 'b': B, 'c': 10, 'd': 20}"
Normally, you could just eval string (eval or literal_eval) into dict, update the way you want and convert it back to string. But in this case, there are placeholders, which would not be recognized when evaluating.
What would be best way to update it, so old values are kept the same, but "dict-string" is updated properly?

For a more robust solution that properly parses the dict, you can subclass lib2to3.refactor.RefactoringTool to refactor the code using a fixer that is a subclass of lib2to3.fixer_base.BaseFix with a pattern that looks for a dictsetmaker node, and a transform method that extends the children list with leaf nodes that consist of the tokens that will make for a new key-value pair in the dict:
from lib2to3 import fixer_base, refactor, pytree
from lib2to3.pgen2 import token
class AddKeyValue(fixer_base.BaseFix):
PATTERN = "dictsetmaker"
def transform(self, node, results):
node.children.extend((
pytree.Leaf(token.COMMA, ','),
pytree.Leaf(token.STRING, "'d'", prefix=' '),
pytree.Leaf(token.COLON, ':'),
pytree.Leaf(token.NUMBER, 20, prefix=' ')
))
return node
class Refactor(refactor.RefactoringTool):
def __init__(self, fixers):
self._fixers= [cls(None, None) for cls in fixers]
super().__init__(None)
def get_fixers(self):
return self._fixers, []
s = "{'a': A, 'b': B, 'c': 10}"
print(Refactor([AddKeyValue]).refactor_string(s + '\n', ''))
This outputs:
{'a': A, 'b': B, 'c': 10, 'd': 20}
lib2to3 is round-trip stable so all white spaces are preserved after the transformation, and a new node should be specified with a prefix if whitespaces are to be inserted before it.
You can find the definition of the Python grammar in Grammar.txt of the lib2to3 module.
Demo: https://repl.it/#blhsing/RudeLimegreenConcentrate

This by no means a best solution but here is one approach:
import re
dict_str = "{'a': A, 'b': B, 'c': 10}"
def update_dict(dict_str, **keyvals):
"""creates an updated dict_str
Parameters:
dict_str (str): current dict_str
**keyvals: variable amounts of key-values
Returns:
str:updated string
"""
new_entries = ", ".join(map(lambda keyval: f"'{keyval[0]}': {keyval[1]}", keyvals.items())) # create a string representation for each key-value and join by ','
return dict_str.replace("}", f", {new_entries}{'}'}") # update the dict_str by removing the last '}' and add the new entries
output:
updated = update_dict(dict_str,
d = 20,
e = 30
)
print(updated)
{'a': A, 'b': B, 'c': 10, 'd': 20, 'e': 30}
some_dict = {
'g': 2,
'h': 3
}
updated = update_dict(dict_str,
**some_dict
)
print(updated)
{'a': A, 'b': B, 'c': 10, 'g': 2, 'h': 3}

I think that you can:
Option 1 - Adding
Insert the new string ", key: value" at the end of the string, before the "}".
Option 2 - RagEx for adding/updating
1 - use find() and search for the key. If it exist use the regex to substitute:
re.replace(regex_search,regex_replace,contents)
So using something like:
string = re.sub(r'key: (.+),', 'key: value', article)
2 - if the find() fail, use the add of the option 1

If it's just about adding at the end of the string...
this_string = "{'a': A, 'b': B, 'c': 10}"
this_add = "'d': 20"
this_string = f"{this_string[:-1]}, {this_add}{this_string[-1]}"
print(this_string)
will output
{'a': A, 'b': B, 'c': 10, 'd': 20}
If you need to insert the new string in between you can do something similar using string.find to locate the index and use that index number instead.
It's basically rewriting the entire string but strings are immutable what can we do.

What does defaultdict(list, {}) or dict({}) do?

I saw this online and I'm confused on what the second argument would do:
defaultdict(list, {})
Looking at what I get on the console, it seems to simply create a defaultdict where values are lists by default. If so, is this exactly equivalent to running defaultdict(list)?
From I read online:
The first argument provides the initial value for the default_factory attribute; it defaults to None. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.
which also makes me wonder about the difference between:
my_dict = dict({})
my_dict = dict()

the argument to the dict class in python is the instantiation values.. so passing an {} creates an empty dictionary.
Its the same case with defaultdict, except that the first argument is the default type of the values for every key.

dict({...}) just makes a dict:
>>> dict({'a': 1, 'b': 2})
{'a': 1, 'b': 2}
Which is equal to this:
>>> dict(a=1, b=2)
{'a': 1, 'b': 2}
or
>>> {'a': 1, 'b': 2}
{'a': 1, 'b': 2}
The same applies for defualtdict.

Python dict.fromkeys return same id element

When i do this on python i update all keys in one time.
>>> base = {}
>>> keys = ['a', 'b', 'c']
>>> base.update(dict.fromkeys(keys, {}))
>>> base.get('a')['d'] = {}
>>> base
{'a': {'d': {}}, 'c': {'d': {}}, 'b': {'d': {}}}
>>> map(id, base.values())
[140536040273352, 140536040273352, 140536040273352]
If instead of .get i use [] operator this not happen:
>>> base['a']['d'] = {}
>>> base
{'a': {'d': {}}, 'c': {}, 'b': {}}
Why?

When you initialize the value for the new keys as {} a new dictionary is created and a reference to this dictionary is becoming the values. There is only one dictionary and so if you change one, you will change "all".

I tried it with both Python 2.7.6 and 3.4.3. I get the same answer when either get('a') or ['a'] is used. Appreciate if you can verify this at your end. Python does object reuse. Thus, dict.fromkeys() reuses the same empty dict is to initialize. To make each one a separate object, you can do this:
base.update(zip(keys, ({} for _ in keys)))

How to assert a dict contains another dict without assertDictContainsSubset in python? [duplicate]

This question already has answers here:
Python unittest's assertDictContainsSubset recommended alternative [duplicate]
(4 answers)
Closed 1 year ago.
I know assertDictContainsSubset can do this in python 2.7, but for some reason it's deprecated in python 3.2. So is there any way to assert a dict contains another one without assertDictContainsSubset?
This seems not good:
for item in dic2:
self.assertIn(item, dic)
any other good way? Thanks

Although I'm using pytest, I found the following idea in a comment. It worked really great for me, so I thought it could be useful here.
Python 3:
assert dict1.items() <= dict2.items()
Python 2:
assert dict1.viewitems() <= dict2.viewitems()
It works with non-hashable items, but you can't know exactly which item eventually fails.

>>> d1 = dict(a=1, b=2, c=3, d=4)
>>> d2 = dict(a=1, b=2)
>>> set(d2.items()).issubset( set(d1.items()) )
True
And the other way around:
>>> set(d1.items()).issubset( set(d2.items()) )
False
Limitation: the dictionary values have to be hashable.

The big problem with the accepted answer is that it does not work if you have non hashable values in your objects values. The second thing is that you get no useful output - the test passes or fails but doesn't tell you which field within the object is different.
As such it is easier to simply create a subset dictionary then test that. This way you can use the TestCase.assertDictEquals() method which will give you very useful formatted output in your test runner showing the diff between the actual and the expected.
I think the most pleasing and pythonic way to do this is with a simple dictionary comprehension as such:
from unittest import TestCase
actual = {}
expected = {}
subset = {k:v for k, v in actual.items() if k in expected}
TestCase().assertDictEqual(subset, expected)
NOTE obviously if you are running your test in a method that belongs to a child class that inherits from TestCase (as you almost certainly should be) then it is just self.assertDictEqual(subset, expected)

John1024's solution worked for me. However, in case of a failure it only tells you False instead of showing you which keys are not matching. So, I tried to avoid the deprecated assert method by using other assertion methods that will output helpful failure messages:
expected = {}
response_keys = set(response.data.keys())
for key in input_dict.keys():
self.assertIn(key, response_keys)
expected[key] = response.data[key]
self.assertDictEqual(input_dict, expected)

You can use assertGreaterEqual or assertLessEqual.
users = {'id': 28027, 'email': 'chungs.lama#gmail.com', 'created_at': '2005-02-13'}
data = {"email": "chungs.lama#gmail.com"}
self.assertGreaterEqual(user.items(), data.items())
self.assertLessEqual(data.items(), user.items()) # Reversed alternative
Be sure to specify .items() or it won't work.

In Python 3 and Python 2.7, you can create a set-like "item view" of a dict without copying any data. This allows you can use comparison operators to test for a subset relationship.
In Python 3, this looks like:
# Test if d1 is a sub-dict of d2
d1.items() <= d2.items()
# Get items in d1 not found in d2
difference = d1.items() - d2.items()
In Python 2.7 you can use the viewitems() method in place of items() to achieve the same result.
In Python 2.6 and below, your best bet is to iterate over the keys in the first dict and check for inclusion in the second.
# Test if d1 is a subset of d2
all(k in d2 and d2[k] == d1[k] for k in d1)

This answers a little broader question than you're asking but I use this in my test harnesses to see if the container dictionary contains something that looks like the contained dictionary. This checks keys and values. Additionally you can use the keyword 'ANYTHING' to indicate that you don't care how it matches.
def contains(container, contained):
'''ensure that `contained` is present somewhere in `container`
EXAMPLES:
contains(
{'a': 3, 'b': 4},
{'a': 3}
) # True
contains(
{'a': [3, 4, 5]},
{'a': 3},
) # True
contains(
{'a': 4, 'b': {'a':3}},
{'a': 3}
) # True
contains(
{'a': 4, 'b': {'a':3, 'c': 5}},
{'a': 3, 'c': 5}
) # True
# if an `contained` has a list, then every item from that list must be present
# in the corresponding `container` list
contains(
{'a': [{'b':1}, {'b':2}, {'b':3}], 'c':4},
{'a': [{'b':1},{'b':2}], 'c':4},
) # True
# You can also use the string literal 'ANYTHING' to match anything
contains(
{'a': [{'b':3}]},
{'a': 'ANYTHING'},
) # True
# You can use 'ANYTHING' as a dict key and it indicates to match the corresponding value anywhere
# below the current point
contains(
{'a': [ {'x':1,'b1':{'b2':{'c':'SOMETHING'}}}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
contains(
{'a': [ {'x':1, 'b':'SOMETHING'}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
contains(
{'a': [ {'x':1,'b1':{'b2':{'c':'SOMETHING'}}}]},
{'a': {'ANYTHING': 'SOMETHING', 'x':1}},
) # True
'''
ANYTHING = 'ANYTHING'
if contained == ANYTHING:
return True
if container == contained:
return True
if isinstance(container, list):
if not isinstance(contained, list):
contained = [contained]
true_count = 0
for contained_item in contained:
for item in container:
if contains(item, contained_item):
true_count += 1
break
if true_count == len(contained):
return True
if isinstance(contained, dict) and isinstance(container, dict):
contained_keys = set(contained.keys())
if ANYTHING in contained_keys:
contained_keys.remove(ANYTHING)
if not contains(container, contained[ANYTHING]):
return False
container_keys = set(container.keys())
if len(contained_keys - container_keys) == 0:
# then all the contained keys are in this container ~ recursive check
if all(
contains(container[key], contained[key])
for key in contained_keys
):
return True
# well, we're here, so I guess we didn't find a match yet
if isinstance(container, dict):
for value in container.values():
if contains(value, contained):
return True
return False

Here is a comparison that works even if you have lists in the dictionaries:
superset = {'a': 1, 'b': 2}
subset = {'a': 1}
common = { key: superset[key] for key in set(superset.keys()).intersection(set(subset.keys())) }
self.assertEquals(common, subset)

Setting a value to a dictionary's dictionary value

The code:
>>> mydict = {}
>>> keylist = ['a','b','c']
>>> mydict=dict.fromkeys(keylist,{})
>>> mydict['a']['sample'] = 1
>>> mydict
{'a': {'sample': 1}, 'c': {'sample': 1}, 'b': {'sample': 1}}
I was expecting mydict['a']['sample'] = 1 would set the value just for a's dictionary value and would get this: {'a': {'sample': 1}, 'c': {}, 'b': {}}.
What am I missing here? What should I have to do to get the expected output?

The problem is that you added the same dictionary to mydict for every key. You want to add different dictionaries, like so:
mydict = dict((key, {}) for key in keylist)
In the above code, you create a new dictionary to pair with each key. In your original code, the function fromkeys took the argument (the empty dictionary you provided) and added that exact argument - that single empty dictionary you created to pass in to the function - to each of the keys. When that one dictionary was changed, then, that change showed up everywhere.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using a loop to .setdefault on Dict Creates Nested Dict - python

Related

Python: update dict string that has placeholders?

What does defaultdict(list, {}) or dict({}) do?

Python dict.fromkeys return same id element

How to assert a dict contains another dict without assertDictContainsSubset in python? [duplicate]

Setting a value to a dictionary's dictionary value

Categories

Resources