Python Nested Dictionary append - Solved with defaultdict(lambda: defaultdict(set)) - python

done this millions of times in other languages by python method for this is escaping me.
Basically reading some data from database. Includes data like ID, Fruits, Colors, Name
I need to create an object / dictionary key on Name. Name then holds lists / dictionaries of Fruits and Colors.
{"greenhouse1": {"fruits": {"apples", "oranges"}
{"colors": {"red","orange"}}
I'm iterating through the db by row and the Name key will appear more than once. When it does fruits or colors may already be populated and need to append.
namedict = {}
db_rows = execute_query(cursor, args.store_proc)
for row in db_rows:
db_name = row.id
db_fruit = row.fruit
db_color = row.color
if db_name in namedict:
namedict[db_name]["fruits"].append(db_fruit)
namedict[db_name]["color"].append(db_color)
else:
namedict[db_name]["fruits"] = [db_fruit]
namedict[db_name]["color"] = [db_color]

collections.defaultdict is your friend: If you access a new key of a defaultdict, it is automatically initialised with the provided function (list or set in this case). Because you want a dictionary (e.g. "greenhouse1") of dictionaries ("fruits", "colors") with lists as values (separate fruits and colors), we need a nested defaultdict. The following should work:
from collections import defaultdict
db = defaultdict(lambda: defaultdict(list)) # alternative: set instead of list
db['greenhouse1']['fruits'].append('apples') # use `add` for sets
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]
db['greenhouse2']['fruits'].append('banana')
print(db)
# defaultdict(<function __main__.<lambda>()>,
# {'greenhouse1': defaultdict(list,
# {'fruits': ['apples', 'oranges'],
# 'colors': ['red', 'orange']}),
# 'greenhouse2': defaultdict(list, {'fruits': ['banana']})})
A defaultdict works like a regular dict, so don't get confused with the strange looking output. E.g. to access the fruits of greenhouse1 you can simply write db['greenhouse1']['fruits'] and you get back a list (or set).

First, you need quotes around fruits and color in the dictionary keys.
Second, you can't create a nested dictionary element until you create the dictionary.
You can simplify the code by using collections.defaultdict. Since you want nested dictionaries, you need nested defaultdicts.
from collections import defaultdict
namedict = defaultdict(lambda: defaultdict(set))
for row in db_rows:
namedict[row.id]['fruits'].add(row.fruit)
namedict[row.id]['colors'].add(row.color)

Related

Creating a function to initialise multiple dictionaries with defaultlist()?

Just a general question....
I need to initialise at least 50 different dictionaries, which then goes as one of the arguments for a function (make_somefunction) I made that involves returning a dictionary with customised keys and a list as values
Is there a way to initialise a dictionary and output it directly to the function?
Something like this?
from collections import defaultdict
def initialise(dict):
dict =defaultdict(list)
return (dict)
initialise(dict).make_somefunction(dict, custom_keys, custom_listofvalues)
Instead of
dict1 = defaultdict(list)
dict2 = defaultdict(list)
dict2 = defaultdict(list)
...
dict49 = defaultdict(list)
dict50 = defaultdict(list)
which would individually go as an argument creating different customised dictionaries
make_somefunction(dict1, animals, foods)
make_somefunction(dict2, patients, drugs)
...
make_somefunction(dict50, fridge, fruits)
You can pass a list of the defaultdicts by using a list comprehension.
make_somefunction([defaultdict() for i in range(50)])
By the way, read the python docs to get a better understanding of python.

Accessing list values in a defaultdict(list) type

I have a defaultdict(list) dictionary and im trying to access the stored values to perform some operations on them only i've never had to do this before so im not quite sure how to access them givin a list index and a key.
listdict = defaultdict(list)
listdict = {'Cake':['cheesecake','icecream cake','oreo-cheesecake']}
so e.g. say i wanted to use "Cake" key word to access "oreo-cheesecake" string at index 2 in the list.
You are overwriting your defaultdict. It mostly works as a normal dict. We set elements:
listdict = defaultdict(list)
listdict['Cake'] = ['cheesecake','icecream cake','oreo-cheesecake']
And we recover them:
print listdict['Cake'][2]
'oreo-cheesecake'
But you can do:
listdict['nonexistent'].append('stuff')

How to associate elements in a set with multiple dict entries

I'm extracting instances of three elements from an XML file: ComponentStr, keyID, and valueStr. Whenever I find a ComponentStr, I want to add/associate the keyID:valueStr to it. ComponentStr values are not unique. As multiple occurrences of a ComponentStr is read, I want to accumulate the keyID:valueStr for that ComponentStr group. The resulting accumulated data structure after reading the XML file might look like this:
ComponentA: key1:value1, key2:value2, key3:value3
ComponentB: key4:value4
ComponentC: key5:value5, key6:value6
After I generate the final data structure, I want to sort the keyID:valueStr entries within each ComponentStr and also sort all the ComponentStrs.
I'm trying to structure this data in Python 2. ComponentStr seem to work well as a set. The keyID:valueStr is clearly a dict. But how do I associate a ComponentStr entry in a set with its dict entries?
Alternatively, is there a better way to organize this data besides a set and associated dict entries? Each keyID is unique. Perhaps I could have one dict of keyID:some combo of ComponentStr and valueStr? After the data structure was built, I could sort it based on ComponentStr first, then perform some type of slice to group the keyID:valueStr and then sort again on the keyID? Seems complicated.
How about a dict of dicts?
data = {
'ComponentA': {'key1':'value1', 'key2':'value2', 'key3':'value3'},
'ComponentB': {'key4':'value4'},
'ComponentC': {'key5':'value5', 'key6':'value6'},
}
It maintains your data structure and mapping. Interestingly enough, the underlying implementation of dicts is similar to the implementation of sets.
This would be easily constructed a'la this pseudo-code:
data = {}
for file in files:
data[get_component(file)] = {}
for key, value in get_data(file):
data[get_component(file)][key] = value
in the case where you have repeated components, you need to have the sub-dict as the default, but add to the previous one if it's there. I prefer setdefault to other solutions like a defaultdict or subclassing dict with a __missing__ as long as I only have to do it once or twice in my code:
data = {}
for file in files:
for key, value in get_data(file):
data.setdefault([get_component(file)], {})[key] = value
It works like this:
>>> d = {}
>>> d.setdefault('foo', {})['bar'] = 'baz'
>>> d
{'foo': {'bar': 'baz'}}
>>> d.setdefault('foo', {})['ni'] = 'ichi'
>>> d
{'foo': {'ni': 'ichi', 'bar': 'baz'}}
alternatively, as I read your comment on the other answer say you need simple code, you can keep it really simple with some more verbose and less optimized code:
data = {}
for file in files:
for key, value in get_data(file):
if get_component(file) not in data:
data[get_component(file)] = {}
data[get_component(file)][key] = value
You can then sort when you're done collecting the data.
for component in sorted(data):
print(component)
print('-----')
for key in sorted(data[component]):
print(key, data[component][key])
I want to accumulate the keyID:valueStr for that ComponentStr group
In this case you want to have the keys of your dictionary as the ComponentStr, accumulating to me immediately goes to a list, which are easily ordered.
Each keyID is unique. Perhaps I could have one dict of keyID:some
combo of ComponentStr and valueStr?
You should store your data in a manner that is the most efficient when you want to retrieve it. Since you will be accessing your data by the component, even though your keys are unique there is no point in having a dictionary that is accessed by your key (since this is not how you are going to "retrieve" the data).
So, with that - how about using a defaultdict with a list, since you really want all items associated with the same component:
from collections import defaultdict
d = defaultdict(list)
with open('somefile.xml', 'r') as f:
for component, key, value in parse_xml(f):
d[component].append((key, value))
Now you have for each component, a list of tuples which are the associated key and values.
If you want to keep the components in the order that they are read from the file, you can use a OrderedDict (also from the collections module), but if you want to sort them in any arbitrary order, then stick with a normal dictionary.
To get a list of sorted component names, just sort the keys of the dictionary:
component_sorted = sorted(d.keys())
For a use case of printing the sorted components with their associated key/value pairs, sorted by their keys:
for key in component_sorted:
values = d[key]
sorted_values = sorted(values, key=lamdba x: x[0]) # Sort by the keys
print('Pairs for {}'.format(key))
for k,v in sorted_values:
print('{} {}'.format(k,v))

How do I create a single large dictionary rather than a bunch a off little ones

I am having an issue. The solution might be straight forward but i am not seeing it. The code below returns a bunch of individual dictionaries as opposed to one large dictionary. I then iterate through these small dictionaries to pull out values. The problem is that I would much rather sort through one LARGE dictionary as opposed to a bunch of small ones. "objFunctions.getAttributes" returns a dictionary. "objFunctions.getRelationships" returns a pointer.
This is the output:
{1:value}
{2:value}
{3:value}
This is what i want:
{1:value,2:value,3:value}
for object in objList:
relationship = objFunctions.getRelationships(object)
for relPtr in relationships:
uglyDict = objFunctions.getAttributes(relPtr)
Use the .update() method to merge dicts:
attributes = {}
for object in objList:
relationship = objFunctions.getRelationships(object)
for relPtr in relationships:
attributes.update(objFunctions.getAttributes(relPtr))
Note that if a key is repeated across different invocations of .getAttributes that the value stored in attributes in the end will be the last one returned for that key.
If you don't mind that your values are stored as lists; you'll have to merge your dicts manually with the values appended one by one to a defaultdict:
from collections import defaultdict
attributes = defaultdict(list)
for object in objList:
relationship = objFunctions.getRelationships(object)
for relPtr in relationships:
for key, value in objFunctions.getAttributes(relPtr):
attributes[key].append(value)
Now your attributes dict will contain a list for each key, with the various values collected together. You could use a set as well, use defaultdict(set) and attributes[key].add(value) instead.
>>> from collections import defaultdict
>>> x = defaultdict(list)
>>> y = defaultdict(list)
>>> x[1].append("value1")
>>> x[2].append("value2")
>>> y[1].append("value3")
>>> y[2].append("value4")
>>> for k in y:
... x[k].extend(y[k])
...
>>> print x
defaultdict(<type 'list'>, {1: ['value1', 'value3'], 2: ['value2', 'value4']})

Most efficient way to add new keys or append to old keys in a dictionary during iteration in Python?

Here's a common situation when compiling data in dictionaries from different sources:
Say you have a dictionary that stores lists of things, such as things I like:
likes = {
'colors': ['blue','red','purple'],
'foods': ['apples', 'oranges']
}
and a second dictionary with some related values in it:
favorites = {
'colors':'yellow',
'desserts':'ice cream'
}
You then want to iterate over the "favorites" object and either append the items in that object to the list with the appropriate key in the "likes" dictionary or add a new key to it with the value being a list containing the value in "favorites".
There are several ways to do this:
for key in favorites:
if key in likes:
likes[key].append(favorites[key])
else:
likes[key] = list(favorites[key])
or
for key in favorites:
try:
likes[key].append(favorites[key])
except KeyError:
likes[key] = list(favorites[key])
And many more as well...
I generally use the first syntax because it feels more pythonic, but if there are other, better ways, I'd love to know what they are. Thanks!
Use collections.defaultdict, where the default value is a new list instance.
>>> import collections
>>> mydict = collections.defaultdict(list)
In this way calling .append(...) will always succeed, because in case of a non-existing key append will be called on a fresh empty list.
You can instantiate the defaultdict with a previously generated list, in case you get the dict likes from another source, like so:
>>> mydict = collections.defaultdict(list, likes)
Note that using list as the default_factory attribute of a defaultdict is also discussed as an example in the documentation.
Use collections.defaultdict:
import collections
likes = collections.defaultdict(list)
for key, value in favorites.items():
likes[key].append(value)
defaultdict takes a single argument, a factory for creating values for unknown keys on demand. list is a such a function, it creates empty lists.
And iterating over .items() will save you from using the key to get the value.
Except defaultdict, the regular dict offers one possibility (that might look a bit strange): dict.setdefault(k[, d]):
for key, val in favorites.iteritems():
likes.setdefault(key, []).append(val)
Thank you for the +20 in rep -- I went from 1989 to 2009 in 30 seconds. Let's remember it is 20 years since the Wall fell in Europe..
>>> from collections import defaultdict
>>> d = defaultdict(list, likes)
>>> d
defaultdict(<class 'list'>, {'colors': ['blue', 'red', 'purple'], 'foods': ['apples', 'oranges']})
>>> for i, j in favorites.items():
d[i].append(j)
>>> d
defaultdict(<class 'list'>, {'desserts': ['ice cream'], 'colors': ['blue', 'red', 'purple', 'yellow'], 'foods': ['apples', 'oranges']})
All of the answers are defaultdict, but I'm not sure that's the best way to go about it. Giving out defaultdict to code that expects a dict can be bad. (See: How do I make a defaultdict safe for unexpecting clients? ) I'm personally torn on the matter. (I actually found this question looking for an answer to "which is better, dict.get() or defaultdict") Someone in the other thread said that you don't want a defaultdict if you don't want this behavior all the time, and that might be true. Maybe using defaultdict for the convenience is the wrong way to go about it. I think there are two needs being conflated here:
"I want a dict whose default values are empty lists." to which defaultdict(list) is the correct solution.
and
"I want to append to the list at this key if it exists and create a list if it does not exist." to which my_dict.get('foo', []) with append() is the answer.
What do you guys think?

Categories

Resources