How to create a double dictionary in Python? - python

I'd like to create a dictionary inside a dictionary in python using function setdefault().
I'm trying to make a list of names and dates of birth using fallow dictionary.
names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02'}
I was tring to use set to achieve this:
res_dict = dict()
for v, k in names.items():
for v1, k1 in dates.items():
res_dict.setdefault(v, {}).append(k)
res_dict.setdefault(v1, {}).append(k1)
return res_dict
but it give me an error.
The result should be:
res_dict = {'Will': {'january': '7/01'}, 'Mary' : {'february': '2/02'} ,'George': {'march': '21/03'}, 'Steven': {'april': '14/03'}, 'Peter': {'may': ''}}
How can I get the desired result using setdefault()?

You could try this:
In [17]: results = {}
In [18]: for k, v in names.iteritems():
results[k] = {v: dates.setdefault(k, '')}
....:
....:
In [20]: results
Out[20]:
{'George': {'march': '21/02'},
'Mary': {'february': '2/02'},
'Peter': {'may': ''},
'Steven': {'april': '14/03'},
'Will': {'january': '7/01'}}
And as to your comment regarding adding month and day, you can add them similarly:
In [28]: for k, v in names.iteritems():
results[k] = {'month': v, 'day': dates.setdefault(k, '')}
....:
....:
In [30]: results
Out[30]:
{'George': {'day': '21/02', 'month': 'march'},
'Mary': {'day': '2/02', 'month': 'february'},
'Peter': {'day': '', 'month': 'may'},
'Steven': {'day': '14/03', 'month': 'april'},
'Will': {'day': '7/01', 'month': 'january'}}
And if you want to omit day completely in the case where a value doesn't exist:
In [8]: results = {}
In [9]: for k, v in names.iteritems():
...: results[k] = {'month': v}
...: if dates.has_key(k):
...: results[k]['day'] = dates[k]
...:
...:
In [10]: results
Out[10]:
{'George': {'day': '21/03', 'month': 'march'},
'Mary': {'day': '2/02', 'month': 'february'},
'Peter': {'month': 'may'},
'Steven': {'day': '14/03', 'month': 'april'},
'Will': {'day': '7/01', 'month': 'january'}}
And in the odd case where you know the date but not the month, iterating through the set of the keys (as #KayZhu suggested) with a defaultdict may be the easiest solution:
In [1]: from collections import defaultdict
In [2]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
In [3]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02', 'Marat': '27/03'}
In [4]: results = defaultdict(dict)
In [5]: for name in set(names.keys() + dates.keys()):
...: if name in names:
...: results[name]['month'] = names[name]
...: if name in dates:
...: results[name]['day'] = dates[name]
...:
...:
In [6]: for k, v in results.iteritems():
...: print k, v
...:
...:
George {'day': '21/03', 'month': 'march'}
Will {'day': '7/01', 'month': 'january'}
Marat {'day': '27/03'}
Steven {'day': '14/03', 'month': 'april'}
Peter {'month': 'may'}
Mary {'day': '2/02', 'month': 'february'}

A simple one-liner:
In [38]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
In [39]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02'}
In [40]: dict((name,{names[name]:dates.get(name,'')}) for name in names)
out[40]:
{'George': {'march': '21/03'},
'Mary': {'february': '2/02'},
'Peter': {'may': ''},
'Steven': {'april': '14/03'},
'Will': {'january': '7/01'}}

You will need get the superset keys from names and dates first:
>>> for k in set(names.keys() + dates.keys()):
... res_dict[k] = {names.setdefault(k, ''): dates.setdefault(k, None)}
...
...
>>> res_dict
{'Will': {'january': '7/01'}, 'Steven': {'april': '14/03'}, 'Peter': {'may': None},
'Mary': {'february': '2/02'}, 'George': {'march': '21/03'}}
Otherwise, you will miss out results whose keys are in dates but not in names.

Related

How to code for the value of "month" which produced the highest profit in a year?

Out of all the months in the year, I need to code the month with largest total balance (it's June as all together June has the biggest "amount" value)
lst = [
{'account': 'x\\*', 'amount': 300, 'day': 3, 'month': 'June'},
{'account': 'y\\*', 'amount': 550, 'day': 9, 'month': 'May'},
{'account': 'z\\*', 'amount': -200, 'day': 21, 'month': 'June'},
{'account': 'g', 'amount': 80, 'day': 10, 'month': 'May'},
{'account': 'x\\*', 'amount': 30, 'day': 16, 'month': 'August'},
{'account': 'x\\*', 'amount': 100, 'day': 5, 'month': 'June'},
]
The problem is that both "amount" and the name of the months are values.
I tried to find the total for each month, but I need to use for loop to code the highest month "amount".
My attempt:
get_sum = lambda my_dict, month: sum(d['amount']
for d in my_list if d['month'] == month)
total_June = get_sum(my_list,'June')
total_August = get_sum(my_list),'August')
A simple solution with pandas.
import pandas as pd
lst = [
{'account': 'x\\*', 'amount': 300, 'day': 3, 'month': 'June'},
{'account': 'y\\*', 'amount': 550, 'day': 9, 'month': 'May'},
{'account': 'z\\*', 'amount': -200, 'day': 21, 'month': 'June'},
{'account': 'g', 'amount': 80, 'day': 10, 'month': 'May'},
{'account': 'x\\*', 'amount': 30, 'day': 16, 'month': 'August'},
{'account': 'x\\*', 'amount': 100, 'day': 5, 'month': 'June'},
]
# convert list of dictionaries to dataframe
df = pd.DataFrame(lst)
# Get the row / series that has max amount.
# idxmax returns an index for loc.
max_series_by_amount = df.loc[df['amount'].idxmax(axis="index")]
# Get only month and amount in a plain list
print(max_series_by_amount[["month", "amount"]].tolist())
['May', 550]
Please note that using pandas adds a substantial amount of dependencies to the project, that said, pandas is commonly imported anyway for data science or data manipulation tasks. Pierre D solutions here are definitively faster.
One possibility (among many):
from itertools import groupby
from operator import itemgetter
mo_total = {
k: sum([d.get('amount', 0) for d in v])
for k, v in groupby(sorted(lst, key=itemgetter('month')), key=itemgetter('month'))
}
>>> mo_total
{'August': 30, 'June': 200, 'May': 630}
>>> max(mo_total.items(), key=lambda kv: kv[1])
('May', 630)
Without itemgetter:
bymonth = lambda d: d.get('month')
mo_total = {
k: sum([d.get('amount', 0) for d in v])
for k, v in groupby(sorted(lst, key=bymonth), key=bymonth)
}
Yet another way, using defaultdict:
from collections import defaultdict
tot = defaultdict(int)
for d in lst:
tot[d['month']] += d.get('amount', 0)
>>> tot
defaultdict(int, {'June': 200, 'May': 630, 'August': 30})
>>> max(tot, key=lambda k: tot[k])
'May'

Flatten/merge a list of dictionaries in python

I have a list of dictionaries:
data = [{'average': 2, 'day': '2022-01-01'},
{'average': 3, 'day': '2022-01-02'},
{'average': 5, 'day': '2022-01-03'},
{'sum': 8, 'day': '2022-01-01'},
{'sum': 15, 'day': '2022-01-02'},
{'sum': 9, 'day': '2022-01-03'},
{'total_value': 19, 'day': '2022-01-01'},
{'total_value': 99, 'day': '2022-01-02'},
{'total_value': 15, 'day': '2022-01-03'}]
I want my output as:
output = [{'average': 2, 'sum': 8, 'total_value': 19, 'day': '2022-01-01'},
{'average': 3, 'sum': 15, 'total_value': 99, 'day': '2022-01-02'},
{'average': 5, 'sum': 9, 'total_value': 15, 'day': '2022-01-03'}]
The output puts the values together based off their date. My approaches so far have been to try and separate everything out into different dictionaries (date_dict, sum_dict, etc.) and then bringing them all together, but that doesn't seem to work and is extremely sloppy.
You could iterate over data and create a dictionary using day as key:
data = [{'average': 2, 'day': '2022-01-01'},
{'average': 3, 'day': '2022-01-02'},
{'average': 5, 'day': '2022-01-03'},
{'sum': 8, 'day': '2022-01-01'},
{'sum': 15, 'day': '2022-01-02'},
{'sum': 9, 'day': '2022-01-03'},
{'total_value': 19, 'day': '2022-01-01'},
{'total_value': 99, 'day': '2022-01-02'},
{'total_value': 15, 'day': '2022-01-03'}]
output = {}
for item in data:
if item['day'] not in output:
output[item['day']] = item
else:
output[item['day']].update(item)
print(list(output.values()))
Out:
[
{'average': 2, 'day': '2022-01-01', 'sum': 8, 'total_value': 19},
{'average': 3, 'day': '2022-01-02', 'sum': 15, 'total_value': 99},
{'average': 5, 'day': '2022-01-03', 'sum': 9, 'total_value': 15}
]
Had a bit of fun and made it with dict/list comprehension. Check out that neat | operator in python 3.9+ :-)
Python <3.9
from collections import ChainMap
data_grouped_by_day = {
day : dict(ChainMap(*[d for d in data if d["day"] == day ]))
for day in {d["day"] for d in data }
}
for day, group_data in data_grouped_by_day.items():
group_data.update(day=day)
result = list(data_grouped_by_day.values())
Python 3.9+
from collections import ChainMap
result = [
dict(ChainMap(*[d for d in data if d["day"] == day ])) | {"day" : day}
for day in {d["day"] for d in data}
]
The output in both cases is (keys order may vary)
[{'total_value': 99, 'day': '2022-01-02', 'sum': 15, 'average': 3},
{'total_value': 15, 'day': '2022-01-03', 'sum': 9, 'average': 5},
{'total_value': 19, 'day': '2022-01-01', 'sum': 8, 'average': 2}]

How to select only specific key-value pairs from a list of dictionaries?

This is my example :
dictlist = [{'Name': 'James', 'city': 'paris','type': 'A' },
{'Name': 'James','city': 'Porto','type': 'B'},
{'Name': 'Christian','city': 'LA','type': 'A'}]
I want to filter specific keys and values.
For example:
desiredKey = [Name,type]
desiredoutput = [{'Name': 'Lara', 'type': 'A' },
{'Name': 'James', 'type': 'B'},
{'Name': 'Christian','type': 'A'}]
I tried this, but it doesn't work
keys = dictlist[0].keys()
output= [d for d in dictlist if d.keys in desiredKey]
You can try something like this:
In [1]: dictlist = [{'Name': 'James', 'city': 'paris','type': 'A' },
...: {'Name': 'James','city': 'Porto','type': 'B'},
...: {'Name': 'Christian','city': 'LA','type': 'A'}]
In [2]: keys = ["Name","type"]
In [3]: res = []
In [5]: for dict1 in dictlist:
...: result = dict((k, dict1[k]) for k in keys if k in dict1)
...: res.append(result)
...:
In [6]: res
Out[6]:
[{'Name': 'James', 'type': 'A'},
{'Name': 'James', 'type': 'B'},
{'Name': 'Christian', 'type': 'A'}]
It's a bit more complicated than this answer but you can also use zip and itemgetter.
In [43]: list_of_dicts = [
...: {"a":1, "b":1, "c":1, "d":1},
...: {"a":2, "b":2, "c":2, "d":2},
...: {"a":3, "b":3, "c":3, "d":3},
...: {"a":4, "b":4, "c":4, "d":4}
...: ]
In [44]: allowed_keys = ("a", "c")
In [45]: filter_func = itemgetter(*allowed_keys)
In [46]: list_of_filtered_dicts = [
...: dict(zip(allowed_keys, filter_func(d)))
...: for d in list_of_dicts
...: ]
In [47]: list_of_filtered_dicts
Out[47]: [{'a': 1, 'c': 1}, {'a': 2, 'c': 2}, {'a': 3, 'c': 3}, {'a': 4, 'c': 4}]

Summarize a list of dictionaries based on common key values

I have a list of dictionaries like so:
dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
{'day': 1, 'start': '10:00am', 'end': '7:00pm'},
{'day': 2, 'start': '8:00am', 'end': '5:00pm'},
{'day': 3, 'start': '10:00am', 'end': '7:00pm'},
{'day': 4, 'start': '8:00am', 'end': '5:00pm'},
{'day': 5, 'start': '11:00am', 'end': '1:00pm'}]
I want to summarize days that share the same 'start' and 'end' times.
For example,
summarylist = [([0,2, 4], '8:00am', '5:00pm'),
([1, 3], '10:00am', '7:00pm')
([5], '11:00am', '1:00pm')]
I have tried to adapt some other StackOverflow solutions re: sets and intersections to achieve this with no luck. I was trying to re-purpose the solution to this question to no avail. Hoping someone can point me in the right direction.
If you don't need the exact format that you provide you could use defaultdict
dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
{'day': 1, 'start': '10:00am', 'end': '7:00pm'},
{'day': 2, 'start': '8:00am', 'end': '5:00pm'},
{'day': 3, 'start': '10:00am', 'end': '7:00pm'},
{'day': 4, 'start': '8:00am', 'end': '5:00pm'},
{'day': 5, 'start': '11:00am', 'end': '1:00pm'}]
from collections import defaultdict
dd = defaultdict(list)
for d in dictlist:
dd[(d['start'],d['end'])].append(d['day'])
Result:
>>> dd
defaultdict(<type 'list'>, {('11:00am', '1:00pm'): [5], ('10:00am', '7:00pm'): [1, 3], ('8:00am', '5:00pm'): [0, 2, 4]})
And if format is important to you could do:
>>> my_list = [(v, k[0], k[1]) for k,v in dd.iteritems()]
>>> my_list
[([5], '11:00am', '1:00pm'), ([1, 3], '10:00am', '7:00pm'), ([0, 2, 4], '8:00am', '5:00pm')]
>>> # If you need the output sorted:
>>> sorted_my_list = sorted(my_list, key = lambda k : len(k[0]), reverse=True)
>>> sorted_my_list
[([0, 2, 4], '8:00am', '5:00pm'), ([1, 3], '10:00am', '7:00pm'), ([5], '11:00am', '1:00pm')]
With itertools.groupby:
In [1]: %paste
dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
{'day': 1, 'start': '10:00am', 'end': '7:00pm'},
{'day': 2, 'start': '8:00am', 'end': '5:00pm'},
{'day': 3, 'start': '10:00am', 'end': '7:00pm'},
{'day': 4, 'start': '8:00am', 'end': '5:00pm'},
{'day': 5, 'start': '11:00am', 'end': '1:00pm'}]
## -- End pasted text --
In [2]: from itertools import groupby
In [3]: tuplist = [(d['day'], (d['start'], d['end'])) for d in dictlist]
In [4]: key = lambda x: x[1]
In [5]: summarylist = [(sorted(e[0] for e in g),) + k
...: for k, g in groupby(sorted(tuplist, key=key), key=key)]
In [6]: summarylist
Out[6]:
[([1, 3], '10:00am', '7:00pm'),
([5], '11:00am', '1:00pm'),
([0, 2, 4], '8:00am', '5:00pm')]
You can use itertools.groupby like this.
source code:
from itertools import groupby
for k, grp in groupby(sorted(dictlist, key=lambda x:(x['end'], x['start'])), key=lambda x:(x['start'], x['end'])):
print [i['day'] for i in grp], k
output:
[5] ('11:00am', '1:00pm')
[0, 2, 4] ('8:00am', '5:00pm')
[1, 3] ('10:00am', '7:00pm')
But I think using defaultdict(#Akavall answer) is the right way in this particular case.

Python, collect data from an array of dicts

I'm new with Python and I have this structure achieved from a DB
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
I know that I can get the id of a record with:
data[i]['id']
but I need to collect by ID in a proper data structure, in order to get the average values for every ID.
What is the better choice for this?
I'm thinking build a new dict for every ID set, but the IDs can grow in number, and I don't figure out how tackle this problem. If someone can give me some idea I would be very grateful.
Assuming your data is sorted by ID as it appears in your data variable, you can try using itertools.groupby, which can be instructed to group by id. You can then create a new dictionary that has keys equal to the id numbers and values equal to the means:
In [1]: from itertools import groupby
In [2]: data=[
...: {'Value': '0.2', 'id': 1},
...: {'Value': '1.2', 'id': 1},
...: {'Value': '33.34', 'id': 2},
...: {'Value': '44.3', 'id': 3},
...: {'Value': '33.23', 'id': 3},
...: {'Value': '21.1', 'id': 4},
...: {'Value': '5.33', 'id': 4},
...: {'Value': '33.3', 'id': 5},
...: {'Value': '12.2', 'id': 5},
...: {'Value': '1.22', 'id': 5},
...: {'Value': '1.23', 'id': 6}
...: ]
In [3]: means = {}
In [4]: for k, g in groupby(data, key=lambda x: x['id']):
...: g = list(g)
...: means[k] = sum(float(x['Value']) for x in g) / len(g)
...:
...:
In [5]: means
Out[5]:
{1: 0.69999999999999996,
2: 33.340000000000003,
3: 38.765000000000001,
4: 13.215,
5: 15.573333333333332,
6: 1.23}
(Updated: after DSM's comment.)
You could reshape the data like this:
from collections import defaultdict
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
newdata = defaultdict(list)
for r in data:
newdata[r['id']].append(float(r['Value']))
This would yield:
In [2]: newdata
Out[2]: defaultdict(<type 'list'>, {1: [0.2, 1.2], 2: [33.34], 3: [44.3, 33.23], 4: [21.1, 5.33], 5: [33.3, 12.2, 1.22], 6: [1.23]})
(Update 2)
Calculating the means is now simple with a dictionary comprehension:
mean = {id: sum(values) / len(values) for id, values in newdata.viewitems()}
Which gives:
In [4]: mean
Out[4]: {1: 0.7, 2: 33.34, 3: 38.765, 4: 13.215, 5: 15.573333333333332, 6: 1.23}
If you have numpy, you could use it for this easily:
import numpy
numpy.mean([x['id'] for x in data])
Otherwise, it would be as simple as:
from __future__ import division # if python2.7
ids = [x['id'] for x in data]
print sum(ids)/len(ids)
You can simply create a list of IDs after all have been collected:
id_list = [element['id'] for element in data]
From there you can calculate whatever you want.

Categories

Resources