Pandas groupby and then apply to_dict('records') - python

Suppose I have the following data frame:
df = pd.DataFrame({'a': [1,1,1,2], 'b': ['a', 'a', 'b', 'c'], 'd': [1, 2, 3, 4]})
And I want to end with the following dict:
{1: [{'b':'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}], 2: [{'b': 'c', 'd': 4}]}
Basically, I want to group by a and for each data frame I want to apply to_dict('records').
What I tried was the following:
# dict ok but not a list
df.groupby('a').agg(list).to_dict('index')
{1: {'b': ['a', 'a', 'b'], 'd': [1, 2, 3]}, 2: {'b': ['c'], 'd': [4]}}
# the index disappears
df.groupby('a').agg(list).to_dict('records')
[{'b': ['a', 'a', 'b'], 'd': [1, 2, 3]}, {'b': ['c'], 'd': [4]}]
df.set_index('a').to_dict('index')
ValueError: DataFrame index must be unique for orient='index'
I think I can do it using a for-loop but I'm almost sure there is a pythonic way to do it.

You could do:
df.assign(dicts=df.drop(columns="a").to_dict("records")).groupby("a")["dicts"].agg(
list
).to_dict()

Here is a way using groupby() and apply()
df.groupby('a').apply(lambda x: x[['b','d']].to_dict('records')).to_dict()
Output:
{1: [{'b': 'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}],
2: [{'b': 'c', 'd': 4}]}

Following your logic, I think one way to avoid a for-loop, is to use GroupBy.apply with zip inside a listcomp to iterate over both columns in // :
out = df.groupby("a").apply(lambda x: [{"b": y, "d": z}
for y, z in zip(x["b"], x["d"])]).to_dict()
If you need to zip more than two columns (dynamically), use this variant :
out = df.groupby("a").apply(lambda x: [dict(zip(x.columns[1:], row))
for row in x[x.columns[1:]].to_numpy()]).to_dict()
​
Output :
print(out)
#{1: [{'b': 'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}], 2: [{'b': 'c', 'd': 4}]}

Related

grouping values based on date and time [duplicate]

I want to merge list of dictionaries in python. The number of dictionaries contained inside the list is not fixed and the nested dictionaries are being merged on both same and different keys. The dictionaries within the list do not contain nested dictionary. The values from same keys can be stored in a list.
My code is:
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3} ...... ]
output = {}
for i in list_of_dict:
for k,v in i.items():
if k in output:
output[k].append(v)
else:
output[k] = [v]
Is there a shorter and faster way of implementing this?
I am actually trying to implement the most fast way of doing this because the list of dictionary is very large and then there are lots of rows with such data.
One way using collections.defaultdict:
from collections import defaultdict
res = defaultdict(list)
for d in list_of_dict:
for k, v in d.items():
res[k].append(v)
Output:
defaultdict(list,
{'a': [1, 3, 3, 3],
'b': [2, 5],
'c': [3],
'k': [5, 5],
'j': [5],
'd': [4]})
items() is a dictionary method, but list_of_dict is a list. You need a nested loop so you can loop over the dictionaries and then loop over the items of each dictionary.
ou = {}
for d in list_of_dict:
for key, value in d.items():
output.setdefault(key, []).append(value)
another shorten version can be,
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3}]
output = {
k: [d[k] for d in list_of_dict if k in d]
for k in set().union(*list_of_dict)
}
print(output)
{'d': [4], 'k': [5, 5], 'a': [1, 3, 3, 3], 'j': [5], 'c': [3], 'b': [2, 5]}
Python 3.9+ you can use the merge operator for this.
def merge_dicts(dicts):
result = dict()
for _dict in dicts:
result |= _dict
return result
One of the shortest way would be to
prepare a list/set of all the keys from all the dictionaries
and call that key on all the dictionary in the list.
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3}]
# prepare a list/set of all the keys from all the dictionaries
# method 1: use sum
all_keys = sum([[a for a in x.keys()] for x in list_of_dict], [])
# method 2: use itertools
import itertools
all_keys = list(itertools.chain.from_iterable(list_of_dict))
# method 3: use union of the set
all_keys = set().union(*list_of_dict)
print(all_keys)
# ['a', 'b', 'c', 'a', 'b', 'k', 'j', 'a', 'k', 'd', 'a']
# convert the list to set to remove duplicates
all_keys = set(all_keys)
print(all_keys)
# {'a', 'k', 'c', 'd', 'b', 'j'}
# now merge the dictionary
merged = {k: [d.get(k) for d in list_of_dict if k in d] for k in all_keys}
print(merged)
# {'a': [1, 3, 3, 3], 'k': [5, 5], 'c': [3], 'd': [4], 'b': [2, 5], 'j': [5]}
In short:
all_keys = set().union(*list_of_dict)
merged = {k: [d.get(k) for d in list_of_dict if k in d] for k in all_keys}
print(merged)
# {'a': [1, 3, 3, 3], 'k': [5, 5], 'c': [3], 'd': [4], 'b': [2, 5], 'j': [5]}

How to merge list of dictionaries in python in shortest and fastest way possible?

I want to merge list of dictionaries in python. The number of dictionaries contained inside the list is not fixed and the nested dictionaries are being merged on both same and different keys. The dictionaries within the list do not contain nested dictionary. The values from same keys can be stored in a list.
My code is:
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3} ...... ]
output = {}
for i in list_of_dict:
for k,v in i.items():
if k in output:
output[k].append(v)
else:
output[k] = [v]
Is there a shorter and faster way of implementing this?
I am actually trying to implement the most fast way of doing this because the list of dictionary is very large and then there are lots of rows with such data.
One way using collections.defaultdict:
from collections import defaultdict
res = defaultdict(list)
for d in list_of_dict:
for k, v in d.items():
res[k].append(v)
Output:
defaultdict(list,
{'a': [1, 3, 3, 3],
'b': [2, 5],
'c': [3],
'k': [5, 5],
'j': [5],
'd': [4]})
items() is a dictionary method, but list_of_dict is a list. You need a nested loop so you can loop over the dictionaries and then loop over the items of each dictionary.
ou = {}
for d in list_of_dict:
for key, value in d.items():
output.setdefault(key, []).append(value)
another shorten version can be,
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3}]
output = {
k: [d[k] for d in list_of_dict if k in d]
for k in set().union(*list_of_dict)
}
print(output)
{'d': [4], 'k': [5, 5], 'a': [1, 3, 3, 3], 'j': [5], 'c': [3], 'b': [2, 5]}
Python 3.9+ you can use the merge operator for this.
def merge_dicts(dicts):
result = dict()
for _dict in dicts:
result |= _dict
return result
One of the shortest way would be to
prepare a list/set of all the keys from all the dictionaries
and call that key on all the dictionary in the list.
list_of_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a': 3, 'b': 5}, {'k': 5, 'j': 5}, {'a': 3, 'k': 5, 'd': 4}, {'a': 3}]
# prepare a list/set of all the keys from all the dictionaries
# method 1: use sum
all_keys = sum([[a for a in x.keys()] for x in list_of_dict], [])
# method 2: use itertools
import itertools
all_keys = list(itertools.chain.from_iterable(list_of_dict))
# method 3: use union of the set
all_keys = set().union(*list_of_dict)
print(all_keys)
# ['a', 'b', 'c', 'a', 'b', 'k', 'j', 'a', 'k', 'd', 'a']
# convert the list to set to remove duplicates
all_keys = set(all_keys)
print(all_keys)
# {'a', 'k', 'c', 'd', 'b', 'j'}
# now merge the dictionary
merged = {k: [d.get(k) for d in list_of_dict if k in d] for k in all_keys}
print(merged)
# {'a': [1, 3, 3, 3], 'k': [5, 5], 'c': [3], 'd': [4], 'b': [2, 5], 'j': [5]}
In short:
all_keys = set().union(*list_of_dict)
merged = {k: [d.get(k) for d in list_of_dict if k in d] for k in all_keys}
print(merged)
# {'a': [1, 3, 3, 3], 'k': [5, 5], 'c': [3], 'd': [4], 'b': [2, 5], 'j': [5]}

Turn a list into multiple dictionaries in a list

I have a list like this:
original_list= ['A', 'B', 'C', 'D', 'E']
How would I be able to write a function to convert it into this:
converted_list= [{'A': 1}, {'B': 1}, {'C': 1}, {'D': 1}, {'E': 1}]
All the values in each dictionary are 1.
Thank you!
Use a list comprehension:
converted_list = [{s: 1} for s in original_list]

Create multiple dictionaries from lists partitions

So I have a list:
[ 1, 2, 3, 4, 5 ]
And two lists of the form
['A', 'B', 'C'] [ 'D', 'E']
whose total length sum is equal to the original list (partition). How can I obtain the following dictionaries in Python:
{'A': 1, 'B': 2, 'C': 3 } {'D': 4, 'E': 5}
Thanks
You can use next with iter:
values = [ 1, 2, 3, 4, 5 ]
lists = [['A', 'B', 'C'], ['D', 'E']]
itr = iter(values)
result = [{key: next(itr) for key in lst} for lst in lists]
Output:
[{'A': 1, 'B': 2, 'C': 3}, {'D': 4, 'E': 5}]

python lambda expression with dynamic fields

I have a list of dictionaries. I want to be able to filter this list with a dynamic list of fields. So that;
my_list = [{'a': 1, 'b': 1, 'c': 1}, {'a': 1, 'b': 1, 'c': 2}, {'a': 1, 'b': 2, 'c': 2}]
reference_row = {'a': 1, 'b': 1, 'c': 1}
compare_fields = ['a'] # Compares only field 'a' of reference row with rows in my_list
# Magical filter expression results in [{'a': 1, 'b': 1, 'c': 1}, {'a': 1, 'b': 1, 'c': 2}, {'a': 1, 'b': 2, 'c': 2}]
compare_fields = ['a', 'b'] # Compares fields 'a' and 'b' of reference row with rows in my_list
# Magical filter expression results in [{'a': 1, 'b': 1, 'c': 1}, {'a': 1, 'b': 1, 'c': 2}]
compare_fields = ['a', 'b', 'c'] # Compares fields 'a', 'b' and 'c' of reference row with rows in my_list
# Magical filter expression results in [{'a': 1, 'b': 1, 'c': 1}]
I've tried something like the following but it did not work:
list(filter(lambda d: (d[field] == reference_row[field] for field in compare_fields ), my_list))
I do not want to go over the items in compare_fields and filter by one field in each iteration. Any neat way of doing this?
You need an all function which is only true if every element of an iterable is True. Otherwise every input to your filter was returning True.
list(filter(lambda d: all(d[field] == reference_row[field] for field in compare_fields), my_list))
I think this is a little cleaner
[d for d in my_list if
all(d[field] == reference_row[field] for field in compare_fields)]

Categories

Resources