Related
I am working on a scenario of converting excel to nested json with group by which is to extend to the header as well as the items.
Tried as below:
Able to apply transformation rules using pandas:
df['Header'] = df[['A','B']].to_dict('records')
df['Item'] = df[['A', 'C', 'D'].to_dict('records')
By this, I am able to separate the records into separate data frames.
Applying below:
data_groupedby = data.groupby(['A', 'B']).agg(list).reset_index()
result = data_groupedby['A','B','Item'].to_json(orient='records')
This gives me the required json with header as well as further drill down of items as a nested deep structure.
With groupby, I am able to group fields of header but not able to apply the group by to the respective items, and its not grouping correctly.
Any idea as how we can achieve it.
Example DS:
Excel:
A B C D
100 Test1 XX10 L
100 Test1 XX10 L
100 Test1 XX20 L
101 Test2 XX10 L
101 Test2 XX20 L
101 Test2 XX20 L
Current output:
[
{
"A": 100,
"B": "Test1",
"Item": [
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX20",
"D": "L"
}
]
},
{
"A": 101,
"B": "Test2",
"Item": [
{
"A": 101,
"C": "XX10",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
}
]
}
]
If you look at the Array Items, Same values are not grouped by and are repeated.
Thanks
TC
You can drop_duplicates and then groupby, then apply the to_dict transformation for columns C and D, and then clean up with reset_index and rename.
(data.drop_duplicates()
.groupby(["A", "B"])
.apply(lambda x: x[["C", "D"]].to_dict("records"))
.to_frame()
.reset_index()
.rename(columns={0: "Item"})
.to_dict("records"))
Output:
[{'A': 100,
'B': 'Test1',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]},
{'A': 101,
'B': 'Test2',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]}]
I have nested dict something like that
my_dict= {'name1': {'code1': {'brand1': 2}},'name2': {'code2.1': {'brand2.1': 2,'brand2.2': 8,'brand2.3': 5, 'brand2.4': 4},'code2.2': {'brand2.1': 2, 'brand1': 1, 'brand2.5': 25}},'name3': {'code1': {'brand2.1': 2},'code3': {'brand4': 1,'brand3.1':2}}}
I need sort on the level "code" with depending on summing values "brands". For example,
target_dict= {'name1': {'code1': {'brand1': 2}}, 'name2': {'code2.2': {'brand2.1':2,'brand1': 1,'brand2.5': 25},'code2.1': {'brand2.1': 2,'brand2.2': 8,'brand2.3': 5,'brand2.4': 4}}, 'name3': {'code3': {'brand4': 1, 'brand3.1':2},'code1': {'brand2.1': 2}}}
*# 'code2.2' first because 2+1+25=28 > 2+8+5+4=19
# 'code3' first because 1+2=3 > 2
I can sum values "brands" by "code" with
sum_values = [[[i, sum(v[i].values())] for i in v.keys()] for x,y in v.items() for k,v in my_dict.items()]
and try combine with sort function as
target_dict = sorted(my_dict.items(), key=lambda i: [[[i, sum(v[i].values())] for i in v.keys()] for x,y in v.items() for k,v in my_dict.items()], reverse=True).
Thanks for your attention and help!
Try (assuming sufficient version of Python to preserve creation order of dict):
my_dict = {
"name1": {"code1": {"brand1": 2}},
"name2": {
"code2.1": {"brand2.1": 2, "brand2.2": 8, "brand2.3": 5, "brand2.4": 4},
"code2.2": {"brand2.1": 2, "brand1": 1, "brand2.5": 25},
},
"name3": {"code1": {"brand2.1": 2}, "code3": {"brand4": 1, "brand3.1": 2}},
}
out = {
k: dict(sorted(v.items(), key=lambda d: sum(d[1].values()), reverse=True))
for k, v in my_dict.items()
}
print(out)
Prints:
{
"name1": {"code1": {"brand1": 2}},
"name2": {
"code2.2": {"brand2.1": 2, "brand1": 1, "brand2.5": 25},
"code2.1": {"brand2.1": 2, "brand2.2": 8, "brand2.3": 5, "brand2.4": 4},
},
"name3": {"code3": {"brand4": 1, "brand3.1": 2}, "code1": {"brand2.1": 2}},
}
I got a dict of dicts which looks like this:
d {
1: {
a: 'aaa',
b: 'bbb',
c: 'ccc'
}
2: {
d: 'dddd',
a: 'abc',
c: 'cca'
}
3: {
e: 'eee',
a: 'ababa',
b: 'bebebe'
}
}
I want to convert by dict like this
d {
a: 1,2,3
b: 1,3
c: 1,2
d: 2
e: 3
}
How can I achieve this?I tried reversing it but it throws unhashable dict.
a = {
1: {
"a": "aaa",
"b": "bbb",
"c": "ccc"
},
2: {
"d": "ddd",
"a": "abc",
"c": "cca"
},
3: {
"e": "eee",
"a": "ababa",
"b": "bebebe"
}
}
from collections import defaultdict
b = defaultdict(list)
for i, v in a.items():
for j in v:
b[j].append(i)
The result b is:
defaultdict(<class 'list'>, {'a': [1, 2, 3], 'b': [1, 3], 'c': [1, 2], 'd': [2], 'e': [3]})
You just need to figure out the logic for it. Iterate through the main dictionary, and use the keys of the sub dictionaries to build your new dict.
d = {
1: {
'a': 'aaa',
'b': 'bbb',
'c': 'ccc'
},
2: {
'd': 'dddd',
'a': 'abc',
'c': 'cca'
},
3: {
'e': 'eee',
'a': 'ababa',
'b': 'bebebe'
}
}
newdict = {}
for k,v in d.items():
for keys in v:
newdict.setdefault(keys,[]).append(k)
print(newdict)
I have multiples list of dictionaries like this:
list_of_dictionaries_1 = [{
'a':1,
'b':2
}, {
'a':3,
'b':4
}]
list_of_dictionaries_2 = [{
'c':1,
'd':2
}, {
'c':3,
'd':4
}]
And I want to add each element into a new dictionary.
new_dictionary = {
data: [{
'a':1,
'b':2
}, {
'a':3,
'b':4
}, {
'c':1,
'd':2
}, {
'c':3,
'd':4
}]
}
So I made this for each list of dictionaries:
for dictionary_ in list_of_dictionaries_1:
new_dictionary['data'] = dictionary_
But this just return the last element in the list of dictionaries.
new_dictionary = {
data:[{
'c':3,
'd':4
}]
}
How can I add all de dictionaries in the new dictionary?
If I understood correctly, you could do it like this:
new_dictionary = {'data': []}
for elem in list_of_dictionaries_1 + list_of_dictionaries_2:
new_dictionary['data'].append(elem)
print(new_dictionary)
Output:
{'data': [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'c': 1, 'd': 2}, {'c': 3, 'd': 4}]}
You can use itertools.chain to merge the two lists:
from itertools import chain
new_dictionary = {'data': list(chain(list_of_dictionaries_1, list_of_dictionaries_2))}
new_dictionary becomes:
{'data': [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'c': 1, 'd': 2}, {'c': 3, 'd': 4}]}
Your dictionary structure looks inconsistent. But however, you can do the below to achieve what you are trying for.
list_of_dictionaries_1 = [{'a':1, 'b':2 }, {'a':3, 'b':4}]
list_of_dictionaries_2 = [{'c':1, 'd':2 }, {'c':3, 'd':4 }]
list_of_dictionaries_1.extend(list_of_dictionaries_2)
print(list_of_dictionaries_1)
Output:
[{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'c': 1, 'd': 2}, {'c': 3, 'd': 4}]
I have 2 nested dictionaries variable that have the similar keys, each defining different values :
data1 = {
"2010":{
'A':2,
'B':3,
'C':5
},
"2011":{
'A':1,
'B':2,
'C':3
},
"2012":{
'A':1,
'B':2,
'C':4
}
}
data2 = {
"2010":{
'A':4,
'B':4,
'C':5
},
"2011":{
'A':1,
'B':1,
'C':3
},
"2012":{
'A':3,
'B':2,
'C':4
}
}
In my case, i need to sum both dictionaries values based on the same keys, so the answer will be like this:
data3 = {
"2010":{
'A':6,
'B':7,
'C':10
},
"2011":{
'A':2,
'B':3,
'C':6
},
"2012":{
'A':4,
'B':4,
'C':8
}
}
How can i do that?
Given the structure of the two dictionaries is the same, you can use dictionary comprehension for that:
data3 = {key:{key2:val1+data2[key][key2] for key2,val1 in subdic.items()} for key,subdic in data1.items()}
In the repl:
>>> {key:{key2:val1+data2[key][key2] for key2,val1 in subdic.items()} for key,subdic in data1.items()}
{'2010': {'B': 7, 'C': 10, 'A': 6}, '2012': {'B': 4, 'C': 8, 'A': 4}, '2011': {'B': 3, 'C': 6, 'A': 2}}
The comprehension works as follows: in the outerloop, we iterate over the key,subdic of data1. So in your case, key is a year and subdic is the dictionary (of data1) for that year.
Now for each of these years, we iterate over the items of the subdic and here key2 is 'A', 'B' and 'C'. val1 is the value that we find in data1 for these keys. We get the other value by querying data2[key][key2]. We sum these up and construct new dictionaries for that.
I hope this helps:
data1 = { "2010":{ 'A':2, 'B':3, 'C':5 }, "2011":{ 'A':1, 'B':2, 'C':3 }, "2012":{ 'A':1, 'B':2, 'C':4 } }
data2 = { "2010":{ 'A':4, 'B':4, 'C':5 }, "2011":{ 'A':1, 'B':1, 'C':3 }, "2012":{ 'A':3, 'B':2, 'C':4 } }
data3 = {}
for data in [data1,data2]:
for year in data.keys():
for x,y in data[year].items():
if not year in data3.keys():
data3[year] = {x:y}
else:
if not x in data3[year].keys():
data3[year].update({x:y})
else:
data3[year].update({x:data3[year][x] + y})
print data3
This works for arbitrary lengths of the inner and outer dictionaries.
Another solution :)
You can also use zip to get both data1 and data2 in the same for loop, and then use collections.Counter to add the value of each dicts.
from collections import Counter
>> {k1: Counter(v1) + Counter(v2) for (k1, v1), (k2, v2) in zip(sorted(data1.items()), sorted(data2.items()))}
{'2011': Counter({'C': 6, 'B': 3, 'A': 2}), '2010': Counter({'C': 10, 'B': 7, 'A': 6}), '2012': Counter({'C': 8, 'A': 4, 'B': 4})}
You will ended with Counter dict but since it is a subclass of dict you can still use the same method as a regular dict.
If you add dict() to Max Chrétiens' nice short solution from above, you will end up with regular dictionaries:
data3 = {k1: dict(Counter(v1) + Counter(v2)) for (k1, v1), (k2, v2) in
zip(data1.items(), data2.items())}
This will, however, only work correctly if both dictionaries share exactly the same keys as already discussed above. Willem Van Onsem's solution will not work if there are any keys not shared by both dictionaries either (it will result in an error, whereas Max Chrétiens' solution will in this case merge items incorrectly). Now you mentioned you are using JSON data which always contains the same structure with similar keys, so this should not constitute a problem and Max Chrétien's solution should work nicely.
In case you do want to make sure only keys shared by both dictionaries (and their subdictionaries) are used, the following will work. Notice how I added 'X': 111111 as a key value pair to the 2012 subdictionary and "1999": { 'Z': 999999 } as an entire subdictionary.
def sum_two_nested_dicts(d1, d2):
dicts = [d1, d2]
d_sum = {}
for topkey in dicts[0]:
if topkey in dicts[1]:
d_sum[topkey] = {}
for key in dicts[0][topkey]:
if key in dicts[1][topkey]:
new_val = sum([d[topkey][key] for d in dicts])
d_sum[topkey][key] = new_val
return d_sum
data1 = {
"2010": {
'A': 2,
'B': 3,
'C': 5
},
"2011": {
'A': 1,
'B': 2,
'C': 3
},
"2012": {
'A': 1,
'B': 2,
'C': 4,
'X': 111111
},
"1999": {
'Z': 999999
}
}
data2 = {
"2010": {
'A': 4,
'B': 4,
'C': 5
},
"2011": {
'A': 1,
'B': 1,
'C': 3
},
"2012": {
'A': 3,
'B': 2,
'C': 4
}
}
data3 = sum_two_nested_dicts(data1, data2)
print(data3)
# different order of arguments
data4 = sum_two_nested_dicts(data2, data1)
print(data4)
# {'2010': {'C': 10, 'A': 6, 'B': 7}, '2012': {'C': 8, 'A': 4, 'B': 4}, '2011': {'C': 6, 'A': 2, 'B': 3}}
# {'2010': {'C': 10, 'A': 6, 'B': 7}, '2012': {'C': 8, 'A': 4, 'B': 4}, '2011': {'C': 6, 'A': 2, 'B': 3}}
I realize this is far from as concise and elegant as can be, but as I already wrote it anyways, I post it here in case someone is trying to achieve this particular functionality.
Long and bloated version which retains unshared keys/values, just because I already wrote it...
def sum_nested_dicts(dic1, dic2):
# create list of both dictionaries
dicts = [dic1, dic2]
# create a set of all unique keys from both dictionaries
topkeys = set(sum([list(dic.keys()) for dic in dicts], []))
# this is the merged dictionary to be returned
d_sum = {}
for topkey in topkeys:
# if topkey is shared by both dictionaries
if topkey in dic1 and topkey in dic2:
d_sum[topkey] = {}
keys = set(sum([list(dic[topkey].keys()) for dic in
dicts], []))
for key in keys:
# if key is shared by both subdictionaries
if key in dic1[topkey] and key in dic2[topkey]:
new_val = sum([d[topkey][key] for d in dicts])
d_sum[topkey][key] = new_val
# if key is only contained in one subdictionary
elif key in dic1[topkey]:
d_sum[topkey][key] = dic1[topkey][key]
elif key in dic2[topkey]:
d_sum[topkey][key] = dic2[topkey][key]
# if topkey is only contained in one dictionary
elif topkey in dic1:
d_sum[topkey] = dic1[topkey]
elif topkey in dic2:
d_sum[topkey] = dic2[topkey]
return d_sum
See Crystal's solution for what seems to be the most concise and functional solution posted thus far.