Dropping duplicates from json deep structure pandas

Dropping duplicates from json deep structure pandas - python

I am working on a scenario of converting excel to nested json with group by which is to extend to the header as well as the items.
Tried as below:
Able to apply transformation rules using pandas:
df['Header'] = df[['A','B']].to_dict('records')
df['Item'] = df[['A', 'C', 'D'].to_dict('records')
By this, I am able to separate the records into separate data frames.
Applying below:
data_groupedby = data.groupby(['A', 'B']).agg(list).reset_index()
result = data_groupedby['A','B','Item'].to_json(orient='records')
This gives me the required json with header as well as further drill down of items as a nested deep structure.
With groupby, I am able to group fields of header but not able to apply the group by to the respective items, and its not grouping correctly.
Any idea as how we can achieve it.
Example DS:
Excel:
A B C D
100 Test1 XX10 L
100 Test1 XX10 L
100 Test1 XX20 L
101 Test2 XX10 L
101 Test2 XX20 L
101 Test2 XX20 L
Current output:
[
{
"A": 100,
"B": "Test1",
"Item": [
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX10",
"D": "L"
},
{
"A": 100,
"C": "XX20",
"D": "L"
}
]
},
{
"A": 101,
"B": "Test2",
"Item": [
{
"A": 101,
"C": "XX10",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
},
{
"A": 101,
"C": "XX20",
"D": "L"
}
]
}
]
If you look at the Array Items, Same values are not grouped by and are repeated.
Thanks
TC

You can drop_duplicates and then groupby, then apply the to_dict transformation for columns C and D, and then clean up with reset_index and rename.
(data.drop_duplicates()
.groupby(["A", "B"])
.apply(lambda x: x[["C", "D"]].to_dict("records"))
.to_frame()
.reset_index()
.rename(columns={0: "Item"})
.to_dict("records"))
Output:
[{'A': 100,
'B': 'Test1',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]},
{'A': 101,
'B': 'Test2',
'Item': [{'C': 'XX10', 'D': 'L'}, {'C': 'XX20', 'D': 'L'}]}]

Related

GroupBy results to list of dictionaries, Using the grouped by object in it

My DataFrame looks like so:
Date Column1 Column2
1.1 A 1
1.1 B 3
1.1 C 4
2.1 A 2
2.1 B 3
2.1 C 5
3.1 A 1
3.1 B 2
3.1 C 2
And I'm looking to group it by Date and extract that data to a list of dictionaries so it appears like this:
[
{
"Date": "1.1",
"A": 1,
"B": 3,
"C": 4
},
{
"Date": "2.1",
"A": 2,
"B": 3,
"C": 5
},
{
"Date": "3.1",
"A": 1,
"B": 2,
"C": 2
}
]
This is my code so far:
df.groupby('Date')['Column1', 'Column2'].apply(lambda g: {k, v for k, v in g.values}).to_list()
Using this method can't use my grouped by objects in the apply method itself:
[
{
"A": 1,
"B": 3,
"C": 4
},
{
"A": 2,
"B": 3,
"C": 5
},
{
"A": 1,
"B": 2,
"C": 2
}
]
Using to_dict() giving me the option to reach the grouped by object, but not to parse it to the way I need.
Anyone familiar with some elegant way to solve it?
Thanks!!

You could first reshape your data using df.pivot, reset the index, and then apply to_dict to the new shape with the orient parameter set to "records". So:
import pandas as pd
data = {'Date': ['1.1', '1.1', '1.1', '2.1', '2.1', '2.1', '3.1', '3.1', '3.1'],
'Column1': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
'Column2': [1, 3, 4, 2, 3, 5, 1, 2, 2]}
df = pd.DataFrame(data)
df_pivot = df.pivot(index='Date',columns='Column1',values='Column2')\
.reset_index(drop=False)
result = df_pivot.to_dict('records')
target = [{'Date': '1.1', 'A': 1, 'B': 3, 'C': 4},
{'Date': '2.1', 'A': 2, 'B': 3, 'C': 5},
{'Date': '3.1', 'A': 1, 'B': 2, 'C': 2}]
print(result == target)
# True

Create nested json from list of dicts and list of strings

I have 3 list of dicts and a list of strings that I need to convert into json format. TBH, I am not sure where to begin but if anyone can point me in the right direction, that will be super helpful
lst_strings = ['one', 'two']
d1 = [{'a':1, 'b':2, 'c':3}]
d2 = [{'d':4, 'e':5, 'f':6}, {'d':7, 'e':8, 'f':9}]
d3 = [{'z': '0'}, {'g': 'false'}]
The json output I am looking for is formatted like this:
{
"one": [{
"a": "1",
"b": "2",
"c": "3",
"two": [{
"d": "4",
"e": "5",
"f": "6"
}, {
"d": "7",
"e": "8",
"f": "9"
}],
"z": "0"
}],
"g": false
}
What will be the best and most efficient way to achieve this result. I know I should post what I have done so far but honestly, my brain is dysfunctional at the moment.

Try this:
import json
lst_strings = ['one', 'two', 'three']
d1 = [{'a':1, 'b':2, 'c':3}]
d2 = [{'d':4, 'e':5, 'f':6}, {'d':7, 'e':8, 'f':9}]
d3 = [{'z': '0'}, {'g': 'false'}]
d = {}
for i,s in enumerate(lst_strings):
d[s] = eval(f'd{i+1}')
j = json.dumps(d)
print(j)
Output:
{"one": [{"a": 1, "b": 2, "c": 3}], "two": [{"d": 4, "e": 5, "f": 6}, {"d": 7, "e": 8, "f": 9}], "three": [{"z": "0"}, {"g": "false"}]}

Python Pandas: Nested Dictionary

I have a list of dictionaries that I wish to manipulate using Pandas. Say:
m = [{"topic": "A", "type": "InvalidA", "count": 1}, {"topic": "A", "type": "InvalidB", "count": 1}, {"topic": "A", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidB", "count": 1}]
1) first create a dataframe using the constructor:
df = pd.DataFrame(m)
2) Group by columns ['topic] and ['type'] and count
df_group = df.groupby(['topic', 'type']).count()
I end up with:
count
topic type
A InvalidA 2
InvalidB 1
B InvalidA 2
InvalidB 1
I want to now convert this to a nested dict:
{ "A" : {"InvalidA" : 2,
"InvalidB" : 1},
"B" : {"InvalidA" : 2,
"InvalidB": 1}
}
Any suggestions on how to get from df_group to a nested dict?

Using unstack + to_dict
df_group['count'].unstack(0).to_dict()
Out[446]: {'A': {'InvalidA': 2, 'InvalidB': 1}, 'B': {'InvalidA': 2, 'InvalidB': 1}}
And also slightly change you groupby to crosstab
pd.crosstab(df.type,df.topic).to_dict()
Out[449]: {'A': {'InvalidA': 2, 'InvalidB': 1}, 'B': {'InvalidA': 2, 'InvalidB': 1}}

Nested Dicts in Python keys and values

I got a dict of dicts which looks like this:
d {
1: {
a: 'aaa',
b: 'bbb',
c: 'ccc'
}
2: {
d: 'dddd',
a: 'abc',
c: 'cca'
}
3: {
e: 'eee',
a: 'ababa',
b: 'bebebe'
}
}
I want to convert by dict like this
d {
a: 1,2,3
b: 1,3
c: 1,2
d: 2
e: 3
}
How can I achieve this?I tried reversing it but it throws unhashable dict.

a = {
1: {
"a": "aaa",
"b": "bbb",
"c": "ccc"
},
2: {
"d": "ddd",
"a": "abc",
"c": "cca"
},
3: {
"e": "eee",
"a": "ababa",
"b": "bebebe"
}
}
from collections import defaultdict
b = defaultdict(list)
for i, v in a.items():
for j in v:
b[j].append(i)
The result b is:
defaultdict(<class 'list'>, {'a': [1, 2, 3], 'b': [1, 3], 'c': [1, 2], 'd': [2], 'e': [3]})

You just need to figure out the logic for it. Iterate through the main dictionary, and use the keys of the sub dictionaries to build your new dict.
d = {
1: {
'a': 'aaa',
'b': 'bbb',
'c': 'ccc'
},
2: {
'd': 'dddd',
'a': 'abc',
'c': 'cca'
},
3: {
'e': 'eee',
'a': 'ababa',
'b': 'bebebe'
}
}
newdict = {}
for k,v in d.items():
for keys in v:
newdict.setdefault(keys,[]).append(k)
print(newdict)

Add one property to json in python

I have the below json, I want to add one more property:
[ {
"A": 1,
"B": "str"
},
{
"A": 2,
"B": "str2"
},
{
"A": 3,
"B": "str3"
}
]
So I want something like this:
[ {
"A": 1,
"B": "str",
"C": "X"
},
{
"A": 2,
"B": "str2",
"C": "X"
},
{
"A": 3,
"B": "str3",
"C": "X"
}
]
What is the best way to do this?

Loop through each dict obj in the list and add required key value pair that you want:
List Before
list1 = [
{
"A": 1,
"B": "str"
},
{
"A": 2,
"B": "str2"
},
{
"A": 3,
"B": "str3"
}
]
The code
for l in list1:
l['C'] = 'X'
print(list1)
List After i.e Output
[{'A': 1, 'B': 'str', 'C': 'X'}, {'A': 2, 'B': 'str2', 'C': 'X'}, {'A': 3, 'B': 'str3', 'C': 'X'}]

>>> j = [ { "A": 1, "B": "str" }, { "A": 2, "B": "str2" }, { "A": 3, "B": "str3" } ]
>>> [i.update({'C': 'X'}) for i in j]
>>> j
[{'A': 1, 'B': 'str', 'C': 'X'}, {'A': 2, 'B': 'str2', 'C': 'X'}, {'A': 3, 'B': 'str3', 'C': 'X'}]
Or, as per coldspeed's comment:
>>> for item in j:
... item['C'] = 'X'
...
>>> j
[{'A': 1, 'B': 'str', 'C': 'X'}, {'A': 2, 'B': 'str2', 'C': 'X'}, {'A': 3, 'B': 'str3', 'C': 'X'}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dropping duplicates from json deep structure pandas - python

Related

GroupBy results to list of dictionaries, Using the grouped by object in it

Create nested json from list of dicts and list of strings

Python Pandas: Nested Dictionary

Nested Dicts in Python keys and values

Add one property to json in python

Categories

Resources