List to a Readable Representation using Python - python

I have data as
[{'name': 'A', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'B', 'subsets': ['B_1', 'B_A'], 'cluster': 2},
{'name': 'C', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'D', 'subsets': ['D_1', 'D_2', 'D_3', 'D_4'], 'cluster': 1}]
I need to represent it as
Cluster Number Subset Name
0 ['X_1', 'X_A', 'X_B'] A, C
1 ['D_1', 'D_2', 'D_3', 'D_4'] D
2 ['B_1', 'B_A'] B

For the sake of completeness, I think it is fair to mention that you can actually create a dataframe without json_normalize in your case and apply groupby as originally shown here:
import pandas as pd
data = [{'name': 'A', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'B', 'subsets': ['B_1', 'B_A'], 'cluster': 2},
{'name': 'C', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'D', 'subsets': ['D_1', 'D_2', 'D_3', 'D_4'], 'cluster': 1}]
df = pd.DataFrame(data).groupby('cluster')
.agg({'subsets':'first','name':', '.join})
.reset_index()
.set_index('cluster')
.rename_axis('Cluster Number')
subsets name
Cluster Number
0 [X_1, X_A, X_B] A, C
1 [D_1, D_2, D_3, D_4] D
2 [B_1, B_A] B

You can use json_normalize + groupby "cluster" and apply join to "name" and first to "subsets":
df = pd.json_normalize(data).groupby('cluster').agg({'subsets':'first','name':', '.join}).reset_index()
Output:
cluster subsets name
0 0 [X_1, X_A, X_B] A, C
1 1 [D_1, D_2, D_3, D_4] D
2 2 [B_1, B_A] B

Related

pandas way to turn DataFrame of sets into DataFrame of dictionaries with value in corresponding cell in other DataFrame

It's hard to explain what I'm trying to do so I'll give an example. In the example below, I am trying to get df3. I have done it with the code below but it is very "anti-pandas" and I am looking for a better (faster, cleaner, more pandas-esque) way to do it:
import pandas as pd
df1 = pd.DataFrame({"begin": [{"a", "b"}, {"b"}, {"c"}], "end": [{"x"}, {"z", "y"}, {"z"}]})
df2 = pd.DataFrame(
{"a": [10, 10, 15], "b": [15, 20, 30], "c": [8, 12, 10], "x": [1, 2, 3], "y": [1, 3, 4], "z": [1, 3, 1]}
)
df3 = df1.copy()
for i in range(len(df1)):
for j in range(len(df1.loc[i])):
df3.at[i, df1.columns[j]] = []
for v in df1.loc[i][j]:
df3.at[i, df1.columns[j]].append({"letter": v, "value": df2.loc[i][v]})
print(df3)
Here's my goal (which this code does, just probably not in the best way):
begin end
0 [{'letter': 'b', 'value': 15}, {'letter': 'a', 'value': 10} [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'y', 'value': 3}, {'letter': 'z', 'value': 3}
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]
Here is one way to approach the problem using pandas
# Reshape and explode the dataframe
s = df1.stack().explode().reset_index(name='letter')
# Map the values corresponding to the letters
s['value'] = s.set_index(['level_0', 'letter']).index.map(df2.stack())
# Assign list of records
s['records'] = s[['letter', 'value']].to_dict('records')
# Pivot with aggfunc as list
s = s.pivot_table('records', 'level_0', 'level_1', aggfunc=list)
print(s)
level_1 begin end
level_0
0 [{'letter': 'a', 'value': 10}, {'letter': 'b', 'value': 15}] [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'z', 'value': 3}, {'letter': 'y', 'value': 3}]
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]

Reverse the group/items in Python

I have a table like this:
Group
Item
A
a, b, c
B
b, c, d
And I want to convert to like this:
Item
Group
a
A
b
A, B
c
A, B
d
B
What is the best way to achieve this?
Thank you!!
If you are working in pandas, you can use 'explode' to unpack items, and can use 'to_list' lambda for the grouping stage.
Here is some info on 'explode' method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html.
import pandas as pd
df = pd.DataFrame(data={'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]})
Exploding
df.explode('Item').reset_index(drop=True).to_dict(orient='records')
[{'Group': 'A', 'Item': 'a'},
{'Group': 'A', 'Item': 'b'},
{'Group': 'A', 'Item': 'c'},
{'Group': 'B', 'Item': 'b'},
{'Group': 'B', 'Item': 'c'},
{'Group': 'B', 'Item': 'd'}]
Exploding and then using 'to_list' lambda
df.explode('Item').groupby('Item')['Group'].apply(lambda x: x.tolist()).reset_index().to_dict(orient='records')
[{'Item': 'a', 'Group': ['A']},
{'Item': 'b', 'Group': ['A', 'B']},
{'Item': 'c', 'Group': ['A', 'B']},
{'Item': 'd', 'Group': ['B']}]
Not the most efficient, but very short:
>>> table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
>>> reversed_table = {v: [k for k, vs in table.items() if v in vs] for v in set(v for vs in table.values() for v in vs)}
>>> print(reversed_table)
{'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B'], 'a': ['A']}
With dictionaries, you wouldtypically approach it like this:
table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
revtable = dict()
for v,keys in table.items():
for k in keys:
revtable.setdefault(k,[]).append(v)
print(revtable)
# {'a': ['A'], 'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B']}
Assuming that your tables are in the form of a pandas dataframe, you could try something like this:
import pandas as pd
import numpy as np
# Create initial dataframe
data = {'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]}
df = pd.DataFrame(data=data)
Group Item
0 A [a, b, c]
1 B [b, c, d]
# Expand number of rows based on list column ("Item") contents
list_col = 'Item'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[list_col].str.len())
for col in df.columns.drop(list_col)}
).assign(**{list_col:np.concatenate(df[list_col].values)})[df.columns]
Group Item
0 A a
1 A b
2 A c
3 B b
4 B c
5 B d
*Above snippet taken from here, which includes a more detailed explanation of the code
# Perform groupby operation
df = df.groupby('Item')['Group'].apply(list).reset_index(name='Group')
Item Group
0 a [A]
1 b [A, B]
2 c [A, B]
3 d [B]

Remove duplicates from a list of dicts

I have a list of dicts like this:
[{'ID': 'a', 'Number': 2}, {'ID': 'b', 'Number': 5} , {'ID': 'a', 'Number': 6}, {'ID': 'a', 'Number': 8}, {'ID': 'c', 'Number': 3}]
I want to remove the dicts that have same key and only keep the one with smallest value. The expected result should be:
[{'ID': 'a', 'Number': 2}, {'Id': 'b', 'Number': 5}, {'ID': 'c', 'Number': 3}]
Most efficient solution would be to use a temporary lookup dictionary with keys as IDs and values as the current dict which has the lowest Number corresponding to that ID.
l = [{'ID': 'a', 'Number': 2},
{'ID': 'b', 'Number': 5}, # note that I corrected a typo Id --> ID
{'ID': 'a', 'Number': 6},
{'ID': 'a', 'Number': 8},
{'ID': 'c', 'Number': 3}]
lookup_dict = {}
for d in l:
if d['ID'] not in lookup_dict or d['Number'] < lookup_dict[d['ID']]['Number']:
lookup_dict[d['ID']] = d
output = list(lookup_dict.values())
which gives output as:
[{'ID': 'a', 'Number': 2}, {'ID': 'b', 'Number': 5}, {'ID': 'c', 'Number': 3}]
A piece of advice: given your final data structure, I wonder if you may be better off now representing this final data as a dictionary - with the IDs as keys since these are now unique. This would allow for more convenient data access.

Convert Multiple pandas Column into json

My dataframe df is like:
col_1 col_2 col_3
A Product 1
B product 2
C Offer 1
D Product 1
What i want is to convert all this column to json with the condition that row of col_2 and col_1 should be key value pair. I have tried the following:
df['col_1_2'] = df.apply(lambda row: {row['col_2']:row['col_1']}, axis=1)
df['final_col']=df[['col_1_2','col_3']].to_dict('r')
My first row of df['final_col'] is:
{'col_1_2': {'product': A}, 'value': 1.0},
But what i want is :
{'product': A, 'value': 1.0}
Add missing key with value by col_3:
df['final_col'] = df.apply(lambda row: {row['col_2']:row['col_1'], 'value':row['col_3']},
axis=1)
print (df)
col_1 col_2 col_3 final_col
0 A Product 1 {'Product': 'A', 'value': 1}
1 B product 2 {'product': 'B', 'value': 2}
2 C Offer 1 {'Offer': 'C', 'value': 1}
3 D Product 1 {'Product': 'D', 'value': 1}
If need output in list:
L = [{b:a, 'value':c} for a,b,c in zip(df['col_1'], df['col_2'], df['col_3'])]
print (L)
[{'Product': 'A', 'value': 1},
{'product': 'B', 'value': 2},
{'Offer': 'C', 'value': 1},
{'Product': 'D', 'value': 1}]
Or json:
import json
j = json.dumps([{b:a, 'value':c} for a,b,c in zip(df['col_1'], df['col_2'], df['col_3'])])
print (j)
[{"Product": "A", "value": 1},
{"product": "B", "value": 2},
{"Offer": "C", "value": 1},
{"Product": "D", "value": 1}]

pandas dataframe convert values in array of objects

I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.
I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]

Categories

Resources