Convert Multiple pandas Column into json - python

My dataframe df is like:
col_1 col_2 col_3
A Product 1
B product 2
C Offer 1
D Product 1
What i want is to convert all this column to json with the condition that row of col_2 and col_1 should be key value pair. I have tried the following:
df['col_1_2'] = df.apply(lambda row: {row['col_2']:row['col_1']}, axis=1)
df['final_col']=df[['col_1_2','col_3']].to_dict('r')
My first row of df['final_col'] is:
{'col_1_2': {'product': A}, 'value': 1.0},
But what i want is :
{'product': A, 'value': 1.0}

Add missing key with value by col_3:
df['final_col'] = df.apply(lambda row: {row['col_2']:row['col_1'], 'value':row['col_3']},
axis=1)
print (df)
col_1 col_2 col_3 final_col
0 A Product 1 {'Product': 'A', 'value': 1}
1 B product 2 {'product': 'B', 'value': 2}
2 C Offer 1 {'Offer': 'C', 'value': 1}
3 D Product 1 {'Product': 'D', 'value': 1}
If need output in list:
L = [{b:a, 'value':c} for a,b,c in zip(df['col_1'], df['col_2'], df['col_3'])]
print (L)
[{'Product': 'A', 'value': 1},
{'product': 'B', 'value': 2},
{'Offer': 'C', 'value': 1},
{'Product': 'D', 'value': 1}]
Or json:
import json
j = json.dumps([{b:a, 'value':c} for a,b,c in zip(df['col_1'], df['col_2'], df['col_3'])])
print (j)
[{"Product": "A", "value": 1},
{"product": "B", "value": 2},
{"Offer": "C", "value": 1},
{"Product": "D", "value": 1}]

Related

List to a Readable Representation using Python

I have data as
[{'name': 'A', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'B', 'subsets': ['B_1', 'B_A'], 'cluster': 2},
{'name': 'C', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'D', 'subsets': ['D_1', 'D_2', 'D_3', 'D_4'], 'cluster': 1}]
I need to represent it as
Cluster Number Subset Name
0 ['X_1', 'X_A', 'X_B'] A, C
1 ['D_1', 'D_2', 'D_3', 'D_4'] D
2 ['B_1', 'B_A'] B
For the sake of completeness, I think it is fair to mention that you can actually create a dataframe without json_normalize in your case and apply groupby as originally shown here:
import pandas as pd
data = [{'name': 'A', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'B', 'subsets': ['B_1', 'B_A'], 'cluster': 2},
{'name': 'C', 'subsets': ['X_1', 'X_A', 'X_B'], 'cluster': 0},
{'name': 'D', 'subsets': ['D_1', 'D_2', 'D_3', 'D_4'], 'cluster': 1}]
df = pd.DataFrame(data).groupby('cluster')
.agg({'subsets':'first','name':', '.join})
.reset_index()
.set_index('cluster')
.rename_axis('Cluster Number')
subsets name
Cluster Number
0 [X_1, X_A, X_B] A, C
1 [D_1, D_2, D_3, D_4] D
2 [B_1, B_A] B
You can use json_normalize + groupby "cluster" and apply join to "name" and first to "subsets":
df = pd.json_normalize(data).groupby('cluster').agg({'subsets':'first','name':', '.join}).reset_index()
Output:
cluster subsets name
0 0 [X_1, X_A, X_B] A, C
1 1 [D_1, D_2, D_3, D_4] D
2 2 [B_1, B_A] B

pandas way to turn DataFrame of sets into DataFrame of dictionaries with value in corresponding cell in other DataFrame

It's hard to explain what I'm trying to do so I'll give an example. In the example below, I am trying to get df3. I have done it with the code below but it is very "anti-pandas" and I am looking for a better (faster, cleaner, more pandas-esque) way to do it:
import pandas as pd
df1 = pd.DataFrame({"begin": [{"a", "b"}, {"b"}, {"c"}], "end": [{"x"}, {"z", "y"}, {"z"}]})
df2 = pd.DataFrame(
{"a": [10, 10, 15], "b": [15, 20, 30], "c": [8, 12, 10], "x": [1, 2, 3], "y": [1, 3, 4], "z": [1, 3, 1]}
)
df3 = df1.copy()
for i in range(len(df1)):
for j in range(len(df1.loc[i])):
df3.at[i, df1.columns[j]] = []
for v in df1.loc[i][j]:
df3.at[i, df1.columns[j]].append({"letter": v, "value": df2.loc[i][v]})
print(df3)
Here's my goal (which this code does, just probably not in the best way):
begin end
0 [{'letter': 'b', 'value': 15}, {'letter': 'a', 'value': 10} [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'y', 'value': 3}, {'letter': 'z', 'value': 3}
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]
Here is one way to approach the problem using pandas
# Reshape and explode the dataframe
s = df1.stack().explode().reset_index(name='letter')
# Map the values corresponding to the letters
s['value'] = s.set_index(['level_0', 'letter']).index.map(df2.stack())
# Assign list of records
s['records'] = s[['letter', 'value']].to_dict('records')
# Pivot with aggfunc as list
s = s.pivot_table('records', 'level_0', 'level_1', aggfunc=list)
print(s)
level_1 begin end
level_0
0 [{'letter': 'a', 'value': 10}, {'letter': 'b', 'value': 15}] [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'z', 'value': 3}, {'letter': 'y', 'value': 3}]
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]

matching panda dataframe and list

I have a dataframe df as follow
Number PT
5 AA
64 BB
7 CC
Then a another list of objects,
myList = [{'label': 'AA', 'value': 'AA', 'group': 'A'}, {'label': 'BB', 'value': 'BB', 'group': 'B'}]
I want for every PT to have the associated group(when available) from the list, so the result should look like
Number PT group
5 AA A
64 BB B
7 CC NOT_MATCHED
d = {'Number': [5, 64, 7], 'PT': ["AA", "BB", "CC"]}
df = pd.DataFrame(data=d)
myList = [{'label': 'AA', 'value': 'AA', 'group': 'A'}, {'label': 'BB', 'value': 'BB', 'group': 'B'}]
for i, row in df.iterrows():
for item in myList:
if item['value'] == df['PT'][i]:
df.at[i,'Group'] = item['group']
break
else:
df.at[i,'Group'] = "NOT_MATCHED"
TRY:
df['group'] = df.PT.map({tuple(i.values())[0]: tuple(i.values())[
2] for i in myList}).fillna('Not Matched')

What is the most efficient way to sum a dict with multiple keys by one key?

I have the following dict structure.
product1 = {'product_tmpl_id': product_id,
'qty':product_uom_qty,
'price':price_unit,
'subtotal':price_subtotal,
'total':price_total,
}
And then a list of products, each item in the list is a dict with the above structure
list_ = [product1,product2,product3,.....]
I need to sum the item in the list, group by the key product_tmpl_id ... I'm using dictcollections but it only sum the qty key, I need to sum key except the product_tmpl_id which is the criteria to group by
c = defaultdict(float)
for d in list_:
c[d['product_tmpl_id']] += d['qty']
c = [{'product_id': id, 'qty': qty} for id, qty in c.items()]
I know how to do it with a for iteration but trying to look for a more pythonic way
thanks
EDIT:
What is need is to pass from this:
lst = [
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'B', 'qty':100,'price':10},
{'Name': 'C', 'qty':100,'price':10},
{'Name': 'C', 'qty':100,'price':10},
]
to this
group_lst = [
{'Name': 'A', 'qty':300,'price':30},
{'Name': 'B', 'qty':100,'price':10},
{'Name': 'C', 'qty':200,'price':20},
]
Using basic Python, this doesn't get a whole lot better. You could hack something together with itertools.groupby, but it'd be ugly and probably slower, certainly less clear.
As #9769953 suggested, though, Pandas is a good package to handle this sort of structured, tabular data.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(lst)
Out[2]:
Name price qty
0 A 10 100
1 A 10 100
2 A 10 100
3 B 10 100
4 C 10 100
5 C 10 100
In [3]: df.groupby('Name').agg(sum)
Out[3]:
price qty
Name
A 30 300
B 10 100
C 20 200
You just need a little extra mojo if you don't want to keep the data as a dataframe:
In [4]: grouped = df.groupby('Name', as_index=False).agg(sum)
In [5]: list(grouped.T.to_dict().values())
Out[5]:
[{'Name': 'A', 'price': 30, 'qty': 300},
{'Name': 'B', 'price': 10, 'qty': 100},
{'Name': 'C', 'price': 20, 'qty': 200}]
On the verbose side, but gets the job done:
group_lst = []
lst_of_names = []
for item in lst:
qty_total = 0
price_total = 0
# Get names that have already been totalled
lst_of_names = [item_get_name['Name'] for item_get_name in group_lst]
if item['Name'] in lst_of_names:
continue
for item2 in lst:
if item['Name'] == item2['Name']:
qty_total += item2['qty']
price_total += item2['price']
group_lst.append(
{
'Name':item['Name'],
'qty':qty_total,
'price':price_total
}
)
pprint(group_lst)
Output:
[{'Name': 'A', 'price': 30, 'qty': 300},
{'Name': 'B', 'price': 10, 'qty': 100},
{'Name': 'C', 'price': 20, 'qty': 200}]
You can use defaultdict and Counter
>>> from collections import Counter, defaultdict
>>> cntr = defaultdict(Counter)
>>> for d in lst:
... cntr[d['Name']].update(d)
...
>>> res = [dict(v, **{'Name':k}) for k,v in cntr.items()]
>>> pprint(res)
[{'Name': 'A', 'price': 30, 'qty': 300},
{'Name': 'C', 'price': 20, 'qty': 200},
{'Name': 'B', 'price': 10, 'qty': 100}]

pandas dataframe convert values in array of objects

I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.
I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]

Categories

Resources