Here is an original dataframe of 2 rows consisting of ID and ColumnA. Some row may have one detail.
ID ColumnA
1 {'1': {'Order': '0', 'Result': ''},
'2': {'Order': 'Yellow', 'Result': 'Red'},
'3': {'Order': 'Clear', 'Result': 'Tight'},
'4': {'Order': '1.000-1.030', 'Result': '1.015'}}
2 {'1': {'Order': '0', 'Result': '1.015'},
'4': {'Order': '1.000-1.030', 'Result': '2.4'},
'5': {'Order': '6.0-7.0', 'Result': ''},
'6': {'Order': 'Negative', 'Result': 'Negative'},
'7': {'Order': 'Negative', 'Result': 'Negative'},
'8': {'Order': 'Negative', 'Result': 'Positive'},
'9': {'Order': 'Negative', 'Result': ''}}
I want to extract from ColumnA to new dataframe
ID Column_ID Column_Order ColumnD_Result
1 1 0
1 2 Yellow Red
1 3 Clear Tight
1 4 1.000-1.030 1.015
2 1 0 1.015
2 4 1.000-1.030 2.4
2 5 6.0-7.0
2 6 Negative Negative
2 7 Negative Negative
2 8 Negative Positive
2 9 Negative
How to write the extraction of dictionary?
Extraction by looping on dictionary items:
import pandas as pd
data = [
['1', {'1': {'Order': '0', 'Result': ''},
'2': {'Order': 'Yellow', 'Result': 'Red'},
'3': {'Order': 'Clear', 'Result': 'Tight'},
'4': {'Order': '1.000-1.030', 'Result': '1.015'}}],
['2', {'1': {'Order': '0', 'Result': '1.015'},
'4': {'Order': '1.000-1.030', 'Result': '2.4'},
'5': {'Order': '6.0-7.0', 'Result': ''},
'6': {'Order': 'Negative', 'Result': 'Negative'},
'7': {'Order': 'Negative', 'Result': 'Negative'},
'8': {'Order': 'Negative', 'Result': 'Positive'},
'9': {'Order': 'Negative', 'Result': ''}}]]
df = pd.DataFrame(data, columns=['ID', 'ColumnA'])
dfColumnA = pd.DataFrame([], columns=['ID', 'Column_ID', 'Column_Order', 'ColumnD_Result'])
i = 0
for index, row in df.iterrows():
dictColumA = row['ColumnA']
for column_ID, v in dictColumA.items():
dfColumnA.loc[i] = [row['ID'], column_ID, v['Order'], v['Result']]
i += 1
print(dfColumnA)
Output:
ID Column_ID Column_Order ColumnD_Result
0 1 1 0
1 1 2 Yellow Red
2 1 3 Clear Tight
3 1 4 1.000-1.030 1.015
4 2 1 0 1.015
5 2 4 1.000-1.030 2.4
6 2 5 6.0-7.0
7 2 6 Negative Negative
8 2 7 Negative Negative
9 2 8 Negative Positive
10 2 9 Negative
Related
I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:
In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3
import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))
I have a list of dict like:
data = [
{'ID': '000681', 'type': 'B:G+', 'testA': '11'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-'},
{'ID': '000683', 'type': 'B:G+', 'testA': '13'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14'},
{'ID': '000681', 'type': 'B:G+', 'testB': '15'},
{'ID': '000682', 'type': 'B:G+', 'testB': '16'},
{'ID': '000683', 'type': 'B:G+', 'testB': '17'},
{'ID': '000684', 'type': 'B:G+', 'testB': '-'}
]
How to use Pandas to get data like:
data = [
{'ID': '000683', 'type': 'B:G+', 'testA': '13', 'testB': '17'},
{'ID': '000681', 'type': 'B:G+', 'testA': '11', 'testB': '15'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14', 'testB': '-'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-', 'testB': '16'}
]
Same ID and same type to one col and sorted by testA and testB values
sorted : both testA and testB have value and lager value of testA+testB at the top.
First convert columns to numeric with replace non numeric to integers and then aggregate sum:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df1 = df.groupby(['ID','type'])[c].sum(min_count=1).sort_values(c).fillna('-').reset_index()
print (df1)
ID type testA testB
0 000681 B:G+ 11 15
1 000683 B:G+ 13 17
2 000684 B:G+ 14 -
3 000682 B:G+ - 16
If want sorting by sum of both columns use Series.argsort:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df2 = df.groupby(['ID','type'])[c].sum(min_count=1)
df2 = df2.iloc[(-df2).sum(axis=1).argsort()].fillna('-').reset_index()
print (df2)
ID type testA testB
0 000683 B:G+ 13 17
1 000681 B:G+ 11 15
2 000682 B:G+ - 16
3 000684 B:G+ 14 -
I have a dictionary within a tuple and I want to know how to access it and create a dataframe merging the dictionary value into single row
Example:
({'Id': '4', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv1331', 'DT': '08/1/19', 'AMT': '1500'}, {'Id': '9', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv4321', 'DT': '02/6/19', 'AMT': '1000'})
Expected Result:
Id_1 BU_1 V_ID_1 INV_1 DT_1 AMT_1 Id_2 BU_2 V_ID_2 INV_2 DT_2 AMT_2
---------------------------------------------------------------------------------------------
4 usa 44 inv1331 08/1/19 1500 9 usa 44 inv4321 02/6/19 1000
x = ({'Id': '4', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv1331', 'DT': '08/1/19', 'AMT': '1500'}, {'Id': '9', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv4321', 'DT': '02/6/19', 'AMT': '1000'})
data = {f"{k}_{i+1}": v for i, d in enumerate(x) for k, v in d.items()}
df = pd.DataFrame(data, index = [0])
Output:
>>> df
Id_1 BU_1 V_ID_1 INV_1 DT_1 ... BU_2 V_ID_2 INV_2 DT_2 AMT_2
0 4 usa 44 inv1331 08/1/19 ... usa 44 inv4321 02/6/19 1000
[1 rows x 12 columns]
I am creating a function that grabs data from an ERP system to display to the end user.
I want to unpack an object of dictionaries and create a range of Pandas DataFrames with them.
For example, I have:
troRows
{0: [{'productID': 134336, 'price': '10.0000', 'amount': '1', 'cost': 0}],
1: [{'productID': 142141, 'price': '5.5000', 'amount': '4', 'cost': 0}],
2: [{'productID': 141764, 'price': '5.5000', 'amount': '1', 'cost': 0}],
3: [{'productID': 81661, 'price': '4.5000', 'amount': '1', 'cost': 0}],
4: [{'productID': 146761, 'price': '5.5000', 'amount': '1', 'cost': 0}],
5: [{'productID': 143585, 'price': '5.5900', 'amount': '9', 'cost': 0}],
6: [{'productID': 133018, 'price': '5.0000', 'amount': '1', 'cost': 0}],
7: [{'productID': 146250, 'price': '13.7500', 'amount': '5', 'cost': 0}],
8: [{'productID': 149986, 'price': '5.8900', 'amount': '2', 'cost': 0},
{'productID': 149790, 'price': '4.9900', 'amount': '2', 'cost': 0},
{'productID': 149972, 'price': '5.2900', 'amount': '2', 'cost': 0},
{'productID': 149248, 'price': '2.0000', 'amount': '2', 'cost': 0},
{'productID': 149984, 'price': '4.2000', 'amount': '2', 'cost': 0},
Each time the function will need to unpack x number of dictionaries which may have different number of rows into a range of DataFrames.
So for example, this range of Dictionaries would return
DF0, DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8.
I can unpack a single Dictionary with:
pd.DataFrame(troRows[8])
which returns
amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984
How can I structure my code so that it does this for all the dictionaries for me?
Solution for dictionary of DataFrames - use dictioanry comprehension and set index values to keys of dictionary:
dfs = {k: pd.DataFrame(v) for k, v in troRows.items()}
print (dfs)
{0: amount cost price productID
0 1 0 10.0000 134336, 1: amount cost price productID
0 4 0 5.5000 142141, 2: amount cost price productID
0 1 0 5.5000 141764, 3: amount cost price productID
0 1 0 4.5000 81661, 4: amount cost price productID
0 1 0 5.5000 146761, 5: amount cost price productID
0 9 0 5.5900 143585, 6: amount cost price productID
0 1 0 5.0000 133018, 7: amount cost price productID
0 5 0 13.7500 146250, 8: amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984}
print (dfs[8])
amount cost price productID
0 2 0 5.8900 149986
1 2 0 4.9900 149790
2 2 0 5.2900 149972
3 2 0 2.0000 149248
4 2 0 4.2000 149984
Solutions for one DataFrame:
Use list comprehension with flattening and pass it to DataFrame constructor:
troRows = pd.Series([[{'productID': 134336, 'price': '10.0000', 'amount': '1', 'cost': 0}],
[{'productID': 142141, 'price': '5.5000', 'amount': '4', 'cost': 0}],
[{'productID': 141764, 'price': '5.5000', 'amount': '1', 'cost': 0}],
[{'productID': 81661, 'price': '4.5000', 'amount': '1', 'cost': 0}],
[{'productID': 146761, 'price': '5.5000', 'amount': '1', 'cost': 0}],
[{'productID': 143585, 'price': '5.5900', 'amount': '9', 'cost': 0}],
[{'productID': 133018, 'price': '5.0000', 'amount': '1', 'cost': 0}],
[{'productID': 146250, 'price': '13.7500', 'amount': '5', 'cost': 0}],
[{'productID': 149986, 'price': '5.8900', 'amount': '2', 'cost': 0},
{'productID': 149790, 'price': '4.9900', 'amount': '2', 'cost': 0},
{'productID': 149972, 'price': '5.2900', 'amount': '2', 'cost': 0},
{'productID': 149248, 'price': '2.0000', 'amount': '2', 'cost': 0},
{'productID': 149984, 'price': '4.2000', 'amount': '2', 'cost': 0}]])
df = pd.DataFrame([y for x in troRows for y in x])
Another solution for flatten your data is use chain.from_iterable:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(troRows)))
print (df)
amount cost price productID
0 1 0 10.0000 134336
1 4 0 5.5000 142141
2 1 0 5.5000 141764
3 1 0 4.5000 81661
4 1 0 5.5000 146761
5 9 0 5.5900 143585
6 1 0 5.0000 133018
7 5 0 13.7500 146250
8 2 0 5.8900 149986
9 2 0 4.9900 149790
10 2 0 5.2900 149972
11 2 0 2.0000 149248
12 2 0 4.2000 149984
I have to analyze some complex data which is in a Pandas DataFrame. I am not aware of the exact structure of the data inside the dataframe. I have pulled the data from a Json file. I used the "head" syntax to look at top level data.
If I want to extract the group manufacturer or nutrients in a seperate dataframe, how should I go about doing that so that I can do some Statistical analysis.
with open("nutrients.json") as f:
objects = [json.loads(line) for line in f]
df = pd.DataFrame(objects)
print(df.head())
group manufacturer \
0 Dairy and Egg Products
1 Dairy and Egg Products
2 Dairy and Egg Products
3 Dairy and Egg Products
4 Dairy and Egg Products
meta \
0 {'langual': [], 'nitrogen_factor': '6.38', 're...
1 {'langual': [], 'nitrogen_factor': '6.38', 're...
2 {'langual': [], 'nitrogen_factor': '6.38', 're...
3 {'langual': [], 'nitrogen_factor': '6.38', 're...
4 {'langual': [], 'nitrogen_factor': '6.38', 're...
name \
0 {'long': 'Butter, salted', 'sci': '', 'common'...
1 {'long': 'Butter, whipped, with salt', 'sci': ...
2 {'long': 'Butter oil, anhydrous', 'sci': '', '...
3 {'long': 'Cheese, blue', 'sci': '', 'common': []}
4 {'long': 'Cheese, brick', 'sci': '', 'common':...
nutrients \
0 [{'code': '203', 'value': '0.85', 'units': 'g'...
1 [{'code': '203', 'value': '0.85', 'units': 'g'...
2 [{'code': '203', 'value': '0.28', 'units': 'g'...
3 [{'code': '203', 'value': '21.40', 'units': 'g...
4 [{'code': '203', 'value': '23.24', 'units': 'g...
portions
0 [{'g': '227', 'amt': '1', 'unit': 'cup'}, {'g'...
1 [{'g': '151', 'amt': '1', 'unit': 'cup'}, {'g'...
2 [{'g': '205', 'amt': '1', 'unit': 'cup'}, {'g'...
3 [{'g': '28.35', 'amt': '1', 'unit': 'oz'}, {'g...
4 [{'g': '132', 'amt': '1', 'unit': 'cup, diced'...