Sum the values in selected rows of a data frame

Sum the values in selected rows of a data frame - python

I have a list of dictionaries:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
I wanted to sum the Values of the Date from 01/01/2020 to 01/04/2020. I tried the following to select the rows within the date range:
from datetime import datetime
dfmylist = pd.DataFrame(mylist)
dfmylist['Date'] = pd.to_datetime(dfmylist['Date'])
dfmylistnew = (dfmylist['Date'] > '01/01/2020') & (dfmylist['Date'] <= '01/04/2020')
dfmylistnew1 = dfmylist.loc[dfmylistnew]
dfmylistnew1
I got the output data frame:
Date Value
0 2020-01-02 13
1 2020-01-03 2
I want to get the sum Value from the above data frame, which is 15
I tried:
total = dfmylistnew1['Value'].sum()
but the output is 132, instead of 15

From your data, convert values with the right type:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
df = pd.DataFrame(mylist).astype({'Date': 'datetime64', 'Value': 'int'})
total = df.loc[df['Date'].between('01/01/2020', '01/04/2020', inclusive='right'),
'Value'].sum()
print(total)
# Output
15

Related

How to distinct (count), group by and sum data in DataFrame in Python?

I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:

In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3

import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))

unpack variable length dictionary from pandas column and create separate columns

I have a pandas dataframe in which one column custom consists of dictionaries within a list. The list may be empty or have one or more dictionary objects within it. for example...
id custom
1 []
2 [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status','name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
3 [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
I'm interested in extracting the data from the JSON into separate columns based on the dict keys named 'key' and 'value'!
for example: here, the output df will have additional columns impact and proposed_status:
id custom impact proposed_status
1 ... NA NA
2 ... abc pqr
3 ... xyz NA
Could the smart people of StackOverflow please guide me on the right way to solve this? Thanks!

The approach is in the comments
df = pd.DataFrame({'id': [1, 2, 3],
'custom': [[],
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'},
{'key': 'proposed_status',
'name': 'PROPOSED Status [temporary]',
'value': 'pqr',
'type': 'string'}],
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]]})
# expand out lists, reset_index() so join() will work
df2 = df.explode("custom").reset_index(drop=True)
# join to keep "id"
df2 = (df2.join(df2["custom"]
# expand embedded dict
.apply(pd.Series))
.loc[:,["id","key","value"]]
# empty list generate spurios NaN, remove
.dropna()
# turn key attribute into column
.set_index(["id","key"]).unstack(1)
# cleanup multi index columns
.droplevel(0, axis=1)
)
df.merge(df2, on="id", how="left")
id
custom
impact
proposed_status
0
1
[]
nan
nan
1
2
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status', 'name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
abc
pqr
2
3
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
xyz
nan

Python Pandas, how to group list of dict and sort

I have a list of dict like:
data = [
{'ID': '000681', 'type': 'B:G+', 'testA': '11'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-'},
{'ID': '000683', 'type': 'B:G+', 'testA': '13'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14'},
{'ID': '000681', 'type': 'B:G+', 'testB': '15'},
{'ID': '000682', 'type': 'B:G+', 'testB': '16'},
{'ID': '000683', 'type': 'B:G+', 'testB': '17'},
{'ID': '000684', 'type': 'B:G+', 'testB': '-'}
]
How to use Pandas to get data like:
data = [
{'ID': '000683', 'type': 'B:G+', 'testA': '13', 'testB': '17'},
{'ID': '000681', 'type': 'B:G+', 'testA': '11', 'testB': '15'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14', 'testB': '-'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-', 'testB': '16'}
]
Same ID and same type to one col and sorted by testA and testB values
sorted : both testA and testB have value and lager value of testA+testB at the top.

First convert columns to numeric with replace non numeric to integers and then aggregate sum:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df1 = df.groupby(['ID','type'])[c].sum(min_count=1).sort_values(c).fillna('-').reset_index()
print (df1)
ID type testA testB
0 000681 B:G+ 11 15
1 000683 B:G+ 13 17
2 000684 B:G+ 14 -
3 000682 B:G+ - 16
If want sorting by sum of both columns use Series.argsort:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df2 = df.groupby(['ID','type'])[c].sum(min_count=1)
df2 = df2.iloc[(-df2).sum(axis=1).argsort()].fillna('-').reset_index()
print (df2)
ID type testA testB
0 000683 B:G+ 13 17
1 000681 B:G+ 11 15
2 000682 B:G+ - 16
3 000684 B:G+ 14 -

Convert JSON API response to pandas Dataframe

I'm struggling to convert a JSON API response into a pandas Dataframe object. I've read answers to similar questions/documentation but nothing has helped. My closest attempt is below:
r = requests.get('https://api.xxx')
data = r.text
df = pd.read_json(data, orient='records')
Which returns the following format:
0 {'type': 'bid', 'price': 6.193e-05, ...},
1 {'type': 'bid', 'price': 6.194e-05, ...},
3 {'type': 'bid', 'price': 6.149e-05, ...} etc
The original format of the data is:
{'abc': [{'type': 'bid',
'price': 6.194e-05,
'amount': 2321.37952545,
'tid': 8577050,
'timestamp': 1498649162},
{'type': 'bid',
'price': 6.194e-05,
'amount': 498.78993587,
'tid': 8577047,
'timestamp': 1498649151},
...]}
I'm happy to be directed to good documentation.

I think you need json_normalize:
from pandas import json_normalize
import requests
r = requests.get('https://api.xxx')
data = r.text
df = json_normalize(data, 'abc')
print (df)
amount price tid timestamp type
0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
For multiple keys is possible use concat with list comprehension and DataFrame constructor:
d = {'abc': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}],
'def': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}]}
df = pd.concat([pd.DataFrame(v) for k,v in d.items()], keys=d)
print (df)
amount price tid timestamp type
abc 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
def 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid

pandas dataframe convert values in array of objects

I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.

I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sum the values in selected rows of a data frame - python

Related

How to distinct (count), group by and sum data in DataFrame in Python?

unpack variable length dictionary from pandas column and create separate columns

Python Pandas, how to group list of dict and sort

Convert JSON API response to pandas Dataframe

pandas dataframe convert values in array of objects

Categories

Resources