Sum the values in selected rows of a data frame - python

I have a list of dictionaries:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
I wanted to sum the Values of the Date from 01/01/2020 to 01/04/2020. I tried the following to select the rows within the date range:
from datetime import datetime
dfmylist = pd.DataFrame(mylist)
dfmylist['Date'] = pd.to_datetime(dfmylist['Date'])
dfmylistnew = (dfmylist['Date'] > '01/01/2020') & (dfmylist['Date'] <= '01/04/2020')
dfmylistnew1 = dfmylist.loc[dfmylistnew]
dfmylistnew1
I got the output data frame:
Date Value
0 2020-01-02 13
1 2020-01-03 2
I want to get the sum Value from the above data frame, which is 15
I tried:
total = dfmylistnew1['Value'].sum()
but the output is 132, instead of 15

From your data, convert values with the right type:
mylist = [{'Date': '01/02/2020', 'Value': '13'},
{'Date': '01/03/2020', 'Value': '2'},
{'Date': '10/3/2020', 'Value': '4'},
{'Date': '12/25/2020', 'Value': '2'}]
df = pd.DataFrame(mylist).astype({'Date': 'datetime64', 'Value': 'int'})
total = df.loc[df['Date'].between('01/01/2020', '01/04/2020', inclusive='right'),
'Value'].sum()
print(total)
# Output
15

Related

How to distinct (count), group by and sum data in DataFrame in Python?

I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:
In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3
import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))

unpack variable length dictionary from pandas column and create separate columns

I have a pandas dataframe in which one column custom consists of dictionaries within a list. The list may be empty or have one or more dictionary objects within it. for example...
id custom
1 []
2 [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status','name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
3 [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
I'm interested in extracting the data from the JSON into separate columns based on the dict keys named 'key' and 'value'!
for example: here, the output df will have additional columns impact and proposed_status:
id custom impact proposed_status
1 ... NA NA
2 ... abc pqr
3 ... xyz NA
Could the smart people of StackOverflow please guide me on the right way to solve this? Thanks!
The approach is in the comments
df = pd.DataFrame({'id': [1, 2, 3],
'custom': [[],
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'},
{'key': 'proposed_status',
'name': 'PROPOSED Status [temporary]',
'value': 'pqr',
'type': 'string'}],
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]]})
# expand out lists, reset_index() so join() will work
df2 = df.explode("custom").reset_index(drop=True)
# join to keep "id"
df2 = (df2.join(df2["custom"]
# expand embedded dict
.apply(pd.Series))
.loc[:,["id","key","value"]]
# empty list generate spurios NaN, remove
.dropna()
# turn key attribute into column
.set_index(["id","key"]).unstack(1)
# cleanup multi index columns
.droplevel(0, axis=1)
)
df.merge(df2, on="id", how="left")
id
custom
impact
proposed_status
0
1
[]
nan
nan
1
2
[{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status', 'name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
abc
pqr
2
3
[{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]
xyz
nan

Python Pandas, how to group list of dict and sort

I have a list of dict like:
data = [
{'ID': '000681', 'type': 'B:G+', 'testA': '11'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-'},
{'ID': '000683', 'type': 'B:G+', 'testA': '13'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14'},
{'ID': '000681', 'type': 'B:G+', 'testB': '15'},
{'ID': '000682', 'type': 'B:G+', 'testB': '16'},
{'ID': '000683', 'type': 'B:G+', 'testB': '17'},
{'ID': '000684', 'type': 'B:G+', 'testB': '-'}
]
How to use Pandas to get data like:
data = [
{'ID': '000683', 'type': 'B:G+', 'testA': '13', 'testB': '17'},
{'ID': '000681', 'type': 'B:G+', 'testA': '11', 'testB': '15'},
{'ID': '000684', 'type': 'B:G+', 'testA': '14', 'testB': '-'},
{'ID': '000682', 'type': 'B:G+', 'testA': '-', 'testB': '16'}
]
Same ID and same type to one col and sorted by testA and testB values
sorted : both testA and testB have value and lager value of testA+testB at the top.
First convert columns to numeric with replace non numeric to integers and then aggregate sum:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df1 = df.groupby(['ID','type'])[c].sum(min_count=1).sort_values(c).fillna('-').reset_index()
print (df1)
ID type testA testB
0 000681 B:G+ 11 15
1 000683 B:G+ 13 17
2 000684 B:G+ 14 -
3 000682 B:G+ - 16
If want sorting by sum of both columns use Series.argsort:
df = pd.DataFrame(data)
c = ['testA','testB']
df[c] = df[c].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df2 = df.groupby(['ID','type'])[c].sum(min_count=1)
df2 = df2.iloc[(-df2).sum(axis=1).argsort()].fillna('-').reset_index()
print (df2)
ID type testA testB
0 000683 B:G+ 13 17
1 000681 B:G+ 11 15
2 000682 B:G+ - 16
3 000684 B:G+ 14 -

Convert JSON API response to pandas Dataframe

I'm struggling to convert a JSON API response into a pandas Dataframe object. I've read answers to similar questions/documentation but nothing has helped. My closest attempt is below:
r = requests.get('https://api.xxx')
data = r.text
df = pd.read_json(data, orient='records')
Which returns the following format:
0 {'type': 'bid', 'price': 6.193e-05, ...},
1 {'type': 'bid', 'price': 6.194e-05, ...},
3 {'type': 'bid', 'price': 6.149e-05, ...} etc
The original format of the data is:
{'abc': [{'type': 'bid',
'price': 6.194e-05,
'amount': 2321.37952545,
'tid': 8577050,
'timestamp': 1498649162},
{'type': 'bid',
'price': 6.194e-05,
'amount': 498.78993587,
'tid': 8577047,
'timestamp': 1498649151},
...]}
I'm happy to be directed to good documentation.
I think you need json_normalize:
from pandas import json_normalize
import requests
r = requests.get('https://api.xxx')
data = r.text
df = json_normalize(data, 'abc')
print (df)
amount price tid timestamp type
0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
For multiple keys is possible use concat with list comprehension and DataFrame constructor:
d = {'abc': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}],
'def': [{'type': 'bid', 'price': 6.194e-05, 'amount': 2321.37952545, 'tid': 8577050, 'timestamp': 1498649162}, {'type': 'bid', 'price': 6.194e-05, 'amount': 498.78993587, 'tid': 8577047, 'timestamp': 1498649151}]}
df = pd.concat([pd.DataFrame(v) for k,v in d.items()], keys=d)
print (df)
amount price tid timestamp type
abc 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid
def 0 2321.379525 0.000062 8577050 1498649162 bid
1 498.789936 0.000062 8577047 1498649151 bid

pandas dataframe convert values in array of objects

I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.
I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]

Categories

Resources