I'm trying to reindex a dataframe relative to the second level of an index. I have a dataframe where the first level of the index is user id and the second level is date. For example:
pd.DataFrame({
'id': 3*['A'] + 5*['B'] + 4*['C'],
'date': ['01-01-2010', '02-01-2010', '12-01-2010',
'04-01-2015', '05-01-2015', '03-01-2016', '04-01-2016', '05-01-2016',
'01-01-2015', '02-01-2015', '03-01-2015', '04-01-2015'],
'value': np.random.randint(10,100, 12)})\
.set_index(['id', 'date'])
I want to reindex the dates to fill in the missing dates, but only for the dates between the max and min dates for each "id" group.
For example user "A" should have continuous monthly data from January to December 2010 and user "B" should have continuous dates between April 2015 through May 2016. For simplicity let's assume I want to fill the NaNs with zeros.
Other questions similar to this assume that I want to use the same date_range for all users, which doesn't work in this use case. Any ideas?
I think you need reset_index + groupby + resample + asfreq + fillna:
np.random.seed(123)
df = pd.DataFrame({
'id': 3*['A'] + 5*['B'] + 4*['C'],
'date': ['01-01-2010', '02-01-2010', '12-01-2010',
'04-01-2015', '05-01-2015', '03-01-2016', '04-01-2016', '05-01-2016',
'01-01-2015', '02-01-2015', '03-01-2015', '04-01-2015'],
'value': np.random.randint(10,100, 12)})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['id', 'date'])
print (df)
value
id date
A 2010-01-01 76
2010-02-01 27
2010-12-01 93
B 2015-04-01 67
2015-05-01 96
2016-03-01 57
2016-04-01 83
2016-05-01 42
C 2015-01-01 56
2015-02-01 35
2015-03-01 93
2015-04-01 88
df1 = df.reset_index(level='id').groupby('id')['value'].resample('D').asfreq().fillna(0)
print (df1.head(10))
value
id date
A 2010-01-01 76.0
2010-01-02 0.0
2010-01-03 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-09 0.0
2010-01-10 0.0
But if need process only max and min dates first need select data with agg by idxmax
idxmin with loc:
df = df.reset_index()
df1 = df.loc[df.groupby('id')['date'].agg(['idxmin', 'idxmax']).stack()]
print (df1)
id date value
0 A 2010-01-01 76
2 A 2010-12-01 93
3 B 2015-04-01 67
7 B 2016-05-01 42
8 C 2015-01-01 56
11 C 2015-04-01 88
df1 = df1.set_index('date').groupby('id')['value'].resample('MS').asfreq().fillna(0)
print (df1.head(10))
Is that what you want?
In [52]: (df.reset_index().groupby('id')
...: .apply(lambda x: x.set_index('date').resample('D').mean().fillna(0))
...: )
Out[52]:
value
id date
A 2010-01-01 91.0
2010-01-02 0.0
2010-01-03 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-09 0.0
2010-01-10 0.0
... ...
C 2015-03-23 0.0
2015-03-24 0.0
2015-03-25 0.0
2015-03-26 0.0
2015-03-27 0.0
2015-03-28 0.0
2015-03-29 0.0
2015-03-30 0.0
2015-03-31 0.0
2015-04-01 11.0
[823 rows x 1 columns]
PS i have converted date to datetime dtype first...
use groupby and agg to get 'start' and 'end' dates and build set up tuples to reindex with.
m = dict(min='start', max='end')
df = df.reset_index().groupby('id').date.agg(['min', 'max']).rename(columns=m)
idx = [(i, d) for i, row in d2.iterrows() for d in pd.date_range(freq='MS', **row)]
df.reindex(idx, fill_value=0)
value
id date
A 2010-01-01 27
2010-02-01 15
2010-03-01 0
2010-04-01 0
2010-05-01 0
2010-06-01 0
2010-07-01 0
2010-08-01 0
2010-09-01 0
2010-10-01 0
2010-11-01 0
2010-12-01 11
B 2015-04-01 10
2015-05-01 94
2015-06-01 0
2015-07-01 0
2015-08-01 0
2015-09-01 0
2015-10-01 0
2015-11-01 0
2015-12-01 0
2016-01-01 0
2016-02-01 0
2016-03-01 42
2016-04-01 15
2016-05-01 71
C 2015-01-01 17
2015-02-01 51
2015-03-01 99
2015-04-01 58
Related
I have a text file which I read into a pandas DataFrame.
File A:
2009,7,1,3,101,13.03,89.33,0.6,287.69,0
2009,7,1,6,102,19.3,55,1,288.67,0
2009,7,1,9,103,22.33,39.67,1,289.6,0
2009,7,1,12,104,21.97,41,1,295.68,0
Read this into a DataFrame
>>> import pandas as pd
>>> from datetime import datetime as dtdt
>>> par3 = lambda x: dtdt.strptime(x, '%Y %m %d %H')
>>> df3=pd.read_csv('fileA.txt',header=None,parse_dates={'Date': [0,1,2,3]}, date_parser=par3, index_col='Date')
>>> df3
4 5 6 7 8 9
Date
2009-07-01 03:00:00 101 13.03 89.33 0.6 287.69 0
2009-07-01 06:00:00 102 19.30 55.00 1.0 288.67 0
2009-07-01 09:00:00 103 22.33 39.67 1.0 289.60 0
2009-07-01 12:00:00 104 21.97 41.00 1.0 295.68 0
Then, I have new data to be appended into df3 as a new row
bb = '2009-07-01 15:00:00'
cc = '105 18.11 44.55 1.2 300.12 0'
Question how do I append this new row to get
>>> new_df3
4 5 6 7 8 9
Date
2009-07-01 03:00:00 101 13.03 89.33 0.6 287.69 0
2009-07-01 06:00:00 102 19.30 55.00 1.0 288.67 0
2009-07-01 09:00:00 103 22.33 39.67 1.0 289.60 0
2009-07-01 12:00:00 104 21.97 41.00 1.0 295.68 0
2009-07-01 15:00:00 105 18.11 44.55 1.2 300.12 0
This How to append dictionary to DataFrame as a row? did not work in my case, I get either messy results or error messages.
I am aware of the docs https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html,
(join): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join
(merge): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge
but my head already has pretty much exploded. Solution would be very much appreciated.
pandas indexer can make row or column
df.loc[bb] = c.split()
I have contiguous periods of NaN values by code. I want to count NaN values from periods of contiguous NaN values by code, and also i want the start and end date of the contiguos period of NaN values.
df :
CODE TMIN
1998-01-01 00:00:00 12 2.5
1999-01-01 00:00:00 12 NaN
2000-01-01 00:00:00 12 NaN
2001-01-01 00:00:00 12 2.2
2002-01-01 00:00:00 12 NaN
1998-01-01 00:00:00 41 NaN
1999-01-01 00:00:00 41 NaN
2000-01-01 00:00:00 41 5.0
2001-01-01 00:00:00 41 9.0
2002-01-01 00:00:00 41 8.0
1998-01-01 00:00:00 52 2.0
1999-01-01 00:00:00 52 NaN
2000-01-01 00:00:00 52 NaN
2001-01-01 00:00:00 52 NaN
2002-01-01 00:00:00 52 1.0
1998-01-01 00:00:00 91 NaN
Expected results :
Start_Date End date CODE number of contiguous missing values
1999-01-01 00:00:00 2000-01-01 00:00:00 12 2
2002-01-01 00:00:00 2002-01-01 00:00:00 12 1
1998-01-01 00:00:00 1999-01-01 00:00:00 41 2
1999-01-01 00:00:00 2001-01-01 00:00:00 52 3
1998-01-01 00:00:00 1998-01-01 00:00:00 91 1
How can i solve this? Thanks!
You can try groupby the cumsum of non-null:
df['group'] = df.TMIN.notna().cumsum()
(df[df.TMIN.isna()]
.groupby(['group','CODE'])
.agg(Start_Date=('group', lambda x: x.index.min()),
End_Date=('group', lambda x: x.index.max()),
cont_missing=('TMIN', 'size')
)
)
Output:
Start_Date End_Date cont_missing
group CODE
1 12 1999-01-01 00:00:00 2000-01-01 00:00:00 2
2 12 2002-01-01 00:00:00 2002-01-01 00:00:00 1
41 1998-01-01 00:00:00 1999-01-01 00:00:00 2
6 52 1999-01-01 00:00:00 2001-01-01 00:00:00 3
7 91 1998-01-01 00:00:00 1998-01-01 00:00:00 1
I have two dataframes (df and df1) like as shown below
df = pd.DataFrame({'person_id': [101,101,101,101,202,202,202],
'start_date':['5/7/2013 09:27:00 AM','09/08/2013 11:21:00 AM','06/06/2014 08:00:00 AM', '06/06/2014 05:00:00 AM','12/11/2011 10:00:00 AM','13/10/2012 12:00:00 AM','13/12/2012 11:45:00 AM']})
df.start_date = pd.to_datetime(df.start_date)
df['end_date'] = df.start_date + timedelta(days=5)
df['enc_id'] = ['ABC1','ABC2','ABC3','ABC4','DEF1','DEF2','DEF3']
df1 = pd.DataFrame({'person_id': [101,101,101,101,101,101,101,202,202,202,202,202,202,202,202],'date_1':['07/07/2013 11:20:00 AM','05/07/2013 02:30:00 PM','06/07/2013 02:40:00 PM','08/06/2014 12:00:00 AM','11/06/2014 12:00:00 AM','02/03/2013 12:30:00 PM','13/06/2014 12:00:00 AM','12/11/2011 12:00:00 AM','13/10/2012 07:00:00 AM','13/12/2015 12:00:00 AM','13/12/2012 12:00:00 AM','13/12/2012 06:30:00 PM','13/07/2011 10:00:00 AM','18/12/2012 10:00:00 AM', '19/12/2013 11:00:00 AM']})
df1['date_1'] = pd.to_datetime(df1['date_1'])
df1['within_id'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF','DEF','DEF','DEF',np.nan]
What I would like to do is
a) Pick each person from df1 who doesnt have NA in 'within_id' column and check whether their date_1 is between (df.start_date - 1) and (df.end_date + 1) of the same person in df and for the same within_idor enc_id
ex: for subject = 101 and within_id = ABC, we have date_1 is 7/7/2013, you check whether they are between 4/7/2013 (df.start_date - 1) and 11/7/2013 (df.end_date + 1).
As the first-row comparison itself gave us the result, we don't have to compare our date_1 with rest of the records in df for subject 101. If not, we need to find/scan until we find the interval within which date_1 falls.
b) If date interval found, then assign the corresponding enc_id from df to the within_id in df1
c) If not then assign, "Out of Range"
I tried the below
t1 = df.groupby('person_id').apply(pd.DataFrame.sort_values, 'start_date')
t2 = df1.groupby('person_id').apply(pd.DataFrame.sort_values, 'date_1')
t3= pd.concat([t1, t2], axis=1)
t3['within_id'] = np.where((t3['date_1'] >= t3['start_date'] && t3['person_id'] == t3['person_id_x'] && t3['date_2'] >= t3['end_date']),enc_id]
I expect my output (also see 14th row at the bottom of my screenshot) to be as shown below. As I intend to apply the solution on big data (4/5 million records and there might be 5000-6000 unique person_ids), any efficient and elegant solution is helpful
14 202 2012-12-13 11:00:00 NA
Let's do:
d = df1.merge(df.assign(within_id=df['enc_id'].str[:3]),
on=['person_id', 'within_id'], how='left', indicator=True)
m = d['date_1'].between(d['start_date'] - pd.Timedelta(days=1),
d['end_date'] + pd.Timedelta(days=1))
d = df1.merge(d[m | d['_merge'].ne('both')], on=['person_id', 'date_1'], how='left')
d['within_id'] = d['enc_id'].fillna('out of range').mask(d['_merge'].eq('left_only'))
d = d[df1.columns]
Details:
Left merge the dataframe df1 with df on person_id and within_id:
print(d)
person_id date_1 within_id start_date end_date enc_id _merge
0 101 2013-07-07 11:20:00 ABC 2013-05-07 09:27:00 2013-05-12 09:27:00 ABC1 both
1 101 2013-07-07 11:20:00 ABC 2013-09-08 11:21:00 2013-09-13 11:21:00 ABC2 both
2 101 2013-07-07 11:20:00 ABC 2014-06-06 08:00:00 2014-06-11 08:00:00 ABC3 both
3 101 2013-07-07 11:20:00 ABC 2014-06-06 05:00:00 2014-06-11 10:00:00 DEF1 both
....
47 202 2012-12-18 10:00:00 DEF 2012-10-13 00:00:00 2012-10-18 00:00:00 DEF2 both
48 202 2012-12-18 10:00:00 DEF 2012-12-13 11:45:00 2012-12-18 11:45:00 DEF3 both
49 202 2013-12-19 11:00:00 NaN NaT NaT NaN left_only
Create a boolean mask m to represent the condition where date_1 is between df.start_date - 1 days and df.end_date + 1 days:
print(m)
0 False
1 False
2 False
3 False
...
47 False
48 True
49 False
dtype: bool
Again left merge the dataframe df1 with the dataframe filtered using mask m on columns person_id and date_1:
print(d)
person_id date_1 within_id_x within_id_y start_date end_date enc_id _merge
0 101 2013-07-07 11:20:00 ABC NaN NaT NaT NaN NaN
1 101 2013-05-07 14:30:00 ABC ABC 2013-05-07 09:27:00 2013-05-12 09:27:00 ABC1 both
2 101 2013-06-07 14:40:00 ABC NaN NaT NaT NaN NaN
3 101 2014-08-06 00:00:00 ABC NaN NaT NaT NaN NaN
4 101 2014-11-06 00:00:00 ABC NaN NaT NaT NaN NaN
5 101 2013-02-03 12:30:00 ABC NaN NaT NaT NaN NaN
6 101 2014-06-13 00:00:00 ABC NaN NaT NaT NaN NaN
7 202 2011-12-11 00:00:00 DEF DEF 2011-12-11 10:00:00 2011-12-16 10:00:00 DEF1 both
8 202 2012-10-13 07:00:00 DEF DEF 2012-10-13 00:00:00 2012-10-18 00:00:00 DEF2 both
9 202 2015-12-13 00:00:00 DEF NaN NaT NaT NaN NaN
10 202 2012-12-13 00:00:00 DEF DEF 2012-12-13 11:45:00 2012-12-18 11:45:00 DEF3 both
11 202 2012-12-13 18:30:00 DEF DEF 2012-12-13 11:45:00 2012-12-18 11:45:00 DEF3 both
12 202 2011-07-13 10:00:00 DEF NaN NaT NaT NaN NaN
13 202 2012-12-18 10:00:00 DEF DEF 2012-12-13 11:45:00 2012-12-18 11:45:00 DEF3 both
14 202 2013-12-19 11:00:00 NaN NaN NaT NaT NaN left_only
Populate the values in within_id column from enc_id and using Series.fillna fill the NaN excluding the ones that doesn't match from df with out of range, finally filter the columns to get the result:
print(d)
person_id date_1 within_id
0 101 2013-07-07 11:20:00 out of range
1 101 2013-05-07 14:30:00 ABC1
2 101 2013-06-07 14:40:00 out of range
3 101 2014-08-06 00:00:00 out of range
4 101 2014-11-06 00:00:00 out of range
5 101 2013-02-03 12:30:00 out of range
6 101 2014-06-13 00:00:00 out of range
7 202 2011-12-11 00:00:00 DEF1
8 202 2012-10-13 07:00:00 DEF2
9 202 2015-12-13 00:00:00 out of range
10 202 2012-12-13 00:00:00 DEF3
11 202 2012-12-13 18:30:00 DEF3
12 202 2011-07-13 10:00:00 out of range
13 202 2012-12-18 10:00:00 DEF3
14 202 2013-12-19 11:00:00 NaN
I used df and df1 as provided above.
The basic approach is to iterate over df1 and extract the matching values of enc_id.
I added a 'rule' column, to show how each value got populated.
Unfortunately, I was not able to reproduce the expected results. Perhaps the general approach will be useful.
df1['rule'] = 0
for t in df1.itertuples():
person = (t.person_id == df.person_id)
b = (t.date_1 >= df.start_date) & (t.date_2 <= df.end_date)
c = (t.date_1 >= df.start_date) & (t.date_2 >= df.end_date)
d = (t.date_1 <= df.start_date) & (t.date_2 <= df.end_date)
e = (t.date_1 <= df.start_date) & (t.date_2 <= df.start_date) # start_date at BOTH ends
if (m := person & b).any():
df1.at[t.Index, 'within_id'] = df.loc[m, 'enc_id'].values[0]
df1.at[t.Index, 'rule'] += 1
elif (m := person & c).any():
df1.at[t.Index, 'within_id'] = df.loc[m, 'enc_id'].values[0]
df1.at[t.Index, 'rule'] += 10
elif (m := person & d).any():
df1.at[t.Index, 'within_id'] = df.loc[m, 'enc_id'].values[0]
df1.at[t.Index, 'rule'] += 100
elif (m := person & e).any():
df1.at[t.Index, 'within_id'] = 'out of range'
df1.at[t.Index, 'rule'] += 1_000
else:
df1.at[t.Index, 'within_id'] = 'impossible!'
df1.at[t.Index, 'rule'] += 10_000
df1['within_id'] = df1['within_id'].astype('Int64')
The results are:
print(df1)
person_id date_1 date_2 within_id rule
0 11 1961-12-30 00:00:00 1962-01-01 00:00:00 11345678901 1
1 11 1962-01-30 00:00:00 1962-02-01 00:00:00 11345678902 1
2 12 1962-02-28 00:00:00 1962-03-02 00:00:00 34567892101 100
3 12 1989-07-29 00:00:00 1989-07-31 00:00:00 34567892101 1
4 12 1989-09-03 00:00:00 1989-09-05 00:00:00 34567892101 10
5 12 1989-10-02 00:00:00 1989-10-04 00:00:00 34567892103 1
6 12 1989-10-01 00:00:00 1989-10-03 00:00:00 34567892103 1
7 13 1999-03-29 00:00:00 1999-03-31 00:00:00 56432718901 1
8 13 1999-04-20 00:00:00 1999-04-22 00:00:00 56432718901 10
9 13 1999-06-02 00:00:00 1999-06-04 00:00:00 56432718904 1
10 13 1999-06-03 00:00:00 1999-06-05 00:00:00 56432718904 1
11 13 1999-07-29 00:00:00 1999-07-31 00:00:00 56432718905 1
12 14 2002-02-03 10:00:00 2002-02-05 10:00:00 24680135791 1
13 14 2002-02-03 10:00:00 2002-02-05 10:00:00 24680135791 1
If we can divide time of a day from 00:00:00 hrs to 23:59:00 into 15 min blocks we will have 96 blocks. we can number them from 0 to 95.
I want to add a "timeblock" column to the dataframe, where i can number each row with a timeblock number that time stamp sits in as shown below.
tagdatetime tagvalue timeblock
2020-01-01 00:00:00 47.874423 0
2020-01-01 00:01:00 14.913561 0
2020-01-01 00:02:00 56.368034 0
2020-01-01 00:03:00 16.555687 0
2020-01-01 00:04:00 42.138176 0
... ... ...
2020-01-01 00:13:00 47.874423 0
2020-01-01 00:14:00 14.913561 0
2020-01-01 00:15:00 56.368034 0
2020-01-01 00:16:00 16.555687 1
2020-01-01 00:17:00 42.138176 1
... ... ...
2020-01-01 23:55:00 18.550685 95
2020-01-01 23:56:00 51.219147 95
2020-01-01 23:57:00 15.098951 95
2020-01-01 23:58:00 37.863191 95
2020-01-01 23:59:00 51.380950 95
I think there's a better way to do it, but I think it's possible below.
import pandas as pd
import numpy as np
tindex = pd.date_range('2020-01-01 00:00:00', '2020-01-01 23:59:00', freq='min')
tvalue = np.random.randint(1,50, (1440,))
df = pd.DataFrame({'tagdatetime':tindex, 'tagvalue':tvalue})
min15 = pd.date_range('2020-01-01 00:00:00', '2020-01-01 23:59:00', freq='15min')
tblock = np.arange(96)
df2 = pd.DataFrame({'min15':min15, 'timeblock':tblock})
df3 = pd.merge(df, df2, left_on='tagdatetime', right_on='min15', how='outer')
df3.ffill(axis=0, inplace=True)
df3 = df3.drop('min15', axis=1)
df3.iloc[10:20,]
tagdatetime tagvalue timeblock
10 2020-01-01 00:10:00 20 0.0
11 2020-01-01 00:11:00 25 0.0
12 2020-01-01 00:12:00 42 0.0
13 2020-01-01 00:13:00 45 0.0
14 2020-01-01 00:14:00 11 0.0
15 2020-01-01 00:15:00 15 1.0
16 2020-01-01 00:16:00 38 1.0
17 2020-01-01 00:17:00 23 1.0
18 2020-01-01 00:18:00 5 1.0
19 2020-01-01 00:19:00 32 1.0
Currently I have a df with a location_key & year_month multiIndex. I want to create a sum using a rolling window for 3 months.
(pd.DataFrame(df.groupby(['LOCATION_KEY','YEAR_MONTH'])['SALES'].count()).sort_index()).groupby(level=(0)).apply(lambda x: x.rolling(window=3).sum())
The window is working properly the issue is that in months where there were no sales instead of counting an empty month instead it counts the another month.
e.g. in the data below, 2016-03 Sales is the sum of 2016-03, 2016-01, 2015-12 as opposed to what I would like: 2016-03, 2016-02, 2016-01.
LOCATION_KE YEAR_MONTH SALES
A 2015-10 NaN
2015-11 NaN
2015-12 200
2016-01 220
2016-03 180
B 2015-04 NaN
2015-05 NaN
2015-06 119
2015-07 120
Basically you have to get your index set up how you want so the rolling window has zeros to process.
df
LOCATION_KE YEAR_MONTH SALES
0 A 2015-10-01 NaN
1 A 2015-11-01 NaN
2 A 2015-12-01 200.0
3 A 2016-01-01 220.0
4 A 2016-03-01 180.0
5 B 2015-04-01 NaN
6 B 2015-05-01 NaN
7 B 2015-06-01 119.0
8 B 2015-07-01 120.0
df['SALES'] = df['SALES'].fillna(0)
df.index = [df["LOCATION_KE"], df["YEAR_MONTH"]]
df
LOCATION_KE YEAR_MONTH SALES
LOCATION_KE YEAR_MONTH
A 2015-10-01 A 2015-10-01 0.0
2015-11-01 A 2015-11-01 0.0
2015-12-01 A 2015-12-01 200.0
2016-01-01 A 2016-01-01 220.0
2016-03-01 A 2016-03-01 180.0
B 2015-04-01 B 2015-04-01 0.0
2015-05-01 B 2015-05-01 0.0
2015-06-01 B 2015-06-01 119.0
2015-07-01 B 2015-07-01 120.0
df = df.reindex(pd.MultiIndex.from_product([df['LOCATION_KE'],
pd.date_range("20150101", periods=24, freq='MS')],
names=['location', 'month']))
df['SALES'].fillna(0).reset_index(level=0).groupby('location').rolling(3).sum().fillna(0)
location SALES
location month
A 2015-01-01 A 0.0
2015-02-01 A 0.0
2015-03-01 A 0.0
2015-04-01 A 0.0
2015-05-01 A 0.0
2015-06-01 A 0.0
2015-07-01 A 0.0
2015-08-01 A 0.0
2015-09-01 A 0.0
2015-10-01 A 0.0
2015-11-01 A 0.0
2015-12-01 A 200.0
2016-01-01 A 420.0
2016-02-01 A 420.0
2016-03-01 A 400.0
2016-04-01 A 180.0
2016-05-01 A 180.0
2016-06-01 A 0.0
2016-07-01 A 0.0
2016-08-01 A 0.0
2016-09-01 A 0.0
2016-10-01 A 0.0
2016-11-01 A 0.0
2016-12-01 A 0.0
2015-01-01 A 0.0
2015-02-01 A 0.0
2015-03-01 A 0.0
2015-04-01 A 0.0
2015-05-01 A 0.0
2015-06-01 A 0.0
... ... ...
B 2016-07-01 B 0.0
2016-08-01 B 0.0
2016-09-01 B 0.0
2016-10-01 B 0.0
2016-11-01 B 0.0
2016-12-01 B 0.0
2015-01-01 B 0.0
2015-02-01 B 0.0
2015-03-01 B 0.0
2015-04-01 B 0.0
2015-05-01 B 0.0
2015-06-01 B 119.0
2015-07-01 B 239.0
2015-08-01 B 239.0
2015-09-01 B 120.0
2015-10-01 B 0.0
2015-11-01 B 0.0
2015-12-01 B 0.0
2016-01-01 B 0.0
2016-02-01 B 0.0
2016-03-01 B 0.0
2016-04-01 B 0.0
2016-05-01 B 0.0
2016-06-01 B 0.0
2016-07-01 B 0.0
2016-08-01 B 0.0
2016-09-01 B 0.0
2016-10-01 B 0.0
2016-11-01 B 0.0
2016-12-01 B 0.0
I think if you have a up to date pandas you can leave out the reset_index.