How to create dataframe by randomly selecting from another dataframe? - python
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759)
0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651
1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682
1.500 (1.228) 1.840 (1.594) (0.282) (0.907)
(1.540) 0.689 (0.683) 0.005 0.543
(0.197) (0.664) (0.636) 0.878
(0.942) 0.764 (0.137)
0.693 1.647
0.197
I have above dataframe:
i need below dataframe using random value from above dataframe:
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.664) 1.290 0.682 0.030 (0.683) (0.636) (0.683) 1.840 (1.540)
1.490 (0.907) (0.850) (0.197) (1.228) 0.682 1.290 0.939
0.047 0.682 0.346 0.689 (0.137) 1.490 0.197
0.047 0.878 0.651 0.047 0.047 (0.197)
(1.141) 0.758 0.878 1.490 0.651
1.647 1.490 0.772 1.490
(0.519) 0.693 0.346
(0.137) 0.850
0.197
I've tried this code :
df2= df1.sample(len(df1))
print(df2)
But Output is
DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9
OP8 0.735590 1.762630 NaN NaN NaN NaN NaN NaN NaN
OP7 -0.999665 0.817949 -0.147698 NaN NaN NaN NaN NaN NaN
OP2 0.031430 0.049994 0.682040 -0.667445 -0.360034 1.089516 0.426642 0.916619 NaN
OP3 1.368955 -0.191781 0.006623 0.932736 -1.277548 0.880056 0.841018 NaN NaN
OP1 -0.551065 -1.195305 -1.243199 0.847178 1.668630 -0.986300 -1.465904 -1.069986 NaN
OP4 1.592201 -1.314628 1.985683 -1.749389 -0.315828 -1.052629 NaN NaN NaN
OP6 -0.208647 -0.710424 -0.686654 0.963221 NaN NaN NaN NaN NaN
OP10 NaN NaN NaN NaN NaN NaN NaN NaN NaN
OP9 0.209244 NaN NaN NaN NaN NaN NaN NaN NaN
OP5 -1.635306 0.737937 -0.736907 0.005545 0.607974 NaN NaN NaN NaN
You can use np.random.choice() for the sampling.
Assuming df is something like this:
df = pd.DataFrame({'DP 1': ['(0.519)','0.030','1.290','1.500','(1.540)','(0.197)','(0.942)','0.693','0.197'],'DP 2': ['(1.117)','0.047','(0.179)','(1.228)','0.689','(0.664)','0.764','1.647',np.nan],'DP 3': ['(1.152)','0.632','0.006','1.840','(0.683)','(0.636)','(0.137)',np.nan,np.nan],'DP 4': ['0.772','(0.608)','0.850','(1.594)','0.005','0.878',np.nan,np.nan,np.nan],'DP 5': ['1.490','(0.322)','(1.141)','(0.282)','0.543',np.nan,np.nan,np.nan,np.nan],'DP 6': ['(0.850)','0.939','0.758','(0.907)',np.nan,np.nan,np.nan,np.nan,np.nan],'DP 7': ['(1.189)','0.346','0.682',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 8': ['(0.759)','0.651',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 9': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 10': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759) NaN NaN
# 1 0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651 NaN NaN
# 2 1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682 NaN NaN NaN
# 3 1.500 (1.228) 1.840 (1.594) (0.282) (0.907) NaN NaN NaN NaN
# 4 (1.540) 0.689 (0.683) 0.005 0.543 NaN NaN NaN NaN NaN
# 5 (0.197) (0.664) (0.636) 0.878 NaN NaN NaN NaN NaN NaN
# 6 (0.942) 0.764 (0.137) NaN NaN NaN NaN NaN NaN NaN
# 7 0.693 1.647 NaN NaN NaN NaN NaN NaN NaN NaN
# 8 0.197 NaN NaN NaN NaN NaN NaN NaN NaN NaN
First extract the choices from all non-null values of df:
choices = df.values[~pd.isnull(df.values)]
# array(['(0.519)', '(1.117)', '(1.152)', '0.772', '1.490', '(0.850)',
# '(1.189)', '(0.759)', '0.030', '0.047', '0.632', '(0.608)',
# '(0.322)', '0.939', '0.346', '0.651', '1.290', '(0.179)', '0.006',
# '0.850', '(1.141)', '0.758', '0.682', '1.500', '(1.228)', '1.840',
# '(1.594)', '(0.282)', '(0.907)', '(1.540)', '0.689', '(0.683)',
# '0.005', '0.543', '(0.197)', '(0.664)', '(0.636)', '0.878',
# '(0.942)', '0.764', '(0.137)', '0.693', '1.647', '0.197'],
# dtype=object)
Then take a np.random.choice() from choices for all non-null cells:
df = df.applymap(lambda x: np.random.choice(choices) if not pd.isnull(x) else x)
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.179) 0.682 0.758 (1.152) (0.137) (1.152) 0.939 (0.759) NaN NaN
# 1 1.500 (1.152) (0.197) 0.772 1.840 1.840 0.772 (0.850) NaN NaN
# 2 0.878 0.005 (1.540) 0.764 (0.519) 0.682 (1.152) NaN NaN NaN
# 3 0.758 (0.137) 1.840 1.647 1.647 (0.942) NaN NaN NaN NaN
# 4 0.693 (0.683) (0.759) 1.500 (0.197) NaN NaN NaN NaN NaN
# 5 0.006 (0.137) 0.764 (1.117) NaN NaN NaN NaN NaN NaN
# 6 (0.664) 0.632 (1.141) NaN NaN NaN NaN NaN NaN NaN
# 7 0.543 (0.664) NaN NaN NaN NaN NaN NaN NaN NaN
# 8 (0.137) NaN NaN NaN NaN NaN NaN NaN NaN NaN
Related
DataFrame.fillna method - filling the NaN values with Df.mean(axis =1)
Hi I am trying to fill my dataframe's NaN values through fillna method: after applying the fill na with value = df.mean(axis =1) I am still getting some NaN values in some columns can anyone explain how is it filling up the NaN values
Try: df.fillna(df.mean()) Fills all NaN with the df.mean of a column values. Given df, 0 1 2 3 4 0 804.0 271.0 690.0 401.0 158.0 1 352.0 995.0 770.0 616.0 791.0 2 381.0 824.0 61.0 152.0 NaN 3 907.0 607.0 NaN 488.0 180.0 4 981.0 938.0 378.0 957.0 176.0 5 NaN NaN NaN NaN NaN Output: 0 1 2 3 4 0 804.0 271.0 690.00 401.0 158.00 1 352.0 995.0 770.00 616.0 791.00 2 381.0 824.0 61.00 152.0 326.25 3 907.0 607.0 474.75 488.0 180.00 4 981.0 938.0 378.00 957.0 176.00 5 685.0 727.0 474.75 522.8 326.25
Calculating multiple columns using Panda's mask() and diff() with multiple conditions
I have a dataframe df df: Date Type AVG1 AVG2 AVG3 AVG4 AVG5 2022-05 ROL1 0.33 0.45 0.12 0.96 1.33 2022-05 ROL2 1.43 0.11 0.75 1.99 3.01 2022-05 ROL3 0.11 0.32 0.55 1.26 4.22 2022-04 ROL1 1.66 0.71 0.87 5.88 1.11 2022-04 ROL2 2.31 0.89 2.20 4.36 4.87 2022-04 ROL3 5.40 1.22 4.45 0.01 0.31 And I need to create the columns AVG1_ROL1_MoM, AVG1_ROL2_MoM, AVG3_ROL1_MoM, AVG1_ROL2_MoM and so on. Where AVG1_ROL1_MoM is the difference in AVG1 where TYPE = ROL1 from one month to the other: Date Type AVG1 AVG2 AVG3 AVG4 AVG5 AVG1_ROL1_MoM AVG1_ROL2_MoM 2022-05 ROL1 0.33 0.45 0.12 0.96 1.33 -1.33 NaN 2022-05 ROL2 1.43 0.11 0.75 1.99 3.01 NaN -0.88 2022-05 ROL3 0.11 0.32 0.55 1.26 4.22 NaN NaN 2022-04 ROL1 1.66 0.71 0.87 5.88 1.11 NaN NaN 2022-04 ROL2 2.31 0.89 2.20 4.36 4.87 NaN NaN 2022-04 ROL3 5.40 1.22 4.45 0.01 0.31 NaN NaN I tried to do that with mask() and shift(), but it didn't work: df['AVG1_ROL1_MoM'] = df.mask(df['Type']=="ROL1", df['AVG1'] - df['AVG1'].shift(), inplace=True) This returns that an axis must be defined, but when I define and axis it returns that: "Cannot do inplace boolean setting on mixed-types with a non np.nan value" What would be the best approach for this?
melt the dataframe to get all the values in a single column Create the new column names groupby to find the monthly differences pivot to get back the original structure merge with the original dataframe melted = df.melt(["Date","Type"]) melted["column"] = melted["variable"]+"_"+melted["Type"]+"_MoM" melted["diff"] = melted.groupby(["Type","variable"])["value"].diff(-1) pivoted = melted.pivot(["Date","Type"],"column","diff").sort_index(ascending=[False,True]).reset_index() output = df.merge(pivoted, on=["Date","Type"]) >>> output Date Type AVG1 ... AVG5_ROL1_MoM AVG5_ROL2_MoM AVG5_ROL3_MoM 0 2022-05 ROL1 0.33 ... 0.22 NaN NaN 1 2022-05 ROL2 1.43 ... NaN -1.86 NaN 2 2022-05 ROL3 0.11 ... NaN NaN 3.91 3 2022-04 ROL1 1.66 ... NaN NaN NaN 4 2022-04 ROL2 2.31 ... NaN NaN NaN 5 2022-04 ROL3 5.40 ... NaN NaN NaN [6 rows x 22 columns]
IUUC, you can try group by Type column and then compare the subgroup AVG shifted value and rename the outcome columns: out = (df.filter(like='AVG') .groupby(df['Type']) .apply(lambda g: (g-g.shift(-1)).rename(columns=lambda col: f'{col}_{g.name}_MOM')) ) print(out) AVG1_ROL1_MOM AVG2_ROL1_MOM AVG3_ROL1_MOM AVG4_ROL1_MOM AVG5_ROL1_MOM \ 0 -1.33 -0.26 -0.75 -4.92 0.22 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN AVG1_ROL2_MOM AVG2_ROL2_MOM AVG3_ROL2_MOM AVG4_ROL2_MOM AVG5_ROL2_MOM \ 0 NaN NaN NaN NaN NaN 1 -0.88 -0.78 -1.45 -2.37 -1.86 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN AVG1_ROL3_MOM AVG2_ROL3_MOM AVG3_ROL3_MOM AVG4_ROL3_MOM AVG5_ROL3_MOM 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 2 -5.29 -0.9 -3.9 1.25 3.91 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN out = pd.concat([df, out], axis=1) print(out) Date Type AVG1 AVG2 AVG3 AVG4 AVG5 AVG1_ROL1_MOM AVG2_ROL1_MOM \ 0 2022-05 ROL1 0.33 0.45 0.12 0.96 1.33 -1.33 -0.26 1 2022-05 ROL2 1.43 0.11 0.75 1.99 3.01 NaN NaN 2 2022-05 ROL3 0.11 0.32 0.55 1.26 4.22 NaN NaN 3 2022-04 ROL1 1.66 0.71 0.87 5.88 1.11 NaN NaN 4 2022-04 ROL2 2.31 0.89 2.20 4.36 4.87 NaN NaN 5 2022-04 ROL3 5.40 1.22 4.45 0.01 0.31 NaN NaN AVG3_ROL1_MOM AVG4_ROL1_MOM AVG5_ROL1_MOM AVG1_ROL2_MOM AVG2_ROL2_MOM \ 0 -0.75 -4.92 0.22 NaN NaN 1 NaN NaN NaN -0.88 -0.78 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN AVG3_ROL2_MOM AVG4_ROL2_MOM AVG5_ROL2_MOM AVG1_ROL3_MOM AVG2_ROL3_MOM \ 0 NaN NaN NaN NaN NaN 1 -1.45 -2.37 -1.86 NaN NaN 2 NaN NaN NaN -5.29 -0.9 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN AVG3_ROL3_MOM AVG4_ROL3_MOM AVG5_ROL3_MOM 0 NaN NaN NaN 1 NaN NaN NaN 2 -3.9 1.25 3.91 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
Calculate maximum difference of rolling interval of n columns
I have a dataset df Time Spot Ubalance 0 2017-01-01T00:00:00+01:00 20.96 NaN 1 2017-01-01T01:00:00+01:00 20.90 29.40 2 2017-01-01T02:00:00+01:00 18.13 24.73 3 2017-01-01T03:00:00+01:00 16.03 24.73 4 2017-01-01T04:00:00+01:00 16.43 27.89 5 2017-01-01T05:00:00+01:00 13.75 28.26 6 2017-01-01T06:00:00+01:00 11.10 30.43 7 2017-01-01T07:00:00+01:00 15.47 32.85 8 2017-01-01T08:00:00+01:00 16.88 33.91 9 2017-01-01T09:00:00+01:00 21.81 28.58 10 2017-01-01T10:00:00+01:00 26.24 28.58 I want to generate a series/dataframe in which I calculate the maximum difference between the highest and lowest value of the last n rows within multiple columns, i.e., the maximum difference of these "last" 10 rows would be 33.91 (highest is here in "ubalance") - 11.10 (lowest is in "Spot") = 22.81 I've tried .rolling() but it apparently does not contain a difference attribute. Expected outcome: Time Spot Ubalance Diff 0 2017-01-01T00:00:00+01:00 20.96 NaN NaN 1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN 2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN 3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN 4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN 5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN 6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN 7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN 8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN 9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN 10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
Use Rolling.aggregate and then subtract: df1 = df['Spot'].rolling(10).agg(['min','max']) print (df1) min max 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN NaN 9 11.1 21.81 10 11.1 26.24 df['dif'] = df1['max'].sub(df1['min']) print (df) Time Spot Ubalance dif 0 2017-01-01T00:00:00+01:00 20.96 NaN NaN 1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN 2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN 3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN 4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN 5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN 6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN 7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN 8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN 9 2017-01-01T09:00:00+01:00 21.81 28.58 10.71 10 2017-01-01T10:00:00+01:00 26.24 28.58 15.14 Or custom function with lambda: df['diff'] = df['Spot'].rolling(10).agg(lambda x: x.max() - x.min()) EDIT: For processing all columns from list use: cols = ['Spot','Ubalance'] N = 10 df['dif'] = (df[cols].stack(dropna=False) .rolling(len(cols) * N) .agg(lambda x: x.max() - x.min()) .groupby(level=0) .max()) print (df) Time Spot Ubalance dif 0 2017-01-01T00:00:00+01:00 20.96 NaN NaN 1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN 2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN 3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN 4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN 5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN 6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN 7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN 8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN 9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN 10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
you could a rolling window like this: n = 10 df.rolling(3).apply(func=lambda x: x.max() - x.min()) you can specify in the lambda function the column you want to do the rolling window
Pandas : How to concatenate or merge the groups using groupby function and populate single table or dataframe?
df = name description curve tenor rates IND 3M ZAR_3M 0.25 6.808000088 IND 2Y ZAR_3M 2 6.483012199 IND 3Y ZAR_3M 3 6.565002918 IND 4Y ZAR_3M 4 6.694129944 IND 5Y ZAR_3M 5 6.83951807 IND 3M CAD_OIS 0.25 1.738620043 BHU 6M CAD_OIS 0.5 1.718042016 IND 9M CAD_OIS 0.75 1.697247028 IND 1Y CAD_OIS 1 1.67719996 IND 18M CAD_OIS 1.5 1.631257057 IND 2Y CAD_3M 2 1.906309009 IND 3y CAD_3M 3 1.855569959 IND 4Y CAD_3M 4 1.830132961 BHU 5Y CAD_3M 5 1.817605019 BHU 6y CAD_3M 6 1.814880013 IND 7Y CAD_3M 7 1.821526051 BHU TND CZK_Curve 0.01 0.02 BHU 1WK CZK_Curve 0.03 0.0203 BHU 1M CZK_Curve 0.09 0.021 BHU 2M CZK_Curve 0.18 0.0212 BHU 3M CZK_Curve 0.26 0.0214 BHU 6M CZK_Curve 0.51 0.0212 BHU 9M CZK_Curve 0.76 0.02045 BHU 12M CZK_Curve 1.01 0.01985 BHU 2Y CZK_Curve 2.01 0.020033333 BHU 3Y CZK_Curve 3.02 0.018816667 BHU 4Y CZK_Curve 4.02 0.017666667 BHU 5Y CZK_Curve 5.02 0.016616667 BHU 6Y CZK_Curve 6.02 0.015766667 BHU 7Y CZK_Curve 7.02 0.015216667 BHU 8Y CZK_Curve 8.02 0.014616667 BHU 9Y CZK_Curve 9.02 0.014358333 Above is my dataframe(df) having 5 variables. I would like to populate the table based on 'curve' and rename the rates as curve name. Following is my expected output. I tried using groupby function to generate groups and concatenate side by side based on 'tenor'. But my code seems incomplete. Please suggest to how to produce the below output. df_tenor = df_tenor[['Tenor']].drop_duplicates() df_tenor = df_tenor.sort_values(by=['tenor']) gb = df.groupby('curve') df.rename(columns={'rates': str([df.curve.unique() for g in gb])}, inplace=True) df_final= pd.concat([g[1].merge(df_tenor, how='outer', on='Tenor') for g in gb], axis=1) df_final.to_csv('testconcat.csv', index = False)
Use ``pandas.pivot_table()``` pd.pivot_table(df, index='tenor', values='rates', columns='curve') Output curve CAD_3M CAD_OIS CZK_Curve ZAR_3M tenor 0.01 NaN NaN 0.020000 NaN 0.03 NaN NaN 0.020300 NaN 0.09 NaN NaN 0.021000 NaN 0.18 NaN NaN 0.021200 NaN 0.25 NaN 1.738620 NaN 6.808000 0.26 NaN NaN 0.021400 NaN 0.50 NaN 1.718042 NaN NaN 0.51 NaN NaN 0.021200 NaN 0.75 NaN 1.697247 NaN NaN 0.76 NaN NaN 0.020450 NaN 1.00 NaN 1.677200 NaN NaN 1.01 NaN NaN 0.019850 NaN 1.50 NaN 1.631257 NaN NaN 2.00 1.906309 NaN NaN 6.483012 2.01 NaN NaN 0.020033 NaN 3.00 1.855570 NaN NaN 6.565003 3.02 NaN NaN 0.018817 NaN 4.00 1.830133 NaN NaN 6.694130 4.02 NaN NaN 0.017667 NaN 5.00 1.817605 NaN NaN 6.839518 5.02 NaN NaN 0.016617 NaN 6.00 1.814880 NaN NaN NaN 6.02 NaN NaN 0.015767 NaN 7.00 1.821526 NaN NaN NaN 7.02 NaN NaN 0.015217 NaN 8.02 NaN NaN 0.014617 NaN 9.02 NaN NaN 0.014358 NaN
Python Dataframe How to groupby weeks over years
I have a dataset like below : date = 2012-01-01 NaN NaN NaN 2012-01-02 NaN NaN NaN 2012-01-03 NaN NaN NaN 2012-01-04 0.880 2.981 -0.0179 2012-01-05 0.857 2.958 -0.0261 2012-01-06 0.858 2.959 0.0012 2012-01-07 NaN NaN NaN 2012-01-08 NaN NaN NaN 2012-01-09 0.880 2.981 0.0256 2012-01-10 0.905 3.006 0.0284 2012-01-11 0.905 3.006 0.0000 2012-01-12 0.902 3.003 -0.0033 2012-01-13 0.880 2.981 -0.0244 2012-01-14 NaN NaN NaN 2012-01-15 NaN NaN NaN 2012-01-16 0.858 2.959 -0.0250 2012-01-17 0.891 2.992 0.0385 2012-01-18 0.878 2.979 -0.0146 2012-01-19 0.887 2.988 0.0103 2012-01-20 0.899 3.000 0.0135 2012-01-21 NaN NaN NaN 2012-01-22 NaN NaN NaN 2012-01-23 NaN NaN NaN 2012-01-24 NaN NaN NaN 2012-01-25 NaN NaN NaN 2012-01-26 NaN NaN NaN 2012-01-27 NaN NaN NaN 2012-01-28 NaN NaN NaN 2012-01-29 NaN NaN NaN 2012-01-30 0.892 2.993 -0.0078 ... ... ... ... 2016-12-02 1.116 3.417 -0.0124 2016-12-03 NaN NaN NaN 2016-12-04 NaN NaN NaN 2016-12-05 1.111 3.412 -0.0045 2016-12-06 1.111 3.412 0.0000 2016-12-07 1.120 3.421 0.0081 2016-12-08 1.113 3.414 -0.0063 2016-12-09 1.109 3.410 -0.0036 2016-12-10 NaN NaN NaN 2016-12-11 NaN NaN NaN 2016-12-12 1.072 3.373 -0.0334 2016-12-13 1.075 3.376 0.0028 2016-12-14 1.069 3.370 -0.0056 2016-12-15 1.069 3.370 0.0000 2016-12-16 1.073 3.374 0.0037 2016-12-17 NaN NaN NaN 2016-12-18 NaN NaN NaN 2016-12-19 1.071 3.372 -0.0019 2016-12-20 1.067 3.368 -0.0037 2016-12-21 1.076 3.377 0.0084 2016-12-22 1.076 3.377 0.0000 2016-12-23 1.066 3.367 -0.0093 2016-12-24 NaN NaN NaN 2016-12-25 NaN NaN NaN 2016-12-26 1.041 3.372 0.0047 2016-12-27 1.042 3.373 0.0010 2016-12-28 1.038 3.369 -0.0038 2016-12-29 1.035 3.366 -0.0029 2016-12-30 1.038 3.369 0.0029 2016-12-31 1.038 3.369 0.0000 when I do : in_range_df = Days_Count_Sum["2012-01-01":"2016-12-31"] print("In range: ",in_range_df) Week_Count = in_range_df.groupby(in_range_df.index.week) print("in_range_df.index.week: ",in_range_df.index.week) print("Group by Week: ",Week_Count.sum()) I found the result always get list of 1 to 53 (weeks) when print out :in_range_df.index.week: [52 1 1 ..., 52 52 52] I realized the index value is always "52" after the first year of this range 2012. How to group by weeks from the range of more than one year?