How to create dataframe by randomly selecting from another dataframe? - python

DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759)
0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651
1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682
1.500 (1.228) 1.840 (1.594) (0.282) (0.907)
(1.540) 0.689 (0.683) 0.005 0.543
(0.197) (0.664) (0.636) 0.878
(0.942) 0.764 (0.137)
0.693 1.647
0.197
I have above dataframe:
i need below dataframe using random value from above dataframe:
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
(0.664) 1.290 0.682 0.030 (0.683) (0.636) (0.683) 1.840 (1.540)
1.490 (0.907) (0.850) (0.197) (1.228) 0.682 1.290 0.939
0.047 0.682 0.346 0.689 (0.137) 1.490 0.197
0.047 0.878 0.651 0.047 0.047 (0.197)
(1.141) 0.758 0.878 1.490 0.651
1.647 1.490 0.772 1.490
(0.519) 0.693 0.346
(0.137) 0.850
0.197
I've tried this code :
df2= df1.sample(len(df1))
print(df2)
But Output is
DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9
OP8 0.735590 1.762630 NaN NaN NaN NaN NaN NaN NaN
OP7 -0.999665 0.817949 -0.147698 NaN NaN NaN NaN NaN NaN
OP2 0.031430 0.049994 0.682040 -0.667445 -0.360034 1.089516 0.426642 0.916619 NaN
OP3 1.368955 -0.191781 0.006623 0.932736 -1.277548 0.880056 0.841018 NaN NaN
OP1 -0.551065 -1.195305 -1.243199 0.847178 1.668630 -0.986300 -1.465904 -1.069986 NaN
OP4 1.592201 -1.314628 1.985683 -1.749389 -0.315828 -1.052629 NaN NaN NaN
OP6 -0.208647 -0.710424 -0.686654 0.963221 NaN NaN NaN NaN NaN
OP10 NaN NaN NaN NaN NaN NaN NaN NaN NaN
OP9 0.209244 NaN NaN NaN NaN NaN NaN NaN NaN
OP5 -1.635306 0.737937 -0.736907 0.005545 0.607974 NaN NaN NaN NaN

You can use np.random.choice() for the sampling.
Assuming df is something like this:
df = pd.DataFrame({'DP 1': ['(0.519)','0.030','1.290','1.500','(1.540)','(0.197)','(0.942)','0.693','0.197'],'DP 2': ['(1.117)','0.047','(0.179)','(1.228)','0.689','(0.664)','0.764','1.647',np.nan],'DP 3': ['(1.152)','0.632','0.006','1.840','(0.683)','(0.636)','(0.137)',np.nan,np.nan],'DP 4': ['0.772','(0.608)','0.850','(1.594)','0.005','0.878',np.nan,np.nan,np.nan],'DP 5': ['1.490','(0.322)','(1.141)','(0.282)','0.543',np.nan,np.nan,np.nan,np.nan],'DP 6': ['(0.850)','0.939','0.758','(0.907)',np.nan,np.nan,np.nan,np.nan,np.nan],'DP 7': ['(1.189)','0.346','0.682',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 8': ['(0.759)','0.651',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 9': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'DP 10': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.519) (1.117) (1.152) 0.772 1.490 (0.850) (1.189) (0.759) NaN NaN
# 1 0.030 0.047 0.632 (0.608) (0.322) 0.939 0.346 0.651 NaN NaN
# 2 1.290 (0.179) 0.006 0.850 (1.141) 0.758 0.682 NaN NaN NaN
# 3 1.500 (1.228) 1.840 (1.594) (0.282) (0.907) NaN NaN NaN NaN
# 4 (1.540) 0.689 (0.683) 0.005 0.543 NaN NaN NaN NaN NaN
# 5 (0.197) (0.664) (0.636) 0.878 NaN NaN NaN NaN NaN NaN
# 6 (0.942) 0.764 (0.137) NaN NaN NaN NaN NaN NaN NaN
# 7 0.693 1.647 NaN NaN NaN NaN NaN NaN NaN NaN
# 8 0.197 NaN NaN NaN NaN NaN NaN NaN NaN NaN
First extract the choices from all non-null values of df:
choices = df.values[~pd.isnull(df.values)]
# array(['(0.519)', '(1.117)', '(1.152)', '0.772', '1.490', '(0.850)',
# '(1.189)', '(0.759)', '0.030', '0.047', '0.632', '(0.608)',
# '(0.322)', '0.939', '0.346', '0.651', '1.290', '(0.179)', '0.006',
# '0.850', '(1.141)', '0.758', '0.682', '1.500', '(1.228)', '1.840',
# '(1.594)', '(0.282)', '(0.907)', '(1.540)', '0.689', '(0.683)',
# '0.005', '0.543', '(0.197)', '(0.664)', '(0.636)', '0.878',
# '(0.942)', '0.764', '(0.137)', '0.693', '1.647', '0.197'],
# dtype=object)
Then take a np.random.choice() from choices for all non-null cells:
df = df.applymap(lambda x: np.random.choice(choices) if not pd.isnull(x) else x)
# DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
# 0 (0.179) 0.682 0.758 (1.152) (0.137) (1.152) 0.939 (0.759) NaN NaN
# 1 1.500 (1.152) (0.197) 0.772 1.840 1.840 0.772 (0.850) NaN NaN
# 2 0.878 0.005 (1.540) 0.764 (0.519) 0.682 (1.152) NaN NaN NaN
# 3 0.758 (0.137) 1.840 1.647 1.647 (0.942) NaN NaN NaN NaN
# 4 0.693 (0.683) (0.759) 1.500 (0.197) NaN NaN NaN NaN NaN
# 5 0.006 (0.137) 0.764 (1.117) NaN NaN NaN NaN NaN NaN
# 6 (0.664) 0.632 (1.141) NaN NaN NaN NaN NaN NaN NaN
# 7 0.543 (0.664) NaN NaN NaN NaN NaN NaN NaN NaN
# 8 (0.137) NaN NaN NaN NaN NaN NaN NaN NaN NaN

Related

DataFrame.fillna method - filling the NaN values with Df.mean(axis =1)

Hi I am trying to fill my dataframe's NaN values through fillna method:
after applying the fill na with value = df.mean(axis =1) I am still getting some NaN values in some columns
can anyone explain how is it filling up the NaN values
Try:
df.fillna(df.mean())
Fills all NaN with the df.mean of a column values.
Given df,
0 1 2 3 4
0 804.0 271.0 690.0 401.0 158.0
1 352.0 995.0 770.0 616.0 791.0
2 381.0 824.0 61.0 152.0 NaN
3 907.0 607.0 NaN 488.0 180.0
4 981.0 938.0 378.0 957.0 176.0
5 NaN NaN NaN NaN NaN
Output:
0 1 2 3 4
0 804.0 271.0 690.00 401.0 158.00
1 352.0 995.0 770.00 616.0 791.00
2 381.0 824.0 61.00 152.0 326.25
3 907.0 607.0 474.75 488.0 180.00
4 981.0 938.0 378.00 957.0 176.00
5 685.0 727.0 474.75 522.8 326.25

Calculating multiple columns using Panda's mask() and diff() with multiple conditions

I have a dataframe df
df:
Date
Type
AVG1
AVG2
AVG3
AVG4
AVG5
2022-05
ROL1
0.33
0.45
0.12
0.96
1.33
2022-05
ROL2
1.43
0.11
0.75
1.99
3.01
2022-05
ROL3
0.11
0.32
0.55
1.26
4.22
2022-04
ROL1
1.66
0.71
0.87
5.88
1.11
2022-04
ROL2
2.31
0.89
2.20
4.36
4.87
2022-04
ROL3
5.40
1.22
4.45
0.01
0.31
And I need to create the columns AVG1_ROL1_MoM, AVG1_ROL2_MoM, AVG3_ROL1_MoM, AVG1_ROL2_MoM and so on. Where AVG1_ROL1_MoM is the difference in AVG1 where TYPE = ROL1 from one month to the other:
Date
Type
AVG1
AVG2
AVG3
AVG4
AVG5
AVG1_ROL1_MoM
AVG1_ROL2_MoM
2022-05
ROL1
0.33
0.45
0.12
0.96
1.33
-1.33
NaN
2022-05
ROL2
1.43
0.11
0.75
1.99
3.01
NaN
-0.88
2022-05
ROL3
0.11
0.32
0.55
1.26
4.22
NaN
NaN
2022-04
ROL1
1.66
0.71
0.87
5.88
1.11
NaN
NaN
2022-04
ROL2
2.31
0.89
2.20
4.36
4.87
NaN
NaN
2022-04
ROL3
5.40
1.22
4.45
0.01
0.31
NaN
NaN
I tried to do that with mask() and shift(), but it didn't work:
df['AVG1_ROL1_MoM'] = df.mask(df['Type']=="ROL1", df['AVG1'] - df['AVG1'].shift(), inplace=True)
This returns that an axis must be defined, but when I define and axis it returns that:
"Cannot do inplace boolean setting on mixed-types with a non np.nan value"
What would be the best approach for this?
melt the dataframe to get all the values in a single column
Create the new column names
groupby to find the monthly differences
pivot to get back the original structure
merge with the original dataframe
melted = df.melt(["Date","Type"])
melted["column"] = melted["variable"]+"_"+melted["Type"]+"_MoM"
melted["diff"] = melted.groupby(["Type","variable"])["value"].diff(-1)
pivoted = melted.pivot(["Date","Type"],"column","diff").sort_index(ascending=[False,True]).reset_index()
output = df.merge(pivoted, on=["Date","Type"])
>>> output
Date Type AVG1 ... AVG5_ROL1_MoM AVG5_ROL2_MoM AVG5_ROL3_MoM
0 2022-05 ROL1 0.33 ... 0.22 NaN NaN
1 2022-05 ROL2 1.43 ... NaN -1.86 NaN
2 2022-05 ROL3 0.11 ... NaN NaN 3.91
3 2022-04 ROL1 1.66 ... NaN NaN NaN
4 2022-04 ROL2 2.31 ... NaN NaN NaN
5 2022-04 ROL3 5.40 ... NaN NaN NaN
[6 rows x 22 columns]
IUUC, you can try group by Type column and then compare the subgroup AVG shifted value and rename the outcome columns:
out = (df.filter(like='AVG')
.groupby(df['Type'])
.apply(lambda g: (g-g.shift(-1)).rename(columns=lambda col: f'{col}_{g.name}_MOM'))
)
print(out)
AVG1_ROL1_MOM AVG2_ROL1_MOM AVG3_ROL1_MOM AVG4_ROL1_MOM AVG5_ROL1_MOM \
0 -1.33 -0.26 -0.75 -4.92 0.22
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
AVG1_ROL2_MOM AVG2_ROL2_MOM AVG3_ROL2_MOM AVG4_ROL2_MOM AVG5_ROL2_MOM \
0 NaN NaN NaN NaN NaN
1 -0.88 -0.78 -1.45 -2.37 -1.86
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
AVG1_ROL3_MOM AVG2_ROL3_MOM AVG3_ROL3_MOM AVG4_ROL3_MOM AVG5_ROL3_MOM
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 -5.29 -0.9 -3.9 1.25 3.91
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
out = pd.concat([df, out], axis=1)
print(out)
Date Type AVG1 AVG2 AVG3 AVG4 AVG5 AVG1_ROL1_MOM AVG2_ROL1_MOM \
0 2022-05 ROL1 0.33 0.45 0.12 0.96 1.33 -1.33 -0.26
1 2022-05 ROL2 1.43 0.11 0.75 1.99 3.01 NaN NaN
2 2022-05 ROL3 0.11 0.32 0.55 1.26 4.22 NaN NaN
3 2022-04 ROL1 1.66 0.71 0.87 5.88 1.11 NaN NaN
4 2022-04 ROL2 2.31 0.89 2.20 4.36 4.87 NaN NaN
5 2022-04 ROL3 5.40 1.22 4.45 0.01 0.31 NaN NaN
AVG3_ROL1_MOM AVG4_ROL1_MOM AVG5_ROL1_MOM AVG1_ROL2_MOM AVG2_ROL2_MOM \
0 -0.75 -4.92 0.22 NaN NaN
1 NaN NaN NaN -0.88 -0.78
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
AVG3_ROL2_MOM AVG4_ROL2_MOM AVG5_ROL2_MOM AVG1_ROL3_MOM AVG2_ROL3_MOM \
0 NaN NaN NaN NaN NaN
1 -1.45 -2.37 -1.86 NaN NaN
2 NaN NaN NaN -5.29 -0.9
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
AVG3_ROL3_MOM AVG4_ROL3_MOM AVG5_ROL3_MOM
0 NaN NaN NaN
1 NaN NaN NaN
2 -3.9 1.25 3.91
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN

Calculate maximum difference of rolling interval of n columns

I have a dataset
df
Time Spot Ubalance
0 2017-01-01T00:00:00+01:00 20.96 NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40
2 2017-01-01T02:00:00+01:00 18.13 24.73
3 2017-01-01T03:00:00+01:00 16.03 24.73
4 2017-01-01T04:00:00+01:00 16.43 27.89
5 2017-01-01T05:00:00+01:00 13.75 28.26
6 2017-01-01T06:00:00+01:00 11.10 30.43
7 2017-01-01T07:00:00+01:00 15.47 32.85
8 2017-01-01T08:00:00+01:00 16.88 33.91
9 2017-01-01T09:00:00+01:00 21.81 28.58
10 2017-01-01T10:00:00+01:00 26.24 28.58
I want to generate a series/dataframe in which I calculate the maximum difference between the highest and lowest value of the last n rows within multiple columns, i.e., the maximum difference of these "last" 10 rows would be
33.91 (highest is here in "ubalance") - 11.10 (lowest is in "Spot") = 22.81
I've tried .rolling() but it apparently does not contain a difference attribute.
Expected outcome:
Time Spot Ubalance Diff
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
Use Rolling.aggregate and then subtract:
df1 = df['Spot'].rolling(10).agg(['min','max'])
print (df1)
min max
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 11.1 21.81
10 11.1 26.24
df['dif'] = df1['max'].sub(df1['min'])
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 10.71
10 2017-01-01T10:00:00+01:00 26.24 28.58 15.14
Or custom function with lambda:
df['diff'] = df['Spot'].rolling(10).agg(lambda x: x.max() - x.min())
EDIT:
For processing all columns from list use:
cols = ['Spot','Ubalance']
N = 10
df['dif'] = (df[cols].stack(dropna=False)
.rolling(len(cols) * N)
.agg(lambda x: x.max() - x.min())
.groupby(level=0)
.max())
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
you could a rolling window like this:
n = 10
df.rolling(3).apply(func=lambda x: x.max() - x.min())
you can specify in the lambda function the column you want to do the rolling window

Pandas : How to concatenate or merge the groups using groupby function and populate single table or dataframe?

df = name description curve tenor rates
IND 3M ZAR_3M 0.25 6.808000088
IND 2Y ZAR_3M 2 6.483012199
IND 3Y ZAR_3M 3 6.565002918
IND 4Y ZAR_3M 4 6.694129944
IND 5Y ZAR_3M 5 6.83951807
IND 3M CAD_OIS 0.25 1.738620043
BHU 6M CAD_OIS 0.5 1.718042016
IND 9M CAD_OIS 0.75 1.697247028
IND 1Y CAD_OIS 1 1.67719996
IND 18M CAD_OIS 1.5 1.631257057
IND 2Y CAD_3M 2 1.906309009
IND 3y CAD_3M 3 1.855569959
IND 4Y CAD_3M 4 1.830132961
BHU 5Y CAD_3M 5 1.817605019
BHU 6y CAD_3M 6 1.814880013
IND 7Y CAD_3M 7 1.821526051
BHU TND CZK_Curve 0.01 0.02
BHU 1WK CZK_Curve 0.03 0.0203
BHU 1M CZK_Curve 0.09 0.021
BHU 2M CZK_Curve 0.18 0.0212
BHU 3M CZK_Curve 0.26 0.0214
BHU 6M CZK_Curve 0.51 0.0212
BHU 9M CZK_Curve 0.76 0.02045
BHU 12M CZK_Curve 1.01 0.01985
BHU 2Y CZK_Curve 2.01 0.020033333
BHU 3Y CZK_Curve 3.02 0.018816667
BHU 4Y CZK_Curve 4.02 0.017666667
BHU 5Y CZK_Curve 5.02 0.016616667
BHU 6Y CZK_Curve 6.02 0.015766667
BHU 7Y CZK_Curve 7.02 0.015216667
BHU 8Y CZK_Curve 8.02 0.014616667
BHU 9Y CZK_Curve 9.02 0.014358333
Above is my dataframe(df) having 5 variables. I would like to populate the table based on 'curve' and rename the rates as curve name. Following is my expected output. I tried using groupby function to generate groups and concatenate side by side based on 'tenor'. But my code seems incomplete. Please suggest to how to produce the below output.
df_tenor = df_tenor[['Tenor']].drop_duplicates()
df_tenor = df_tenor.sort_values(by=['tenor'])
gb = df.groupby('curve')
df.rename(columns={'rates': str([df.curve.unique() for g in gb])}, inplace=True)
df_final= pd.concat([g[1].merge(df_tenor, how='outer', on='Tenor') for g in gb], axis=1)
df_final.to_csv('testconcat.csv', index = False)
Use ``pandas.pivot_table()```
pd.pivot_table(df, index='tenor', values='rates', columns='curve')
Output
curve CAD_3M CAD_OIS CZK_Curve ZAR_3M
tenor
0.01 NaN NaN 0.020000 NaN
0.03 NaN NaN 0.020300 NaN
0.09 NaN NaN 0.021000 NaN
0.18 NaN NaN 0.021200 NaN
0.25 NaN 1.738620 NaN 6.808000
0.26 NaN NaN 0.021400 NaN
0.50 NaN 1.718042 NaN NaN
0.51 NaN NaN 0.021200 NaN
0.75 NaN 1.697247 NaN NaN
0.76 NaN NaN 0.020450 NaN
1.00 NaN 1.677200 NaN NaN
1.01 NaN NaN 0.019850 NaN
1.50 NaN 1.631257 NaN NaN
2.00 1.906309 NaN NaN 6.483012
2.01 NaN NaN 0.020033 NaN
3.00 1.855570 NaN NaN 6.565003
3.02 NaN NaN 0.018817 NaN
4.00 1.830133 NaN NaN 6.694130
4.02 NaN NaN 0.017667 NaN
5.00 1.817605 NaN NaN 6.839518
5.02 NaN NaN 0.016617 NaN
6.00 1.814880 NaN NaN NaN
6.02 NaN NaN 0.015767 NaN
7.00 1.821526 NaN NaN NaN
7.02 NaN NaN 0.015217 NaN
8.02 NaN NaN 0.014617 NaN
9.02 NaN NaN 0.014358 NaN

Python Dataframe How to groupby weeks over years

I have a dataset like below :
date =
2012-01-01 NaN NaN NaN
2012-01-02 NaN NaN NaN
2012-01-03 NaN NaN NaN
2012-01-04 0.880 2.981 -0.0179
2012-01-05 0.857 2.958 -0.0261
2012-01-06 0.858 2.959 0.0012
2012-01-07 NaN NaN NaN
2012-01-08 NaN NaN NaN
2012-01-09 0.880 2.981 0.0256
2012-01-10 0.905 3.006 0.0284
2012-01-11 0.905 3.006 0.0000
2012-01-12 0.902 3.003 -0.0033
2012-01-13 0.880 2.981 -0.0244
2012-01-14 NaN NaN NaN
2012-01-15 NaN NaN NaN
2012-01-16 0.858 2.959 -0.0250
2012-01-17 0.891 2.992 0.0385
2012-01-18 0.878 2.979 -0.0146
2012-01-19 0.887 2.988 0.0103
2012-01-20 0.899 3.000 0.0135
2012-01-21 NaN NaN NaN
2012-01-22 NaN NaN NaN
2012-01-23 NaN NaN NaN
2012-01-24 NaN NaN NaN
2012-01-25 NaN NaN NaN
2012-01-26 NaN NaN NaN
2012-01-27 NaN NaN NaN
2012-01-28 NaN NaN NaN
2012-01-29 NaN NaN NaN
2012-01-30 0.892 2.993 -0.0078
... ... ... ...
2016-12-02 1.116 3.417 -0.0124
2016-12-03 NaN NaN NaN
2016-12-04 NaN NaN NaN
2016-12-05 1.111 3.412 -0.0045
2016-12-06 1.111 3.412 0.0000
2016-12-07 1.120 3.421 0.0081
2016-12-08 1.113 3.414 -0.0063
2016-12-09 1.109 3.410 -0.0036
2016-12-10 NaN NaN NaN
2016-12-11 NaN NaN NaN
2016-12-12 1.072 3.373 -0.0334
2016-12-13 1.075 3.376 0.0028
2016-12-14 1.069 3.370 -0.0056
2016-12-15 1.069 3.370 0.0000
2016-12-16 1.073 3.374 0.0037
2016-12-17 NaN NaN NaN
2016-12-18 NaN NaN NaN
2016-12-19 1.071 3.372 -0.0019
2016-12-20 1.067 3.368 -0.0037
2016-12-21 1.076 3.377 0.0084
2016-12-22 1.076 3.377 0.0000
2016-12-23 1.066 3.367 -0.0093
2016-12-24 NaN NaN NaN
2016-12-25 NaN NaN NaN
2016-12-26 1.041 3.372 0.0047
2016-12-27 1.042 3.373 0.0010
2016-12-28 1.038 3.369 -0.0038
2016-12-29 1.035 3.366 -0.0029
2016-12-30 1.038 3.369 0.0029
2016-12-31 1.038 3.369 0.0000
when I do :
in_range_df = Days_Count_Sum["2012-01-01":"2016-12-31"]
print("In range: ",in_range_df)
Week_Count = in_range_df.groupby(in_range_df.index.week)
print("in_range_df.index.week: ",in_range_df.index.week)
print("Group by Week: ",Week_Count.sum())
I found the result always get list of 1 to 53 (weeks)
when print out :in_range_df.index.week: [52 1 1 ..., 52 52 52]
I realized the index value is always "52" after the first year of this range 2012.
How to group by weeks from the range of more than one year?

Categories

Resources