Calculating Accrued Interest with Pandas - python

I'm trying to calculate the amount of interest that would've accrued during a period of time. I have the starting DataFrame as below
MONTH_BEG_D NO_OF_DAYS RATE
1/10/2017 31 5.22
1/11/2017 30 5.22
1/12/2017 31 5.22
1/1/2018 31 3.5
1/2/2018 28 3.5
1/3/2018 31 3.5
If the starting value is 20, I would like the outcome to be:
FORMULA: INTEREST = (PRINCIPAL_A * RATE * NO_OF_DAYS) / 36500
PRINCIPAL_A MONTH_BEG_D RATE NO_OF_DAYS INTEREST NEW_BALANCE
20 1/10/2017 5.22 31 0.08866849 20.08866849
20.08866849 1/11/2017 5.22 30 0.08618864 20.17485713
20.17485713 1/12/2017 5.22 31 0.08944371 20.26430084
20.26430084 1/1/2018 3.5 31 0.06023772 20.32453856
20.32453856 1/2/2018 3.5 28 0.05456999 20.37910855
Just to explain, the 36500 is from 365 days of the year for NO_OF_DAYS and the 0.01 multiplier for RATE. I can easily add/modify columns for these 2 variables so this is no problem. My problem lies in how I can carry the NEW_BALANCE over as the next month's PRINCIPAL_A
This is basically a cumprod between each column with a cumsum between each row. Is there an easier way of doing this while avoiding loops?

There you go, not the cleanest solution but it does what you require!
#ENSURE TO IMPORT NUMPY LIBRARY.
import numpy as np
#INCREASE PRECISION OUTPUT.
pd.options.display.precision = 7
#INSERT PRINCIPAL_A COLUMN WITH NULL VALUE, LATER TO SET THE INITIAL VALUE.
df.insert(0, 'PRINCIPAL_A', np.nan)
#SET INITIAL VALUE AS 20 IN FIRST ROW ONLY.
df.iloc[0:1, 0] = 20
#LOOP OVER DATAFRAME FOR ROLL-OVER.
for row in range(1, len(df)):
df['INTEREST'] = (df['PRINCIPAL_A'] * df['RATE'] * df['NO_OF_DAYS']) / 36500
df['NEW_BALANCE'] = df['PRINCIPAL_A'] + df['INTEREST']
df.iloc[row : row+1]['PRINCIPAL_A'] = df['NEW_BALANCE'].shift(1) #ROLL-OVER NEW BALANCE USING PANDAS SHIFT.
There is the output (slightly different than yours due to rounding) :
PRINCIPAL_A MONTH_BEG_D NO_OF_DAYS RATE INTEREST NEW_BALANCE
0 20.0000000 01/10/2017 31 5.22 0.0886685 20.0886685
1 20.0886685 01/11/2017 30 5.22 0.0861886 20.1748571
2 20.1748571 01/12/2017 31 5.22 0.0894437 20.2643008
3 20.2643008 01/01/2018 31 3.50 0.0602377 20.3245386
4 20.3245386 01/02/2018 28 3.50 0.0545700 20.3791086
5 20.3791086 01/03/2018 31 3.50 NaN NaN

Related

Determining average values over irregular number of rows in a csv file

I have a csv file with days of the year in one column and temperature in another. The days are split into sections and I want to find the average temperature over each day.Eg day 0,1,2,3 etc
The measurements of temperatures has been taken irregularly meaning there are different numbers of measurements at certain times for each day.
Typically I would use df.groupby(np.arange(len(df)) // n).mean() but n, the number of rows will be varying in this case.
I have an example of what the data is like.
Days
Temp
0.75
19
0.8
18
1.2
18
1.25
18
1.75
19
3.05
18
3.55
21
3.60
21
3.9
18
4.5
20
You could convert Days to an integer and use that to group.
>>> df.groupby(df["Days"].astype(int)).mean()
Days Temp
Days
0 0.775 18.500000
1 1.400 18.333333
3 3.525 19.500000
4 4.500 20.000000

Slicing pandas dataframe by ordered values into clusters

I have a pandas dataframe like there is longer gaps in time and I want to slice them into smaller dataframes where time "clusters" are together
Time Value
0 56610.41341 8.55
1 56587.56394 5.27
2 56590.62965 6.81
3 56598.63790 5.47
4 56606.52203 6.71
5 56980.44206 4.75
6 56592.53327 6.53
7 57335.52837 0.74
8 56942.59094 6.96
9 56921.63669 9.16
10 56599.52053 6.14
11 56605.50235 5.20
12 57343.63828 3.12
13 57337.51641 3.17
14 56593.60374 5.69
15 56882.61571 9.50
I tried sorting this and taking time difference of two consecutive points with
df = df.sort_values("Time")
df['t_dif'] = df['Time'] - df['Time'].shift(-1)
And it gives
Time Value t_dif
1 56587.56394 5.27 -3.06571
2 56590.62965 6.81 -1.90362
6 56592.53327 6.53 -1.07047
14 56593.60374 5.69 -5.03416
3 56598.63790 5.47 -0.88263
10 56599.52053 6.14 -5.98182
11 56605.50235 5.20 -1.01968
4 56606.52203 6.71 -3.89138
0 56610.41341 8.55 -272.20230
15 56882.61571 9.50 -39.02098
9 56921.63669 9.16 -20.95425
8 56942.59094 6.96 -37.85112
5 56980.44206 4.75 -355.08631
7 57335.52837 0.74 -1.98804
13 57337.51641 3.17 -6.12187
12 57343.63828 3.12 NaN
Lets say I want to slice this dataframe to smaller dataframes where time difference between two consecutive points is smaller than 40 how would I go by doing this?
I could loop the rows but this is frowned upon so is there a smarter solution?
Edit: Here is a example:
df1:
Time Value t_dif
1 56587.56394 5.27 -3.06571
2 56590.62965 6.81 -1.90362
6 56592.53327 6.53 -1.07047
14 56593.60374 5.69 -5.03416
3 56598.63790 5.47 -0.88263
10 56599.52053 6.14 -5.98182
11 56605.50235 5.20 -1.01968
4 56606.52203 6.71 -3.89138
df2:
0 56610.41341 8.55 -272.20230
df3:
15 56882.61571 9.50 -39.02098
9 56921.63669 9.16 -20.95425
8 56942.59094 6.96 -37.85112
...
etc.
I think you can just
df1 = df[df['t_dif']<30]
df2 = df[df['t_dif']>=30]
def split_dataframe(df, value):
df = df.sort_values("Time")
df = df.reset_index()
df['t_dif'] = (df['Time'] - df['Time'].shift(-1)).abs()
indxs = df.index[df['t_dif'] > value].tolist()
indxs.append(-1)
indxs.append(len(df))
indxs.sort()
frames = []
for i in range(1, len(indxs)):
val = df.iloc[indxs[i] + 1: indxs[i]]
frames.append(val)
return frames
Returns the correct dataframes as a list

Filtering pandas dataframe for a steady speed condition

Below is a sample dataframe which is similar to mine except the one I am working on has 200,000 data points.
import pandas as pd
import numpy as np
df=pd.DataFrame([
[10.07,5], [10.24,5], [12.85,5], [11.85,5],
[11.10,5], [14.56,5], [14.43,5], [14.85,5],
[14.95,5], [10.41,5], [15.20,5], [15.47,5],
[15.40,5], [15.31,5], [15.43,5], [15.65,5]
], columns=['speed','delta_t'])
df
speed delta_t
0 10.07 5
1 10.24 5
2 12.85 5
3 11.85 5
4 11.10 5
5 14.56 5
6 14.43 5
7 14.85 5
8 14.95 5
9 10.41 5
10 15.20 5
11 15.47 5
12 15.40 5
13 15.31 5
14 15.43 5
15 15.65 5
std_dev = df.iloc[0:3,0].std() # this will give 1.55
print(std_dev)
I have 2 columns, 'Speed' and 'Delta_T'. Delta_T is the difference in time between subsequent rows in my actual data (it has date and time). The operating speed keeps varying and what I want to achieve is to filter out all data points where the speed is nearly steady, say by filtering for a standard deviations of < 0.5 and Delta_T >=15 min. For example, if we start with the first speed, the code should be able to keep jumping to the next speeds, keep calculating the standard deviation and if it less than 0.5 and it delta_T sums up to 30 min and more I should be copy that data into a new dataframe.
So for this dataframe I will be left with index 5 to 8 and 10 to15.
Is this possible? Could you please give me some suggestion on how to do it? Sorry I am stuck. It seems to complicated to me.
Thank you.
Best Regards Arun
Let use rolling,shift and std:
Calculate the rolling std for a window of 3, the find those stds less than 0.5 and use shift(-2) to get the values at the start of the window where std was less than 0.5. Using boolean indexing with |(or) we can get the entire steady state range.
df_std = df['speed'].rolling(3).std()
df_ss = df[(df_std < 0.5) | (df_std < 0.5).shift(-2)]
df_ss
Output:
speed delta_t
5 14.56 5
6 14.43 5
7 14.85 5
8 14.95 5
10 15.20 5
11 15.47 5
12 15.40 5
13 15.31 5
14 15.43 5
15 15.65 5

Format pivot table with even columns as column names

I'm trying to parse scraped data from finviz quote profile pages. I think I can use panda's pivot table mechanisms to get the output I need, but have never used pivot tables so I'm unsure if or how to format the output.
The table I receive is below. I would like each even column number to be the column header of the output table. Ending with a dataframe with one row, and 72 columns, as there are 72 values. Unless anyone can recommend a better output strcuture and how to access the values?
0 1 2 3 4 5 6 7 8 9 10 11
0 Index S&P 500 P/E 23.06 EPS (ttm) 3.15 Insider Own 0.20% Shs Outstand 1.01B Perf Week 3.94%
1 Market Cap 73.24B Forward P/E 21.29 EPS next Y 3.41 Insider Trans -34.81% Shs Float 996.66M Perf Month 4.85%
2 Income 3.21B PEG 2.31 EPS next Q 0.81 Inst Own 89.30% Short Float 2.05% Perf Quarter 4.47%
3 Sales 13.15B P/S 5.57 EPS this Y 9.70% Inst Trans 0.42% Short Ratio 4.21 Perf Half Y 25.19%
4 Book/sh 10.26 P/B 7.08 EPS next Y 7.88% ROA 20.10% Target Price 74.34 Perf Year 28.65%
5 Cash/sh 3.11 P/C 23.35 EPS next 5Y 10.00% ROE 32.10% 52W Range 45.51 - 72.28 Perf YTD 36.02%
6 Dividend 2 P/FCF 31.29 EPS past 5Y 1.50% ROI 21.60% 52W High 0.44% Beta 1.26
7 Dividend % 2.75% Quick Ratio 2.5 Sales past 5Y -1.40% Gross Margin 60.70% 52W Low 59.54% ATR 1.34
8 Employees 29977 Current Ratio 3.3 Sales Q/Q 7.20% Oper. Margin 35.20% RSI (14) 67.26 Volatility 1.65% 1.93%
9 Optionable Yes Debt/Eq 0.35 EPS Q/Q 23.80% Profit Margin 24.40% Rel Volume 1.13 Prev Close 72.08
10 Shortable Yes LT Debt/Eq 0.29 Earnings Oct 26 AMC Payout 47.70% Avg Volume 4.84M Price 72.6
11 Recom 2.5 SMA20 3.86% SMA50 4.97% SMA200 16.49% Volume 5476883 Change 0.72%
I know the formatting is hard to see, so
Try reshaping
d1 = pd.DataFrame(df.values.reshape(-1, 2), columns=['key', 'value'])
d1.set_index('key').T

weighted average based on a variable window in pandas

I would like to take a weighted average of "cycle" based on a "day" as window. The window is not always the same. How do I compute weighted average in pandas?
In [3]: data = {'cycle':[34.1, 41, 49.0, 53.9, 35.8, 49.3, 38.6, 51.2, 44.8],
'day':[6,6,6,13,13,20,20,20,20]}
In [4]: df = pd.DataFrame(data, index=np.arange(9), columns = ['cycle', 'day'])
In [5]: df
Out[5]:
cycle day
0 34.1 6
1 41.0 6
2 49.0 6
3 53.9 13
4 35.8 13
5 49.3 20
6 38.6 20
7 51.2 20
8 44.8 20
I would expect three values (if I have done this correctly):
34.1 * 1/3 + 41 * 1/3 + 49 * 1/3 = 41.36
cycle day
41.36 6
6.90 13
45.90 20
If I'm understanding correctly, I think you just want:
df.groupby(['day']).mean()
Group on day, and then apply a lambda function that calculates the sum of the group and divides it by then number of non-null values within the group.
>>> df.groupby('day').cycle.apply(lambda group: group.sum() / group.count())
day
6 41.366667
13 44.850000
20 45.975000
Name: cycle, dtype: float64
Although you say weighted average, I don't believe there are any weights involved. It appears as a simple average of the cycle value for a particular day. In fact, a simple mean should suffice.
Also, I believe the value for day 13 should be calculated as 53.9 * 1/2 + 35.8 * 1/2 which yields 44.85. Same approach for day 20.

Categories

Resources