Python For Loop, Use Index To Conditionally Compare Between Values In a List - python

I am using python 3.7 in a Jupyter Notebook on Windows.
Gsheet Below is a sample of what the final data set should look like: https://docs.google.com/spreadsheets/d/1y8hlJYjk8-sY-sIMeJldZ2DFMFLpy2MWFVdYTeAbV9s/edit#gid=0
My Issue:
Within my for loop, I would like the adjustment of the payment and total investment to be dynamic to the repayment terms defined by the variables at the start of the script. Currently, it's fixed with each elif block compared to my list. Can this be created so it's dynamic to the list of repayment terms? Let's say, I was making $8,000 repayments every 5 years. How can I structure the for loop to behave similarly but not have the need for each elif block to be added in manually?
Ideally, this will NOT be hard coded like it is now. While the script works and produces the desired table. It will not flex if different inputs are established for the repayment terms.
Code I Have So Far:
import pandas as pd
import numpy as np
## Link To Sample Desired OutPut On Below GSHEET, Rows Are Highlighted Where Changes Occur ##
## https://docs.google.com/spreadsheets/d/1y8hlJYjk8-sY-sIMeJldZ2DFMFLpy2MWFVdYTeAbV9s/edit#gid=0
## How Many Years is The Loan Over ##
years = 12
## How Many "Periods" aka "Months" is the Load Over ##
total_periods = years * 12
## Total Initial Investment ##
initial_investment = 40000
## Monthly Interest To Be Paid To Investor ##
payment_ratio = 100
## Investors Initial Investment Is Repaid In Installments Across The Years Of The Loan. In This Case, We Repay 25% Every 3 Years ##
graduated_repayment = .25
repayment_years = 3
## Repayment_Periods represents the "Period" or "Month" that we will need to pay our installment to the investor ##
## In this Example, since it's every 3 years, that means every 36 months ##
repayment_periods = repayment_years * years
## With Each Repayment Every 36 Months, Our Monthly Interest Payment Decreases, in this case, by 50% ##
payment_reduction = .5
## Generating A List Of Integers That Represent The Periods In Which We Make a Repayment and Calculations Will Change ##
payback_periods = [i for i in range(0,total_periods + 1, repayment_periods) if(total_periods%repayment_periods==0) & (i!=0)]
payback_periods.insert(0, 0)
print(payback_periods)
cols = ['Period', 'Payment', 'Total_Investment', 'Annualized_Return']
n_periods = np.arange(years * 12) + 1
df_initialize = pd.DataFrame({"Period": n_periods})
df = pd.DataFrame(df_initialize, columns = cols)
## This is Still Fixed. A new Elif block ould need to be made or removed depending on the repayment terms ##
for i in range(len(df)):
cur_payment = df.loc[i, 'Payment']
cur_invested = df.loc[i, 'Total_Investment']
cur_period = df.loc[i, 'Period']
pay_reduction = payment_ratio * payment_reduction
if payback_periods[0] <= i <= payback_periods[1]:
df.loc[i, 'Payment'] = initial_investment / payment_ratio
df.loc[i, 'Total_Investment'] = initial_investment
df.loc[i, 'Annualized_Return'] = (cur_payment * 12) / cur_invested
elif payback_periods[1] <= i <= payback_periods[2]:
df.loc[i, 'Payment'] = df.loc[payback_periods[1], 'Payment'] - pay_reduction
df.loc[i, 'Total_Investment'] = df.loc[payback_periods[1], 'Total_Investment'] - (initial_investment * graduated_repayment)
df.loc[i, 'Annualized_Return'] = (cur_payment * 12) / cur_invested
elif payback_periods[2] <= i <= payback_periods[3]:
df.loc[i, 'Payment'] = df.loc[payback_periods[2], 'Payment'] - pay_reduction
df.loc[i, 'Total_Investment'] = df.loc[payback_periods[2], 'Total_Investment'] - (initial_investment * graduated_repayment)
df.loc[i, 'Annualized_Return'] = (cur_payment * 12) / cur_invested
elif i > payback_periods[3]:
df.loc[i, 'Payment'] = df.loc[payback_periods[3], 'Payment'] - pay_reduction
df.loc[i, 'Total_Investment'] = df.loc[payback_periods[3], 'Total_Investment'] - (initial_investment * graduated_repayment)
df.loc[i, 'Annualized_Return'] = (cur_payment * 12) / cur_invested
else:
pass
df['Running_Earnings'] = df['Payment'].cumsum()

Related

Whats the best way generate a table of the top N drawdowns in a return index with python?

enter image description here
I have a time series with stock market returns, dates, and an index that starts at 100 based on the returns. Example data below.
enter image description here
I am trying to get a table output that shows the top N drawdowns in a time series, the date range for each drawdown, and the duration. I have a working script for this, but it's very messy and the accuracy is off at times.
df['drawdown'] = df['index_100'] / df['index_100'].expanding().max() - 1
throw_away_df = df[df['drawdown'] == 0].copy()
throw_away_df = throw_away_df[['Date']]
throw_away_df['duration'] = throw_away_df['Date'].diff().dt.days
throw_away_df['start_date'] = throw_away_df['Date'].shift(1)
throw_away_df = throw_away_df[throw_away_df['duration'] > 35]
throw_away_df['rank'] = throw_away_df['duration'].rank(ascending=False)
throw_away_df['drawdown'] = 0
throw_away_df = throw_away_df.reset_index(drop=True)
for i in throw_away_df.index:
x = max(abs(df['drawdown'][(df['Date'] > throw_away_df['start_date'].loc[i]) & (df['Date'] < throw_away_df['Date'].loc[i])]))
throw_away_df['drawdown'].loc[i] = -x
throw_away_df = throw_away_df[throw_away_df['rank'] <= N]
throw_away_df = throw_away_df.sort_values(by=['rank'])
throw_away_df = throw_away_df[['start_date', 'Date', 'duration', 'drawdown']]
throw_away_df.columns = ['start_date', 'end_date', 'duration', 'drawdown']
Here's the output
enter image description here

DataFrame parallel vector calculations (Python/Pandas)

I'm working on a trading backtester, I give it an initial capital, an history price dataframe, and randomly generated signals for entries and exits stored into the dataframe.
| price | signal | entry | exit | shares | profit | capital
0 22 0 nan nan 0 0 1000
1 24 1 24 nan 41,66 0 1000
2 22 1 nan nan 0 0 1000
3 22 0 nan 22 0 -83,33 916,67
4 24 1 24 nan 41,66 0 916,67
When ['signal'] turns from 0 to 1 ['entry'] stores the entry price and ['shares'] stores: ['capital']/['entry price']
When ['signal'] turns from 1 to 0 ['exit'] stores the exit price, ['profit'] stores: (['exit']-['entry']) * ['shares']) and ['capital'] stores the cumulative profit + the previous value ['capital'] + ['profit'].cumsum()
So in this case:
entry = 24,
shares = 24/1000 = 41..,
exit = 22
profit = (22-24)*41 = -83
capital = 1000+(-83)
All of this operations are made by pandas vectorization because according to this article (https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06) vectorization is much faster than for loops (especially with huge dataframe) so I absolutely need to avoid them.
Now, the calculations are written sequantially by the pandas loc vectorization method, but THE PROBLEM is:
the shares calculation comes before the capital calculation, so the shares calculation takes in consideration only the "initial" capital values, without considering the updated capital value, in fact if you LOOK AT THE DATAFRAME at the last row the shares are 1000/24=41,66, when instead should be 916/24=38.
I can fix it without loops?
PART OF THE CODE:
# ENTRY/EXIT CONDITIONS
buy = ((data['signal'].shift(1) != 1) & (data['signal'] == 1))
buyclose = ((data['signal'].shift(1) == 1) & (data['signal'] != 1))
# ENTRY PRICE
data['entry'].loc[buy] = data['price'].shift(-1)
# SHARES
data['shares'].loc[buy] = (data['capital'].loc[buy].values / data['entry'].loc[buy].values)
# EXIT PRICE
data['exit'].loc[buyclose] = data['price'].shift(-1)
# PROFIT
data['profit'].loc[buyclose] = (data['exit'].loc[buyclose].values - data['entry'].loc[buy].values) * data['shares'].loc[buy].values
# CAPITAL
data['capital'] = data['capital'].values + data['profit'].cumsum().values
EDIT FULL CODE:
from tvDatafeed import TvDatafeed, Interval
import numpy as np
import pandas as pd
capital = 1000
leverage = 1
spread = 0.03
### DATA ###
data = TvDatafeed().get_hist(symbol='OIL_CRUDE',exchange='CAPITALCOM',interval=Interval.in_30_minute,n_bars=5000)
data = data.drop(['symbol','close','volume'], axis=1)
data = data.rename_axis('date').reset_index()
data['date'] = pd.to_datetime(data['date'], unit='D')#.dt.date
data = data.rename_axis('index').reset_index()
data['position'] = np.random.randint(3, size=len(data)) - 1
data.loc[0, 'position'] = 0 ### PROVVISORIO
new_entry = pd.DataFrame()
data['entry'] = np.zeros((len(data)), int)
data['entry date'] = np.zeros((len(data)), int)
data['exit'] = np.zeros((len(data)), int)
data['exit date'] = np.zeros((len(data)), int)
data['shares'] = np.zeros((len(data)), int)
data['profit'] = np.zeros((len(data)), int)
data['capital'] = np.full((len(data)), capital)
### FIX LAST BUY/SELL ###
index = 1
if data['position'].iloc[-1] != 0:
fix = data['position'] != data['position'].iloc[-1]
index = len(data.loc[fix[fix].index[-1]:])-1
### CONDITIONS ###
nlast = (data['index'] < data['index'].iloc[-index]) #EXCLUDE LAST ENTRY
buy = ((data['position'].shift(1) != 1) & (data['position'] == 1) & (nlast))
sell = ((data['position'].shift(1) != -1) & (data['position'] == -1) & (nlast))
buyclose = ((data['position'].shift(1) == 1) & (data['position'] != 1))
sellclose = ((data['position'].shift(1) == -1) & (data['position'] != -1))
##### ENTRIES #####
# ENTRY PRICE
data['entry'].loc[buy] = data['open'].shift(-1) + spread
data['entry'].loc[sell] = data['open'].shift(-1) - spread
##### EXITS #####
# EXIT PRICE
data['exit'].loc[buyclose] = data['open'].shift(-1)
data['exit'].loc[sellclose] = data['open'].shift(-1)
data.loc[0, 'exit'] = 0
##### CALCULATIONS #####
# SHARES
data['shares'].loc[buy] = (data['capital'].loc[buy].values / data['entry'].loc[buy].values) * leverage
data['shares'].loc[sell] = (data['capital'].loc[sell].values / data['entry'].loc[sell].values) * leverage
# PROFIT
data['profit'].loc[buyclose] = (data['exit'].loc[buyclose].values - data['entry'].loc[buy].values) * data['shares'].loc[buy].values
data['profit'].loc[sellclose] = (data['entry'].loc[sell].values - data['exit'].loc[sellclose].values) * data['shares'].loc[sell].values
# CAPITAL
data['capital'] = data['capital'].values + data['profit'].cumsum().values
print(data)

How to Loop Through Grouped Dataframe

I'm super new to python, but diving right in to try to figure out a real world problem using analysis tools like pandas.
I've imported the data from a csv already, but here's a small replication of the data:
df2 = pd.DataFrame({'SKU': [22335, 22335, 22335, 22335, 33442, 33442, 33442, 33442],
'Date': ['2019-12-31', '2020-01-07', '2020-01-14', '2020-01-21', '2019-12-31', '2020-01-07', '2020-01-14', '2020-01-21'],
'Urgent': [10,8,4,20,50,45,65,32],
'Delivered': [4,7,12,10, 35,75,23,42]})
There are two item SKU numbers, 22335 and 33442, a week starting date, weekly urgent requests for equipment and weekly equipment delivered quantities. At this point I have figured out how to calculate over the entire data set a for loop that for each row references the previous row's calculated value:
# Create new numeric column 'Result'
df['Result'] = np.nan
# Assign initial value for the first row of 'Result' (Should be first row in each SKU group)
df.loc[0, 'Result'] = df.loc[0, 'Delivered'] + df.loc[1, 'Delivered'] - df.loc[0, 'Urgent']
# Loop through each row except for last row to calculate
for i in range(1, len(df)-1):
df.loc[i,'Result'] = max(df.loc[i-1, 'Result'], 0) + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent']
print(df)
However, the next step for me is to perform the above only for each individual SKU (22335 and then 33442 separately). I have tried ranking each row by SKU by date using groupby, but I can't figure out how to reference this in my loop:
# Convert Date datatype
df['Date'] = pd.to_datetime(df['Date'])
# Use groupby to create ranking by SKU and Date
df['SKURank'] = df.groupby('SKU')['Date'].rank(ascending = True).astype('int64')
I've tried unsuccessfully to define a function that can reference it's own output for each iteration and then call that with an .apply style loop, but to be honest I'm totally lost on that.
I've also attempted to abide by the split, apply, combine principle and group my data by SKU, apply the loop, and then combine all rows back together, but again I really don't know where to start.
Here are my main questions:
What kind of loop should I use to perform the same task as the above code (return an initial value for the first row in the group and then loop through each subsequent row) for each individual SKU group?
If the recommended form of loop (regardless of performance, I'm not that high-speed yet) requires that I define a function beforehand, how would I create a function that references it's own output for each row subsequent to the first row for each SKU?
UPDATE:
oh god. dear god what have I created... it's... it's disgusting...
Yes, I created a giant for loop with nested if statements. And yes, it's horrendous. And no, it doesn't do everything I need it to like perform the loop on the last row of the dataframe. If any part of the below makes sense and you can point me in the direction of how to make this actually functional, I'd appreciate some advice.
import pandas as pd
import numpy as np
# Create dataframe for two SKUs, a weekly process date, urgent requested quantity, and delivered quantity
df = pd.DataFrame({'SKU': [22335, 22335, 22335, 22335, 33442, 33442, 33442, 33442],
'Date': ['2019-12-31', '2020-01-07', '2020-01-14', '2020-01-21', '2019-12-31', '2020-01-07', '2020-01-14', '2020-01-21'],
'Urgent': [10,8,4,20,50,45,65,32],
'Delivered': [4,7,12,10, 35,75,23,42]})
# Create new numeric column 'Result'
df['Result'] = np.nan
# Convert Date datatype and create 3 necessary columns
df['Date'] = pd.to_datetime(df['Date'])
df['Result'] = np.nan
df['WeeklyMiss'] = np.nan
df['Logic'] = ''
# Create list of unique SKUs in dataframe
skulst = df.SKU.unique()
print(skulst)
# Set initial indeces value
skunum = 0
i = 0
# While loop with nested for loop to iterate over the dataframe
while skunum <= len(skulst):
for i in range(0, len(df)-1):
# Calculate first SKU row
if i == 0 and df.loc[i, 'SKU'] == skulst[skunum]:
df.loc[i, 'Result'] = max(df.loc[i, 'Delivered'] + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'WeeklyMiss'] = min(df.loc[i, 'Delivered'] + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'Logic'] = 'First Row'
# Calculate next SKU rows
elif i > 0 and df.loc[i, 'SKU'] == skulst[skunum] and df.loc[i+1,'SKU'] == skulst[skunum]:
df.loc[i, 'Result'] = max(df.loc[i+1, 'Delivered'] + min(df.loc[i-1, 'Result'], df.loc[i, 'Delivered']) - df.loc[i, 'Urgent'], 0)
df.loc[i, 'WeeklyMiss'] = min(df.loc[i-1, 'Result'] + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'Logic'] = 'Next SKU Row'
# Calculate last SKU row
elif i > 0 and df.loc[i, 'SKU'] == skulst[skunum] and (df.loc[i+1,'SKU'] != skulst[skunum] or i == len(df)):
df.loc[i, 'Result'] = max(df.loc[i-1, 'Result'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'WeeklyMiss'] = min(df.loc[i-1, 'Result'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'Logic'] = 'Last SKU Row'
# Calculate first SKU row and switch to next SKU
elif i > 0 and i < len(df) and df.loc[i, 'SKU'] != skulst[skunum] and df.loc[i-1,'SKU'] == skulst[skunum] :
df.loc[i, 'Result'] = max(df.loc[i, 'Delivered'] + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'WeeklyMiss'] = min(df.loc[i, 'Delivered'] + df.loc[i+1, 'Delivered'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'Logic'] = 'First SKU Row'
if skunum + 1 <= len(skulst):
skunum += 1
else:
df.loc[i, 'Result'] = max(df.loc[i-1, 'Result'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'WeeklyMiss'] = min(df.loc[i-1, 'Result'] - df.loc[i, 'Urgent'], 0)
df.loc[i, 'Logic'] = 'Last SKU Row'
continue
else:
print(df)
break
See the Group By: split-apply-combine guide from pandas documentation to see how you can iterate over groups.

Replace for loop with dataframe.apply()

Objective is to replace a for loop with dataframe.apply().
Below is the code for the for loop:
#ma is moving average of a number of days say 100
#days is the number of days for which stock data is available
for d in range(ma-1, days):
# Buy if stock price > Moving average & if not bought yet
if df['Close'] > df['ma'] and cash == 1:
buyPrice = closingprices[d + 1] #buy next day
buy_data.append(buyPrice)
cash = 0
stock = 1
if df['Close'] < df['ma'] and stock == 1:
sellPrice = closingprices[d + 1]
sell_data.append(sellPrice)
cash = 1
stock = 0
I'm unable to get a correct solution.
Question: How do I take care of setting up the toggle (cash indicator) and referencing the next row element?
df is the complete dataset, and buy_data is the result which I want
buy_data = df.apply(lambda x : (x ['Close'+ 1]) if (x ['Close'] > x ['ma']
and cash ==1) else 0)
Key Errors etc.

Numpy repeat to odd length index

I have a dataframe:
proj_length = 6
nb_date = pd.Period("2017-10-13")
rb_date = pd.Period("2017-11-13")
rev_length = proj_length * 30
end_date = rb_date + (rev_length * 2)
df_index = pd.PeriodIndex(start=nb_date, end=end_date)
df = pd.DataFrame(data=[[0,0]] * len(df_index),
index=df_index,
columns=["in", "out"])
len(df) == 392
And i'm trying to groupby 30 days at a time, so my initial thought was to just create some new column:
groups = (end_date - nb_date) // 30
gb_key = np.repeat(np.arange(groups), 30)
len(gb_key) == 390
This is good so far, but I cannot figure out a pythonic way to get the overflow (392-390) to be set to 13?
Non-numpy/pandas way:
arr = np.zeros(len(df))
for i, idx in enumerate(range(0, len(df), 30)):
arr[idx:] = i

Categories

Resources