I have a dataframe similar to the one below;
Price
return
indicator
5
0.05
1
6
0.20
-1
5
-0.16
1
Where the indicator is based upon the forecasted return on the following day.
what I would like to achieve is a strategy where when the indicator is positive 1, I buy the stock at the price on that date/row. Then if the indicator is negative we sell at that price. Then I would like to create a new column with represents the value of the portfolio on each day. Assuming I have $1000 to invest the value of the portfolio should equal the holdings and cash amount. Im assuming that any fraction of Stock can be purchased.
Im unsure where to start with this one. I tried calculating a the Buy/Hold strategy using;
df['Holding'] = df['return'].add(1).cumprod().*5000
this worked for a buy hold strategy but to modify it to the new strategy seems difficult.
I tried;
df['HOLDINg'] = (df['return'].add(1).cumprod()* 5000 * df['Indicator])
#to get the value of the buy or the sell
#then using
df['HOLDING'] = np.where(df['HOLDING'] >0, df['HOLDING'] , df['HON HOLDING 2']*-1)
#my logic was, if its positive its the value of the stock holding, and if its negative it is a cash inflow therefore I made it positive as it would be cash.
the issue is, my logic is massively flawed, as if the holding is cash the return shouldn't apply to it. further I don't think using the cumprod is correct with this strategy.
Has anyone used this strategy before and can offer tips of how to make it work?
thank you
I'm not sure about the returns and prices being in the correct place (they shouldn't really be in the same row if they represent the buying price (presumably yesterday's close), and the daily return (assuming the position was held for the whole day). But anyway...
import pandas as pd
# the data you provided
df = pd.read_csv("Data.csv", header=0)
# an initial starting row (explanation provided)
starting = pd.DataFrame({'Price': [0], 'return': [0], 'indicator': [0]})
# concatenate so starting is first row
df = pd.concat([starting, df]).reset_index(drop=True)
# setting holding to 0 at start (no shares), and cash at 1000 (therefore portfolio = 1000)
df[["Holding", "Cash", "Portfolio"]] = [0, 1000, 1000]
# buy/sell is the difference (explanation provided)
df["BuySell"] = df["indicator"].diff()
# simulating every day
for i in range(1, len(df)):
# buying
if df["BuySell"].iloc[i] > 0:
df["Holding"].iloc[i] += df["Cash"].iloc[i-1] / df["Price"].iloc[i]
df["Cash"].iloc[i] = 0
# selling
elif df["BuySell"].iloc[i] < 0:
df["Cash"].iloc[i] = df["Holding"].iloc[i-1] * df["Price"].iloc[i]
df["Holding"].iloc[i] = 0
# holding position
else:
df["Cash"].iloc[i] = df["Cash"].iloc[i-1]
df["Holding"].iloc[i] = df["Holding"].iloc[i-1]
# multiply holding by return (assuming all-in, so holding=0 not affected)
df["Holding"].iloc[i] *= (1 + df["return"].iloc[i])
df["Portfolio"].iloc[i] = df["Holding"].iloc[i] * df["Price"].iloc[i] + df["Cash"].iloc[i]
Explanations:
Starting row:
This is needed so that the loop can refer to the previous holdings and cash (it would be more of an inconvenience to add in an if statement in the loop if i=0).
Buy/Sell:
The difference is necessary here, as if the position changes from buy to sell, then obviously selling the shares (and vice versa). However, if the previous was buy/sell, the same as the current row, there would be no change (diff=0), with no shares bought or sold.
Portfolio:
This is an "equivalent" amount (the amount you would hold if you converted all shares to cash at the time).
Holding:
This is the number of shares held.
NOTE: from what I understood of your question, this is an all-in strategy - there is no percentage in, which has made this strategy more simplistic, but easier to code.
Output:
#Out:
# Price return indicator Holding Cash Portfolio BuySell
#0 0 0.00 0 0.00 1000 1000.0 NaN
#1 5 0.05 1 210.00 0 1050.0 1.0
#2 6 0.20 -1 0.00 1260 1260.0 -2.0
#3 5 -0.16 1 211.68 0 1058.4 2.0
Hopefully this will give you a good starting point to create something more to your specification and more advanced, such as with multiple shares, or being a certain percentage exposed, etc.
Related
I'm trying to come up with a formula to calculate the average entry/position price to further update my stop loss and take profit.
For example opened BTC buy position with amount of 1 when price was 20000.
Later when price dropped down to 19000 we made another buy using the same amount of 1, "avereging" the position to the middle, so end up with position at 19500 with amount of 2.
Where I'm struggling is what if we want to increase the order size on each price.
Say 1 at 20000, 1.5 at 19500, 2 at 19000 and so on.
Or made new buys of the same amount but shorter distance between.
Inital buy at 20000. then 19000 then 19150
Or combine these two variants.
I use mainly Python and Pandas. Maybe the latter one has some built-in function which I'm not aware of. I checked the official Pandas docs, but found only regular mean function.
Thanks to Yuri's suggestion to look into VWAP, I came up with the following code, which is more advanced and allows you to use different contract/volume sizes and increase/decrease "distance" between orders.
As an example here I used avarage price of BTC 20000 and increased steps distance using 1.1 multiplier as well as increased volume. Operated in Binance futures terms, where you can buy minimum 1 contract for 10$.
The idea is to find sweet spot for orders distance, volume, stop loss and take profit while avereging down.
# initial entry price
initial_price = 20000
# bottom price
bottom_price = 0
# enter on every 5% price drop
step = int(initial_price*0.05)
# 1.1 to increase distance between orders, 0.9 to decrease
step_multiplier = 1.1
# initial volume size in contracts
initial_volume = 1
# volume_multiplier, can't be less than 1, in case of use float, will be rounded to decimal number
volume_multiplier = 1.1
# defining empty arrays
prices = []
volumes = []
# checking if we are going to use simple approach with 1 contract volume and no sep or volume multiplier
if step_multiplier == 1 and volume_multiplier == 1:
prices = range(initial_price,bottom_price,-step)
else:
# defining current price and volume vars
curr_price = initial_price
curr_volume = initial_volume
# Checking if current price is still bigger then defined bottom price
while curr_price > bottom_price:
# adding current price to the list
prices.append(curr_price)
# calulating next order price
curr_price = curr_price-step*step_multiplier
# checking if volume multiplier is bigger then 1
if volume_multiplier > 1:
# adding current volume to the list
volumes.append(int(curr_volume))
# calulating next order volume
curr_volume = curr_volume*volume_multiplier
print("Prices:")
for price in prices:
print(price)
print("Volumes:")
for volume in volumes:
print(volume)
print("Prices array length", len(prices))
print("Volumes array length", len(volumes))
a = [item1 * item2 for item1, item2 in zip(prices, volumes)]
b = volumes
print("Average position price when price will reach",prices[-1], "is", sum(a)/sum(b))
Dataframe
(Disregard the two index columns)
level_0
index
Year
Month
Day
Open
High
Low
Close
Volume
Length
Polarity
Sentiment_Negative
Sentiment_Neutral
Sentiment_Positive
Target_variable
Predicted
0
0
0
2020
1
19
8941.45
9164.36
8620.08
8706.25
3.42173e+10
937.167
0.0884653
0
0
1
0
0
1
1
1
2020
1
18
8927.21
9012.2
8827.33
8942.81
3.23378e+10
1177.5
0.176394
0
0
1
1
1
2
2
2
2020
1
17
8725.21
8958.12
8677.32
8929.04
3.63721e+10
1580
0.216762
0
0
1
0
0
3
3
3
2020
1
16
8812.48
8846.46
8612.1
8723.79
3.1314e+10
1336.33
0.182707
0
0
1
0
0
Description
The value of the target_variable is 1 if todays closing price is greater than yesterdays closing price
The value of the target_variable is 0 if todays closing price is less than yesterdays closing price
The predicted value is the output of my classifier.
Problem
I need to run some code that tracks how much money is gained if I invest when the classifier tells me to invest
I have started to code this
credit = 10000
for index, row in df.iterrows():
if row["Predicted"] == 1:
#print(row["Percentage_diff"])
credit = credit - 100
credit = credit + (100 * row["Percentage_diff"])
print(credit)
The idea is that I start off with a balance of 10,000 and invest 100 every time the classifier signals to. The only problem is that when I lose 8000 credits. Is the code correct and the classifier is very poor?
Or have I made an error in the code?
I am not a trading expert, so I assume that every day the classifier tells you to trade, you will buy with the opening price and sell with the close price.
You can start by calculating the percentage of profit or loss when the classifier tells you to trade. You can do that by subtracting the closing price from the opening and dividing it by the opening price.
df["perc_diff"] = (df["Close"] - df["Open"])/df["open"]
Of course, this will be negative when the classifier is wrong. To compute the cumulative profits/losses, all you want to do is to iteratively add/subtract your profit/loss to your capital. This means at a day with a profit/loss percentage of r, if you invest x dollars, your new credit is (1+r)*x. So a simple for loop can do it like that:
credit = 1 # your capital
for label, row in df.iterrows():
credit = (1 + row["Predicted"] * r) * row["perc_diff"]
print(credit)
Edit to address your updated problem:
If you want to specify an amount to invest rather than all your capital, then you can use this:
credit = 1 # your capital
to_invest = 0.1 # money to invest
for label, row in df.iterrows():
# update invest
invest_update = (1 + row["Predicted"] * row["perc_diff"]) * to_invest
credit = credit - to_invest + invest_update
print(credit)
The last two lines can be combined into one line:
credit = credit + row["Predicted"] * row["perc_diff"] * to_invest
I think the code is correct, and if you lose, then it is probably due to poor performance from your classifier, but this should be evident from your evaluation of the model (like accuracy and precision metrics). Also, if it is a classifier that is not made for time series (e.g. logistic regression), then it is very reasonable that it performs poorly.
Solution
df["Percentage_diff"] = (df["Close"] - df["Open"])/df["Open"]
credit = 10000
for index, row in df.iterrows():
if row["Predicted"] == 1:
#print(row["Percentage_diff"])
credit = credit - 100
credit = credit + ((100 * row["Percentage_diff"]) + 100)
print(credit)
This was the solution thanks to Ahmed.
If I start with an original balance of 10000 every time the classifier signals to invest I invest 100 dollars at opening and withdraw at close this calculates the balance.
My goal is to write a function that returns a vector of portfolio returns for each period (i.e. day) from a pandas dataframe of security prices. For simplicity, let's assume that the initial weights are equally split between securities A and B. Prices are given by the following dataframe:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=20)
prices = pd.DataFrame({'A': np.linspace(20, 50, num=20),
'B': np.linspace(100, 200, num=20)},
index=dates)
Further, we assume that asset A is the asset where we initiate a short position and we go long asset B.
Calculating discrete returns from a "zero-investment position" like a short position (i.e. in asset A) in a first step and overall portfolio returns from the weighted returns of single assets that constitute the portfolio in a second step is not trivial, and before I put my so far attempt, which is not working correctly (key problem being the loss from the short position in asset A exceeding -100% on 2013-01-14), I am greatful for any kind of help - may it be theoretical or code.
You are forgetting asset “C”, as in collateral. No matter how generous your broker might be (not!), most exchanges and national regulatory organizations would require collateral. you might read about some wealthy hedge fund guy doing these trades with just long and short positions, but when it goes south you also read about the HF guy losing his art collection— which was the collateral.
Equity margin requirements in the USA would require 50% collateral to start, and at least 25% maintenance margin while the short trades were open. This is enforced by exchanges and regulatory authorities. Treasury bonds might have more favorable requirements, but even then the margin (collateral) is not zero.
Since a long/short doubles your risk (what happens if the long position goes down, and short position goes up?), your broker likely would require more margin than the minimum.
Add asset “C”, collateral, to your calculations and the portfolio returns become straight forward
Thank you for your answer #Stripedbass. Based on your comment, can the return process of a portfolio consisting of the two stocks be described by the following equations?:
The terms with + and - are the market values of the long and short position respectively such that the difference of them represents the net value of the two positions. If we assume that we want to be "market neutral" at the beginning of the trade t=0, the net value is zero.
For t > 0 these net positions represent the unrealised gains or losses of the long and short position that were opened and have not yet been closed. The term C denotes the money that we actually hold. It consists of the initial collateral and the cumulative gains and losses from the stock positions.
The overall return per period from trading the two securtities is then calculated as the simple return of the account V.
Based on this, you could define the following function and for short posititions choose option type='shares':
def weighted_return(type, df, weights):
capital = 100
#given the input dataframe contains return series
if type == "returns":
# create price indices
df.fillna(0, inplace=True)
df_price_index = pd.DataFrame(index=df.index, columns=df.columns)
df_price_index.iloc[0] = 100 + df.iloc[0]
for i in np.arange(1, len(df_price_index)):
for col in df_price_index.columns:
df_price_index[col].iloc[i] = df_price_index[col].iloc[i - 1] * (1 + df[col].iloc[i])
n = 0
ind_acc = []
for stock in df.columns:
ind_capital = capital * weights[n]
moves = (df_price_index[stock].diff()) * ind_capital / df_price_index[stock][0]
ind_acc.append(moves)
n += 1
pair_ind_accounts = pd.concat(ind_acc, axis=1)
portfolio_acc = pair_ind_accounts.sum(1).cumsum() + capital
df_temp_returns_combined = portfolio_acc.pct_change()
df_temp_returns_combined[0] = np.sum(weights * df.iloc[0].values)
df_temp_returns_combined = pd.DataFrame(df_temp_returns_combined)
df_temp_returns_combined.columns = ["combinedReturns"]
#given the input dataframe contains price series
if type == "prices":
n = 0
ind_acc = []
for stock in df.columns:
ind_capital = capital * weights[n]
moves = (df[stock].diff()) * ind_capital / df[stock][0]
ind_acc.append(moves)
n += 1
pair_ind_accounts = pd.concat(ind_acc, axis=1)
portfolio_acc = pair_ind_accounts.sum(1).cumsum() + capital
df_temp_returns_combined = portfolio_acc.pct_change()
df_temp_returns_combined[0] = np.NaN
df_temp_returns_combined = pd.DataFrame(df_temp_returns_combined)
df_temp_returns_combined.columns = ["combinedReturns"]
#given the input dataframe contains return series and the strategy is long/short
if type == "shares":
exposures = []
for stock in df.columns:
shares = 1/df[stock][0]
exposure = df[stock] * shares
exposures.append(exposure)
df_temp = pd.concat(exposures, axis=1)
index_long = np.where(np.array(weights) == 1)
index_short = np.where(np.array(weights) == -1)
df_temp_account = pd.DataFrame(df_temp.iloc[:,index_long[0]].values - df_temp.iloc[:,index_short[0]].values) + 1
df_temp_returns_combined = df_temp_account.pct_change()
df_temp_returns_combined.columns = ["combinedReturns"]
df_temp_returns_combined.index = df.index
return pd.DataFrame(df_temp_returns_combined)
I am trying to build an equity curve in Python using Pandas. For those not in the know, an equity curve is a cumulative tally of investing profits/losses day by day. The code below works but it is incredibly slow. I've tried to build an alternate using Pandas .iloc and such but nothing is working. I'm not sure if it is possible to do this outside of a loop given how I have to reference the prior row(s).
for today in range(len(f1)): #initiate a loop that runs the length of the "f1" dataframe
if today == 0: #if the index value is zero (aka first row in the dataframe) then...
f1.loc[today,'StartAUM'] = StartAUM #Set intial assets
f1.loc[today,'Shares'] = 0 #dummy placeholder for shares; no trading on day 1
f1.loc[today,'PnL'] = 0 #dummy placeholder for P&L; no trading day 1
f1.loc[today,'EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = today - 1 #used to reference the rows (see below)
f1.loc[today,'StartAUM'] = f1.loc[yesterday,'EndAUM'] #todays starting aseets are yesterday's ending assets
f1.loc[today,'Shares'] = f1.loc[yesterday,'EndAUM']//f1.loc[yesterday,'Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
f1.loc[today,'PnL'] = f1.loc[today,'Shares']*f1.loc[today,'Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
f1.loc[today,'EndAUM'] = f1.loc[today,'StartAUM']+f1.loc[today,'PnL'] #ending assets are starting assets + today's P&L
There is a good example here: http://www.pythonforfinance.net/category/basic-data-analysis/ and I know that there is an example in Wes McKinney's book Python for Data Analysis. You might be able to find it here: http://wesmckinney.com/blog/python-for-financial-data-analysis-with-pandas/
Have you tried using iterrows() to construct the for loop?
for index, row in f1.iterrows():
if today == 0:
row['StartAUM'] = StartAUM #Set intial assets
row['Shares'] = 0 #dummy placeholder for shares; no trading on day 1
row['PnL'] = 0 #dummy placeholder for P&L; no trading day 1
row['EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = row[today] - 1 #used to reference the rows (see below)
row['StartAUM'] = row['EndAUM'] #todays starting aseets are yesterday's ending assets
row['Shares'] = row['EndAUM']//['Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
row['PnL'] = row['Shares']*row['Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
row['EndAUM'] = row['StartAUM']+row['PnL'] #ending assets are starting assets + today's P&L
Probably the code is so slow as loc goes through f1 from beginning every time. iterrows() uses the same dataframe as it loops through it row by row.
See more details about iterrows() here.
You need to vectorize the operations (don't iterate with for but rather compute whole column at once)
# fill the initial values
f1['StartAUM'] = StartAUM # Set intial assets
f1['Shares'] = 0 # dummy placeholder for shares; no trading on day 1
f1['PnL'] = 0 # dummy placeholder for P&L; no trading day 1
f1['EndAUM'] = StartAUM # s
#do the computations (vectorized)
f1['StartAUM'].iloc[1:] = f1['EndAUM'].iloc[:-1]
f1['Shares'].iloc[1:] = f1['EndAUM'].iloc[:-1] // f1['Shareprice'].iloc[:-1]
f1['PnL'] = f1['Shares'] * f1['Outcome1']
f1['EndAUM'] = f1['StartAUM'] + f1 ['PnL']
EDIT: this will not work correctly since StartAUM, EndAUM, Shares depend on each other and cannot be computed one without another. I didn't notice that before.
Can you try the following:
#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt
#download data into DataFrame and create moving averages columns
f1 = data.DataReader('AAPL', 'yahoo',start='1/1/2017')
StartAUM = 1000000
#populate DataFrame with starting values
f1['Shares'] = 0
f1['PnL'] = 0
f1['EndAUM'] = StartAUM
#Set shares held to be the previous day's EndAUM divided by the previous day's closing price
f1['Shares'] = f1['EndAUM'].shift(1) / f1['Adj Close'].shift(1)
#Set the day's PnL to be the number of shares held multiplied by the change in closing price from yesterday to today's close
f1['PnL'] = f1['Shares'] * (f1['Adj Close'] - f1['Adj Close'].shift(1))
#Set day's ending AUM to be previous days ending AUM plus daily PnL
f1['EndAUM'] = f1['EndAUM'].shift(1) + f1['PnL']
#Plot the equity curve
f1['EndAUM'].plot()
Does the above solve your issue?
The solution was to use the Numba package. It performs the loop task in a fraction of the time.
https://numba.pydata.org/
The arguments/dataframe can be passed to the numba module/function. I will try to write up a more detailed explanation with code when time permits.
Thanks to all
In case others come across this, you can definitely make an equity curve without loops.
Dummy up some data
import pandas as pd
import numpy as np
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (13, 10)
# Some data to work with
np.random.seed(1)
stock = pd.DataFrame(
np.random.randn(100).cumsum() + 10,
index=pd.date_range('1/1/2020', periods=100, freq='D'),
columns=['Close']
)
stock['ma_5'] = stock['Close'].rolling(5).mean()
stock['ma_15'] = stock['Close'].rolling(15).mean()
Holdings: simple long/short based on moving average crossover signals
longs = stock['Close'].where(stock['ma_5'] > stock['ma_15'], np.nan)
shorts = stock['Close'].where(stock['ma_5'] < stock['ma_15'], np.nan)
# Quick plot
stock.plot()
longs.plot(lw=5, c='green')
shorts.plot(lw=5, c='red')
EQUITY CURVE:
Identify which side (l/s) has first holding (ie: first trade, in this case, short), then keep the initial trade price and subsequently cumulatively sum the daily changes (there would normally be more nan's in the series if you have exit rules as well for when you are out of the market), and finally forward fill over the nan values and fill any last remaining nans with zeros. Its basically the same for the second opposite holdings (in this case, long) except don't keep the starting price. The other important thing is to invert the short daily changes (ie: negative changes should be positive to the PnL).
lidx = np.where(longs > 0)[0][0]
sidx = np.where(shorts > 0)[0][0]
startdx = min(lidx, sidx)
# For first holding side, keep first trade price, then calc daily change fwd and ffill nan's
# For second holdng side, get cumsum of daily changes, ffill and fillna(0) (make sure short changes are inverted)
if lidx == startdx:
lcurve = longs.diff() # get daily changes
lcurve[lidx] = longs[lidx] # put back initial starting price
lcurve = lcurve.cumsum().ffill() # add dialy changes/ffill to build curve
scurve = -shorts.diff().cumsum().ffill().fillna(0) # get daily changes (make declines positive changes)
else:
scurve = -shorts.diff() # get daily changes (make declines positive changes)
scurve[sidx] = shorts[sidx] # put back initial starting price
scurve = scurve.cumsum().ffill() # add dialy changes/ffill to build curve
lcurve = longs.diff().cumsum().ffill().fillna(0) # get daily changes
Add the 2 long/short curves together to get the final equity curve
eq_curve = lcurve + scurve
# quick plot
stock.iloc[:, :3].plot()
longs.plot(lw=5, c='green', label='Long')
shorts.plot(lw=5, c='red', label='Short')
eq_curve.plot(lw=2, ls='dotted', c='orange', label='Equity Curve')
plt.legend()
I was wondering if there is a more efficient/cleaner way of doing the following. Say I have a dataframe that contains 2 columns, the percentage, (base on previous price) and the action, play/buy (1) or not play/sell (-1). Its basically about stocks.
For simplicity, consider the example df:
Percent Action
1.25 1
1.20 1
0.50 -1
0.75 1
I would like to generate the following. I only care about the final money amount, I am just showing this table for reference. Say we started with $100 and a state of not playing. Thus we should get the money amount of:
Playing Percent Action Money
No 1.25 1 $100
Yes 1.20 1 $120
Yes 0.50 -1 $60
No 0.75 1 $60
Yes ... ... ...
The amount didnt change in the first row since we weren't playing yet. Since the action is 1, we will play the next one. The percentage went up 20%, thus we get $120. The next action is still a 1, so we'll still be in the next one. The percentage went down to 50% so we end up with $60. Next action is -1, thus we will not play. The percentage went down to 75%, but since we weren't playing, our money stayed the same. And so on.
Currently, I have the code below. It works fine, but just wondering if there is a more efficient way using numpy/pandas functions. Mine basically iterates through each row and calculate the value.
playing = False
money = 10000
for index, row in df.iterrows():
## UPDATE MONEY IF PLAYING
if index > 0 and playing == True:
money = float(format(money*row['Percent'],'.2f'))
## BUY/SELL
if row['Action'] == 1:
if playing == False:
playing = True ## Buy, playing after this
elif row['Action'] == -1:
if playing == True:
playing = False ## Sell, not playing after this
You could try this:
# decide whether to play based on action
df['Playing'] = df.Action.shift().eq(1)
# replace Percent for not playing row with 1 and then calculate the cumulative product
df['Money'] = '$' + df.Percent.where(df.Playing, 1).cumprod().mul(100).astype(str)
df
#Percent Action Playing Money
#0 1.25 1 False $100.0
#1 1.20 1 True $120.0
#2 0.50 -1 True $60.0
#3 0.75 1 False $60.0