zipline: target_order not executed in handle_data - python

I'm trying to develop a monthly rotational trading strategy with Zipline and data from the Quandl bundle.
The strategy is supposed to hold a number ("topn") of assets with the highest momentum score and hold them until they dropped below a certain momentum rank ("keepn").
When I run the following code through zipline, it works for a couple of months, then suddenly starts holding more and more positions, selling the same positions repeatedly without actually removing the positions from the portfolio. This happens with Quandl data as well as with a custom bundle.
I'm guessing, there's a fundamental flaw in my strategy, but going through debugging, I really can't find it.
Any help is appreciated!
Thank you.
Dirk
def initialize(context):
# List of all assets to chose from
context.tickers = ["AAPL", "YELP", "YHOO", "MMM",
"ABT", "AMD", "AMZN", "GOOG",
"AXP", "AMGN", "BBY", "BLK",
"CAT"]
context.universe = [symbol(ticker) for ticker in context.tickers]
context.momentum_lookback = 256
# Hold (topn) 3 assets, as long as they are in the (keepn) top 5 momentum_rank
context.topn = 3
context.keepn = 5
# Schedule the trading routine for once per month
schedule_function(handle_data, date_rules.month_start(), time_rules.market_close())
# Allow no leverage
set_max_leverage = 1.0
def momentum_score(ts):
# Simplified momentum score: Last price / price 256 days ago
return ts[-1] / ts[0]
def handle_data(context, data):
# String with today's date for logging purposes
today = get_datetime().date().strftime('%d/%m/%Y')
# Create 256 days (context.momentum_lookup) history for all equities
hist = data.history(context.universe,
"close",
context.momentum_lookback,
"1d")
# How much to hold of each equity
target_percent = 100 / context.topn
# Rank ETFs by momentum score
ranking_table = hist.apply(momentum_score).sort_values(ascending=False)
top_assets = ranking_table[:context.topn]
grace_assets = ranking_table[:context.keepn]
# List of equities being held in current portfolio
kept_positions = list(context.portfolio.positions.keys())
# Sell logic
# ==========
# Sell current holdings no longer in grace assets
for holding in context.portfolio.positions:
if holding not in grace_assets:
if data.can_trade(holding):
print(today + " [Sell] "+holding.symbol)
order_target_percent(holding, 0.0)
kept_positions.remove(holding)
# Buy Logic
# =========
# Determine how many new assets to buy
replacements = context.topn - len(kept_positions)
# Remove currently held positions from the top list
buy_list = ranking_table.loc[~ranking_table.index.isin(kept_positions)][:replacements]
# Buy new entities and rebalance "kept_positions" to the desired weights
new_portfolio = list(buy_list.index) + kept_positions
# Buy/rebalance assets
for asset in new_portfolio:
if data.can_trade(asset):
print(today+"[BUY] "+asset.symbol)
order_target_percent(asset, target_percent)

Ok, so I figured out what the problem is. Basic math failure on my end.
This is the troublesome code:
# How much to hold of each equity
target_percent = 100 / context.topn
It should have been target_percent context.topn / 100 instead. facepalm
I'm assuming this leads to situations in which orders aren't filled properly, leading to the described behavior.
Lesson learned:
Check for open orders and cancel them, if needed
Keep an eye on leverage and position sizes and check against restrictions during the algo run

Related

Calculate average asset price when using netting instead of hedging

I'm trying to come up with a formula to calculate the average entry/position price to further update my stop loss and take profit.
For example opened BTC buy position with amount of 1 when price was 20000.
Later when price dropped down to 19000 we made another buy using the same amount of 1, "avereging" the position to the middle, so end up with position at 19500 with amount of 2.
Where I'm struggling is what if we want to increase the order size on each price.
Say 1 at 20000, 1.5 at 19500, 2 at 19000 and so on.
Or made new buys of the same amount but shorter distance between.
Inital buy at 20000. then 19000 then 19150
Or combine these two variants.
I use mainly Python and Pandas. Maybe the latter one has some built-in function which I'm not aware of. I checked the official Pandas docs, but found only regular mean function.
Thanks to Yuri's suggestion to look into VWAP, I came up with the following code, which is more advanced and allows you to use different contract/volume sizes and increase/decrease "distance" between orders.
As an example here I used avarage price of BTC 20000 and increased steps distance using 1.1 multiplier as well as increased volume. Operated in Binance futures terms, where you can buy minimum 1 contract for 10$.
The idea is to find sweet spot for orders distance, volume, stop loss and take profit while avereging down.
# initial entry price
initial_price = 20000
# bottom price
bottom_price = 0
# enter on every 5% price drop
step = int(initial_price*0.05)
# 1.1 to increase distance between orders, 0.9 to decrease
step_multiplier = 1.1
# initial volume size in contracts
initial_volume = 1
# volume_multiplier, can't be less than 1, in case of use float, will be rounded to decimal number
volume_multiplier = 1.1
# defining empty arrays
prices = []
volumes = []
# checking if we are going to use simple approach with 1 contract volume and no sep or volume multiplier
if step_multiplier == 1 and volume_multiplier == 1:
prices = range(initial_price,bottom_price,-step)
else:
# defining current price and volume vars
curr_price = initial_price
curr_volume = initial_volume
# Checking if current price is still bigger then defined bottom price
while curr_price > bottom_price:
# adding current price to the list
prices.append(curr_price)
# calulating next order price
curr_price = curr_price-step*step_multiplier
# checking if volume multiplier is bigger then 1
if volume_multiplier > 1:
# adding current volume to the list
volumes.append(int(curr_volume))
# calulating next order volume
curr_volume = curr_volume*volume_multiplier
print("Prices:")
for price in prices:
print(price)
print("Volumes:")
for volume in volumes:
print(volume)
print("Prices array length", len(prices))
print("Volumes array length", len(volumes))
a = [item1 * item2 for item1, item2 in zip(prices, volumes)]
b = volumes
print("Average position price when price will reach",prices[-1], "is", sum(a)/sum(b))

Is there any way to get rid of for loops and get the number of stocks as a variable of cash in hand?

I have got the following code from datacamp to create a portfolio of returns for trading in the stock market.
# Set the initial capital
initial_capital= float(100000)
positions = pd.DataFrame(index=final.index).fillna(0.0)
number_of_stocks = 3
positions['signal'] = number_of_stocks*final['signal'] #Buy shares
portfolio = positions.multiply(final['Close'], axis=0)
pos_diff = positions.diff()
portfolio['holdings'] = (positions.multiply(final['Close'], axis=0)).sum(axis=1) # holding amount
portfolio['cash'] = initial_capital - (pos_diff.multiply(final['Close'], axis=0)).sum(axis=1).cumsum() # cash amount
portfolio['total'] = portfolio['cash'] + portfolio['holdings'] # total amount
portfolio['returns'] = portfolio['total'].pct_change() # Return percentages
portfolio['diff'] = pos_diff
portfolio['positions'] = positions['signal']
portfolio.tail()
All I want to do now is to convert number_of_strock into a variable of cash in hand, so that, I can trade with all the cash I have in my hand each time I buy or sell.
I have tried using nested for loops but did not get any good outputs. Is there any way to get the thing done without using a complex looping structure?
Thanks for your aid.

Compound Interest Calculator - Variable Frequency of Deposits, Deposit Amounts and Daily Variable Interest Rate

I am trying to build a calculator that computes compound interest but with a few quirks. Specifically:
a) I want the deposited amount to vary within a normal distribution for every month until the end of the investment
b) I want the rate of interest to vary within a normal distribution for every day until the end of the investment
I started off with the basics:
# -*- coding: utf-8 -*-
"""
Spyder Editor
This is a temporary script file.
"""
import numpy as np
from tabulate import tabulate
tabulate.PRESERVE_WHITESPACE = True
year = 1
Principal = 1050
Prev_Principal = 0
n = 365
Total_New = 0
FV_prev = 0
for year in range(1,5):
RoR = 0.01*np.random.normal(7.43,4.172,1)
PMT = np.random.normal(575,85.39,1)
FV = PMT*(12/n)*((1+(RoR/n))**(n*year)-1)/(RoR/n)
Total = FV + Principal*(1+(RoR/n))**(n*year)
Total_New = Total - Total_New
Net_Gain = Total - (year*12*PMT+Principal)
print(tabulate([["YEAR","FV","TOTAL","CI", "R%"],[year,np.round(FV,1),np.round(Total,1),np.round(Net_Gain,1),np.round(100*RoR,1)]], headers="firstrow", tablefmt='fancy_grid'))
This basic version just outputs the sum at the end of the year including the interest as well as the net interest by year. Unfortunately, although it varies the interest rate, it only does so once per year. I plan on doing the following however I am not sure if it's correct, both in terms of programming and mathematically. So I want to take these:
FV = PMT*(12/n)*((1+(RoR/n))-1)/(RoR/n)
Total = FV + Principal*(1+(RoR/n))**(n*year)
and use
RoR_Array = 0.01*np.random.normal(7.43,4.172, 365)
math.prod((1+RoR_Array/365))
With RoR_Array I am basically trying to create a 1D ray with 365 elements, one for each day of the year, which represents the daily interest rate.
With 'math prod' I am trying to overcome the following issue:
If n=365, that means that interest is fixed for the year and compounded daily so
Total = FV + Principal*(1+(RoR/365))**(365*1)
But since I want a variable daily RoR, what's the best way of doing it?
Hence why I am using math.prod.

Calculating returns with short positions (backtest)

My goal is to write a function that returns a vector of portfolio returns for each period (i.e. day) from a pandas dataframe of security prices. For simplicity, let's assume that the initial weights are equally split between securities A and B. Prices are given by the following dataframe:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=20)
prices = pd.DataFrame({'A': np.linspace(20, 50, num=20),
'B': np.linspace(100, 200, num=20)},
index=dates)
Further, we assume that asset A is the asset where we initiate a short position and we go long asset B.
Calculating discrete returns from a "zero-investment position" like a short position (i.e. in asset A) in a first step and overall portfolio returns from the weighted returns of single assets that constitute the portfolio in a second step is not trivial, and before I put my so far attempt, which is not working correctly (key problem being the loss from the short position in asset A exceeding -100% on 2013-01-14), I am greatful for any kind of help - may it be theoretical or code.
You are forgetting asset “C”, as in collateral. No matter how generous your broker might be (not!), most exchanges and national regulatory organizations would require collateral. you might read about some wealthy hedge fund guy doing these trades with just long and short positions, but when it goes south you also read about the HF guy losing his art collection— which was the collateral.
Equity margin requirements in the USA would require 50% collateral to start, and at least 25% maintenance margin while the short trades were open. This is enforced by exchanges and regulatory authorities. Treasury bonds might have more favorable requirements, but even then the margin (collateral) is not zero.
Since a long/short doubles your risk (what happens if the long position goes down, and short position goes up?), your broker likely would require more margin than the minimum.
Add asset “C”, collateral, to your calculations and the portfolio returns become straight forward
Thank you for your answer #Stripedbass. Based on your comment, can the return process of a portfolio consisting of the two stocks be described by the following equations?:
The terms with + and - are the market values of the long and short position respectively such that the difference of them represents the net value of the two positions. If we assume that we want to be "market neutral" at the beginning of the trade t=0, the net value is zero.
For t > 0 these net positions represent the unrealised gains or losses of the long and short position that were opened and have not yet been closed. The term C denotes the money that we actually hold. It consists of the initial collateral and the cumulative gains and losses from the stock positions.
The overall return per period from trading the two securtities is then calculated as the simple return of the account V.
Based on this, you could define the following function and for short posititions choose option type='shares':
def weighted_return(type, df, weights):
capital = 100
#given the input dataframe contains return series
if type == "returns":
# create price indices
df.fillna(0, inplace=True)
df_price_index = pd.DataFrame(index=df.index, columns=df.columns)
df_price_index.iloc[0] = 100 + df.iloc[0]
for i in np.arange(1, len(df_price_index)):
for col in df_price_index.columns:
df_price_index[col].iloc[i] = df_price_index[col].iloc[i - 1] * (1 + df[col].iloc[i])
n = 0
ind_acc = []
for stock in df.columns:
ind_capital = capital * weights[n]
moves = (df_price_index[stock].diff()) * ind_capital / df_price_index[stock][0]
ind_acc.append(moves)
n += 1
pair_ind_accounts = pd.concat(ind_acc, axis=1)
portfolio_acc = pair_ind_accounts.sum(1).cumsum() + capital
df_temp_returns_combined = portfolio_acc.pct_change()
df_temp_returns_combined[0] = np.sum(weights * df.iloc[0].values)
df_temp_returns_combined = pd.DataFrame(df_temp_returns_combined)
df_temp_returns_combined.columns = ["combinedReturns"]
#given the input dataframe contains price series
if type == "prices":
n = 0
ind_acc = []
for stock in df.columns:
ind_capital = capital * weights[n]
moves = (df[stock].diff()) * ind_capital / df[stock][0]
ind_acc.append(moves)
n += 1
pair_ind_accounts = pd.concat(ind_acc, axis=1)
portfolio_acc = pair_ind_accounts.sum(1).cumsum() + capital
df_temp_returns_combined = portfolio_acc.pct_change()
df_temp_returns_combined[0] = np.NaN
df_temp_returns_combined = pd.DataFrame(df_temp_returns_combined)
df_temp_returns_combined.columns = ["combinedReturns"]
#given the input dataframe contains return series and the strategy is long/short
if type == "shares":
exposures = []
for stock in df.columns:
shares = 1/df[stock][0]
exposure = df[stock] * shares
exposures.append(exposure)
df_temp = pd.concat(exposures, axis=1)
index_long = np.where(np.array(weights) == 1)
index_short = np.where(np.array(weights) == -1)
df_temp_account = pd.DataFrame(df_temp.iloc[:,index_long[0]].values - df_temp.iloc[:,index_short[0]].values) + 1
df_temp_returns_combined = df_temp_account.pct_change()
df_temp_returns_combined.columns = ["combinedReturns"]
df_temp_returns_combined.index = df.index
return pd.DataFrame(df_temp_returns_combined)

How do I avoid a loop with Python/Pandas to build an equity curve?

I am trying to build an equity curve in Python using Pandas. For those not in the know, an equity curve is a cumulative tally of investing profits/losses day by day. The code below works but it is incredibly slow. I've tried to build an alternate using Pandas .iloc and such but nothing is working. I'm not sure if it is possible to do this outside of a loop given how I have to reference the prior row(s).
for today in range(len(f1)): #initiate a loop that runs the length of the "f1" dataframe
if today == 0: #if the index value is zero (aka first row in the dataframe) then...
f1.loc[today,'StartAUM'] = StartAUM #Set intial assets
f1.loc[today,'Shares'] = 0 #dummy placeholder for shares; no trading on day 1
f1.loc[today,'PnL'] = 0 #dummy placeholder for P&L; no trading day 1
f1.loc[today,'EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = today - 1 #used to reference the rows (see below)
f1.loc[today,'StartAUM'] = f1.loc[yesterday,'EndAUM'] #todays starting aseets are yesterday's ending assets
f1.loc[today,'Shares'] = f1.loc[yesterday,'EndAUM']//f1.loc[yesterday,'Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
f1.loc[today,'PnL'] = f1.loc[today,'Shares']*f1.loc[today,'Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
f1.loc[today,'EndAUM'] = f1.loc[today,'StartAUM']+f1.loc[today,'PnL'] #ending assets are starting assets + today's P&L
There is a good example here: http://www.pythonforfinance.net/category/basic-data-analysis/ and I know that there is an example in Wes McKinney's book Python for Data Analysis. You might be able to find it here: http://wesmckinney.com/blog/python-for-financial-data-analysis-with-pandas/
Have you tried using iterrows() to construct the for loop?
for index, row in f1.iterrows():
if today == 0:
row['StartAUM'] = StartAUM #Set intial assets
row['Shares'] = 0 #dummy placeholder for shares; no trading on day 1
row['PnL'] = 0 #dummy placeholder for P&L; no trading day 1
row['EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = row[today] - 1 #used to reference the rows (see below)
row['StartAUM'] = row['EndAUM'] #todays starting aseets are yesterday's ending assets
row['Shares'] = row['EndAUM']//['Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
row['PnL'] = row['Shares']*row['Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
row['EndAUM'] = row['StartAUM']+row['PnL'] #ending assets are starting assets + today's P&L
Probably the code is so slow as loc goes through f1 from beginning every time. iterrows() uses the same dataframe as it loops through it row by row.
See more details about iterrows() here.
You need to vectorize the operations (don't iterate with for but rather compute whole column at once)
# fill the initial values
f1['StartAUM'] = StartAUM # Set intial assets
f1['Shares'] = 0 # dummy placeholder for shares; no trading on day 1
f1['PnL'] = 0 # dummy placeholder for P&L; no trading day 1
f1['EndAUM'] = StartAUM # s
#do the computations (vectorized)
f1['StartAUM'].iloc[1:] = f1['EndAUM'].iloc[:-1]
f1['Shares'].iloc[1:] = f1['EndAUM'].iloc[:-1] // f1['Shareprice'].iloc[:-1]
f1['PnL'] = f1['Shares'] * f1['Outcome1']
f1['EndAUM'] = f1['StartAUM'] + f1 ['PnL']
EDIT: this will not work correctly since StartAUM, EndAUM, Shares depend on each other and cannot be computed one without another. I didn't notice that before.
Can you try the following:
#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt
#download data into DataFrame and create moving averages columns
f1 = data.DataReader('AAPL', 'yahoo',start='1/1/2017')
StartAUM = 1000000
#populate DataFrame with starting values
f1['Shares'] = 0
f1['PnL'] = 0
f1['EndAUM'] = StartAUM
#Set shares held to be the previous day's EndAUM divided by the previous day's closing price
f1['Shares'] = f1['EndAUM'].shift(1) / f1['Adj Close'].shift(1)
#Set the day's PnL to be the number of shares held multiplied by the change in closing price from yesterday to today's close
f1['PnL'] = f1['Shares'] * (f1['Adj Close'] - f1['Adj Close'].shift(1))
#Set day's ending AUM to be previous days ending AUM plus daily PnL
f1['EndAUM'] = f1['EndAUM'].shift(1) + f1['PnL']
#Plot the equity curve
f1['EndAUM'].plot()
Does the above solve your issue?
The solution was to use the Numba package. It performs the loop task in a fraction of the time.
https://numba.pydata.org/
The arguments/dataframe can be passed to the numba module/function. I will try to write up a more detailed explanation with code when time permits.
Thanks to all
In case others come across this, you can definitely make an equity curve without loops.
Dummy up some data
import pandas as pd
import numpy as np
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (13, 10)
# Some data to work with
np.random.seed(1)
stock = pd.DataFrame(
np.random.randn(100).cumsum() + 10,
index=pd.date_range('1/1/2020', periods=100, freq='D'),
columns=['Close']
)
stock['ma_5'] = stock['Close'].rolling(5).mean()
stock['ma_15'] = stock['Close'].rolling(15).mean()
Holdings: simple long/short based on moving average crossover signals
longs = stock['Close'].where(stock['ma_5'] > stock['ma_15'], np.nan)
shorts = stock['Close'].where(stock['ma_5'] < stock['ma_15'], np.nan)
# Quick plot
stock.plot()
longs.plot(lw=5, c='green')
shorts.plot(lw=5, c='red')
EQUITY CURVE:
Identify which side (l/s) has first holding (ie: first trade, in this case, short), then keep the initial trade price and subsequently cumulatively sum the daily changes (there would normally be more nan's in the series if you have exit rules as well for when you are out of the market), and finally forward fill over the nan values and fill any last remaining nans with zeros. Its basically the same for the second opposite holdings (in this case, long) except don't keep the starting price. The other important thing is to invert the short daily changes (ie: negative changes should be positive to the PnL).
lidx = np.where(longs > 0)[0][0]
sidx = np.where(shorts > 0)[0][0]
startdx = min(lidx, sidx)
# For first holding side, keep first trade price, then calc daily change fwd and ffill nan's
# For second holdng side, get cumsum of daily changes, ffill and fillna(0) (make sure short changes are inverted)
if lidx == startdx:
lcurve = longs.diff() # get daily changes
lcurve[lidx] = longs[lidx] # put back initial starting price
lcurve = lcurve.cumsum().ffill() # add dialy changes/ffill to build curve
scurve = -shorts.diff().cumsum().ffill().fillna(0) # get daily changes (make declines positive changes)
else:
scurve = -shorts.diff() # get daily changes (make declines positive changes)
scurve[sidx] = shorts[sidx] # put back initial starting price
scurve = scurve.cumsum().ffill() # add dialy changes/ffill to build curve
lcurve = longs.diff().cumsum().ffill().fillna(0) # get daily changes
Add the 2 long/short curves together to get the final equity curve
eq_curve = lcurve + scurve
# quick plot
stock.iloc[:, :3].plot()
longs.plot(lw=5, c='green', label='Long')
shorts.plot(lw=5, c='red', label='Short')
eq_curve.plot(lw=2, ls='dotted', c='orange', label='Equity Curve')
plt.legend()

Categories

Resources