Low volatility portfolio construction - python

I want to test the low volatility factor for some market other than equities. Contradiccting finance 101, it has been Shown that low volatility stocks outperform high volatility stocks (see, for example, Baker, Malcolm, Brendan Bradley, and Jeffrey Wurgler (2011), “Benchmarks as Limits to Arbitrage: Understanding the Low-Volatility Anomaly”, Financial Analyst Journal, Vol. 67, No. 1, pp. 40–54.)
So what I want to do is construct the low vola factor by following the methodology of Jegadeesh and Titman (1993), namely raning stocks according to their previous j historical volatility and short top 30% (the most volatile) and Long the bottom 30% (the least volatile), and hold that Long-short Portfolio for k periods. Therefore, a 3-3 j-k Portfolio would mean, looking at the past 3 months of historical volatility (j), and hold that Portfolio for the following 3 months (k).
I have written some Code, and the j part Can be easily managed by simply increasing or decreasing the window of the rolling window vola calculation. The part I am struggling with is the k part, how this could be done. Unfortunately, I couldnt find many examples online.
In addition, I was wondering if my Code is correct or if I did any mistake, since it surprisingly did not work, regardless of the dataset I used. I am not sure whether this is the right place to ask, but if someone could take a look at it that would be great and might be helpful to others planning to implement a strategy like this as well.
Below is a simple working example with just 10 stocks. As I said, I want to implement it for some other assets, but this Code should work. You just have to use your own API key in line 16. Thanks a lot!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import quandl
import pickle
import scipy.optimize as sco
from scipy.ndimage.interpolation import shift
import matplotlib.pyplot as plt
##################
# Low volatility #
##################
quandl.ApiConfig.api_key = 'Your key here'
stocks = ['MSFT','AAPL','AMZN','FB','BRK.B','JPM','GOOG','JNJ','V','PG','XOM']
data = quandl.get_table('WIKI/PRICES', ticker = stocks,
qopts = { 'columns': ['date', 'ticker', 'adj_close'] },
date = { 'gte': '2016-1-1', 'lte': '2019-11-3' }, paginate=True)
# with open("data.pkl", "wb") as pickle_file:
# pickle.dump(data, pickle_file)
# with open("data.pkl", "rb") as pickle_file:
# data = pickle.load(pickle_file)
data = data.pivot_table(index='date', columns='ticker', values='adj_close')
data = data.groupby(pd.Grouper(freq="M")).mean() # convert from daily to monthly prices
returns = (np.log(data) - np.log(data.shift(1))).dropna()
stds = returns.rolling(12).std()
stds = stds.values # convert to numpy array
list = []
for x in range(0, stds.shape[0]): # for each row in std matrix, create decile buckets (dec -> breakpoint to the next bucket)
for y in range(0,100,10):
dec = np.percentile(stds[x], y)
list.append(dec)
list = np.array(list) # convert list to numpy array
list = np.reshape(list, (stds.shape[0], -1)) # reshape the array such that it has the same format as returns (here: (26,10))
inds = []
for x in range(0, stds.shape[0]): # if the return is in the lower 30%, allocate a -1 to the asset. If it is in the upper 30%, allocate a 1. 0 otherwise.
ind = np.digitize(stds[x], list[x])
for x in range(0, ind.shape[0]):
if ind[x] <= 3:
ind[x] = 1
elif ind[x] >= 8:
ind[x] = -1
else:
ind[x] = 0
inds.append(ind)
inds = np.array(inds)
inds = inds.astype(np.float32)
for x in inds: # divide -1, 1 and 0 by the respective total number of counts of -1, 1 and 0, such that they sum up to -1 and 1 (beta neutral long-short)
ones = np.count_nonzero(x == 1) # count the number of 1
minus_ones = np.count_nonzero(x == -1) # count the number of -1
zeros = np.count_nonzero(x == 0) # count the number of 0
for y in range(0, inds.shape[1]):
if x[y] == 1:
x[y] = x[y] / ones
elif x[y] == -1:
x[y] = x[y] / minus_ones
else:
x[y] = x[y] / zeros
returns = returns.shift(periods=-1).values # shift returns one period back, and create numpy array
pf_returns = np.sum((inds*returns), axis=1) # multiply returns with weights, and sum up
pf_returns = pd.DataFrame(pf_returns)
print("---")
print(pf_returns.describe())
# Plot
pf_returns_indexed = 100 * (1 + pf_returns).cumprod()
pf_returns_indexed = pf_returns_indexed.plot(linewidth=1.2) # change line width
plt.show()

Related

changing interval at which simulation fits ARIMA (help w/ for-loop)

I'm currently working on a trading strategy simulator that fits an ARIMA to stock return data, makes a next day prediction, then buys/sells based on that prediction. It continues to accumulate shares until a sell signal is generated, at which point the program will liquidate the accumulated position and begin again.
Right now, I specify an interval of dates, then the loop will start by fitting an ARIMA to the first 14 days of return data, making a prediction for day 15, acting on the prediction, then it will begin again with the first 15 days, fitting a new ARIMA. It will continue this until it gets to the end of the range of dates specified, with each new iteration adding the previous day's sample.
So, basically n increases by 1 for every iteration of the loop. I don't want this. I want it to repeatedly fit to an interval of a fixed length. For example, say I'm testing a strategy over 500 trading days. For the first iteration I want the loop to take the 50 days prior to day 1 of the specified interval and fit an ARIMA, and then trade in the same manner as before, but for the next iteration of the loop, I don't want it to fit to 51 days, I want to fit the 50 days prior to the current date every time.
Here's the start of the simulation function where the for-loop is specified. I can't seem to figure out how to change the loop to accomplish my goal. Any help would be greatly appreciated!!
def run_simulation(returns, prices, amt, order, thresh, verbose=True, plot=True):
if type(order) == float:
thresh = None
curr_holding = False
sum_list = []
events_list = []
sharpe_list = []
init_amt = amt
#go through dates
for date, r in tqdm (returns.iloc[14:].items(), total=len(returns.iloc[14:])):
#get data til just before current date
curr_data = returns[:date]
# check if using ARIMA from order
if type(order) == tuple:
#fit model
model = ARIMA(curr_data, order=order).fit()
print(model.summary())
#get forecast
pred = model.forecast()
print(pred)
float_pred = float(pred)
Here's the full script for context:
import yfinance as yf
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
import numpy as np
import seaborn as sns
from tqdm import tqdm
import pandas as pd
from statsmodels.tools.sm_exceptions import ValueWarning, HessianInversionWarning, ConvergenceWarning
import warnings
#in practice do not supress these warnings, they carry important information about the status of your model
warnings.filterwarnings('ignore', category=ValueWarning)
warnings.filterwarnings('ignore', category=HessianInversionWarning)
warnings.filterwarnings('ignore', category=ConvergenceWarning)
tickerSymbol = 'SPY'
data = yf.Ticker(tickerSymbol)
prices = data.history(start='2021-01-01', end='2022-01-03').Close
returns = prices.pct_change().dropna()
def std_dev(data):
# Get number of observations
n = len(data)
# Calculate mean
mean = sum(data) / n
# Calculate deviations from the mean
deviations = sum([(x - mean)**2 for x in data])
# Calculate Variance & Standard Deviation
variance = deviations / (n - 1)
s = variance**(1/2)
return s
# Sharpe Ratio From Scratch
def sharpe_ratio(data, risk_free_rate=0):
# Calculate Average Daily Return
mean_daily_return = sum(data) / len(data)
print(f"mean daily return = {mean_daily_return}")
# Calculate Standard Deviation
s = std_dev(data)
# Calculate Daily Sharpe Ratio
daily_sharpe_ratio = (mean_daily_return - risk_free_rate) / s
# Annualize Daily Sharpe Ratio
sharpe_ratio = 252**(1/2) * daily_sharpe_ratio
return sharpe_ratio
def run_simulation(returns, prices, amt, order, thresh, verbose=True, plot=True):
if type(order) == float:
thresh = None
curr_holding = False
sum_list = []
events_list = []
sharpe_list = []
init_amt = amt
#go through dates
for date, r in tqdm (returns.iloc[14:].items(), total=len(returns.iloc[14:])):
#get data til just before current date
curr_data = returns[:date]
# check if using ARIMA from order
if type(order) == tuple:
#fit model
model = ARIMA(curr_data, order=order).fit()
print(model.summary())
#get forecast
pred = model.forecast()
print(pred)
float_pred = float(pred)
#if you predict a high enough return and not holding, buy stock
# order for random strat and tuple for ARIMA
if float_pred > thresh \
or (order == 'last' and curr_data[-1] > 0):
buy_price = prices.loc[date]
events_list.append(('b', date))
int_buy_price = int(buy_price)
sum_list.append(int_buy_price)
curr_holding = True
if verbose:
print('Bought at $%s'%buy_price)
print('Predicted Return: %s'%round(pred,4))
print(f"Current holdings = {sum(sum_list)}")
print('=======================================')
continue
#if you predict below the threshold return, sell the stock
if (curr_holding) and \
((type(order) == float and np.random.random() < order)
or (type(order) == tuple and float_pred < thresh)
or (order == 'last' and curr_data[-1] > 0)):
sell_price = prices.loc[date]
total_return = len(sum_list) * sell_price
ret = (total_return-sum(sum_list))/sum(sum_list)
amt *= (1+ret)
events_list.append(('s', date, ret))
sharpe_list.append(ret)
sum_list.clear()
curr_holding = False
if verbose:
print('Sold at $%s'%sell_price)
print('Predicted Return: %s'%round(pred,4))
print('Actual Return: %s'%(round(ret, 4)))
print('=======================================')
if verbose:
sharpe = sharpe_ratio(sharpe_list, risk_free_rate=0.004)
print('Total Amount: $%s'%round(amt,2))
print(f"Sharpe Ratio: {sharpe}")
#graph
if plot:
plt.figure(figsize=(10,4))
plt.plot(prices[14:])
y_lims = (int(prices.min()*.95), int(prices.max()*1.05))
shaded_y_lims = int(prices.min()*.5), int(prices.max()*1.5)
for idx, event in enumerate(events_list):
plt.axvline(event[1], color='k', linestyle='--', alpha=0.4)
if event[0] == 's':
color = 'green' if event[2] > 0 else 'red'
plt.fill_betweenx(range(*shaded_y_lims),
event[1], events_list[idx-1][1], color=color, alpha=0.1)
tot_return = round(100*(amt / init_amt - 1), 2)
sharpe = sharpe_ratio(sharpe_list, risk_free_rate=0)
tot_return = str(tot_return) + '%'
plt.title("%s Price Data\nThresh=%s\nTotal Amt: $%s\nTotal Return: %s"%(tickerSymbol, thresh, round(amt,2), tot_return), fontsize=20)
plt.ylim(*y_lims)
plt.show()
print(sharpe)
return amt
# A model with a dth difference to fit and ARMA(p,q) model is called an ARIMA process
# of order (p,d,q). You can select p,d, and q with a wide range of methods,
# including AIC, BIC, and empirical autocorrelations (Petris, 2009).
for thresh in [0.001]:
run_simulation(returns, prices, 100000, (7,0,0), thresh, verbose=True)
solution:
curr_data = returns[:date]
curr_data_sliced = curr_data[-14:]
.
.
.
model=ARIMA(curr_data_sliced, ... )
Changing index for range of dates to use
e.g. [-50:] to incrementally train on 50 most recent data points

A faster way to compute percentage correlation between two filter functions

I wrote this function to compute the normalized percentage correlation between two filter functions (with one shifted). The function works but takes about 8 to 12 minutes depending on the number of elements in nbs. I would like to know if there is another way to make this operation faster. Here is my code below:
import numpy as np
DT = 0.08
def corr_g(*nbs, Np=10000, sf = 0.5):
wb = 0.25 # bandwidth in Hz
freq = (1/DT)*np.linspace(-0.5,0.5-1/Np,Np) # frequency vector
dCg_norms = np.zeros((Np,len(nbs)))
for idx, nb in enumerate(nbs): # nb is the filter parameter
d_k_vector = np.linspace(-Np*sf, Np*sf, Np) # indices vector
dCg = d_k_vector*0 # array to hold correlation
g = ((1+np.exp(-nb))**2)/((1+np.exp(-nb*(freq+wb)/wb))*(1+np.exp(nb*(freq-wb)/wb))) # filter function
for index2, d_k in enumerate(d_k_vector): # loop through the new indices vector
for index, sth in enumerate(g):
# form a new array from g using the indices vector use only values within the limits of g. Then do a dot product operation
if (index+d_k) < Np and (index+d_k) >= 0:
dCg[index2] += g[index] * g[index+int(d_k)]
dCg_norm = dCg/np.max(dCg)*100 # normalized correlation
dCg_norms[:,idx] = dCg_norm # add to allocated array
return dCg_norms
my_arr = corr_g(*[2,4,8,16])
import matplotlib.pyplot as plt
Np = 10000
DT = 0.08
d_k_vector = np.linspace(-5000, 5000, Np)
plt.plot(d_k_vector/(10000*DT)/0.25,my_arr[:,1])
You should not calculate correlation yourself, better use np.correlate(vector, 'same'). There are small differences between your result and mine and I am pretty sure error is on your side.
def corr_g2(*nbs, Np=10000, sf = 0.5):
wb = 0.25 # bandwidth in Hz
freq = (1/DT)*np.linspace(-0.5,0.5-1/Np,Np) # frequency vector
dCg_norms = np.zeros((Np,len(nbs)))
for idx, nb in enumerate(nbs): # nb is the filter parameter
g = ((1+np.exp(-nb))**2)/((1+np.exp(-nb*(freq+wb)/wb))*(1+np.exp(nb*(freq-wb)/wb))) # filter function
dCg = np.correlate(g, g, 'same')
dCg_norm = dCg/np.max(dCg)*100 # normalized correlation
dCg_norms[:,idx] = dCg_norm # add to allocated array
return dCg_norms
def main():
my_arr = corr_g(*[2,4], Np=Np)
my_arr2 = corr_g2(*[2,4], Np=Np)
# import matplotlib.pyplot as plt
# d_k_vector = np.linspace(-Np / 2, Np / 2 - 1, Np)
# plt.plot(d_k_vector/(10000*DT)/0.25,my_arr[:,1])
# plt.plot(d_k_vector/(10000*DT)/0.25,my_arr2[:,1])
# plt.show()
if __name__ == '__main__':
main()
Profiling results for Np=1000:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
39 #do_profile()
40 def main():
41 1 14419637.0 14419637.0 100.0 my_arr = corr_g(*[2,4], Np=Np)
42 1 1598.0 1598.0 0.0 my_arr2 = corr_g2(*[2,4], Np=Np)

ZigZag Indicator Metastock Formula to Python

I would like to create a zigzag indicator for stocks in Python. I have this Metastock Formula.
I dicided to publish this problem here because I don't know any other foro.
I saw 2 stackoverflow posts with something like this but they are wrong.
As you can see, the indicator tooks the close prices.
Thanks for your help.
Python code:
from __future__ import division
import matplotlib.pyplot as plt
import numpy as np
def islocalmax(x):
"""Both neighbors are lower,
assumes a centered window of size 3"""
return (x[0] < x[1]) & (x[2] < x[1])
def islocalmin(x):
"""Both neighbors are higher,
assumes a centered window of size 3"""
return (x[0] > x[1]) & (x[2] > x[1])
def isextrema(x):
return islocalmax(x) or islocalmin(x)
def create_zigzag(col, p=0.05):
# Find the local min/max
# converting to bool converts NaN to True, which makes it include the endpoints
ext_loc = col.rolling(3, center=True).apply(isextrema, raw=False).astype(np.bool_)
# extract values at local min/max
ext_val = col[ext_loc]
# filter locations based on threshold
thres_ext_loc = (ext_val.diff().abs() > (ext_val.shift(-1).abs() * p))
# Keep the endpoints
thres_ext_loc.iloc[0] = True
thres_ext_loc.iloc[-1] = True
thres_ext_loc = thres_ext_loc[thres_ext_loc]
# extract values at filtered locations
thres_ext_val = col.loc[thres_ext_loc.index]
# again search the extrema to force the zigzag to always go from high > low or vice versa,
# never low > low, or high > high
ext_loc = thres_ext_val.rolling(3, center=True).apply(isextrema, raw=False).astype(np.bool_)
thres_ext_val =thres_ext_val[ext_loc]
return thres_ext_val
from pandas_datareader import data
# Only get the adjusted close.
serie = data.DataReader(
"AAPL", start='2018-1-1', end='2020-12-31', data_source='yahoo'
)
dfzigzag = serie.apply(create_zigzag)
data1_zigzag = dfzigzag['Close'].dropna()
fig, axs = plt.subplots(figsize=(10, 3))
axs.plot(serie.Close, '-', ms=4, label='original')
axs.plot(data1_zigzag, 'ro-', ms=4, label='zigzag')
axs.legend()
plt.show()
Python code Plot:
The indicator:
Metastock Formula:
{ Copyright (c) 2004, John Bruns and Financial Trading Inc. }
reversal:=Input("Reversal",0,100,5);
pc:=Input("Use Percentage?",0,1,1);
z:=If(pc,Zig(CLOSE,reversal,%),Zig(CLOSE,reversal,$));
peakbar:=LastValue(BarsSince((z>Ref(z,-1)AND Ref(Z,-1)<Ref(Z,-2)) OR (z<Ref(z,-1))AND Ref(Z,-1)>Ref(Z,-2)))+1;
lastpeak:=LastValue(Ref(z,-peakbar));
lastend:=LastValue(z);
bars:=Cum(1);
invalid:=If(pc,If(Abs(lastend-lastpeak)*100/lastpeak<reversal,1,0),If(Abs(lastend-lastpeak)<reversal,1,0));
If(bars>=LastValue(bars)-peakbar AND invalid,lastpeak,z);

Use a pandas DataFrame created inside a function outside of the function

I am a Python beginner and wrote a function for a simple moving average strategy. I created a portfolio DataFrame inside the function and now I want to use this DataFrame outside of the function for plotting some graphs. My solution is: return portfolio - but this does not work. Can anybody help me?
This is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import a data source - FSE-Data with Index 'Date'
all_close_prices = pd.read_csv('FSE_daily_close.csv')
all_close_prices = all_close_prices.set_index('Date')
# Fill NaN Values with the last available stock price - except for Zalando
all_close_prices = all_close_prices.fillna(method='ffill')
# Import ticker symbols
ticker_list = list(all_close_prices)
# Zalando 'FSE/ZO1_X' (position row 99) - doesn't begin in 2004
# Drop Zalando
all_close_prices.drop('FSE/ZO1_X', axis=1)
# Also from the ticker list
ticker_list.remove('FSE/ZO1_X')
# Create an empty signal dataframe with datetime index equivalent to the stocks
signals = pd.DataFrame(index=all_close_prices.index)
def ma_strategy(ticker, long_window, short_window):
# Calculate the moving avergaes
moving_avg_long = all_close_prices.rolling(window=long_window, min_periods=1).mean()
moving_avg_short = all_close_prices.rolling(window=short_window, min_periods=1).mean()
moving_avg_short = moving_avg_short
moving_avg_long = moving_avg_long
# Add the two MAs for the stocks in the ticker_list to the signals dataframe
for i in ticker_list:
signals['moving_avg_short_' + i] = moving_avg_short[i]
signals['moving_avg_long_' + i] = moving_avg_long[i]
# Set up the signals
for i in ticker_list:
signals['signal_' + i] = np.where(signals['moving_avg_short_' + i] > signals['moving_avg_long_' + i], 1, 0)
signals['positions_' + i] = signals['signal_' + i].diff(periods=1)
#Backtest
initial_capital = float(100000)
# Create a DataFrame `positions` with index of signals
positions = pd.DataFrame(index=all_close_prices)
# Create a new column in the positions DataFrame
# On the days that the signal is 1 (short moving average crosses the long moving average, you’ll buy a 100 shares.
# The days on which the signal is 0, the final result will be 0 as a result of the operation 100*signals['signal']
positions = 100 * signals[['signal_' + ticker]]
# Store the portfolio value owned with the stock
# DataFrame.multiply(other, axis='columns', fill_value=None) - Multiplication of dataframe and other, element-wise
# Store the difference in shares owned - same like position column in signals
pos_diff = positions.diff()
# Add `holdings` to portfolio
portfolio = pd.DataFrame(index=all_close_prices.index)
portfolio['holdings'] = (positions.multiply(all_close_prices[ticker], axis=0)).sum(axis=1)
# Add `cash` to portfolio
portfolio['cash'] = initial_capital - (pos_diff.multiply(all_close_prices[ticker], axis=0)).sum(
axis=1).cumsum()
# Add `total` to portfolio
portfolio['total'] = portfolio['cash'] + portfolio['holdings']
# Add `returns` to portfolio
portfolio['return'] = portfolio['total'].pct_change()
portfolio['return_cum'] = portfolio['total'].pct_change().cumsum()
return portfolio
ma_strategy('FSE/VOW3_X',20,5)
# Visualize the total value of the portfolio
portfolio_value = plt.figure(figsize=(12, 8))
ax1 = portfolio_value.add_subplot(1, 1, 1, ylabel='Portfolio value in $')
# Plot the equity curve in dollars
portfolio['total'].plot(ax=ax1, lw=2.)
You need to assign your function return value to a variable. The line which says
ma_strategy('FSE/VOW3_X',20,5)
probably needs to change to
portfolio = ma_strategy('FSE/VOW3_X',20,5)

Exponential Moving Average by time interval [duplicate]

I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?
I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.
So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
(there's probably a better way to structure the data, any advice would be appreciated)
EDIT:
It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.
To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):
>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq
... for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq),
... sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0
The above is not pretty, so let's refactor it a bit:
from collections import namedtuple
from operator import itemgetter
def smooth(iq_data, alpha=1, today=None):
"""Perform exponential smoothing with factor `alpha`.
Time period is a day.
Each time period the value of `iq` drops `alpha` times.
The most recent data is the most valuable one.
"""
assert 0 < alpha <= 1
if alpha == 1: # no smoothing
return sum(map(itemgetter(1), iq_data))
if today is None:
today = max(map(itemgetter(0), iq_data))
return sum(alpha**((today - date).days) * iq for date, iq in iq_data)
IQData = namedtuple("IQData", "date iq")
if __name__ == "__main__":
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
iqdata = list(map(IQData, days, IQ))
print("\n".join(map(str, iqdata)))
print(smooth(iqdata, alpha=0.5))
Example:
$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
I'm always calculating EMAs with Pandas:
Here is an example how to do it:
import pandas as pd
import numpy as np
def ema(values, period):
values = np.array(values)
return pd.ewma(values, span=period)[-1]
values = [9, 5, 10, 16, 5]
period = 5
print ema(values, period)
More infos about Pandas EWMA:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html
I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):
def ema(s, n):
"""
returns an n period exponential moving average for
the time series s
s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer
returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1
#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)
#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)
#now calculate the rest of the values
for i in s[n+1:]:
tmp = ( (i - ema[j]) * multiplier) + ema[j]
j = j + 1
ema.append(tmp)
return ema
You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.
import numpy as np
from scipy.signal import lfilter
x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form
y_new = y_old + (input - y_old)*alpha
where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)
y_new = y_old + (input - y_old)*dt/tau
If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.
My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....
def movingAverageExponential(values, alpha, epsilon = 0):
if not 0 < alpha < 1:
raise ValueError("out of range, alpha='%s'" % alpha)
if not 0 <= epsilon < alpha:
raise ValueError("out of range, epsilon='%s'" % epsilon)
result = [None] * len(values)
for i in range(len(result)):
currentWeight = 1.0
numerator = 0
denominator = 0
for value in values[i::-1]:
numerator += value * currentWeight
denominator += currentWeight
currentWeight *= alpha
if currentWeight < epsilon:
break
result[i] = numerator / denominator
return result
This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.
At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).
(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)
The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:
today: 1.0
yesterday: 0.5
2 days ago: 0.25
3 days ago: 0.125
...etc...
Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).
You'd invoke the function something like this:
result = movingAverageExponential(values, 0.75, 0.0001)
In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:
def moving_average(x, n, type):
x = np.asarray(x)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(x, weights, mode='full')[:len(x)]
a[:n] = a[n]
return a
I found the above code snippet by #earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:
def exponential_moving_average(period=1000):
""" Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
period: int - how many values to smooth over (default=100). """
multiplier = 2 / float(1 + period)
cum_temp = yield None # We are being primed
# Start by just returning the simple average until we have enough data.
for i in xrange(1, period + 1):
cum_temp += yield cum_temp / float(i)
# Grab the timple avergae
ema = cum_temp / period
# and start calculating the exponentially smoothed average
while True:
ema = (((yield ema) - ema) * multiplier) + ema
and I use it like this:
def temp_monitor(pin):
""" Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
ema = exponential_moving_average()
next(ema) # Prime the generator
while True:
yield ema.send(val_to_temp(pin.read()))
(where pin.read() produces the next value I'd like to consume).
May be shortest:
#Specify decay in terms of span
#data_series should be a DataFrame
ema=data_series.ewm(span=5, adjust=False).mean()
import pandas_ta as ta
data["EMA3"] = ta.ema(data["close"], length=3)
pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.
Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.
def emaWeight(numSamples):
return 2 / float(numSamples + 1)
def ema(close, prevEma, numSamples):
return ((close-prevEma) * emaWeight(numSamples) ) + prevEma
samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
numSamples = emaCap if s > emaCap else s
e = ema(samples[s], e, numSamples)
print e
I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia.
No numpy or pandas required.
prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
{'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]
def rec_calculate_ema(n):
k = 2 / (n + 1)
price = prices[n]['close']
if n == 1:
return price
res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
return res
print(rec_calculate_ema(3))
A fast way (copy-pasted from here) is the following:
def ExpMovingAverage(values, window):
""" Numpy implementation of EMA
"""
weights = np.exp(np.linspace(-1., 0., window))
weights /= weights.sum()
a = np.convolve(values, weights, mode='full')[:len(values)]
a[:window] = a[window]
return a
I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.
def expma(aseries, ratio):
return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
more simply, using pandas
def EMA(tw):
for x in tw:
data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
EMA([10,50,100])
Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:
Given a NumPy vector, x
import numpy as np
from scipy import signal
period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])
Get the N-point EMA (here, 12) returned in the vector y

Categories

Resources